WO2022178312A1 - Methods of stratifying and treating coronavirus infection - Google Patents

Methods of stratifying and treating coronavirus infection Download PDF

Info

Publication number
WO2022178312A1
WO2022178312A1 PCT/US2022/017082 US2022017082W WO2022178312A1 WO 2022178312 A1 WO2022178312 A1 WO 2022178312A1 US 2022017082 W US2022017082 W US 2022017082W WO 2022178312 A1 WO2022178312 A1 WO 2022178312A1
Authority
WO
WIPO (PCT)
Prior art keywords
cells
genes
cell
interferon
ciliated
Prior art date
Application number
PCT/US2022/017082
Other languages
French (fr)
Inventor
Alexander K. Shalek
Jose ORDOVAS-MONTANES
Carly ZIEGLER
Sarah GLOVER
Bruce Horwitz
Vincent MIAO
Anna OWINGS
Andrew NAVIA
Ying Tang
Joshua BROMLEY
Original Assignee
Massachusetts Institute Of Technology
The Children’s Medical Center Corporation
University Of Mississippi Medical Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Massachusetts Institute Of Technology, The Children’s Medical Center Corporation, University Of Mississippi Medical Center filed Critical Massachusetts Institute Of Technology
Priority to US18/277,612 priority Critical patent/US20240229166A9/en
Priority to EP22757036.3A priority patent/EP4295151A4/en
Publication of WO2022178312A1 publication Critical patent/WO2022178312A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/5005Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells
    • G01N33/5091Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing the pathological state of an organism
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P31/00Antiinfectives, i.e. antibiotics, antiseptics, chemotherapeutics
    • A61P31/12Antivirals
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/70Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving virus or bacteriophage
    • C12Q1/701Specific hybridization probes
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/5005Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells
    • G01N33/5008Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing or evaluating the effect of chemical or biological compounds, e.g. drugs, cosmetics
    • G01N33/5044Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing or evaluating the effect of chemical or biological compounds, e.g. drugs, cosmetics involving specific cell types
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/5005Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells
    • G01N33/5008Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing or evaluating the effect of chemical or biological compounds, e.g. drugs, cosmetics
    • G01N33/5082Supracellular entities, e.g. tissue, organisms
    • G01N33/5088Supracellular entities, e.g. tissue, organisms of vertebrates
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/569Immunoassay; Biospecific binding assay; Materials therefor for microorganisms, e.g. protozoa, bacteria, viruses
    • G01N33/56966Animal cells
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/569Immunoassay; Biospecific binding assay; Materials therefor for microorganisms, e.g. protozoa, bacteria, viruses
    • G01N33/56983Viruses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/005Assays involving biological materials from specific organisms or of a specific nature from viruses
    • G01N2333/08RNA viruses
    • G01N2333/165Coronaviridae, e.g. avian infectious bronchitis virus
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/435Assays involving biological materials from specific organisms or of a specific nature from animals; from humans
    • G01N2333/52Assays involving cytokines
    • G01N2333/555Interferons [IFN]
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/26Infectious diseases, e.g. generalised sepsis
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/52Predicting or monitoring the response to treatment, e.g. for selection of therapy based on assay results in personalised medicine; Prognosis
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/56Staging of a disease; Further complications associated with the disease

Definitions

  • the subject matter disclosed herein is generally directed to determining whether a subject is at risk for severe respiratory disease from a coronavirus infection and treating the subject.
  • SARS-CoV-2 The novel coronavirus clade SARS-CoV-2 emerged in late 2019 and has quickly led to one of the most devastating global pandemics in modern history. SARS-CoV-2 infection can cause severe respiratory COVID-19. However, many individuals present with isolated upper respiratory symptoms, suggesting potential to constrain viral pathology to the nasopharynx. Which cells SARS-CoV-2 primarily targets and how infection influences the respiratory epithelium remains incompletely understood.
  • the present invention provides for a method of treating a barrier tissue infection in a subject in need thereof comprising: detecting one or more indicators of infection from a sample obtained from the subject, wherein the sample comprises one or more of epithelial, immune, stromal, and neuronal cells; comparing the indicators to control/healthy samples or disease reference values to determine whether the subject will progress to a risk group selected from: mild/moderate or severe; and administering one or more treatments if one or more indicators are present.
  • the barrier tissue infection is a respiratory barrier tissue infection.
  • mild subjects are asymptomatic or symptomatic and not hospitalized, wherein moderate subjects are hospitalized and do not require oxygen by non- invasive ventilation or high flow, and wherein severe subjects are hospitalized and require oxygen by non-invasive ventilation, high flow, or intubation and mechanical ventilation.
  • the infection is a viral infection.
  • the viral infection is a coronavirus.
  • the coronavirus is SARS-CoV2 or variant thereof.
  • mild/moderate subjects have a WHO score of 1-5 and severe subjects have a WHO score of 6-8.
  • one or more indicators of infection are selected from the group consisting of: decreased interferon-stimulated gene (ISG) induction; upregulation of one or more anti-viral factors or IFN-responsive genes; reduction of mature ciliated cell population or increased immature ciliated cell population; increased secretory cell population; increased deuterosomal cell population; increased ciliated cell population; increased goblet cell population; decreased expression in Type II interferon specific genes; increased expression in Type I interferon specific genes; increased MHC-I and MHC-II genes; increased developing ciliated cell populations; altered expression of one or more genes in a cell type selected from any of Tables 2- 4; altered expression of one or more genes in a cell type selected from Table 5; increase expression of IFITM3 and IFI44L; increased expression of EIF2AK2; increased expression of TMPRSS4, TMPRSS2, CTSS, CTSD; upregulation of cholesterol and lipid biosynthesis; and increased abundance of low-density lipoprotein receptors
  • one or more interferon-stimulated genes are detected, wherein if the one or more interferon-stimulated genes are downregulated the subject is at risk for severe disease and if the one or more interferon-stimulated genes are upregulated the subject is not at risk for severe disease.
  • the one or more interferon-stimulated genes are selected from the group consisting of STAT1, STAT2, IRF1, and IRF9.
  • the one or more indicators of infection are detected in infected host cells and compared to reference values in infected host cells from a risk group.
  • one or more anti-viral factors or IFN-responsive genes are detected in virally- infected cells, wherein if the one or more anti-viral factors or IFN-responsive genes are downregulated or absent in virally-infected cells the subject is at risk for severe disease and if the one or more anti-viral factors or IFN-responsive genes are upregulated in virally-infected cells the subject is not at risk for severe disease.
  • the one or more anti -viral factors or IFN-responsive genes are selected from the group consisting of EIF2AK2, STAT1 and STAT2.
  • the secretory cells comprise one or both of: KRT13 KRT24 high Secretory Cells and Early Response Secretory Cells.
  • the secretory cells express CXCL8.
  • the goblet cells comprise one or both of: AZGP1 high Goblet Cells and SCGB1A1 high Goblet Cells.
  • the ciliated cells comprise one or more upregulated genes selected from the group consisting of IFI27, IFIT1, IFI6, IFITM3, and GBP3.
  • one or both of the ciliated cells and the goblet cells comprise increased gene expression of one or more IFN gene selected from any of Tables 2-4.
  • ACE2 expression is upregulated compared to other epithelial cells among one or more of secretory cells, goblet cells, ciliated cells, developing ciliated cells, and deuterosomal cells.
  • the mature ciliated cells are BEST4 high cilia high ciliated cells.
  • the MHC-I and MHC-II genes comprise at least one or more of: HLA-A, HLA-C, HLA-F, HLA-E, HLA-DRBl, and HLA-DRA.
  • the upregulated cholesterol and lipid biosynthesis genes comprise at least one or more of: FDFT1, MVK, FDPS, ACAT2, and HMGCS1.
  • detecting one or more indicators is performed by using Simpson’s index.
  • a subject is determined to belong to the severe risk group if one or more of the following is detected in the sample: proinflammatory cytokines comprising at least one or more of: IL1B, TNF, CXCL8, CCL2, CCL3, CXCL9, CXCL10, and CXCL11; upregulation of alarmins comprising one or both of: S100A8 and S100A9; 14% - 26% of all epithelial cells are secretory cells; elevated BPIFAl high Secretory cells; elevated KRT13 KRT24 high secretory cells; macrophage population increase as compared to other immune cells; upregulated genes in ciliated cells comprising one or both of: IL5RA and NLRP1; no increase of at least one or more of: type I, type II, and type III interferon abundance; elevated stress response factors comprising at least one or more of: HSPA8, HSPA1A, and DUSP1; increased expression of one or more genes differentially expressed in COVID-19
  • a subject is determined to belong to the mild/moderate risk group if one or more of the following is detected in the sample: 4% - 12% of all epithelial cells are Secretory Cells; 10% - 20% of all epithelial cells comprise Interferon Responsive Ciliated Cells; upregulated ciliated cell genes comprising at least one or more of: IFI44L, STAT1, IFITM1, MX1, IFITM3, OAS1, OAS2, OAS3, STAT2, TAPI, HLA-C, ADAR, XAFl, IRF1, CTSS, and CTSB; increase in type I interferon abundance; high expression of interferon-responsive genes; decreased expression of one or more genes differentially expressed in COVID-19 WHO 6-8 according to Table 3 or Table 4; induction of type I interferon responses; and high abundance of IFI6 and IFI27.
  • the interferon-responsive genes comprise at least one or more of: STAT1, MX1, HLA-B, and HLA-C.
  • the interferon response occurs in at least one or more of: MUC5AC high Goblet Cells, SCGB1A1 high Goblet Cells, Early Response Secretory Cells, Deuterosomal Cells, Interferon Responsive Ciliated Cells, and BEST4 high Cilia high Ciliated Cells.
  • the treatment is administered according to determined risk group. In certain example embodiments, where the treatment involves administering a preventative or therapeutic intervention according to the determined risk group. In certain example embodiments, wherein if the subject is determined to be at risk for progression to the severe risk group the subject is administered a treatment comprising one or more treatments selected from the group consisting of: one or more antiviral; blood-derived immune-based therapy; one or more corticosteroid; one or more interferon; one or more interferon Type I agonists; one or more interleukin-1 inhibitors; one or more kinase inhibitors; one or TLR agonists; a glucocorticoid; and interleukin-6 inhibitor.
  • the subject is administered a treatment comprising one or more of: one or more antiviral; one or more antibiotic; and one or more cholesterol biosynthesis inhibitor.
  • the treatment comprises an antiviral.
  • the antiviral inhibits viral replication.
  • the antiviral is paxlovid, molnupiravir and remdesivir.
  • the treatment is an immune-based therapy.
  • the immune-based therapy is a blood-derived product comprising at least one or more of: a convalescent plasma and an immunoglobin.
  • the immune-based therapy is an immunomodulator comprising at least one or more of: a corticosteroid, a glucocorticoid, an interferon, an interferon Type I agonist, an interleukin- 1 inhibitor, an interleukin-6 inhibitor, a kinase inhibitor, and a TLR agonist.
  • the corticosteroid comprises at least one of: methylprednisolone, hydrocortisone, and dexamethasone.
  • the glucocorticoid comprises at least one of: cortisone, prednisone, prednisolone, methylprednisolone, dexamethasone, betamethasone, triamcinolone, Fludrocortisone acetate, deoxycorticosterone acetate, and hydrocortisone.
  • the interferon comprises at least one or more of: interferon beta-lb and interferon alpha-2b.
  • the interleukin-1 inhibitor comprises anakinra.
  • the interleukin-6 inhibitor comprises at least one or more of: anti-interleukin-6 receptor monoclonal antibodies and anti -interleukin-6 monoclonal antibody.
  • the anti-interleukin-6 receptor monoclonal antibody is tocilizumab.
  • the anti-interleukin-6 monoclonal antibody is siltuximab.
  • the kinase inhibitor comprises of at least one or more of Bruton's tyrosine kinase inhibitor and Janus kinase inhibitor.
  • the Bruton's tyrosine kinase inhibitor comprises at least one or more of: acalabrutinib, ibrutinib, and zanubrutinib.
  • the Janus kinase inhibitor comprises at least one or more of: baracitinib, ruxolitinib and tofacitinib.
  • the TLR agonist comprises at least one or more of: imiquimod, BCG, and MPL.
  • the treatment comprises inhibiting cholesterol biosynthesis.
  • inhibiting cholesterol biosynthesis comprises administering HMG-CoA reductase inhibitors.
  • the HMG-CoA reductase inhibitor comprises at least one or more of: simvastatin atorvastatin, lovastatin, pravastatin, fluvastatin, rosuvastatin, pitavastatin.
  • the treatment comprises an antibiotic.
  • the treatment comprises one or more agents capable of shifting epithelial cells to express an antiviral signature. In certain example embodiments, the treatment comprises one or more agents capable of suppressing a myeloid inflammatory response. In certain example embodiments, the treatment comprises an RNA-guided nuclease system. In certain example embodiments, the RNA-guided nuclease system is a CRISPR system. In certain example embodiments, the CRISPR system comprises a CRISPR-Cas base editing system, a prime editor system, or a CAST system.
  • the treatment is administered before severe disease.
  • the infection is a viral infection.
  • the viral infection is a coronavirus.
  • coronavirus is SARS-CoV2 or variant thereof.
  • the one or more cell types are detected using one or markers differentially expressed in the cell types.
  • the one or more cell types or one or more genes are detected by immunohistochemistry (IHC), fluorescence activated cell sorting (FACS), fluorescently bar-coded oligonucleotide probes, RNA FISH (fluorescent in situ hybridization), RNA-seq, or any combination thereof.
  • IHC immunohistochemistry
  • FACS fluorescence activated cell sorting
  • RNA FISH fluorescent in situ hybridization
  • RNA-seq or any combination thereof.
  • single cell expression is inferred from bulk RNA-seq.
  • expression is determined by single cell RNA-seq.
  • the present invention provides for a method of screening for agents capable of shifting epithelial cells from a SARS-CoV2 severe phenotype to a mild/moderate phenotype comprising: treating a sample comprising epithelial cells with a drug candidate; detecting modulation of any indicators of infection according to any of the preceding claims; and identifying the drug, wherein the one or more indicators shift towards a mild/moderate phenotype.
  • the sample comprises epithelial cells infected with SARS-CoV2.
  • the sample comprises epithelial cells expressing one or more SARS-CoV2 genes.
  • the sample is an organoid or tissue model.
  • the sample is an animal model.
  • cell types are detected using one or markers selected from Table 1.
  • FIG. 1A Schematic of method for viable cryopreservation of nasopharyngeal swabs, cellular isolation, and scRNA- seq using the Seq-Well S ⁇ 3 platform (created with BioRender).
  • FIG. 1B UMAP of 32,588 single- cell transcriptomes from all participants, colored by cell type (following iterative Louvain clustering).
  • FIG. 1C The detailed description that sets forth illustrative embodiments, in which the principles of the invention may be utilized, and the accompanying drawings of which: [0028] FIGS. IA-IO - Cellular composition of nasopharyngeal swabs.
  • FIG. 1A Schematic of method for viable cryopreservation of nasopharyngeal swabs, cellular isolation, and scRNA- seq using the Seq-Well S ⁇ 3 platform (created
  • FIG. 1D UMAP as in B, colored by SARS-CoV-2 PCR status at time of swab.
  • FIG. 1D UMAP as in B, colored by peak level of respiratory support (WHO COVID-19 severity scale).
  • FIG. 1E UMAP as in B, colored by participant.
  • FIG. 1F Violin plots of cluster marker genes (FDR ⁇ 0.01) for coarse cell type annotations (as in B).
  • FIG. 1G Proportional abundance of coarse cell types by participant (ordered within each disease cohort by increasing Ciliated cell abundance).
  • FIG. 1H Proportional abundance of participants by coarse cell types. Shades of red: COVID-19. Shades of blue: Control.
  • FIG. 1 I Expression of entry factors for SARS-CoV-2 and other common upper respiratory viruses.
  • FIG. 1J Proportion of Goblet Cells by sample. Statistical test above graph represents Kruskal-Wallis test results across all cohorts (following Bonferroni-correction). Statistical significance asterisks within box represent significant results from Dunn's post-hoc testing. * Bonferroni-corrected p-value ⁇ 0.05, ** q ⁇ 0.01, *** q ⁇ 0.001.
  • FIG. 1K Proportion of Secretory Cells by sample.
  • FIG. 1L Proportion of Deuterosomal Cells by sample.
  • FIG. 1M Proportion of Developing Ciliated Cells by sample.
  • FIG. 1N Proportion of Ciliated Cells by sample.
  • FIGS. 2A-2R Altered epithelial cell composition and recovery in the nasopharynx during COVID-19.
  • FIG. 2A UMAP of 28,948 epithelial cell types following re- clustering, colored by coarse cell types. Lines represent smoothed estimate of cellular differentiation trajectories (RNA velocity estimates via scVelo using intronic:exonic splice ratios).
  • FIG. 2B UMAP as in A, colored by SARS-CoV-2 PCR status at time of swab.
  • FIG. 2C UMAP as in A, colored by peak level of respiratory support (WHO illness severity scale).
  • FIG. 2D UMAP as in A, colored by detailed cell annotations.
  • FIG. 2E UMAP of 28,948 epithelial cell types following re- clustering, colored by coarse cell types. Lines represent smoothed estimate of cellular differentiation trajectories (RNA velocity estimates via scVelo using intronic:exonic splice ratios).
  • FIG. 2F UMAP of 9,209 Basal, Goblet, and Secretory Cells, following sub-clustering and resolution of detailed cell annotations.
  • FIG. 2G UMAP of only Basal, Goblet, and Secretory Cells as in F, colored by SARS-CoV-2 PCR status at time of swab.
  • FIG. 2H UMAP of only Basal, Goblet, and Secretory Cells as in F, colored by inferred velocity pseudotime (darker blue shades: precursor cells, lighter yellow shades: more terminally differentiated cell types).
  • FIG. 21 UMAP of only Basal, Goblet, and Secretory Cells as in F, colored by inferred velocity pseudotime (darker blue shades: precursor cells, lighter yellow shades: more terminally differentiated cell types).
  • FIG. 2J UMAP of 13,913 Ciliated Cells, following sub-clustering and resolution of detailed cell annotations.
  • FIG. 2K UMAP of Ciliated Cells as in J, colored by SARS-CoV-2 PCR status at time of swab.
  • FIG. 2L UMAP of Ciliated Cells as in J, colored by inferred velocity pseudotime (darker blue shades: precursor cells, lighter yellow shades: more terminally differentiated cell types).
  • FIG. 2M Plot of gene expression by Ciliated Cell velocity pseudotime for select genes (all significantly correlated with velocity expression. Points colored by detailed cell type annotations.
  • FIG. 2N Proportion of Secretory Cell subtypes (detailed annotation) by sample, normalized to all epithelial cells.
  • FIG. 20 Proportion of Ciliated Cell subtypes (detailed annotation) by sample, normalized to all epithelial cells.
  • FIG. 2P UMAP of 13,210 epithelial cells (using UMAP embedding from A) from SARS-CoV-2 PCR negative participants (Control). Lines represent smoothed estimate of cellular differentiation trajectories (via RNA velocity) calculated using only cells from Control participants.
  • FIG. 2Q UMAP of 15,738 epithelial cells (using UMAP embedding from A) from SARS-CoV-2 PCR positive participants (COVID-19).
  • FIG. 2R UMAP of 32,588 cells from all participants, shaded by detailed cell type. Arrows represent smoothed estimate of cellular differentiation trajectories inferred by RNA Velocity.
  • FIGS. 3A-3J Cell-type specific and shared transcriptional responses to SARS- CoV-2 infection.
  • FIG. 3B Top: Volcano plots of average log fold change vs.
  • FIG. 3C Heatmap of significantly DE genes between Interferon Responsive Ciliated Cells from different disease cohorts.
  • FIG. 3E Heatmap of significantly DE genes between MUC5AC high Goblet Cells from different disease cohorts.
  • FIG. 3G Top: Dot plot of IFNGR1/2 and IFNAR1/2 gene expression by selected cell types.
  • FIGS. 4A-4H Co-detection of human and SARS-CoV-2 RNA.
  • FIG 4A Metatranscriptomic classification of all single-cell RNA-seq reads using Kraken2. Results shown from selected respiratory viruses. Only results with greater than 5 reads are shown.
  • FIG. 4B Normalized abundance of SARS-CoV-2 aligning UMI from all single-cell RNA-seq reads (including those derived from ambient/low-quality cell barcodes). P ⁇ 0.0001 by Kruskal-Wallis test. Pairwise comparisons using Dunn's post-hoc testing. ** p ⁇ 0.01, *** p ⁇ 0.001.
  • FIG. 4C Proportional abundance of Secretory cells (all) vs. total SARS-CoV-2 UMI (normalized to M total UMI).
  • FIG. 4D Proportional abundance of Secretory cells (all) vs. total SARS-CoV-2 UMI (normalized to M total UMI).
  • FIG. 4E Proportional abundance of FOXJ1 high Ciliated cells vs. total SARS-CoV-2 UMI (normalized to M total UMI).
  • FIG. 4E SARS-CoV-2 UMI per high-quality cell barcode. Results following correction for ambient viral reads.
  • FIG. 4F Schematic for SARS-CoV-2 genome and subgenomic RNA species.
  • FIG. 4G Schematic for SARS-CoV-2 genomic features annotated in the custom reference gtf.
  • FIG. 4H Heatmap of SARS-CoV-2 genes expression among SARS- CoV-2 RNA+ single cells (following correction for ambient viral reads).
  • Top color bar indicates disease and severity cohort (red: COVID-19 WHO 1-5, pink: COVID-19 WHO 6-8, black: COVID-19 convalescent, blue: Control WHO 0).
  • Top heatmap SARS-CoV-2 genes and regions organized from 5’ to 3’.
  • Bottom heatmap alignment to 70-mer regions directly surrounding viral transcription regulatory sequence (TRS) sites, suggestive of spliced RNA species (joining of the leader to body regions) vs. unspliced RNA species (alignment across TRS).
  • TRS viral transcription regulatory sequence
  • FIGS. 5A-5E Cellular targets of SARS-CoV-2 in the nasopharynx.
  • FIG. 5A Summary schematic of top SARS-CoV-2 RNA+ cells, (created with BioRender).
  • FIG. 5B SARS- CoV-2 RNA+ cell abundance (top) and percent (bottom) per participant. Results following correction for ambient viral reads.
  • FIG. 5C Abundance of SARS-CoV-2 RNA+ cells by detailed cell type, bars colored by participant. Results following correction for ambient viral reads.
  • FIG. 5D Dot plot of SARS-CoV-2 RNA presence by sample (columns) and detailed cell types (rows).
  • Dot size reflects fraction of a given participant and cell type containing SARS-CoV-2 RNA (following viral ambient correction). Dot color reflects fraction of aligned reads corresponding to the SARS-CoV-2 positive strand (yellow) vs. negative strand (black). Dot plot across columns: alignment of viral reads by participant, separated by RNA species type. Dot plot across rows: alignment of viral reads by detailed cell type, separated by RNA species type. FIG. 5E. Percent ACE2+ cells vs. percent SARS-CoV-2 RNA+ cells by coarse cell type (left) and detailed cell type (right).
  • FIGS. 6A-6F Intrinsic and bystander responses to SARS-CoV-2 infection.
  • FIG. 6A Violin plot of selected genes upregulated in SARS-CoV-2 RNA+ cells in at least 3 individual cell type comparisons. Dark red: SARS-CoV-2 RNA+ cells, red: bystander cells from COVID-19 participants, blue: cells from Control participants. From left to right the scale is log(1 + UMI per 10K)
  • FIG. 6B Enriched gene ontologies among genes consistently up- or down-regulated among SARS-CoV-2 RNA+ cells across cell types.
  • FIG. 6C Heatmap of genes consistently higher in SARS-CoV-2 RNA+ cells across multiple cell types.
  • Colors represent log fold changes between SARS-CoV-2 RNA+ cells and bystander cells (SARS-CoV-2 RNA- cells, from COVID-19 infected donors) by cell type. Restricted to cell types with at least 5 SARS-CoV-2 RNA+ cells. Yellow: upregulated among SARS-CoV-2 RNA+ cells, blue: upregulated among bystander cells.
  • FIG. 6D Heatmap of genes consistently higher in bystander cells across multiple cell types.
  • FIG. 6F Percent ACE2+ cells vs. percent SARS-CoV-2 RNA+ cells by detailed cell type. Left: cells from participants with mild/moderate COVID-19. Right: cells from participants with severe COVID- 19. Point size reflects average type I interferon specific module score among SARS-CoV-2 RNA+ cells.
  • FIGS. 7A-7N Participant cohort and cellular composition of nasopharyngeal swabs.
  • FIG. 7 A Cohort composition and participant demographics.
  • FIG. 7B IgM and IgG titers among Control WHO 0 and COVID-19 participants.
  • FIG. 7C Detailed schematic of sample preparation and cell processing from nasal swabs (created with BioRender).
  • FIG. 7D Single cell quality metrics by cohort (after filtering for low-quality cells).
  • FIG. 7E Single cell quality metrics by participant (after filtering for low quality cells).
  • FIG. 7F Quality metrics for matched fresh vs. frozen nasal swabs from two participants (P1 and P2).
  • FIG. 7G UMAP of cell types from PI .
  • FIG. 7G UMAP of cell types from PI .
  • FIG. 7H UMAP of cell types from P2.
  • FIG. 71 Percent composition of each cell type by fresh (grey circles) or frozen (black squares) processing.
  • FIG. 7J UMAP from P1 as in G, colored by fresh (grey) vs. frozen (black).
  • FIG. 7K UMAP from P2 as in H, colored by fresh (grey) vs. frozen (black).
  • FIG. 7L Comparison of WHO severity at swab and peak.
  • FIG. 7M Comparison of WHO severity at swab and peak.
  • FIG. 8A Proportional abundance of detailed epithelial cell types by participant.
  • FIG. 8B Expression of entry factors for SARS-CoV-2 and other common upper respiratory viruses among detailed epithelial cell types. Dot size represents fraction of cell type (rows) expressing a given gene (columns). Dot hue represents average expression.
  • FIG. 8C Plot of gene expression by epithelial cell velocity pseudotime. Select genes significantly associated with ciliated cell pseudotime. Points colored by coarse cell type annotations. Top: alignment to unspliced (intronic) regions. Bottom: alignment to spliced (exonic) regions.
  • FIG. 8D The first stage annotations.
  • FIG. 8E Flow cytometry and gating scheme of immune cells from a fresh nasopharyngeal (NP) swab. Representative healthy participant. Bottom right: quantification of cellular proportions.
  • FIG. 8F Flow cytometry and gating scheme of epithelial cells from an NP swab. Representative data from a participant with severe COVID-19.
  • FIG. 8G Secretory cell proportion of live, CD45- cells from NP swabs.
  • FIGS. 9A-9L - COVID-19-induced changes to nasopharynx-resident immune cells UMAP of 3,640 immune cells following re-clustering, colored by coarse cell types.
  • FIG. 9B UMAP as in A, colored by detailed cell annotations.
  • FIG. 9C UMAP as in A, colored by level of respiratory support (WHO illness severity scale).
  • FIG. 9D UMAP as in A, colored by SARS-CoV-2 PCR status at time of swab.
  • FIG. 9F Violin plots of cluster marker genes (FDR ⁇ 0.01) for detailed immune cell type annotations (as in B).
  • FIG. 9G Violin plots of cluster marker genes (FDR ⁇ 0.01) for detailed immune cell type annotations (as in B).
  • FIG. 9H Proportion of immune cell subtypes by sample and cohort, normalized to all immune cells. Statistical test above graph represents Kruskal-Wallis test results across all cohorts (following Bonferroni-correction).
  • FIG. 9F Heatmap of significantly DE genes between Macrophages (all, coarse annotation) from different disease cohorts.
  • FIG. 9J Heatmap of significantly DE genes between T Cells (all, coarse annotation) from different disease cohorts.
  • FIG. 9L Violin plots of gene module scores, split by Control WHO 0 (blue), COVID-19 WHO 1-5 (red), and COVID-19 WHO 6-8 (pink).
  • Gene modules represent transcriptional responses of human basal cells from the nasal epithelium following in vitro treatment with IFNA or IFNG. Significance by Wilcoxon signed-rank test. P-values following Bonferroni-correction: * p ⁇ 0.05, ** p ⁇ 0.01, *** p ⁇ 0.001.
  • FIG. 9L Proportion of interferon responsive macrophages vs. proportion of interferon responsive cytotoxic CD8 T cells per sample, normalized to total immune cells. Including all samples, Control and COVID-19 groups.
  • FIGS. 10A-10H Cell-type specific and shared transcriptional responses to SARS-CoV-2 infection.
  • FIG. 10 A Abundance of significant differentially expressed genes by coarse cell type between Control WHO 0 and COVID-19 WHO 1-5 samples (left), Control WHO 0 and COVID-19 WHO 6-8 samples (middle) and COVID-19 WHO 1-5 vs. COVID-19 WHO 6- 8 samples (right). FDR-corrected p ⁇ 0.001, log2 fold change > 0.25.
  • FIG. 10B Heatmap of significantly DE genes between Ciliated Cells (all, coarse annotation) from different disease cohorts.
  • FIG. 1OC Heatmap of significantly DE genes between Ciliated Cells (all, coarse annotation) from different disease cohorts.
  • FIG. 10D Interferon gene module scores across all detailed epithelial cell types, split by Control WHO 0 (blue), COVID-19 WHO 1-5 (red), and COVID-19 WHO 6-8 (pink). Gene modules represent transcriptional responses of human basal cells from the nasal epithelium following in vitro treatment with IFNA or IFNG.
  • FIG. 10E Dot plot of ACE2 expression across select coarse and detailed epithelial cell types and subsets.
  • FIG. 10F Dot plot of ACE2 expression across select coarse and detailed epithelial cell types and subsets.
  • FIG. 10G Violin plots of select genes upregulated among ciliated cells in COVID-19 WHO 1-5 participants compared to Control WHO 0 (PARP14, ISG15) and in COVID-19 WHO 6-8 participants compared to Control WHO 0 ( FKBP5 ). Cells separated by participant treatment with corticosteroids. *** FDR-corrected p ⁇ 0.001.
  • FIG. 10H Dot plot of type I and type III interferons among ciliated, goblet, and squamous cells. Left: healthy vs. influenza A/B virus infected participants from Cao et al., 2020. Right: Control WHO 0 vs. COVID-19 WHO 1-5, vs. COVID-19 WHO 6-8 participants. Datasets processed and scaled identically.
  • FIGS. 11A-11J Detection of SARS-CoV-2 RNA from single-cell RNA-seq data.
  • FIG. 11 A Metatranscriptomic classification of all single-cell RNA-seq reads using Kraken2: reads per sample annotated as unclassified.
  • FIG. 11B Metatranscriptomic classification of all single- cell RNA-seq reads using Kraken2: reads per sample annotated as Homo sapiens.
  • FIG. 11C Metatranscriptomic classification of all single-cell RNA-seq reads using Kraken2: reads per sample annotated as SARS-related coronaviruses.
  • FIG. 11D Total recovered cells per sample vs. normalized abundance of SARS-CoV-2 aligning UMI from all single-cell RNA-seq reads (including those derived from ambient/low-quality cell barcodes).
  • FIG. 11 D Total recovered cells per sample vs. normalized abundance of SARS-CoV-2 aligning UMI from all single-cell RNA-seq reads (including those derived from ambient/low-quality cell barcodes).
  • FIG. 11E Normalized abundance of SARS-CoV-2 aligning UMI from all single-cell RNA-seq reads across all COVID- 19 participants. Dashed line represents partition between “Viral High” vs “Viral Low” samples.
  • FIG. 11F Proportional abundance of selected cell types according to total SARS-CoV-2 abundance among COVID-19 samples. Statistical test above graph represents Kruskal -Wallis test statistic across all cohorts. Statistical significance asterisks within box represent significant results from Dunn's post-hoc testing. Bonferroni-corrected p-value: * p ⁇ 0.05, ** p ⁇ 0.01, *** p ⁇ 0.001.
  • FIG. 11G Bonferroni-corrected p-value: * p ⁇ 0.05, ** p ⁇ 0.01, *** p ⁇ 0.001.
  • FIG. 11H Quality metrics among 415 SARS-CoV-2 RNA+ cells (associated with high-quality cell barcodes and following ambient viral RNA correction).
  • Left abundance of SARS-CoV-2 aligning UMI vs. percent of all aligned reads (per cell barcode) aligning to SARS-CoV-2.
  • Middle abundance of human (GRCh38)-aligning UMI vs. abundance of SARS-CoV-2 aligning UMF
  • FIG. 11I Quality metrics among 415 SARS-CoV-2 RNA+ cells (associated with high-quality cell barcodes and following ambient viral RNA correction).
  • Left abundance of SARS-CoV-2 aligning UMI vs. percent of all aligned reads (per cell barcode) aligning to SARS-CoV-2.
  • Middle abundance of human (GRCh38)-aligning UMI vs
  • FIGS. 12A-12H SARS-CoV-2 RNA species and cell types containing viral reads.
  • FIG. 12A Schematic of method to distinguish unspliced from spliced SARS-CoV-2 RNA species by searching for reads which align across a spliced or genomic Transcription Regulatory Sequence (TRS, 6mer).
  • FIG. 12B Abundance of SARS-CoV-2 aligning UMI/Cell per detailed cell type (following ambient viral RNA correction), split by UMI aligning to the viral positive strand, negative strand, 70-mer region across an unspliced TRS, and 70-mer region across a spliced TRS.
  • FIG. 12C Schematic of method to distinguish unspliced from spliced SARS-CoV-2 RNA species by searching for reads which align across a spliced or genomic Transcription Regulatory Sequence (TRS, 6mer).
  • FIG. 12B Abundance of SARS-CoV-2 aligning UMI/Cell per detailed cell type (following ambient viral RNA correction), split by UMI aligning to the viral positive strand, negative
  • FIG. 12D Dot plot of SARS-CoV-2 unspliced TRS aligning UMI by participant (columns) and detailed cell type (rows).
  • FIG. 12E Dot plot of SARS-CoV-2 spliced TRS aligning UMI by participant (columns) and detailed cell type (rows).
  • FIG. 12F Percent ACE2+ cells vs.
  • FIG. 12G Abundance of SARS-CoV-2 negative strand aligning reads by coarse epithelial cell types.
  • FIG. 12H Abundance of SARS-CoV-2 negative strand aligning reads by detailed ciliated cell types.
  • FIGS. 13A-13C Intrinsic and bystander responses to SARS-CoV-2 infection.
  • FIG. 13A Violin plots of select genes upregulated in SARS-CoV-2 RNA+ Cells when compared to matched bystanders. Plotting only SARS-CoV-2 RNA+ Cells from COVID-19 WHO 1-5 participants (red) and COVID-19 WHO 6-8 participants (pink). Top row: SARS-CoV-2 RNA expression by alignment type.
  • FIG. 13B Heatmaps of log fold changes between SARS-CoV-2 RNA+ cells and bystander cells by cell types. Gene sets derived from four CRISPR screens for important host factors in the SARS-CoV-2 viral life cycle. Restricted to cell types with at least 5 SARS-CoV-2 RNA+ cells.
  • FIG. 13C Heatmap of Spearman's correlation between 73 clinical parameters, demographic data, or results from scRNA-seq. Includes individuals from healthy (Control WHO 0), COVID-19 mild/moderate (COVID-19 WHO 1-5) and COVID-19 severe (COVID-19 WHO 6-8) groups. Colored squares represent statistically significant associations by permutation test (p ⁇ 0.01; red: positive Spearman's rho; blue: negative Spearman's rho).
  • a “biological sample” may contain whole cells and/or live cells and/or cell debris.
  • the biological sample may contain (or be derived from) a “bodily fluid”.
  • a “bodily fluid” encompasses embodiments wherein the bodily fluid is selected from amniotic fluid, aqueous humour, vitreous humour, bile, blood serum, breast milk, cerebrospinal fluid, cerumen (earwax), chyle, chyme, endolymph, perilymph, exudates, feces, female ejaculate, gastric acid, gastric juice, lymph, mucus (including nasal drainage and phlegm), pericardial fluid, peritoneal fluid, pleural fluid, pus, rheum, saliva, sebum (skin oil), semen, sputum, synovial fluid, sweat, tears, urine, vaginal secretion, vomit and mixtures of one or more thereof.
  • Biological samples include cell cultures, bodily fluids, cell cultures from bodily fluids. Bodily fluids may be obtained from a mammal organism, for example by puncture, or other collecting or sampling procedures.
  • the terms “subject,” “individual,” and “patient” are used interchangeably herein to refer to a vertebrate, preferably a mammal, more preferably a human. Mammals include, but are not limited to, murines, simians, humans, farm animals, sport animals, and pets. Tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed. [0049] Various embodiments are described hereinafter.
  • Embodiments disclosed herein provide methods of determining whether a subject is at risk for severe respiratory disease from a coronavirus infection and treating subjects at risk prophylactically or subjects suffering from severe respiratory disease.
  • SARS-CoV-2 the virus that causes COVID-19, relies on efficient replication within cells of the human upper airways for infection and transmission. In some individuals, the virus accesses lower respiratory tissues, causing pneumonia, acute respiratory distress syndrome, and systemic effects which lead to profound morbidity and mortality.
  • peripheral correlates of immunity during COVID-19 how SARS-CoV-2 impacts its primary target tissue, the human nasopharynx, remains unclear.
  • Applicants present a cohort of over 60 samples from healthy individuals and participants with COVID-19, representing a wide spectrum of disease states from ambulatory to critically ill.
  • Applicants collected viable cells and performed single-cell RNA-seq, simultaneously profiling both host and viral RNA.
  • Applicants performed scRNA-seq on nasopharyngeal swabs from 58 healthy and COVID-19 participants.
  • Applicants find that following infection with SARS-CoV-2 the upper respiratory epithelium undergoes massive expansion and diversification of secretory cells and preferential loss of mature ciliated cells.
  • epithelial cells express anti -viral/interferon- responsive genes, while cells in severe COVID-19 have muted anti -viral responses despite equivalent viral loads.
  • Applicants characterized cell- associated SARS-CoV-2 RNA and identified rare cells with RNA intermediates strongly suggestive of active replication.
  • SARS-CoV-2 RNA+ host cells Applicants found remarkable diversity and heterogeneity both within and across individuals, including developing/immature and interferon-responsive ciliated cells, KRT13+ “hillock”-like cells, and unique subsets of secretory, goblet, and squamous cells.
  • SARS-CoV-2 RNA+ host-target cells are highly heterogenous, including developing ciliated, interferon-responsive ciliated, AZGP1 high goblet, and KRT13+ “hillock”-like cells, and Applicants identify genes associated with susceptibility, resistance, or infection response.
  • SARS-CoV-2 RNA+ cells Applicants detected genes that were enriched compared to uninfected bystanders, suggesting involvement in either the cell-intrinsic response or susceptibility to infection. These included anti- viral genes (e.g., MX1, IFITM3, EIF2AK2), proteases (e.g., CTSL, TMPRSS2), and pathways involved in cholesterol biosynthesis.
  • anti- viral genes e.g., MX1, IFITM3, EIF2AK2
  • proteases e.g., CTSL, TMPRSS2
  • the present invention stratifies subjects based on their risk of developing severe respiratory disease or if the subject is predicted to have mild/moderate disease.
  • the present invention also provides for predicting the risk of developing severe respiratory disease in subjects who initially present as asymptomatic or as mild/moderate disease.
  • severe refers to a subject having intubation and mechanical ventilation, ventilation with additional organ support, or death.
  • mimild refers to a subject having no limitation of activities, limitation of activities, hospitalized and no oxygen therapy, oxygen by mask or nasal prongs, non-invasive ventilation or high-flow oxygen.
  • moderate refers to a subject having no limitation of activities, limitation of activities, hospitalized and no oxygen therapy, oxygen by mask or nasal prongs, non-invasive ventilation or high-flow oxygen.
  • cell subsets refers to a cell that can be distinguished by a parent cell type, but expresses a specific gene signature or cell state that can further distinguish the cell from other cells of the parent cell type.
  • cell subsets are also referred to by a cluster (i.e., the different cell subsets cluster together). In certain embodiments, shifts in cell types or subsets of a cell type are used to predict a disease state and for selecting a treatment.
  • shifts in cell states in cell types or subsets of a cell type are used to predict a disease state and for selecting a treatment.
  • cell state refers to the expression of genes in specific cell subsets.
  • gene expression is not limited to mRNA expression and may also include proteins.
  • the cell subset frequency and/or cell states can be detected for screening novel therapeutics.
  • the present invention provides for subsets of epithelial cell types and immune cells.
  • intrinsic immune responses are differentially induced in different patient populations (e.g., severe, mild or moderate).
  • intrinsic immune states or conditions are monitored or detected during treatment.
  • the frequency of the cell subsets are shifted in disease states.
  • Disease states may include disease severity or response to any treatment in the standard of care for the disease.
  • one or more cell subsets associated with a disease state or risk group is detected or shifted to a treat a subject in need thereof.
  • the cell subsets can be identified using one or more marker genes specific for the subset.
  • the cell subsets that are shifted include KRT13 KRT24 high Secretory Cells, Early Response Secretory Cells, CXCL8 Secretory Cells, AZGP1 high Goblet Cells, SCGB1A1 high Goblet Cells, IFI27; IFIT1; IFI6; IFITM3; and GBP3 ciliated cells, any IFN gene ciliated cells, any IFN goblet cells, ACE2 epithelial cells, ACE2 secretory cells, ACE2 goblet cells, ACE2 ciliated cells, ACE2 developing ciliated cells, ACE2 deuterosomal cells, BEST4 high cilia high ciliated cells.
  • scRNA-seq single cell RNA sequencing
  • 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 or more genes are detected.
  • detecting 2 or more of the subset markers increases the probability of detecting a cell subset.
  • specific cell types or cell subtypes differentially express genes based on the disease state or risk of the disease state.
  • Applicants have identified specific differentially expressed genes in specific cell types using single cell RNA sequencing (scRNA- seq).
  • scRNA- seq single cell RNA sequencing
  • Applicants identified differentially expressed genes in specific cell types between subjects having different severity of disease (see, e.g., Tables 2-4).
  • genes differentially expressed between WHO score 0 (healthy) and WHO score 1- 5 (mild/moderate) (Table 2) indicate genes that are expressed in subjects to reduce virus severity.
  • a treatment would increase expression of one or more of these genes.
  • detection of one or more of these genes indicates that the subject does not have a severe disease or risk of severe disease.
  • genes differentially expressed between WHO score 0 (healthy) and WHO score 6-8 (severe) indicate genes that are expressed in subjects to reduce virus severity and/or generate an intrinsic immune response that leads to severe disease.
  • a treatment would decrease expression of one or more of these genes.
  • detection of one or more of these genes indicates that the subject has a severe disease or risk of severe disease.
  • genes differentially expressed between WHO score 1-5 (mild/moderate) and WHO score 6-8 (severe) (Table 4) indicate genes that are expressed in subjects generate an intrinsic immune response that leads to severe disease.
  • a treatment would decrease expression of one or more of these genes.
  • detection of one or more of these genes indicates that the subject has a severe disease or risk of severe disease.
  • a cell state associated with a disease state or risk group is detected or shifted to a treat a subject in need thereof.
  • the cell states can be identified using one or more differentially expressed genes in specific cell types between risk groups. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 or more genes are detected. In certain embodiments, 10, 20, 30, 40, 50, 60, 70, 80, 90 or more than 100 genes are detected. In certain embodiments, detecting 2 or more of the differentially expressed genes increases the probability of detecting a subject having a cell state indicative of a specific intrinsic immune state and risk of severe disease.
  • the methods of the present invention use control values for the frequency of subsets and cell states.
  • the present nasal swab single cell atlas provides for the frequency of cell subsets and cell states for each of healthy WHO score 0 and COVID WHO score 1-8 subjects.
  • Cells such as disclosed herein may in the context of the present specification be said to “comprise the expression” or conversely to “not express” one or more markers, such as one or more genes or gene products; or be described as “positive” or conversely as “negative” for one or more markers, such as one or more genes or gene products; or be said to “comprise” a defined “gene or gene product signature”.
  • Such terms are commonplace and well-understood by the skilled person when characterizing cell phenotypes.
  • a skilled person would conclude the presence or evidence of a distinct signal for the marker when carrying out a measurement capable of detecting or quantifying the marker in or on the cell.
  • the presence or evidence of the distinct signal for the marker would be concluded based on a comparison of the measurement result obtained for the cell to a result of the same measurement carried out for a negative control (for example, a cell known to not express the marker) and/or a positive control (for example, a cell known to express the marker).
  • a positive cell may generate a signal for the marker that is at least 1.5-fold higher than a signal generated for the marker by a negative control cell or than an average signal generated for the marker by a population of negative control cells, e.g., at least 2-fold, at least 4-fold, at least 10-fold, at least 20-fold, at least 30-fold, at least 40-fold, at least 50-fold higher or even higher.
  • a positive cell may generate a signal for the marker that is 3.0 or more standard deviations, e.g., 3.5 or more, 4.0 or more, 4.5 or more, or 5.0 or more standard deviations, higher than an average signal generated for the marker by a population of negative control cells.
  • a cell subset may be present or not present. In certain embodiments, a cell subset may be 5, 10, 20, 30, 40, 50, 60, 70, 80 or 90% more frequent in a parent cell population as compared to a control level.
  • the cell state is a gene program comprising one or more up and down regulated genes.
  • Clusters (subsets) and gene programs as described herein can also be described as a metagene.
  • a “metagene” refers to a pattern or aggregate of gene expression and not an actual gene. Each metagene may represent a collection or aggregate of genes behaving in a functionally correlated fashion within the genome. The metagene can be increased if the pattern is increased.
  • gene program or “program” can be used interchangeably with “cell state”, “biological program”, “expression program”, “transcriptional program”, “expression profile”, “signature”, “gene signature” or “expression program” and may refer to a set of genes that share a role in a biological function (e.g., an antiviral program, inflammatory program, cell differentiation program, proliferation program).
  • Biological programs can include a pattern of gene expression that result in a corresponding physiological event or phenotypic trait (e.g., inflammation).
  • Biological programs can include up to several hundred genes that are expressed in a spatially and temporally controlled fashion. Expression of individual genes can be shared between biological programs.
  • Expression of individual genes can be shared among different single cell subtypes; however, expression of a biological program may be cell subtype specific or temporally specific (e.g., the biological program is expressed in a cell subtype at a specific time). Multiple biological programs may include the same gene, reflecting the gene's roles in different processes. Expression of a biological program may be regulated by a master switch, such as a nuclear receptor or transcription factor.
  • a “signature” or “gene program” may encompass any gene or genes, protein or proteins, or epigenetic element(s) whose expression profile or whose occurrence is associated with a specific cell type, subtype, or cell state of a specific cell type or subtype within a population of cells.
  • any of gene or genes, protein or proteins, or epigenetic element(s) may be substituted.
  • Levels of expression or activity or prevalence may be compared between different cells in order to characterize or identify for instance signatures specific for cell (sub)populations.
  • Increased or decreased expression or activity or prevalence of signature genes may be compared between different cells in order to characterize or identify for instance specific cell (sub)populations.
  • a signature may include a gene or genes, protein or proteins, or epigenetic element(s) whose expression or occurrence is specific to a cell (sub)population, such that expression or occurrence is exclusive to the cell (sub)population.
  • a gene signature as used herein may thus refer to any set of up- and down-regulated genes that are representative of a cell type or subtype.
  • a gene signature as used herein may also refer to any set of up- and down-regulated genes between different cells or cell (sub)populations derived from a gene-expression profile.
  • a gene signature may comprise a list of genes differentially expressed in a distinction of interest.
  • the signature as defined herein can be used to indicate the presence of a cell type, a subtype of the cell type, the state of the microenvironment of a population of cells, a particular cell type population or subpopulation, and/or the overall status of the entire cell (sub)population. Furthermore, the signature may be indicative of cells within a population of cells in vivo. The signature may also be used to suggest for instance particular therapies, or to follow up treatment, or to suggest ways to modulate immune systems. The presence of subtypes or cell states may be determined by subtype specific or cell state specific signatures.
  • the presence of these specific cell (sub)types or cell states may be determined by applying the signature genes to bulk sequencing data in a sample.
  • the signatures of the present invention may be microenvironment specific, such as their expression in a particular spatio-temporal context.
  • signatures as discussed herein are specific to a particular pathological context.
  • a combination of cell subtypes having a particular signature may indicate an outcome.
  • the signatures can be used to deconvolute the network of cells present in a particular pathological condition.
  • the presence of specific cells and cell subtypes are indicative of a particular response to treatment, such as including increased or decreased susceptibility to treatment.
  • the signature may indicate the presence of one particular cell type.
  • the novel signatures are used to detect multiple cell states or hierarchies that occur in subpopulations of immune cells that are linked to particular pathological condition (e.g., inflammation), or linked to a particular outcome or progression of the disease (e.g., autoimmunity), or linked to a particular response to treatment of the disease.
  • the signature according to certain embodiments of the present invention may comprise or consist of one or more genes, proteins and/or epigenetic elements, such as for instance 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more.
  • the signature may comprise or consist of two or more genes, proteins and/or epigenetic elements, such as for instance 2, 3, 4, 5, 6, 7, 8, 9, 10 or more.
  • the signature may comprise or consist of three or more genes, proteins and/or epigenetic elements, such as for instance 3, 4, 5, 6, 7, 8, 9, 10 or more.
  • the signature may comprise or consist of four or more genes, proteins and/or epigenetic elements, such as for instance 4, 5, 6, 7, 8, 9, 10 or more.
  • the signature may comprise or consist of five or more genes, proteins and/or epigenetic elements, such as for instance 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of six or more genes, proteins and/or epigenetic elements, such as for instance 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of seven or more genes, proteins and/or epigenetic elements, such as for instance 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of eight or more genes, proteins and/or epigenetic elements, such as for instance 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of nine or more genes, proteins and/or epigenetic elements, such as for instance 9, 10 or more.
  • the signature may comprise or consist of ten or more genes, proteins and/or epigenetic elements, such as for instance 10, 11, 12, 13, 14, 15, or more. It is to be understood that a signature according to the invention may for instance also include genes or proteins as well as epigenetic elements combined.
  • genes/proteins include genes/proteins which are up- or down-regulated as well as genes/proteins which are turned on or off.
  • up- or down-regulation in certain embodiments, such up- or down-regulation is preferably at least two-fold, such as two-fold, three-fold, four-fold, five-fold, or more, such as for instance at least ten-fold, at least 20-fold, at least 30-fold, at least 40-fold, at least 50- fold, or more.
  • differential expression may be determined based on common statistical tests, as is known in the art.
  • differentially expressed genes/proteins, or differential epigenetic elements may be differentially expressed on a single cell level, or may be differentially expressed on a cell population level.
  • the differentially expressed genes/ proteins or epigenetic elements as discussed herein, such as constituting the gene signatures as discussed herein, when as to the cell population level refer to genes that are differentially expressed in all or substantially all cells of the population (such as at least 80%, preferably at least 90%, such as at least 95% of the individual cells). This allows one to define a particular subpopulation of tumor cells.
  • a “subpopulation” of cells preferably refers to a particular subset of cells of a particular cell type which can be distinguished or are uniquely identifiable and set apart from other cells of this cell type.
  • the cell subpopulation may be phenotypically characterized, and is preferably characterized by the signature as discussed herein.
  • a cell (sub)population as referred to herein may constitute of a (sub)population of cells of a particular cell type characterized by a specific cell state.
  • induction or alternatively suppression of a particular signature preferable is meant induction or alternatively suppression (or upregulation or downregulation) of at least one gene/protein and/or epigenetic element of the signature, such as for instance at least two, at least three, at least four, at least five, at least six, or all genes/proteins and/or epigenetic elements of the signature.
  • all gene name symbols refer to the gene as commonly known in the art. The examples described herein that refer to the human gene names are to be understood to also encompasses mouse genes, as well as genes in any other organism (e.g., homologous, orthologous genes). Any reference to the gene symbol is a reference made to the entire gene or variants of the gene.
  • any reference to the gene symbol is also a reference made to the gene product (e.g., protein).
  • the term, homolog may apply to the relationship between genes separated by the event of speciation (e.g., ortholog).
  • Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution.
  • Gene symbols may be those referred to by the HUGO Gene Nomenclature Committee (HGNC) or National Center for Biotechnology Information (NCBI).
  • HGNC HUGO Gene Nomenclature Committee
  • NCBI National Center for Biotechnology Information
  • the signature as described herein may encompass any of the genes described herein.
  • the disease is a viral infection.
  • the virus infects a barrier tissue.
  • a “barrier cell” or “barrier tissues” refers generally to various epithelial tissues of the body such, but not limited to, those that line the respiratory system, digestive system, urinary system, and reproductive system as well as cutaneous systems.
  • the epithelial barrier may vary in composition between tissues but is composed of basal and apical components, or crypt/villus components in the case of intestine.
  • the disease is caused by a differential immune response (e.g., subjects have different immune responses to SARS-CoV-2 which affects severity of COVID-19 disease).
  • immune responses are coordinated by immune cells and epithelial cells.
  • the term “immune cell” as used throughout this specification generally encompasses any cell derived from a hematopoietic stem cell that plays a role in the immune response. The term is intended to encompass immune cells both of the innate or adaptive immune system.
  • the immune cell as referred to herein may be a leukocyte, at any stage of differentiation (e.g., a stem cell, a progenitor cell, a mature cell) or any activation stage.
  • Immune cells include lymphocytes (such as natural killer cells, T-cells (including, e.g., thymocytes, Th or Tc; Th1, Th2, Th17, Th ⁇ , CD4+, CD8+, effector Th, memory Th, regulatory Th, CD4+/CD8+ thymocytes, CD4-/CD8- thymocytes, ⁇ T cells, etc.) or B-cells (including, e.g., pro-B cells, early pro-B cells, late pro-B cells, pre-B cells, large pre-B cells, small pre-B cells, immature or mature B-cells, producing antibodies of any isotype, T1 B-cells, T2, B-cells, naive B-cells, GC B-cells, plasmablasts, memory B-cells, plasma cells, follicular B-cells, marginal zone B-cells, B-1 cells, B-2 cells, regulatory B cells, etc.), such as for instance, monocytes (including
  • immune response refers to a response by a cell of the immune system, such as a B cell, T cell (CD4+ or CD8+), regulatory T cell, antigen-presenting cell, dendritic cell, monocyte, macrophage, NKT cell, NK cell, basophil, eosinophil, or neutrophil, to a stimulus.
  • the response is specific for a particular antigen (an “antigen-specific response”) and refers to a response by a CD4 T cell, CD8 T cell, or B cell via their antigen-specific receptor.
  • an immune response is a T cell response, such as a CD4+ response or a CD8+ response.
  • Such responses by these cells can include, for example, cytotoxicity, proliferation, cytokine or chemokine production, trafficking, or phagocytosis, and can be dependent on the nature of the immune cell undergoing the response.
  • An immune response can also be an innate immune response (see, e.g., Artis D, Spits H. The biology of innate lymphoid cells. Nature. 2015;517(7534):293-301).
  • the viral infection is a coronavirus infection.
  • coronavirus refers to enveloped viruses with a positive-sense single-stranded RNA genome and a nucleocapsid of helical symmetry that constitute the subfamily Orthocoronavirinae, in the family Coronaviridae (see, e.g., Woo PC, Huang Y, Lau SK, Yuen KY. Coronavirus genomics and bioinformatics analysis. Viruses. 2010;2(8): 1804-1820).
  • the present disclosure relates to and/or involves SARS-CoV-2.
  • Severe acute respiratory syndrome coronavirus 2 is the virus causing the ongoing Coronavirus Disease 19 (COVID19) pandemic (see, e.g., Zhou, et al. (2020). A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270-273).
  • the virus is SARS-CoV-2 or variants thereof.
  • the disease treated is COVID-19.
  • SARS-CoV-2 is the third zoonotic betacoronavirus to cause a human outbreak after SARS-CoV in 2002 and Middle East respiratory syndrome coronavirus (MERS-CoV) in 2012 (de Wit et al., 2016, SARS and MERS: recent insights into emerging coronaviruses. Nat Rev Microbiol 14, 523-534).
  • the term “variant” refers to any virus having one or more mutations as compared to a known virus.
  • a strain is a genetic variant or subtype of a virus.
  • the terms 'strain', 'variant', and 'isolate' may be used interchangeably.
  • a variant has developed a “specific group of mutations” that causes the variant to behave differently than that of the strain it originated from.
  • SARS-CoV-2 Genetic variants of SARS-CoV-2 have been emerging and circulating around the world throughout the COVID-19 pandemic (see, e.g., The US Centers for Disease Control and Prevention; www.cdc.gov/coronavirus/2019-ncov/variants/variant-info.html).
  • Exemplary, non- limiting variants applicable to the present disclosure include variants of SARS-CoV-2, particularly those having substitutions of therapeutic concern.
  • Table A shows exemplary, non-limiting genetic substitutions in SARS-CoV-2 variants.
  • PANGO Phylogenetic Assignment of Named Global Outbreak
  • the SARS-CoV-2 variant is and/or includes: B.1.1.7, also known as Alpha (WHO) or UK variant, having the following spike protein substitutions: 69del, 70del, 144del, (E484K*), (S494P*), N501Y, A570D, D614G, P681H, T716I, S982A, and D1118H (K1191N*); B.1.351, also known as Beta (WHO) or South Africa variant, having the following spike protein substitutions: D80A, D215G, 241del, 242del, 243del, K417N, E484K, N501Y, D614G, and A701V; B.1.427, also known as Epsilon (WHO) or US California variant, having the following spike protein substitutions: L452R, and D614G; B.1.429, also known as Epsilon (WHO) or US California variant, having the following spike protein substitutions: L452
  • the SARS-CoV-2 variant is classified and/or otherwise identified as a Variant of Concern (VOC) by the World Health Organization and/or the U.S. Centers for Disease Control.
  • VOC is a variant for which there is evidence of an increase in transmissibility, more severe disease (e.g., increased hospitalizations or deaths), significant reduction in neutralization by antibodies generated during previous infection or vaccination, reduced effectiveness of treatments or vaccines, or diagnostic detection failures.
  • the SARS-Cov-2 variant is classified and/or otherwise identified as a Variant of High Consequence (VHC) by the World Health Organization and/or the U.S. Centers for Disease Control.
  • VHC Variant of High Consequence
  • MCMs medical countermeasures
  • the SARS-Cov-2 variant is classified and/or otherwise identified as a Variant of Interest (VOI) by the World Health Organization and/or the U.S. Centers for Disease Control.
  • VOI Variant of Interest
  • a VOI is a variant with specific genetic markers that have been associated with changes to receptor binding, reduced neutralization by antibodies generated against previous infection or vaccination, reduced efficacy of treatments, potential diagnostic impact, or predicted increase in transmissibility or disease severity.
  • the SARS-Cov-2 variant is classified and/or is otherwise identified as a Variant of Note (VON).
  • VON refers to both “variants of concern” and “variants of note” as the two phrases are used and defined by Pangolin (cov-lineages.org) and provided in their available “VOC reports” available at cov-lineages.org.
  • the SARS-Cov-2 variant is a VOC.
  • the SARS-CoV-2 variant is or includes an Alpha variant (e.g., Pango lineage B.1.1.7), a Beta variant (e.g., Pango lineage B.1.351, B.1.351.1, B.1.351.2, and/or B.1.351.3), a Delta variant (e.g., Pango lineage B.1.617.2, AY.l, AY.2, AY.3 and/or AY.3.1); a Gamma variant (e.g., Pango lineage P.1, P.1.1, P.1.2, P.1.4, P.1.6, and/or P.1.7), an Omicron variant (B.1.1.529) or any combination thereof.
  • an Alpha variant e.g., Pango lineage B.1.1.7
  • a Beta variant e.g., Pango lineage B.1.351, B.1.351.1, B.1.351.2, and/or B.1.351.3
  • a Delta variant
  • the SARS-Cov-2 variant is a VOI.
  • the SARS-CoV-2 variant is or includes an Eta variant (e.g., Pango lineage B.1.525 (Spike protein substitutions A67V, 69del, 70del, 144del, E484K, D614G, Q677H, F888L)); an Iota variant (e.g., Pango lineage B.1.526 (Spike protein substitutions L5F, (D80G*), T95I, (Y144-*), (F157S*), D253G, (L452R*), (S477N*), E484K, D614G, A701V, (T859N*), (D950H*), (Q957R*))); a Kappa variant (e.g., Pango lineage B.1.617.1 (Spike protein substitutions (T95I), G142D, E154K, L452
  • SARS-Cov-2 variant is a VON.
  • the SARS-Cov-2 variant is or includes Pango lineage variant P.1 (alias, B.1.1.28.1.) as described in Rambaut et al. 2020. Nat. Microbiol.
  • detecting cell subset markers or differentially expressed genes can be used to determine a treatment for a subject suffering from a disease or stratify a subject based on risk of developing severe disease (e.g., COVID-19).
  • the invention provides biomarkers (e.g., phenotype specific or cell subtype) for the identification, diagnosis, prognosis and manipulation of cell properties, for use in a variety of diagnostic and/or therapeutic indications.
  • Biomarkers in the context of the present invention encompasses, without limitation nucleic acids, proteins, reaction products, and metabolites, together with their polymorphisms, mutations, variants, modifications, subunits, fragments, and other analytes or sample-derived measures.
  • biomarkers include the signature genes or signature gene products, and/or cells as described herein.
  • diagnosis and “monitoring” are commonplace and well -understood in medical practice.
  • diagnosis generally refers to the process or act of recognising, deciding on or concluding on a disease or condition in a subject on the basis of symptoms and signs and/or from results of various diagnostic procedures (such as, for example, from knowing the presence, absence and/or quantity of one or more biomarkers characteristic of the diagnosed disease or condition).
  • prognosing generally refer to an anticipation on the progression of a disease or condition and the prospect (e.g., the probability, duration, and/or extent) of recovery.
  • a good prognosis of the diseases or conditions taught herein may generally encompass anticipation of a satisfactory partial or complete recovery from the diseases or conditions, preferably within an acceptable time period.
  • a good prognosis of such may more commonly encompass anticipation of not further worsening or aggravating of such, preferably within a given time period.
  • a poor prognosis of the diseases or conditions as taught herein may generally encompass anticipation of a substandard recovery and/or unsatisfactorily slow recovery, or to substantially no recovery or even further worsening of such.
  • the biomarkers of the present invention are useful in methods of identifying patient populations who would benefit from treatment based on a detected level of expression, activity and/or function of one or more biomarkers. These biomarkers are also useful in monitoring subjects undergoing treatments and therapies for suitable or aberrant response(s) to determine efficaciousness of the treatment or therapy and for selecting or modifying therapies and treatments that would be efficacious in treating, delaying the progression of or otherwise ameliorating a symptom.
  • the biomarkers provided herein are useful for selecting a group of patients at a specific state of a disease with accuracy that facilitates selection of treatments.
  • monitoring generally refers to the follow-up of a disease or a condition in a subject for any changes which may occur over time.
  • the terms also encompass prediction of a disease.
  • the terms “predicting” or “prediction” generally refer to an advance declaration, indication or foretelling of a disease or condition in a subject not (yet) having said disease or condition.
  • a prediction of a disease or condition in a subject may indicate a probability, chance or risk that the subject will develop said disease or condition, for example within a certain time period or by a certain age.
  • Said probability, chance or risk may be indicated inter alia as an absolute value, range or statistics, or may be indicated relative to a suitable control subject or subject population (such as, e.g., relative to a general, normal or healthy subject or subject population).
  • the probability, chance or risk that a subject will develop a disease or condition may be advantageously indicated as increased or decreased, or as fold-increased or fold-decreased relative to a suitable control subject or subject population.
  • the term “prediction” of the conditions or diseases as taught herein in a subject may also particularly mean that the subject has a 'positive' prediction of such, i.e., that the subject is at risk of having such (e.g., the risk is significantly increased vis-a- vis a control subject or subject population).
  • prediction of no diseases or conditions as taught herein as described herein in a subject may particularly mean that the subject has a 'negative' prediction of such, i.e., that the subject's risk of having such is not significantly increased vis-a- vis a control subject or subject population.
  • an altered quantity or phenotype of the cells in the subject compared to a control subject having normal status or not having a disease indicates response to treatment.
  • the methods may rely on comparing the quantity of cell populations, biomarkers, or gene or gene product signatures measured in samples from patients with reference values, wherein said reference values represent known predictions, diagnoses and/or prognoses of diseases or conditions as taught herein.
  • distinct reference values may represent the prediction of a risk (e.g., an abnormally elevated risk) of having a given disease or condition as taught herein vs. the prediction of no or normal risk of having said disease or condition.
  • distinct reference values may represent predictions of differing degrees of risk of having such disease or condition.
  • distinct reference values can represent the diagnosis of a given disease or condition as taught herein vs. the diagnosis of no such disease or condition (such as, e.g., the diagnosis of healthy, or recovered from said disease or condition, etc.).
  • distinct reference values may represent the diagnosis of such disease or condition of varying severity.
  • distinct reference values may represent a good prognosis for a given disease or condition as taught herein vs. a poor prognosis for said disease or condition.
  • distinct reference values may represent varyingly favourable or unfavourable prognoses for such disease or condition.
  • Such comparison may generally include any means to determine the presence or absence of at least one difference and optionally of the size of such difference between values being compared.
  • a comparison may include a visual inspection, an arithmetical or statistical comparison of measurements. Such statistical comparisons include, but are not limited to, applying a rule.
  • Reference values may be established according to known procedures previously employed for other cell populations, biomarkers and gene or gene product signatures.
  • a reference value may be established in an individual or a population of individuals characterised by a particular diagnosis, prediction and/or prognosis of said disease or condition (i.e., for whom said diagnosis, prediction and/or prognosis of the disease or condition holds true).
  • Such population may comprise without limitation 2 or more, 10 or more, 100 or more, or even several hundred or more individuals.
  • a “deviation” of a first value from a second value may generally encompass any direction (e.g., increase: first value > second value; or decrease: first value ⁇ second value) and any extent of alteration.
  • a deviation may encompass a decrease in a first value by, without limitation, at least about 10% (about 0.9-fold or less), or by at least about 20% (about 0.8-fold or less), or by at least about 30% (about 0.7-fold or less), or by at least about 40% (about 0.6-fold or less), or by at least about 50% (about 0.5-fold or less), or by at least about 60% (about 0.4-fold or less), or by at least about 70% (about 0.3-fold or less), or by at least about 80% (about 0.2-fold or less), or by at least about 90% (about 0.1 -fold or less), relative to a second value with which a comparison is being made.
  • a deviation may encompass an increase of a first value by, without limitation, at least about 10% (about 1.1 -fold or more), or by at least about 20% (about 1.2-fold or more), or by at least about 30% (about 1.3-fold or more), or by at least about 40% (about 1.4-fold or more), or by at least about 50% (about 1.5-fold or more), or by at least about 60% (about 1.6- fold or more), or by at least about 70% (about 1.7-fold or more), or by at least about 80% (about 1.8-fold or more), or by at least about 90% (about 1.9-fold or more), or by at least about 100% (about 2-fold or more), or by at least about 150% (about 2.5-fold or more), or by at least about 200% (about 3-fold or more), or by at least about 500% (about 6-fold or more), or by at least about 700% (about 8-fold or more), or like, relative to a second value with which a comparison is being made.
  • a deviation may refer to a statistically significant observed alteration.
  • a deviation may refer to an observed alteration which falls outside of error margins of reference values in a given population (as expressed, for example, by standard deviation or standard error, or by a predetermined multiple thereof, e.g., ⁇ lxSD or ⁇ 2xSD or ⁇ 3xSD, or ⁇ lxSE or ⁇ 2xSE or ⁇ 3xSE).
  • Deviation may also refer to a value falling outside of a reference range defined by values in a given population (for example, outside of a range which comprises ⁇ 40%, ⁇ 50%, ⁇ 60%, ⁇ 70%, ⁇ 75% or ⁇ 80% or ⁇ 85% or ⁇ 90% or ⁇ 95% or even ⁇ 100% of values in said population).
  • a deviation may be concluded if an observed alteration is beyond a given threshold or cut-off.
  • threshold or cut-off may be selected as generally known in the art to provide for a chosen sensitivity and/or specificity of the prediction methods, e.g., sensitivity and/or specificity of at least 50%, or at least 60%, or at least 70%, or at least 80%, or at least 85%, or at least 90%, or at least 95%.
  • receiver-operating characteristic (ROC) curve analysis can be used to select an optimal cut-off value of the quantity of a given immune cell population, biomarker or gene or gene product signatures, for clinical use of the present diagnostic tests, based on acceptable sensitivity and specificity, or related performance measures which are well-known per se, such as positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+), negative likelihood ratio (LR-), Youden index, or similar.
  • PV positive predictive value
  • NPV negative predictive value
  • LR+ positive likelihood ratio
  • LR- negative likelihood ratio
  • Youden index or similar.
  • the subject is determined to belong to or at risk to progress to the severe risk group if one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more) of proinflammatory cytokines comprising at least one or more of: IL1B, TNF, CXCL8, CCL2, CCL3,
  • CXCL9, CXCL10, and CXCL11 upregulation of alarmins comprising one or both of: S100A8 and S100A9; 14% - 26% of all epithelial cells are secretory cells; elevated BPIFA1 high Secretory cells; elevated KRT13 KRT24 high secretory cells; macrophage population increase as compared to other immune cells; upregulated genes in ciliated cells comprising one or both of: IL5RA and NLRP1; no increase of at least one or more of: type I, type II, and type III interferon abundance; elevated stress response factors comprising at least one or more of: HSPA8, HSPA1A, and DUSP1; and reduced or absent antiviral/interferon response, and reduced or absent mature ciliated cells is detected.
  • the subject is determined to belong to the mild/moderate risk group if one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more) of 4% - 12% of all epithelial cells are Secretory Cells; 10% - 20% of all epithelial cells comprise Interferon Responsive Ciliated Cells; upregulated ciliated cell genes comprising at least one or more of: IFI44L, STAT1, IFITM1, MX1, IFITM3, OAS1, OAS2, OAS3, STAT2, TAP1, HLA-C, ADAR, XAF1, IRF1, CTSS, and CTSB; increase in type I interferon abundance; high expression of interferon-responsive genes; induction of type I interferon responses; and high abundance of IFI6 and IFI27 is detected.
  • one or more e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more
  • upregulated ciliated cell genes comprising at least one or more of: IFI44L, STAT1, I
  • one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more) cell subset markers or differentially expressed genes found in Table 2 are detected in a sample from a subject stratify the subject into the mild/moderate risk group.
  • one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more) cell subset markers or differentially expressed genes found in Table 3 are detected in a sample from a subject stratify the subject into the severe risk group.
  • cell subset markers or differentially expressed genes found in Table 3 are detected in a sample from a subject stratify the subject into the mild/moderate risk group or severe risk group.
  • one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more) cell subset markers or differentially expressed genes found in Table 5 are detected in a sample from a subject stratify the subject into the risk of developing the disease or having the disease.
  • a sample can be collected with a nasal swab, endoscopy, polyester tipped swabs, plastic curettes, cytology brushes (Lai PS, et al. J Allergy Clin Immunol. 2015; 136(4)).
  • Tissue samples for diagnosis, prognosis or detecting may be obtained by endoscopy.
  • a sample may be obtained by endoscopy and analyzed b FACS.
  • endoscopy refers to a procedure that uses an endoscope to examine the interior of a hollow organ or cavity of the body.
  • the endoscope may include a camera and a light source.
  • the endoscope may include tools for dissection or for obtaining a biological sample.
  • a cutting tool can be attached to the end of the endoscope, and the apparatus can then be used to perform surgery.
  • Applications of endoscopy that can be used with the present invention include, but are not limited to examination of the oesophagus, stomach and duodenum (esophagogastroduodenoscopy); small intestine (enteroscopy); large intestine/colon (colonoscopy, sigmoidoscopy); bile duct; rectum (rectoscopy) and anus (anoscopy), both also referred to as (proctoscopy); respiratory tract; nose (rhinoscopy); lower respiratory tract (bronchoscopy); ear (otoscope); urinary tract (cystoscopy); female reproductive system (gynoscopy); cervix (colposcopy); uterus (hysteroscopy); fallopian tubes (falloposcopy); normally closed body cavities (through a small incision); abdominal or pelvic cavity (laparoscopy); interior of a joint (arthroscopy); or
  • nasopharyngeal samples are collected by a trained healthcare provider using FLOQSwabs (Copan 1109 flocked swabs) following the manufacturer's instructions.
  • Collectors don personal protective equipment (PPE), including a gown, non-sterile gloves, a protective N95 mask, a bouffant, and a face shield. The patient's head is tilted back slightly, and the swab is inserted along the nasal septum, above the floor of the nasal passage to the nasopharynx until slight resistance was felt. The swab is then left in place for several seconds to absorb secretions and is slowly removed while rotating swab.
  • PPE personal protective equipment
  • the swab is then placed into a cryogenic vial with 900 ⁇ L of heat inactivated fetal bovine serum (FBS) and 100 ⁇ L of dimethyl sulfoxide (DMSO).
  • FBS heat inactivated fetal bovine serum
  • DMSO dimethyl sulfoxide
  • Vials are placed into a Mr. Frosty Freezing Container (Thermo Fisher Scientific) for optimal cell preservation.
  • a Mr. Frosty containing the vials is placed in a cooler with dry ice for transportation from patient areas to the laboratory for processing. Once in the laboratory, the Mr. Frosty is placed into a -80°C freezer overnight, and on the next day, the vials are moved to liquid nitrogen storage containers.
  • swabs in freezing media (90% FBS/10% DMSO) were stored in liquid nitrogen until immediately prior to dissociation. This approach ensures that all cells and cellular material from the nasal swab (whether directly attached to the nasal swab, or released during the washing and digestion process), are exposed first to DTT for 15 minutes, followed by an Accutase digestion for 30 minutes. Briefly, nasal swabs in freezing media were thawed, and each swab was rinsed in RPMI before incubation in 1 mL RPMI/10 mM DTT (Sigma) for 15 minutes at 37°C with agitation.
  • the nasal swab was incubated in 1 mL Accutase (Sigma) for 30 minutes at 37°C with agitation.
  • the 1 mL RPMI/10 mM DTT from the nasal swab incubation was centrifuged at 400 g for 5 minutes at 4°C to pellet cells, the supernatant was discarded, and the cell pellet was resuspended in 1 mL Accutase and incubated for 30 minutes at 37°C with agitation.
  • the original cryovial containing the freezing media and the original swab washings were combined and centrifuged at 400 g for 5 minutes at 4°C.
  • the cell pellet was then resuspended in RPMI/10 mM DTT, and incubated for 15 minutes at 37°C with agitation, centrifuged as above, the supernatant was aspirated, and the cell pellet was resuspended in 1 mL Accutase, and incubated for 30 minutes at 37°C with agitation. All cells were combined following Accutase digestion and filtered using a 70 pm nylon strainer. The filter and swab were washed with RPMI/10% FBS/4 mM EDTA, and all washings combined.
  • Dissociated, filtered cells were centrifuged at 400 g for 10 minutes at 4°C, and resuspended in 200 ⁇ L RPMI/10% FBS for counting. Cells were diluted to 20,000 cells in 200 ⁇ L for scRNA-seq. For the majority 1140 of swabs, fewer than 20,000 cells total were recovered. In these instances, all cells were input into scRNA-seq.
  • the signature genes, biomarkers, and/or cells may be detected by immunofluorescence, immunohistochemistry (IHC), fluorescence activated cell sorting (FACS), mass spectrometry (MS), mass cytometry (CyTOF), RNA-seq, single cell RNA-seq (described further herein), quantitative RT-PCR, single cell qPCR, FISH, RNA-FISH, MERFISH (multiplex (in situ) RNA FISH) (Chen et al., Spatially resolved, highly multiplexed RNA profiling in single cells.
  • detection may comprise primers and/or probes or fluorescently bar-coded oligonucleotide probes for hybridization to RNA (see e.g., Geiss GK, et al., Direct multiplexed measurement of gene expression with color-coded probe pairs. Nat Biotechnol. 2008 Mar;26(3):317-25).
  • a tissue sample may be obtained and analyzed for specific cell markers (IHC) or specific transcripts (e.g., RNA-FISH).
  • Tissue samples for diagnosis, prognosis or detecting may be obtained by endoscopy.
  • a sample may be obtained by endoscopy and analyzed by FACS.
  • endoscopy refers to a procedure that uses an endoscope to examine the interior of a hollow organ or cavity of the body.
  • the endoscope may include a camera and a light source.
  • the endoscope may include tools for dissection or for obtaining a biological sample (e.g., a biopsy).
  • the present invention also may comprise a kit with a detection reagent that binds to one or more biomarkers or can be used to detect one or more biomarkers.
  • Immunoassay methods are based on the reaction of an antibody to its corresponding target or analyte and can detect the analyte in a sample depending on the specific assay format.
  • monoclonal antibodies are often used because of their specific epitope recognition.
  • Polyclonal antibodies have also been successfully used in various immunoassays because of their increased affinity for the target as compared to monoclonal antibodies
  • Immunoassays have been designed for use with a wide range of biological sample matrices
  • Immunoassay formats have been designed to provide qualitative, semi-quantitative, and quantitative results.
  • Quantitative results may be generated through the use of a standard curve created with known concentrations of the specific analyte to be detected.
  • the response or signal from an unknown sample is plotted onto the standard curve, and a quantity or value corresponding to the target in the unknown sample is established.
  • ELISA or EIA can be quantitative for the detection of an analyte/biomarker. This method relies on attachment of a label to either the analyte or the antibody and the label component includes, either directly or indirectly, an enzyme. ELISA tests may be formatted for direct, indirect, competitive, or sandwich detection of the analyte. Other methods rely on labels such as, for example, radioisotopes (I 125 ) or fluorescence.
  • Additional techniques include, for example, agglutination, nephelometry, turbidimetry, Western blot, immunoprecipitation, immunocytochemistry, immunohistochemistry, flow cytometry, Luminex assay, and others (see ImmunoAssay : A Practical Guide, edited by Brian Law, published by Taylor & Francis, Ltd., 2005 edition).
  • Exemplary assay formats include enzyme-linked immunosorbent assay (ELISA), radioimmunoassay, fluorescent, chemiluminescence, and fluorescence resonance energy transfer (FRET) or time resolved-FRET (TR-FRET) immunoassays.
  • ELISA enzyme-linked immunosorbent assay
  • FRET fluorescence resonance energy transfer
  • TR-FRET time resolved-FRET
  • biomarkers include biomarker immunoprecipitation followed by quantitative methods that allow size and peptide level discrimination, such as gel electrophoresis, capillary electrophoresis, planar electrochromatography, and the like.
  • Methods of detecting and/or quantifying a detectable label or signal generating material depend on the nature of the label.
  • the products of reactions catalyzed by appropriate enzymes can be, without limitation, fluorescent, luminescent, or radioactive or they may absorb visible or ultraviolet light.
  • detectors suitable for detecting such detectable labels include, without limitation, x-ray film, radioactivity counters, scintillation counters, spectrophotometers, colorimeters, fluorometers, luminometers, and densitometers.
  • Any of the methods for detection can be performed in any format that allows for any suitable preparation, processing, and analysis of the reactions. This can be, for example, in multi- well assay plates (e.g., 96 wells or 384 wells) or using any suitable array or microarray. Stock solutions for various agents can be made manually or robotically, and all subsequent pipetting, diluting, mixing, distribution, washing, incubating, sample readout, data collection and analysis can be done robotically using commercially available analysis software, robotics, and detection instrumentation capable of detecting a detectable label.
  • multi- well assay plates e.g., 96 wells or 384 wells
  • Stock solutions for various agents can be made manually or robotically, and all subsequent pipetting, diluting, mixing, distribution, washing, incubating, sample readout, data collection and analysis can be done robotically using commercially available analysis software, robotics, and detection instrumentation capable of detecting a detectable label.
  • Such applications are hybridization assays in which a nucleic acid that displays “probe” nucleic acids for each of the genes to be assayed/profiled in the profile to be generated is employed.
  • a sample of target nucleic acids is first prepared from the initial nucleic acid sample being assayed, where preparation may include labeling of the target nucleic acids with a label, e.g., a member of a signal producing system.
  • a label e.g., a member of a signal producing system.
  • the sample is contacted with the array under hybridization conditions, whereby complexes are formed between target nucleic acids that are complementary to probe sequences attached to the array surface.
  • the presence of hybridized complexes is then detected, either qualitatively or quantitatively.
  • an array of “probe” nucleic acids that includes a probe for each of the biomarkers whose expression is being assayed is contacted with target nucleic acids as described above. Contact is carried out under hybridization conditions, e.g., stringent hybridization conditions as described above, and unbound nucleic acid is then removed.
  • hybridization conditions e.g., stringent hybridization conditions as described above
  • unbound nucleic acid is then removed.
  • the resultant pattern of hybridized nucleic acids provides information regarding expression for each of the biomarkers that have been probed, where the expression information is in terms of whether or not the gene is expressed and, typically, at what level, where the expression data, i.e., expression profile, may be both qualitative and quantitative.
  • Optimal hybridization conditions will depend on the length (e.g., oligomer vs. polynucleotide greater than 200 bases) and type (e.g., RNA, DNA, PNA) of labeled probe and immobilized polynucleotide or oligonucleotide.
  • length e.g., oligomer vs. polynucleotide greater than 200 bases
  • type e.g., RNA, DNA, PNA
  • General parameters for specific (i.e., stringent) hybridization conditions for nucleic acids are described in Sambrook et al., supra, and in Ausubel et al., “Current Protocols in Molecular Biology”, Greene Publishing and Wiley-interscience, NY (1987), which is incorporated in its entirety for all purposes.
  • hybridization conditions are hybridization in 5xSSC plus 0.2% SDS at 65C for 4 hours followed by washes at 25°C in low stringency wash buffer (lxSSC plus 0.2% SDS) followed by 10 minutes at 25°C in high stringency wash buffer (0.1 SSC plus 0.2% SDS) (see Shena et al., Proc. Natl. Acad. Sci. USA, Vol. 93, p. 10614 (1996)).
  • Useful hybridization conditions are also provided in, e.g., Tijessen, Hybridization With Nucleic Acid Probes”, Elsevier Science Publishers B.V. (1993) and Kricka, “Nonisotopic DNA Probe Techniques”, Academic Press, San Diego, Calif. (1992).
  • the invention involves single cell RNA sequencing (see, e.g., Kalisky, T., Blainey, P. & Quake, S. R. Genomic Analysis at the Single-Cell Level. Annual review of genetics 45, 431-445, (2011); Kalisky, T. & Quake, S. R. Single-cell genomics. Nature Methods 8, 311-314 (2011); Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Research, (2011); Tang, F. et al. RNA-Seq analysis to capture the transcriptome landscape of a single cell. Nature Protocols 5, 516-535, (2010); Tang, F. et al.
  • the invention involves plate based single cell RNA sequencing (see, e.g., Picelli, S. et al., 2014, “Full-length RNA-seq from single cells using Smart-seq2” Nature protocols 9, 171-181, doi:10.1038/nprot.2014.006).
  • the invention involves high-throughput single-cell RNA-seq.
  • Macosko et al. 2015, “Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets” Cell 161, 1202-1214; International patent application number PCT/US2015/049178, published as WO2016/040476 on March 17, 2016; Klein et al., 2015, “Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells” Cell 161, 1187-1201; International patent application number PCT/US2016/027734, published as WO2016168584A1 on October 20, 2016; Zheng, et al., 2016, “Haplotyping germline and cancer genomes with high-throughput linked-read sequencing” Nature Biotechnology 34, 303-311; Zheng, et al., 2017, “Massively parallel digital transcriptional profiling of single cells” Nat.
  • the invention involves single nucleus RNA sequencing.
  • Swiech et al., 2014 “In vivo interrogation of gene function in the mammalian brain using CRISPR-Cas9” Nature Biotechnology Vol. 33, pp. 102-106; Habib et al., 2016, “Div-Seq: Single-nucleus RNA-Seq reveals dynamics of rare adult newborn neurons” Science, Vol. 353, Issue 6302, pp. 925-928; Habib et al., 2017, “Massively parallel single-nucleus RNA-seq with DroNc-seq” Nat Methods. 2017 Oct;14(10):955-958; International Patent Application No.
  • Biomarker detection may also be evaluated using mass spectrometry methods.
  • a variety of configurations of mass spectrometers can be used to detect biomarker values.
  • Several types of mass spectrometers are available or can be produced with various configurations.
  • a mass spectrometer has the following major components: a sample inlet, an ion source, a mass analyzer, a detector, a vacuum system, and instrument-control system, and a data system. Difference in the sample inlet, ion source, and mass analyzer generally define the type of instrument and its capabilities.
  • an inlet can be a capillary-column liquid chromatography source or can be a direct probe or stage such as used in matrix-assisted laser desorption.
  • Common ion sources are, for example, electrospray, including nanospray and microspray or matrix-assisted laser desorption.
  • Common mass analyzers include a quadrupole mass filter, ion trap mass analyzer and time-of-flight mass analyzer. Additional mass spectrometry methods are well known in the art (see Burlingame et al., Anal. Chem. 70:647 R-716R (1998); Kinter and Sherman, New York (2000)).
  • Protein biomarkers and biomarker values can be detected and measured by any of the following: electrospray ionization mass spectrometry (ESI-MS), ESI-MS/MS, ESI-MS/(MS)n, matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS), surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF-MS), desorption/ionization on silicon (DIOS), secondary ion mass spectrometry (SIMS), quadrupole time-of-flight (Q-TOF), tandem time-of-flight (TOF/TOF) technology, called ultraflex III TOF/TOF, atmospheric pressure chemical ionization mass spectrometry (APCI-MS), APCI- MS/MS, APCI-(MS).sup.N, atmospheric pressure photoionization mass spectrometry (APPI-MS), APPI-MS
  • Sample preparation strategies are used to label and enrich samples before mass spectroscopic characterization of protein biomarkers and determination biomarker values.
  • Labeling methods include but are not limited to isobaric tag for relative and absolute quantitation (iTRAQ) and stable isotope labeling with amino acids in cell culture (SILAC).
  • Capture reagents used to selectively enrich samples for candidate biomarker proteins prior to mass spectroscopic analysis include but are not limited to aptamers, antibodies, nucleic acid probes, chimeras, small molecules, an F(ab')2 fragment, a single chain antibody fragment, an Fv fragment, a single chain Fv fragment, a nucleic acid, a lectin, a ligand-binding receptor, affybodies, nanobodies, ankyrins, domain antibodies, alternative antibody scaffolds (e.g.
  • the methods of the present invention are used to select a treatment within the current standard of care and provide for less toxicity and improved treatment.
  • standard of care refers to the current treatment that is accepted by medical experts as a proper treatment for a certain type of disease and that is widely used by healthcare professionals. Standard of care is also called best practice, standard medical care, and standard therapy.
  • a subject having a mild or moderate phenotype will recover without any treatment.
  • a subject having a severe phenotype requires treatment in order to recover.
  • severe subjects or subjects at risk for severe disease as determined by detecting cell subsets and/or differentially expressed genes are treated with one or more agents as described further herein.
  • subjects already suffering from severe disease are treated.
  • subjects at risk for severe disease are treated.
  • the treatment results in induction of a phenotype identified in mild/moderate subjects (e.g., antiviral response).
  • treatment or “treating,” or “palliating” or “ameliorating” are used interchangeably. These terms refer to an approach for obtaining beneficial or desired results including but not limited to a therapeutic benefit and/or a prophylactic benefit.
  • therapeutic benefit is meant any therapeutically relevant improvement in or effect on one or more diseases, conditions, or symptoms under treatment.
  • the compositions may be administered to a subject at risk of developing a particular disease, condition, or symptom, or to a subject reporting one or more of the physiological symptoms of a disease, even though the disease, condition, or symptom may not have yet been manifested.
  • “treating” includes ameliorating, curing, preventing it from becoming worse, slowing the rate of progression, or preventing the disorder from re-occurring (i.e., to prevent a relapse).
  • the therapeutic agents are administered in an effective amount or therapeutically effective amount.
  • effective amount or “therapeutically effective amount” refers to the amount of an agent that is sufficient to effect beneficial or desired results.
  • the therapeutically effective amount may vary depending upon one or more of: the subject and disease condition being treated, the weight and age of the subject, the severity of the disease condition, the manner of administration and the like, which can readily be determined by one of ordinary skill in the art.
  • the term also applies to a dose that will provide an image for detection by any one of the imaging methods described herein.
  • the specific dose may vary depending on one or more of: the particular agent chosen, the dosing regimen to be followed, whether it is administered in combination with other compounds, timing of administration, the tissue to be imaged, and the physical delivery system in which it is carried.
  • the present invention provides for one or more therapeutic agents capable of shifting a phenotype as described herein. In certain embodiments, the present invention provides for one or more therapeutic agents against one or more of the targets identified. In certain embodiments, the one or more agents comprises a small molecule inhibitor, small molecule degrader (e.g., ATTEC, AUTAC, LYTAC, or PROTAC), genetic modifying agent, antibody, antibody fragment, antibody-like protein scaffold, aptamer, protein, or any combination thereof.
  • small molecule inhibitor e.g., ATTEC, AUTAC, LYTAC, or PROTAC
  • genetic modifying agent e.g., antibody, antibody fragment, antibody-like protein scaffold, aptamer, protein, or any combination thereof.
  • therapeutic agent refers to a molecule or compound that confers some beneficial effect upon administration to a subject.
  • the beneficial effect includes enablement of diagnostic determinations; amelioration of a disease, symptom, disorder, or pathological condition; reducing or preventing the onset of a disease, symptom, disorder or condition; and generally counteracting a disease, symptom, disorder or pathological condition.
  • the therapeutic agents are administered in an effective amount or therapeutically effective amount.
  • effective amount or “therapeutically effective amount” refers to the amount of an agent that is sufficient to effect beneficial or desired results.
  • the therapeutically effective amount may vary depending upon one or more of: the subject and disease condition being treated, the weight and age of the subject, the severity of the disease condition, the manner of administration and the like, which can readily be determined by one of ordinary skill in the art.
  • the term also applies to a dose that will provide an image for detection by any one of the imaging methods described herein.
  • the specific dose may vary depending on one or more of: the particular agent chosen, the dosing regimen to be followed, whether it is administered in combination with other compounds, timing of administration, the tissue to be imaged, and the physical delivery system in which it is carried.
  • an agent against one of the targets is used in combination with a treatment already be known or used clinically.
  • targeting the combination may require less of the agent as compared to the current standard of care and provide for less toxicity and improved treatment.
  • the one or more agent is an antiviral.
  • an antiviral inhibits viral replication.
  • the antiviral is paxlovid.
  • EUA emergency use authorization
  • Pfizer s Paxlovid (nirmatrelvir tablets and ritonavir tablets, co-packaged for oral use) for the treatment of mild-to- moderate coronavirus disease (COVID-19) in adults and pediatric patients (12 years of age and older weighing at least 40 kilograms or about 88 pounds) with positive results of direct SARS- CoV-2 testing, and who are at high risk for progression to severe COVID-19, including hospitalization or death (Paxlovid EUA Letter of Authorization issued December 22, 2021).
  • the antiviral is molnupiravir.
  • the U.S. Food and Drug Administration issued an emergency use authorization (EUA) for Merck's molnupiravir for the treatment of mild-to- moderate coronavirus disease (COVID-19) in adults with positive results of direct SARS-CoV-2 viral testing, and who are at high risk for progression to severe COVID-19, including hospitalization or death, and for whom alternative COVID-19 treatment options authorized by the FDA are not accessible or clinically appropriate (Molnupiravir EUA Letter of Authorization issued February 11, 2022).
  • the antiviral is Remdesivir.
  • the one or more agent is immune-based therapy.
  • the immune-based therapy is a blood-derived product.
  • the blood-derived product is convalescent plasma.
  • the blood-derived product is immunoglobulin.
  • the immune-based therapy is immunoglobin.
  • the immune-based therapy is one or more of: a corticosteroid, a glucocorticoid, an interferon, an interferon Type I agonist, an interleukin-1 inhibitor, an interleukin-6 inhibitor, a kinase inhibitor, and a TLR agonist.
  • the corticosteroid comprises at least one of: methylprednisolone, hydrocortisone, and dexamethasone.
  • the glucocorticoid comprises at least one of: cortisone, prednisone, prednisolone, methylprednisolone, dexamethasone, betamethasone, triamcinolone, Fludrocortisone acetate, deoxycorticosterone acetate, and hydrocortisone.
  • the interferon comprises at least one or more of: interferon beta-lb and interferon alpha-2b.
  • the interleukin- 1 inhibitor comprises anakinra.
  • the interleukin-6 inhibitor comprises at least one or more of: anti-interleukin-6 receptor monoclonal antibodies and anti-interleukin-6 monoclonal antibody.
  • the anti- interleukin-6 receptor monoclonal antibody is tocilizumab.
  • the anti- interleukin-6 monoclonal antibody is siltuximab.
  • the kinase inhibitor comprises of at least one or more of Bruton's tyrosine kinase inhibitor and Janus kinase inhibitor.
  • the Bruton's tyrosine kinase inhibitor comprises at least one or more of: acalabrutinib, ibrutinib, and zanubrutinib.
  • the Janus kinase inhibitor comprises at least one or more of: baracitinib, ruxolitinib and tofacitinib.
  • the TLR agonist comprises at least one or more of: imiquimod, BCG, and MPL.
  • the treatment comprises inhibiting cholesterol biosynthesis.
  • inhibiting cholesterol biosynthesis comprises administering HMG-CoA reductase inhibitors.
  • the HMG-CoA reductase inhibitor comprises at least one or more of: simvastatin atorvastatin, lovastatin, pravastatin, fluvastatin, rosuvastatin, pitavastatin.
  • the treatment comprises one or more agents capable of shifting epithelial cells to express an antiviral signature.
  • the treatment comprises one or more agents capable of suppressing a myeloid inflammatory response.
  • the one or more agent is an antibody.
  • an antibody targets one or more surface genes or polypeptides.
  • antibody is used interchangeably with the term “immunoglobulin” herein, and includes intact antibodies, fragments of antibodies, e.g., Fab, F(ab')2 fragments, and intact antibodies and fragments that have been mutated either in their constant and/or variable region (e.g., mutations to produce chimeric, partially humanized, or fully humanized antibodies, as well as to produce antibodies with a desired trait, e.g., enhanced binding and/or reduced FcR binding).
  • fragment refers to a part or portion of an antibody or antibody chain comprising fewer amino acid residues than an intact or complete antibody or antibody chain. Fragments can be obtained via chemical or enzymatic treatment of an intact or complete antibody or antibody chain. Fragments can also be obtained by recombinant means. Exemplary fragments include Fab, Fab', F(ab')2, Fabc, Fd, dAb, VHH and scFv and/or Fv fragments.
  • a preparation of antibody protein having less than about 50% of non- antibody protein (also referred to herein as a “contaminating protein”), or of chemical precursors, is considered to be “substantially free.” 40%, 30%, 20%, 10% and more preferably 5% (by dry weight), of non-antibody protein, or of chemical precursors is considered to be substantially free.
  • the antibody protein or biologically active portion thereof is recombinantly produced, it is also preferably substantially free of culture medium, i.e., culture medium represents less than about 30%, preferably less than about 20%, more preferably less than about 10%, and most preferably less than about 5% of the volume or mass of the protein preparation.
  • antigen-binding fragment refers to a polypeptide fragment of an immunoglobulin or antibody that binds antigen or competes with intact antibody (i.e., with the intact antibody from which they were derived) for antigen binding (i.e., specific binding).
  • antigen binding i.e., specific binding
  • antibody encompass any Ig class or any Ig subclass (e.g., the IgG1, IgG2, IgG3, and IgG4 subclasses of IgG) obtained from any source (e.g., humans and non-human primates, and in rodents, lagomorphs, caprines, bovines, equines, ovines, etc.).
  • IgG1, IgG2, IgG3, and IgG4 subclasses of IgG obtained from any source (e.g., humans and non-human primates, and in rodents, lagomorphs, caprines, bovines, equines, ovines, etc.).
  • Ig class or “immunoglobulin class”, as used herein, refers to the five classes of immunoglobulin that have been identified in humans and higher mammals, IgG, IgM, IgA, IgD, and IgE.
  • Ig subclass refers to the two subclasses of IgM (H and L), three subclasses of IgA (IgA1, IgA2, and secretory IgA), and four subclasses of IgG (IgG1, IgG2, IgG3, and IgG4) that have been identified in humans and higher mammals.
  • the antibodies can exist in monomeric or polymeric form; for example, IgM antibodies exist in pentameric form, and IgA antibodies exist in monomeric, dimeric or multimeric form.
  • IgG subclass refers to the four subclasses of immunoglobulin class IgG - IgG1, IgG2, IgG3, and IgG4 that have been identified in humans and higher mammals by the heavy chains of the immunoglobulins, VI - g4, respectively.
  • single-chain immunoglobulin or “single-chain antibody” (used interchangeably herein) refers to a protein having a two- polypeptide chain structure consisting of a heavy and a light chain, said chains being stabilized, for example, by interchain peptide linkers, which has the ability to specifically bind antigen.
  • domain refers to a globular region of a heavy or light chain polypeptide comprising peptide loops (e.g., comprising 3 to 4 peptide loops) stabilized, for example, by b pleated sheet and/or intrachain disulfide bond.
  • Domains are further referred to herein as “constant” or “variable”, based on the relative lack of sequence variation within the domains of various class members in the case of a “constant” domain, or the significant variation within the domains of various class members in the case of a “variable” domain.
  • Antibody or polypeptide “domains” are often referred to interchangeably in the art as antibody or polypeptide “regions”.
  • the “constant” domains of an antibody light chain are referred to interchangeably as “light chain constant regions”, “light chain constant domains”, “CL” regions or “CL” domains.
  • the “constant” domains of an antibody heavy chain are referred to interchangeably as “heavy chain constant regions”, “heavy chain constant domains”, “CH” regions or “CH” domains).
  • variable domains of an antibody light chain are referred to interchangeably as “light chain variable regions”, “light chain variable domains”, “VL” regions or “VL” domains).
  • variable domains of an antibody heavy chain are referred to interchangeably as “heavy chain constant regions”, “heavy chain constant domains”, “VH” regions or “VH” domains).
  • region can also refer to a part or portion of an antibody chain or antibody chain domain (e.g., a part or portion of a heavy or light chain or a part or portion of a constant or variable domain, as defined herein), as well as more discrete parts or portions of said chains or domains.
  • light and heavy chains or light and heavy chain variable domains include “complementarity determining regions” or “CDRs” interspersed among “framework regions” or “FRs”, as defined herein.
  • the term “conformation” refers to the tertiary structure of a protein or polypeptide (e.g., an antibody, antibody chain, domain or region thereof).
  • the phrase “light (or heavy) chain conformation” refers to the tertiary structure of a light (or heavy) chain variable region
  • the phrase “antibody conformation” or “antibody fragment conformation” refers to the tertiary structure of an antibody or fragment thereof.
  • antibody-like protein scaffolds or “engineered protein scaffolds” broadly encompasses proteinaceous non-immunoglobulin specific-binding agents, typically obtained by combinatorial engineering (such as site-directed random mutagenesis in combination with phage display or other molecular selection techniques).
  • Such scaffolds are derived from robust and small soluble monomeric proteins (such as Kunitz inhibitors or lipocalins) or from a stably folded extra-membrane domain of a cell surface receptor (such as protein A, fibronectin or the ankyrin repeat).
  • Curr Opin Biotechnol 2007, 18:295-304 include without limitation affibodies, based on the Z-domain of staphylococcal protein A, a three- helix bundle of 58 residues providing an interface on two of its alpha-helices (Nygren, Alternative binding proteins: Affibody binding proteins developed from a small three-helix bundle scaffold. FEBS J 2008, 275:2668-2676); engineered Kunitz domains based on a small (ca.
  • anticalins derived from the lipocalins, a diverse family of eight-stranded beta-barrel proteins (ca. 180 residues) that naturally form binding sites for small ligands by means of four structurally variable loops at the open end, which are abundant in humans, insects, and many other organisms (Skerra, Alternative binding proteins: Anticalins — harnessing the structural plasticity of the lipocalin ligand pocket to engineer novel binding activities.
  • DARPins designed ankyrin repeat domains (166 residues), which provide a rigid interface arising from typically three repeated beta-turns
  • avimers multimerized LDLR-A module
  • avimers Smallman et al., Multivalent avimer proteins evolved by exon shuffling of a family of human receptor domains. Nat Biotechnol 2005, 23:1556-1561
  • cysteine-rich knottin peptides Kolmar, Alternative binding proteins: biological activity and therapeutic potential of cystine-knot miniproteins.
  • “Specific binding” of an antibody means that the antibody exhibits appreciable affinity for a particular antigen or epitope and, generally, does not exhibit significant cross reactivity. “Appreciable” binding includes binding with an affinity of at least 25 mM. Antibodies with affinities greater than 1 ⁇ 10 7 M -1 (or a dissociation coefficient of ImM or less or a dissociation coefficient of lnm or less) typically bind with correspondingly greater specificity.
  • antibodies of the invention bind with a range of affinities, for example, 100nM or less, 75nM or less, 50nM or less, 25nM or less, for example 10nM or less, 5nM or less, 1nM or less, or in embodiments 500pM or less, 100pM or less, 50pM or less or 25pM or less.
  • An antibody that “does not exhibit significant crossreactivity” is one that will not appreciably bind to an entity other than its target (e.g., a different epitope or a different molecule).
  • an antibody that specifically binds to a target molecule will appreciably bind the target molecule but will not significantly react with non-target molecules or peptides.
  • An antibody specific for a particular epitope will, for example, not significantly crossreact with remote epitopes on the same protein or peptide.
  • Specific binding can be determined according to any art-recognized means for determining such binding. Preferably, specific binding is determined according to Scatchard analysis and/or competitive binding assays.
  • affinity refers to the strength of the binding of a single antigen-combining site with an antigenic determinant. Affinity depends on the closeness of stereochemical fit between antibody combining sites and antigen determinants, on the size of the area of contact between them, on the distribution of charged and hydrophobic groups, etc. Antibody affinity can be measured by equilibrium dialysis or by the kinetic BIACORETM method. The dissociation constant, Kd, and the association constant, Ka, are quantitative measures of affinity.
  • the term “monoclonal antibody” refers to an antibody derived from a clonal population of antibody-producing cells (e.g., B lymphocytes or B cells) which is homogeneous in structure and antigen specificity.
  • the term “polyclonal antibody” refers to a plurality of antibodies originating from different clonal populations of antibody-producing cells which are heterogeneous in their structure and epitope specificity, but which recognize a common antigen.
  • Monoclonal and polyclonal antibodies may exist within bodily fluids, as crude preparations, or may be purified, as described herein.
  • binding portion of an antibody includes one or more complete domains, e.g., a pair of complete domains, as well as fragments of an antibody that retain the ability to specifically bind to a target molecule. It has been shown that the binding function of an antibody can be performed by fragments of a full-length antibody. Binding fragments are produced by recombinant DNA techniques, or by enzymatic or chemical cleavage of intact immunoglobulins. Binding fragments include Fab, Fab', F(ab')2, Fabc, Fd, dAb, Fv, single chains, single-chain antibodies, e.g., scFv, and single domain antibodies.
  • “Humanized” forms of non-human (e.g., murine) antibodies are chimeric antibodies that contain minimal sequence derived from non-human immunoglobulin.
  • humanized antibodies are human immunoglobulins (recipient antibody) in which residues from a hypervariable region of the recipient are replaced by residues from a hypervariable region of a non-human species (donor antibody) such as mouse, rat, rabbit, or nonhuman primate having the desired specificity, affinity, and capacity.
  • donor antibody such as mouse, rat, rabbit, or nonhuman primate having the desired specificity, affinity, and capacity.
  • FR residues of the human immunoglobulin are replaced by corresponding non-human residues.
  • humanized antibodies may comprise residues that are not found in the recipient antibody or in the donor antibody. These modifications are made to further refine antibody performance.
  • the humanized antibody will comprise substantially all of at least one, and typically two, variable domains, in which all or substantially all of the hypervariable regions correspond to those of a nonhuman immunoglobulin and all or substantially all of the FR regions are those of a human immunoglobulin sequence.
  • the humanized antibody optionally also will comprise at least a portion of an immunoglobulin constant region (Fc), typically that of a human immunoglobulin.
  • portions of antibodies or epitope-binding proteins encompassed by the present definition include: (i) the Fab fragment, having V L , C L , V H and C H I domains; (ii) the Fab' fragment, which is a Fab fragment having one or more cysteine residues at the C-terminus of the C H I domain; (iii) the Fd fragment having V H and C H I domains; (iv) the Fd' fragment having V H and C H I domains and one or more cysteine residues at the C-terminus of the CHI domain; (v) the Fv fragment having the V L and V H domains of a single arm of an antibody; (vi) the dAb fragment (Ward et al., 341 Nature 544 (1989)) which consists of a V H domain or a V L domain that binds antigen; (vii) isolated CDR regions or isolated CDR regions presented in a functional framework; (viii) F(ab')2 fragments which
  • a “blocking” antibody or an antibody “antagonist” is one which inhibits or reduces biological activity of the antigen(s) it binds (e.g., CD160).
  • the blocking antibodies or antagonist antibodies or portions thereof described herein completely inhibit the biological activity of the antigen(s).
  • Antibodies may act as agonists or antagonists of the recognized polypeptides.
  • the present invention includes antibodies which disrupt receptor/ligand interactions either partially or fully.
  • the invention features both receptor-specific antibodies and ligand- specific antibodies.
  • the invention also features receptor-specific antibodies which do not prevent ligand binding but prevent receptor activation.
  • Receptor activation i.e., signaling
  • receptor activation can be determined by techniques described herein or otherwise known in the art. For example, receptor activation can be determined by detecting the phosphorylation (e.g., tyrosine or serine/threonine) of the receptor or of one of its down-stream substrates by immunoprecipitation followed by western blot analysis.
  • antibodies are provided that inhibit ligand activity or receptor activity by at least 95%, at least 90%, at least 85%, at least 80%, at least 75%, at least 70%, at least 60%, or at least 50% of the activity in absence of the antibody.
  • the invention also features receptor-specific antibodies which both prevent ligand binding and receptor activation as well as antibodies that recognize the receptor-ligand complex.
  • receptor-specific antibodies which both prevent ligand binding and receptor activation as well as antibodies that recognize the receptor-ligand complex.
  • neutralizing antibodies which bind the ligand and prevent binding of the ligand to the receptor, as well as antibodies which bind the ligand, thereby preventing receptor activation, but do not prevent the ligand from binding the receptor.
  • antibodies which activate the receptor are also included in the invention. These antibodies may act as receptor agonists, i.e., potentiate or activate either all or a subset of the biological activities of the ligand-mediated receptor activation, for example, by inducing dimerization of the receptor.
  • the antibodies may be specified as agonists, antagonists or inverse agonists for biological activities comprising the specific biological activities of the peptides disclosed herein.
  • the antibody agonists and antagonists can be made using methods known in the art. See, e.g., PCT publication WO 96/40281; U.S. Pat. No. 5,811,097; Deng et al., Blood 92(6): 1981-1988 (1998); Chen et al., Cancer Res. 58(16):3668-3678 (1998); Harrop et al., J. Immunol. 161(4): 1786-1794 (1998); Zhu et al., Cancer Res. 58(15):3209-3214 (1998); Yoon et al., J.
  • the antibodies as defined for the present invention include derivatives that are modified, i.e., by the covalent attachment of any type of molecule to the antibody such that covalent attachment does not prevent the antibody from generating an anti -idiotypic response.
  • the antibody derivatives include antibodies that have been modified, e.g., by glycosylation, acetylation, pegylation, phosphylation, amidation, derivatization by known protecting/blocking groups, proteolytic cleavage, linkage to a cellular ligand or other protein, etc. Any of numerous chemical modifications may be carried out by known techniques, including, but not limited to specific chemical cleavage, acetylation, formylation, metabolic synthesis of tunicamycin, etc. Additionally, the derivative may contain one or more non-classical amino acids.
  • Simple binding assays can be used to screen for or detect agents that bind to a target protein, or disrupt the interaction between proteins (e.g., a receptor and a ligand). Because certain targets of the present invention are transmembrane proteins, assays that use the soluble forms of these proteins rather than full-length protein can be used, in some embodiments. Soluble forms include, for example, those lacking the transmembrane domain and/or those comprising the IgV domain or fragments thereof which retain their ability to bind their cognate binding partners. Further, agents that inhibit or enhance protein interactions for use in the compositions and methods described herein, can include recombinant peptido-mimetics.
  • Detection methods useful in screening assays include antibody-based methods, detection of a reporter moiety, detection of cytokines as described herein, and detection of a gene signature as described herein.
  • affinity biosensor methods may be based on the piezoelectric effect, electrochemistry, or optical methods, such as ellipsometry, optical wave guidance, and surface plasmon resonance (SPR).
  • bispecific antibodies are used to target specific cell types (e.g., viral infected cells).
  • Bi-specific antigen-binding constructs e.g., bi-specific antibodies (bsAb) or BiTEs, bind two antigens (see, e.g., Suurs et al., A review of bispecific antibodies and antibody constructs in oncology and clinical challenges. Pharmacol Ther. 2019 Sep;201: 103-119; and Huehls, et al., Bispecific T cell engagers for cancer immunotherapy. Immunol Cell Biol. 2015 Mar; 93(3): 290-296).
  • the bi-specific antigen-binding construct includes two antigen-binding polypeptide constructs, e.g., antigen binding domains.
  • the antigen-binding construct is derived from known antibodies or antigen-binding constructs.
  • the antigen- binding polypeptide constructs comprise two antigen binding domains that comprise antibody fragments.
  • the first antigen binding domain and second antigen binding domain each independently comprises an antibody fragment selected from the group of: an scFv, a Fab, and an Fc domain.
  • the antibody fragments may be the same format or different formats from each other.
  • the antigen-binding polypeptide constructs comprise a first antigen binding domain comprising an scFv and a second antigen binding domain comprising a Fab.
  • the antigen-binding polypeptide constructs comprise a first antigen binding domain and a second antigen binding domain, wherein both antigen binding domains comprise an scFv.
  • the first and second antigen binding domains each comprise a Fab.
  • the first and second antigen binding domains each comprise an Fc domain. Any combination of antibody formats is suitable for the bi-specific antibody constructs disclosed herein.
  • the one or more agent is an aptamer.
  • Nucleic acid aptamers are nucleic acid species that have been engineered through repeated rounds of in vitro selection or equivalently, SELEX (systematic evolution of ligands by exponential enrichment) to bind to various molecular targets such as small molecules, proteins, nucleic acids, cells, tissues and organisms. Nucleic acid aptamers have specific binding affinity to molecules through interactions other than classic Watson-Crick base pairing. Aptamers are useful in biotechnological and therapeutic applications as they offer molecular recognition properties similar to antibodies.
  • RNA aptamers may be expressed from a DNA construct.
  • a nucleic acid aptamer may be linked to another polynucleotide sequence.
  • the polynucleotide sequence may be a double stranded DNA polynucleotide sequence.
  • the aptamer may be covalently linked to one strand of the polynucleotide sequence.
  • the aptamer may be ligated to the polynucleotide sequence.
  • the polynucleotide sequence may be configured, such that the polynucleotide sequence may be linked to a solid support or ligated to another polynucleotide sequence.
  • Aptamers like peptides generated by phage display or monoclonal antibodies (“mAbs”), are capable of specifically binding to selected targets and modulating the target's activity, e.g., through binding, aptamers may block their target's ability to function.
  • a typical aptamer is 10-15 kDa in size (30-45 nucleotides), binds its target with sub-nanomolar affinity, and discriminates against closely related targets (e.g., aptamers will typically not bind other proteins from the same gene family).
  • aptamers are capable of using the same types of binding interactions (e.g., hydrogen bonding, electrostatic complementarity, hydrophobic contacts, steric exclusion) that drives affinity and specificity in antibody-antigen complexes.
  • binding interactions e.g., hydrogen bonding, electrostatic complementarity, hydrophobic contacts, steric exclusion
  • Aptamers have a number of desirable characteristics for use in research and as therapeutics and diagnostics including high specificity and affinity, biological efficacy, and excellent pharmacokinetic properties. In addition, they offer specific competitive advantages over antibodies and other protein biologies. Aptamers are chemically synthesized and are readily sealed as needed to meet production demand for research, diagnostic or therapeutic applications. Aptamers are chemically robust. They are intrinsically adapted to regain activity following exposure to factors such as heat and denaturants and can be stored for extended periods (>1 yr) at room temperature as lyophilized powders. Not being bound by a theory, aptamers bound to a solid support or beads may be stored for extended periods.
  • Oligonucleotides in their phosphodiester form may be quickly degraded by intracellular and extracellular enzymes such as endonucleases and exonucleases.
  • Aptamers can include modified nucleotides conferring improved characteristics on the ligand, such as improved in vivo stability or improved delivery characteristics. Examples of such modifications include chemical substitutions at the ribose and/or phosphate and/or base positions. SELEX identified nucleic acid ligands containing modified nucleotides are described, e.g., in U.S. Pat. No.
  • Modifications of aptamers may also include, modifications at exocydic amines, substitution of 4- thiouridine, substitution of 5-bromo or 5-iodo-uracil; backbone modifications, phosphorothioate or allyl phosphate modifications, methylations, and unusual base-pairing combinations such as the isobases isocytidine and isoguanosine. Modifications can also include 3' and 5' modifications such as capping. As used herein, the term phosphorothioate encompasses one or more non-bridging oxygen atoms in a phosphodiester bond replaced by one or more sulfur atoms.
  • the oligonucleotides comprise modified sugar groups, for example, one or more of the hydroxyl groups is replaced with halogen, aliphatic groups, or functionalized as ethers or amines.
  • the 2'-position of the furanose residue is substituted by any of an O- methyl, O-alkyl, O-allyl, S-alkyl, S-allyl, or halo group.
  • aptamers include aptamers with improved off-rates as described in International Patent Publication No. WO 2009012418, “Method for generating aptamers with improved off-rates,” incorporated herein by reference in its entirety.
  • aptamers are chosen from a library of aptamers.
  • Such libraries include, but are not limited to, those described in Rohloff et al., “Nucleic Acid Ligands With Protein-like Side Chains: Modified Aptamers and Their Use as Diagnostic and Therapeutic Agents,” Molecular Therapy Nucleic Acids (2014) 3, e201. Aptamers are also commercially available (see, e.g., SomaLogic, Inc., Boulder, Colorado). In certain embodiments, the present invention may utilize any aptamer containing any modification as described herein. Small Molecules
  • the one or more agents is a small molecule.
  • small molecule refers to compounds, preferably organic compounds, with a size comparable to those organic molecules generally used in pharmaceuticals.
  • Preferred small organic molecules range in size up to about 5000 Da, e.g., up to about 4000, preferably up to 3000 Da, more preferably up to 2000 Da, even more preferably up to about 1000 Da, e.g., up to about 900, 800, 700, 600 or up to about 500 Da.
  • the small molecule may act as an antagonist or agonist (e.g., blocking an enzyme active site or activating a receptor by binding to a ligand binding site).
  • an antagonist or agonist e.g., blocking an enzyme active site or activating a receptor by binding to a ligand binding site.
  • One type of small molecule applicable to the present invention is a degrader molecule (see, e.g., Ding, et al., Emerging New Concepts of Degrader Technologies, Trends Pharmacol Sci. 2020 Jul;41(7):464-474).
  • the terms “degrader” and “degrader molecule” refer to all compounds capable of specifically targeting a protein for degradation (e.g., ATTEC, AUTAC, LYTAC, or PROTAC, reviewed in Ding, et al. 2020).
  • PROTAC Proteolysis Targeting Chimera
  • LYTACs are particularly advantageous for cell surface proteins as described herein (e.g., CD160).
  • the one or more modulating agents may be a genetic modifying agent.
  • the genetic modifying agents may manipulate nucleic acids (e.g., genomic DNA or mRNA).
  • the genetic modulating agent can be used to up- or downregulate expression of a gene either by targeting a nuclease or functional domain to a DNA or RNA sequence.
  • the genetic modifying agent may comprise an RNA-guided nuclease system (e.g., CRISPR system), RNAi system, a zinc finger nuclease, a TALE, or a meganuclease.
  • one or more genes capable of shifting cell composition or cell states is modified by a genetic modifying agent (e.g., one or more genes in Tables 1-5).
  • a genetic modifying agent is used in subjects already having severe disease.
  • a polynucleotide of the present invention described elsewhere herein can be modified using a CRISPR-Cas and/or Cas-based system (e.g., genomic DNA or mRNA, preferably, for a disease gene).
  • the nucleotide sequence may be or encode one or more components of a CRISPR-Cas system.
  • the nucleotide sequences may be or encode guide RNAs.
  • the nucleotide sequences may also encode CRISPR proteins, variants thereof, or fragments thereof.
  • a CRISPR-Cas or CRISPR system refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g., tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system), or “RNA(s)” as that term is herein used (e.g., RNA(s) to guide Cas, such as Cas9, e.g., CRISPR RNA and transactivating (tracr) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g.
  • a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system). See, e.g., Shmakov et al. (2015) “Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems”, Molecular Cell, DOI: dx.doi.org/10.1016/j.molcel.2015.10.008.
  • CRISPR-Cas systems can generally fall into two classes based on their architectures of their effector molecules, which are each further subdivided by type and subtype. The two classes are Class 1 and Class 2. Class 1 CRISPR-Cas systems have effector modules composed of multiple Cas proteins, some of which form crRNA-binding complexes, while Class 2 CRISPR-Cas systems include a single, multi-domain crRNA-binding protein.
  • the CRISPR-Cas system that can be used to modify a polynucleotide of the present invention described herein can be a Class 1 CRISPR-Cas system. In some embodiments, the CRISPR-Cas system that can be used to modify a polynucleotide of the present invention described herein can be a Class 2 CRISPR-Cas system.
  • the CRISPR-Cas system that can be used to modify a polynucleotide of the present invention described herein can be a Class 1 CRISPR-Cas system.
  • Class 1 CRISPR-Cas systems are divided into Types I, II, and IV. Makarova et al. 2020. Nat. Rev. 18: 67-83., particularly as described in Figure 1.
  • Type I CRISPR-Cas systems are divided into 9 subtypes (I-A, I-B, I-C, I-D, I-E, I-Fl, I-F2, 1-F3, and IG). Makarova et al, 2020.
  • Type I CRISPR-Cas systems can contain a Cas3 protein that can have helicase activity.
  • Type III CRISPR- Cas systems are divided into 6 subtypes (III-A, III-B, III-C, III-D, III-E, and III-F).
  • Type III CRISPR-Cas systems can contain a Cas 10 that can include an RNA recognition motif called Palm and a cyclase domain that can cleave polynucleotides.
  • Type IV CRISPR- Cas systems are divided into 3 subtypes. (IV- A, IV-B, and IV-C). Makarova et al., 2020.
  • Class 1 systems also include CRISPR-Cas variants, including Type I-A, I-B, I-E, I-F and I-U variants, which can include variants carried by transposons and plasmids, including versions of subtype I- F encoded by a large family of Tn7-like transposon and smaller groups of Tn7-like transposons that encode similarly degraded subtype I-B systems.
  • CRISPR-Cas variants including Type I-A, I-B, I-E, I-F and I-U variants, which can include variants carried by transposons and plasmids, including versions of subtype I- F encoded by a large family of Tn7-like transposon and smaller groups of Tn7-like transposons that encode similarly degraded subtype I-B systems.
  • the Class 1 systems typically use a multi-protein effector complex, which can, in some embodiments, include ancillary proteins, such as one or more proteins in a complex referred to as a CRISPR-associated complex for antiviral defense (Cascade), one or more adaptation proteins (e.g., Casl, Cas2, RNA nuclease), and/or one or more accessory proteins (e.g., Cas 4, DNA nuclease), CRISPR associated Rossman fold (CARF) domain containing proteins, and/or RNA transcriptase.
  • CRISPR-associated complex for antiviral defense Cascade
  • adaptation proteins e.g., Casl, Cas2, RNA nuclease
  • accessory proteins e.g., Cas 4, DNA nuclease
  • CARF CRISPR associated Rossman fold
  • the backbone of the Class 1 CRISPR-Cas system effector complexes can be formed by RNA recognition motif domain-containing protein(s) of the repeat-associated mysterious proteins (RAMPs) family subunits (e.g., Cas 5, Cas6, and/or Cas7).
  • RAMP proteins are characterized by having one or more RNA recognition motif domains. In some embodiments, multiple copies of RAMPs can be present.
  • the Class I CRISPR-Cas system can include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or more Cas5, Cas6, and/or Cas 7 proteins.
  • the Cas6 protein is an RNAse, which can be responsible for pre-crRNA processing. When present in a Class 1 CRISPR-Cas system, Cas6 can be optionally physically associated with the effector complex.
  • Class 1 CRISPR-Cas system effector complexes can, in some embodiments, also include a large subunit.
  • the large subunit can be composed of or include a Cas8 and/or Cas 10 protein. See , e.g., Figures 1 and 2. Koonin EV, Makarova KS. 2019. Phil. Trans. R. Soc. B 374: 20180087, DOI: 10.1098/rstb.2018.0087 and Makarova et al. 2020.
  • Class 1 CRISPR-Cas system effector complexes can, in some embodiments, include a small subunit (for example, Casl l). See , e.g., Figures 1 and 2. Koonin EV, Makarova KS. 2019 Origins and Evolution of CRISPR-Cas systems. Phil. Trans. R. Soc. B 374: 20180087, DOI: 10.1098/rstb.2018.0087. [0171] In some embodiments, the Class 1 CRISPR-Cas system can be a Type I CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-A CRISPR-Cas system.
  • the Type I CRISPR-Cas system can be a subtype I-B CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-C CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-D CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-E CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-Fl CRISPR- Cas system.
  • the Type I CRISPR-Cas system can be a subtype I-F2 CRISPR- Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-F3 CRISPR- Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-G CRISPR- Cas system.
  • the Type I CRISPR-Cas system can be a CRISPR Cas variant, such as a Type I-A, I-B, I-E, I-F and I-U variants, which can include variants carried by transposons and plasmids, including versions of subtype I-F encoded by a large family of Tn7-like transposon and smaller groups of Tn7-like transposons that encode similarly degraded subtype I- B systems as previously described.
  • CRISPR Cas variant such as a Type I-A, I-B, I-E, I-F and I-U variants, which can include variants carried by transposons and plasmids, including versions of subtype I-F encoded by a large family of Tn7-like transposon and smaller groups of Tn7-like transposons that encode similarly degraded subtype I- B systems as previously described.
  • the Class 1 CRISPR-Cas system can be a Type III CRISPR-Cas system.
  • the Type III CRISPR-Cas system can be a subtype III-A CRISPR- Cas system.
  • the Type III CRISPR-Cas system can be a subtype III-B CRISPR-Cas system.
  • the Type III CRISPR-Cas system can be a subtype
  • the Type III CRISPR-Cas system can be a subtype III-D CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-E CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-F CRISPR-Cas system.
  • the Class 1 CRISPR-Cas system can be a Type IV CRISPR- Cas-system.
  • the Type IV CRISPR-Cas system can be a subtype IV-A CRISPR-Cas system.
  • the Type IV CRISPR-Cas system can be a subtype
  • Type IV CRISPR-Cas system can be a subtype IV-C CRISPR-Cas system.
  • the effector complex of a Class 1 CRISPR-Cas system can, in some embodiments, include a Cas3 protein that is optionally fused to a Cas2 protein, a Cas4, a Cas5, a Cas6, a Cas7, a Cas8, a Cas10, a Cas11, or a combination thereof.
  • the effector complex of a Class 1 CRISPR-Cas system can have multiple copies, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14, of any one or more Cas proteins.
  • the CRISPR-Cas system is a Class 2 CRISPR-Cas system.
  • Class 2 systems are distinguished from Class 1 systems in that they have a single, large, multi-domain effector protein.
  • the Class 2 system can be a Type II, Type V, or Type VI system, which are described in Makarova et al. “Evolutionary classification of CRISPR-Cas systems: a burst of class 2 and derived variants” Nature Reviews Microbiology, 18:67-81 (Feb 2020), incorporated herein by reference.
  • Class 2 system Each type of Class 2 system is further divided into subtypes. See Markova et al. 2020, particularly at Figure. 2.
  • Class 2 Type II systems can be divided into 4 subtypes: II-A, II-B, II-C1, and II-C2.
  • Class 2 Type V systems can be divided into 17 subtypes: V-A, V-B1, V-B2, V-C, V-D, V-E, V-Fl, V-F1(V-U3), V-F2, V-F3, V-G, V-H, V-I, V-K (V-U5),
  • Type IV systems can be divided into 5 subtypes: VI-A, VI-B1,
  • VI-B2, VI-C, and VI-D are VI-B2, VI-C, and VI-D.
  • Type V systems differ from Type II effectors (e.g., Cas9), which contain two nuclear domains that are each responsible for the cleavage of one strand of the target DNA, with the HNH nuclease inserted inside the Ruv-C like nuclease domain sequence.
  • the Type V systems e.g., Casl2 only contain a RuvC-like nuclease domain that cleaves both strands.
  • Type VI (Casl3) are unrelated to the effectors of Type II and V systems and contain two HEPN domains and target RNA. Casl3 proteins also display collateral activity that is triggered by target recognition. Some Type V systems have also been found to possess this collateral activity with two single-stranded DNA in in vitro contexts.
  • the Class 2 system is a Type II system.
  • the Type II CRISPR-Cas system is a II-A CRISPR-Cas system.
  • the Type II CRISPR-Cas system is a II-B CRISPR-Cas system.
  • the Type II CRISPR- Cas system is a II-C1 CRISPR-Cas system.
  • the Type II CRISPR-Cas system is a II-C2 CRISPR-Cas system.
  • the Type II system is a Cas9 system.
  • the Type II system includes a Cas9.
  • the Class 2 system is a Type V system.
  • the Type V CRISPR-Cas system is a V-A CRISPR-Cas system.
  • the Type V CRISPR-Cas system is a V-A CRISPR-Cas system.
  • the Type V CRISPR-Cas system is a V-A CRISPR-Cas system.
  • V CRISPR-Cas system is a V-B1 CRISPR-Cas system.
  • Type V the Type V
  • the Type V CRISPR-Cas system is a V-B2 CRISPR-Cas system.
  • the Type V CRISPR- Cas system is a V-C CRISPR-Cas system.
  • the Type V CRISPR-Cas system is a V-D CRISPR-Cas system.
  • the Type V CRISPR-Cas system is a V-E CRISPR-Cas system.
  • the Type V CRISPR-Cas system is a V-F1 CRISPR- Cas system.
  • the Type V CRISPR-Cas system is a V-F1 (V-U3) CRISPR- Cas system.
  • the Type V CRISPR-Cas system is a V-F2 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-F3 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-G CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-H CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-I CRISPR-Cas system.
  • the Type V CRISPR-Cas system is a V-K (V-U5) CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-U1 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-U2 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-U4 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system includes a Cas12a (Cpf1), Cas12b (C2c1), Cas12c (C2c3), CasX, and/or Cas14.
  • the Class 2 system is a Type VI system.
  • the Type VI CRISPR-Cas system is a VI-A CRISPR-Cas system.
  • the Type VI CRISPR-Cas system is a VI-A CRISPR-Cas system.
  • the Type VI CRISPR-Cas system is a VI-A CRISPR-Cas system.
  • VI CRISPR-Cas system is a VI-B1 CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system is a VI-B2 CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system is a VI-C CRISPR-Cas system. In some embodiments, the Type VI CRISPR- Cas system is a VI-D CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system includes a Cas13a (C2c2), Cas13b (Group 29/30), Cas13c, and/or Cas13d. Specialized Cas-based Systems
  • the system is a Cas-based system that is capable of performing a specialized function or activity.
  • the Cas protein may be fused, operably coupled to, or otherwise associated with one or more functionals domains.
  • the Cas protein may be a catalytically dead Cas protein (“dCas”) and/or have nickase activity.
  • dCas catalytically dead Cas protein
  • a nickase is a Cas protein that cuts only one strand of a double stranded target.
  • the dCas or nickase provide a sequence specific targeting functionality that delivers the functional domain to or proximate a target sequence.
  • Example functional domains that may be fused to, operably coupled to, or otherwise associated with a Cas protein can be or include, but are not limited to a nuclear localization signal (NLS) domain, a nuclear export signal (NES) domain, a translational activation domain, a transcriptional activation domain (e.g.
  • VP64, p65, MyoDl, HSF1, RTA, and SET7/9) a translation initiation domain, a transcriptional repression domain (e.g., a KRAB domain, NuE domain, NcoR domain, and a SID domain such as a SID4X domain), a nuclease domain (e.g., Fokl), a histone modification domain (e.g., a histone acetyltransferase), a light inducible/controllable domain, a chemically inducible/controllable domain, a transposase domain, a homologous recombination machinery domain, a recombinase domain, an integrase domain, and combinations thereof.
  • a transcriptional repression domain e.g., a KRAB domain, NuE domain, NcoR domain, and a SID domain such as a SID4X domain
  • a nuclease domain e.g
  • the functional domains can have one or more of the following activities: methylase activity, demethylase activity, translation activation activity, translation initiation activity, translation repression activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, nuclease activity, single-strand RNA cleavage activity, double-strand RNA cleavage activity, single-strand DNA cleavage activity, double-strand DNA cleavage activity, molecular switch activity, chemical inducibility, light inducibility, and nucleic acid binding activity.
  • the one or more functional domains may comprise epitope tags or reporters.
  • epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags.
  • reporters include, but are not limited to, glutathione-S-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT) beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and auto-fluorescent proteins including blue fluorescent protein (BFP).
  • GST glutathione-S-transferase
  • HRP horseradish peroxidase
  • CAT chloramphenicol acetyltransferase
  • beta-galactosidase beta-galactosidase
  • beta-glucuronidase beta-galactosidase
  • luciferase green fluorescent protein
  • GFP green fluorescent protein
  • HcRed HcRed
  • DsRed cyan fluorescent protein
  • the one or more functional domain(s) may be positioned at, near, and/or in proximity to a terminus of the effector protein (e.g., a Cas protein). In embodiments having two or more functional domains, each of the two can be positioned at or near or in proximity to a terminus of the effector protein (e.g., a Cas protein). In some embodiments, such as those where the functional domain is operably coupled to the effector protein, the one or more functional domains can be tethered or linked via a suitable linker (including, but not limited to, GlySer linkers) to the effector protein (e.g., a Cas protein). When there is more than one functional domain, the functional domains can be same or different.
  • a suitable linker including, but not limited to, GlySer linkers
  • all the functional domains are the same. In some embodiments, all of the functional domains are different from each other. In some embodiments, at least two of the functional domains are different from each other. In some embodiments, at least two of the functional domains are the same as each other.
  • the CRISPR-Cas system is a split CRISPR-Cas system. See e.g., Zetche et al., 2015. Nat. Biotechnol. 33(2): 139-142 and WO 2019/018423, the compositions and techniques of which can be used in and/or adapted for use with the present invention.
  • Split CRISPR-Cas proteins are set forth herein and in documents incorporated herein by reference in further detail herein.
  • each part of a split CRISPR protein is attached to a member of a specific binding pair, and when bound with each other, the members of the specific binding pair maintain the parts of the CRISPR protein in proximity.
  • each part of a split CRISPR protein is associated with an inducible binding pair.
  • An inducible binding pair is one which is capable of being switched “on” or “off” by a protein or small molecule that binds to both members of the inducible binding pair.
  • CRISPR proteins may preferably split between domains, leaving domains intact.
  • said Cas split domains e.g., RuvC and HNH domains in the case of Cas9
  • the reduced size of the split Cas compared to the wild type Cas allows other methods of delivery of the systems to the cells, such as the use of cell penetrating peptides as described herein.
  • a polynucleotide of the present invention described elsewhere herein can be modified using a base editing system.
  • a Cas protein is connected or fused to a nucleotide deaminase.
  • the Cas-based system can be a base editing system.
  • base editing refers generally to the process of polynucleotide modification via a CRISPR-Cas-based or Cas-based system that does not include excising nucleotides to make the modification. Base editing can convert base pairs at precise locations without generating excess undesired editing byproducts that can be made using traditional CRISPR-Cas systems.
  • the nucleotide deaminase may be a DNA base editor used in combination with a DNA binding Cas protein such as, but not limited to, Class 2 Type II and Type V systems.
  • a DNA binding Cas protein such as, but not limited to, Class 2 Type II and Type V systems.
  • Two classes of DNA base editors are generally known: cytosine base editors (CBEs) and adenine base editors (ABEs).
  • CBEs convert a C ⁇ G base pair into a T ⁇ A base pair
  • ABEs convert an A ⁇ T base pair to a C ⁇ G base pair.
  • CBEs and ABEs can mediate all four possible transition mutations (C to T, A to G, T to C, and G to A).
  • the base editing system includes a CBE and/or an ABE.
  • a polynucleotide of the present invention described elsewhere herein can be modified using a base editing system. Rees and Liu. 2018. Nat. Rev. Gent. 19(12):770-788.
  • Base editors also generally do not need a DNA donor template and/or rely on homology-directed repair. Komor et al. 2016.
  • the catalytically disabled Cas protein can be a variant or modified Cas can have nickase functionality and can generate a nick in the non-edited DNA strand to induce cells to repair the non-edited strand using the edited strand as a template.
  • Base editors may be further engineered to optimize conversion of nucleotides (e.g., A:T to G:C). Richter et al. 2020. Nature Biotechnology . doi . org /10.1038/s41587-020-0453 -z.
  • Example Type V base editing systems are described in WO 2018/213708, WO 2018/213726, PCT/US2018/067207, PCT/US2018/067225, and PCT/US2018/067307 which are incorporated by referenced herein.
  • the base editing system may be a RNA base editing system.
  • a nucleotide deaminase capable of converting nucleotide bases may be fused to a Cas protein.
  • the Cas protein will need to be capable of binding RNA.
  • Example RNA binding Cas proteins include, but are not limited to, RNA- binding Cas9s such as Francisella novicida Cas9 (“FnCas9”), and Class 2 Type VI Cas systems.
  • the nucleotide deaminase may be a cytidine deaminase or an adenosine deaminase, or an adenosine deaminase engineered to have cytidine deaminase activity.
  • the RNA based editor may be used to delete or introduce a post-translation modification site in the expressed mRNA.
  • RNA base editors can provide edits where finer temporal control may be needed, for example in modulating a particular immune response.
  • Example Type VI RNA- base editing systems are described in Cox et al. 2017.
  • a polynucleotide of the present invention described elsewhere herein can be modified using a prime editing system (See e.g., Anzalone et al. 2019. Nature. 576: 149-157). Like base editing systems, prime editing systems can be capable of targeted modification of a polynucleotide without generating double stranded breaks and does not require donor templates. Further prime editing systems can be capable of all 12 possible combination swaps. Prime editing can operate via a “search-and-replace” methodology and can mediate targeted insertions, deletions, all 12 possible base-to-base conversion, and combinations thereof.
  • a prime editing system as exemplified by PE1, PE2, and PE3 (Id.), can include a reverse transcriptase fused or otherwise coupled or associated with an RNA-programmable nickase, and a prime-editing extended guide RNA (pegRNA) to facility direct copying of genetic information from the extension on the pegRNA into the target polynucleotide.
  • pegRNA prime-editing extended guide RNA
  • Embodiments that can be used with the present invention include these and variants thereof.
  • Prime editing can have the advantage of lower off-target activity than traditional CRIPSR-Cas systems along with few byproducts and greater or similar efficiency as compared to traditional CRISPR-Cas systems.
  • the prime editing guide molecule can specify both the target polynucleotide information (e.g., sequence) and contain a new polynucleotide cargo that replaces target polynucleotides.
  • the PE system can nick the target polynucleotide at a target side to expose a 3’ hydroxyl group, which can prime reverse transcription of an edit-encoding extension region of the guide molecule (e.g., a prime editing guide molecule or peg guide molecule) directly into the target site in the target polynucleotide. See e.g., Anzalone et al. 2019. Nature. 576: 149-157, particularly at Figures lb, lc, related discussion, and Supplementary discussion.
  • a prime editing system can be composed of a Cas polypeptide having nickase activity, a reverse transcriptase, and a guide molecule.
  • the Cas polypeptide can lack nuclease activity.
  • the guide molecule can include a target binding sequence as well as a primer binding sequence and a template containing the edited polynucleotide sequence.
  • the guide molecule, Cas polypeptide, and/or reverse transcriptase can be coupled together or otherwise associate with each other to form an effector complex and edit a target sequence.
  • the Cas polypeptide is a Class 2, Type V Cas polypeptide.
  • the Cas polypeptide is a Cas9 polypeptide (e.g., is a Cas9 nickase). In some embodiments, the Cas polypeptide is fused to the reverse transcriptase. In some embodiments, the Cas polypeptide is linked to the reverse transcriptase.
  • the prime editing system can be a PEI system or variant thereof, a PE2 system or variant thereof, or a PE3 (e.g., PE3, PE3b) system. See e.g., Anzalone et al. 2019. Nature. 576: 149-157, particularly at pgs. 2-3, Figs. 2a, 3a-3f, 4a-4b, Extended data Figs. 3a-3b, 4,
  • the peg guide molecule can be about 10 to about 200 or more nucleotides in length, such as 10 to/or 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
  • a polynucleotide of the present invention described elsewhere herein can be modified using a CRISPR Associated Transposase (“CAST”) system.
  • CAST system can include a Cas protein that is catalytically inactive, or engineered to be catalytically active, and further comprises a transposase (or subunits thereof) that catalyze RNA-guided DNA transposition.
  • Such systems are able to insert DNA sequences at a target site in a DNA molecule without relying on host cell repair machinery.
  • CAST systems can be Class 1 or Class 2 CAST systems. An example Class 1 system is described in Klompe et al.
  • the CRISPR-Cas or Cas-Based system described herein can, in some embodiments, include one or more guide molecules.
  • guide molecule, guide sequence and guide polynucleotide refer to polynucleotides capable of guiding Cas to a target genomic locus and are used interchangeably as in foregoing cited documents such as WO 2014/093622 (PCT/US2013/074667).
  • a guide sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a CRISPR complex to the target sequence.
  • the guide molecule can be a polynucleotide.
  • a guide sequence within a nucleic acid-targeting guide RNA
  • a guide sequence may direct sequence-specific binding of a nucleic acid-targeting complex to a target nucleic acid sequence
  • the components of a nucleic acid-targeting CRISPR system sufficient to form a nucleic acid-targeting complex, including the guide sequence to be tested, may be provided to a host cell having the corresponding target nucleic acid sequence, such as by transfection with vectors encoding the components of the nucleic acid-targeting complex, followed by an assessment of preferential targeting (e.g., cleavage) within the target nucleic acid sequence, such as by Surveyor assay (Qui et al. 2004.
  • preferential targeting e.g., cleavage
  • cleavage of a target nucleic acid sequence may be evaluated in a test tube by providing the target nucleic acid sequence, components of a nucleic acid-targeting complex, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions.
  • Other assays are possible and will occur to those skilled in the art.
  • the guide molecule is an RNA.
  • the guide molecule(s) (also referred to interchangeably herein as guide polynucleotide and guide sequence) that are included in the CRISPR-Cas or Cas based system can be any polynucleotide sequence having sufficient complementarity with a target nucleic acid sequence to hybridize with the target nucleic acid sequence and direct sequence-specific binding of a nucleic acid-targeting complex to the target nucleic acid sequence.
  • the degree of complementarity when optimally aligned using a suitable alignment algorithm, can be about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
  • Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows- Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies; available at www.novocraft.com), ELAND (Illumina, San Diego, CA), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net).
  • any suitable algorithm for aligning sequences include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows- Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies; available at www.novocraft.com), ELAND (Illumina, San Diego, CA),
  • a guide sequence and hence a nucleic acid-targeting guide, may be selected to target any target nucleic acid sequence.
  • the target sequence may be DNA.
  • the target sequence may be any RNA sequence.
  • the target sequence may be a sequence within an RNA molecule selected from the group consisting of messenger RNA (mRNA), pre-mRNA, ribosomal RNA (rRNA), transfer RNA (tRNA), micro-RNA (miRNA), small interfering RNA (siRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), double stranded RNA (dsRNA), non-coding RNA (ncRNA), long non-coding RNA (IncRNA), and small cytoplasmatic RNA (scRNA).
  • mRNA messenger RNA
  • rRNA ribosomal RNA
  • tRNA transfer RNA
  • miRNA micro-RNA
  • siRNA small interfering RNA
  • snRNA small nuclear RNA
  • snoRNA small nucle
  • the target sequence may be a sequence within an RNA molecule selected from the group consisting of mRNA, pre-mRNA, and rRNA. In some preferred embodiments, the target sequence may be a sequence within an RNA molecule selected from the group consisting of ncRNA, and IncRNA. In some more preferred embodiments, the target sequence may be a sequence within an mRNA molecule or a pre-mRNA molecule.
  • a nucleic acid-targeting guide is selected to reduce the degree secondary structure within the nucleic acid-targeting guide. In some embodiments, about or less than about 75%, 50%, 40%, 30%, 25%, 20%, 15%, 10%, 5%, 1%, or fewer of the nucleotides of the nucleic acid-targeting guide participate in self-complementary base pairing when optimally folded. Optimal folding may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker and Stiegler (Nucleic Acids Res. 9 (1981), 133-148).
  • a guide RNA or crRNA may comprise, consist essentially of, or consist of a direct repeat (DR) sequence and a guide sequence or spacer sequence.
  • the guide RNA or crRNA may comprise, consist essentially of, or consist of a direct repeat sequence fused or linked to a guide sequence or spacer sequence.
  • the direct repeat sequence may be located upstream (i.e., 5’) from the guide sequence or spacer sequence. In other embodiments, the direct repeat sequence may be located downstream (i.e., 3’) from the guide sequence or spacer sequence.
  • the crRNA comprises a stem loop, preferably a single stem loop.
  • the direct repeat sequence forms a stem loop, preferably a single stem loop.
  • the spacer length of the guide RNA is from 15 to 35 nt. In certain embodiments, the spacer length of the guide RNA is at least 15 nucleotides. In certain embodiments, the spacer length is from 15 to 17 nt, e.g., 15, 16, or 17 nt, from 17 to 20 nt, e.g., 17, 18, 19, or 20 nt, from 20 to 24 nt, e.g., 20, 21, 22, 23, or 24 nt, from 23 to 25 nt, e.g., 23, 24, or 25 nt, from 24 to 27 nt, e.g., 24, 25, 26, or 27 nt, from 27 to 30 nt, e.g., 27, 28, 29, or 30 nt, from 30 to 35 nt, e.g., 30, 31, 32, 33, 34, or 35 nt, or 35 nt or longer.
  • the “tracrRNA” sequence or analogous terms includes any polynucleotide sequence that has sufficient complementarity with a crRNA sequence to hybridize.
  • the degree of complementarity between the tracrRNA sequence and crRNA sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher.
  • the tracr sequence is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length.
  • the tracr sequence and crRNA sequence are contained within a single transcript, such that hybridization between the two produces a transcript having a secondary structure, such as a hairpin.
  • degree of complementarity is with reference to the optimal alignment of the sea sequence and tracr sequence, along the length of the shorter of the two sequences.
  • Optimal alignment may be determined by any suitable alignment algorithm and may further account for secondary structures, such as self-complementarity within either the sea sequence or tracr sequence.
  • the degree of complementarity between the tracr sequence and sea sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher.
  • the degree of complementarity between a guide sequence and its corresponding target sequence can be about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or 100%; a guide or RNA or sgRNA can be about or more than about 5,
  • RNA or sgRNA can be less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length; and tracr RNA can be 30 or 50 nucleotides in length.
  • the degree of complementarity between a guide sequence and its corresponding target sequence is greater than 94.5% or 95% or 95.5% or 96% or 96.5% or 97% or 97.5% or 98% or 98.5% or 99% or 99.5% or 99.9%, or 100%.
  • Off target is less than 100% or 99.9% or 99.5% or 99% or 99% or 98.5% or 98% or 97.5% or 97% or 96.5% or 96% or 95.5% or 95% or 94.5% or 94% or 93% or 92% or 91% or 90% or 89% or 88% or 87% or 86% or 85% or 84% or 83% or 82% or 81% or 80% complementarity between the sequence and the guide, with it advantageous that off target is 100% or 99.9% or 99.5% or 99% or 99% or 98.5% or 98% or 97.5% or 97% or 96.5% or 96% or 95.5% or 95% or 94.5% complementarity between the sequence and the guide.
  • the guide RNA (capable of guiding Cas to a target locus) may comprise (1) a guide sequence capable of hybridizing to a genomic target locus in the eukaryotic cell; (2) a tracr sequence; and (3) a tracr mate sequence. All (1) to (3) may reside in a single RNA, i.e., an sgRNA (arranged in a 5’ to 3’ orientation), or the tracr RNA may be a different RNA than the RNA containing the guide and tracr sequence. The tracr hybridizes to the tracr mate sequence and directs the CRISPR/Cas complex to the target sequence.
  • each RNA may be optimized to be shortened from their respective native lengths, and each may be independently chemically modified to protect from degradation by cellular RNase or otherwise increase stability.
  • target sequence refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex.
  • a target sequence may comprise RNA polynucleotides.
  • target RNA refers to an RNA polynucleotide being or comprising the target sequence.
  • the target polynucleotide can be a polynucleotide or a part of a polynucleotide to which a part of the guide sequence is designed to have complementarity with and to which the effector function mediated by the complex comprising the CRISPR effector protein and a guide molecule is to be directed.
  • a target sequence is located in the nucleus or cytoplasm of a cell.
  • the guide sequence can specifically bind a target sequence in a target polynucleotide.
  • the target polynucleotide may be DNA.
  • the target polynucleotide may be RNA.
  • the target polynucleotide can have one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more) target sequences.
  • the target polynucleotide can be on a vector.
  • the target polynucleotide can be genomic DNA.
  • the target polynucleotide can be episomal. Other forms of the target polynucleotide are described elsewhere herein.
  • the target sequence may be DNA.
  • the target sequence may be any RNA sequence.
  • the target sequence may be a sequence within an RNA molecule selected from the group consisting of messenger RNA (mRNA), pre-mRNA, ribosomal RNA (rRNA), transfer RNA (tRNA), micro-RNA (miRNA), small interfering RNA (siRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), double stranded RNA (dsRNA), non-coding RNA (ncRNA), long non-coding RNA (IncRNA), and small cytoplasmatic RNA (scRNA).
  • mRNA messenger RNA
  • rRNA ribosomal RNA
  • tRNA transfer RNA
  • miRNA micro-RNA
  • siRNA small interfering RNA
  • snRNA small nuclear RNA
  • dsRNA small nucleolar RNA
  • dsRNA non-coding RNA
  • IncRNA long non-coding RNA
  • scRNA small
  • the target sequence (also referred to herein as a target polynucleotide) may be a sequence within an RNA molecule selected from the group consisting of mRNA, pre-mRNA, and rRNA. In some preferred embodiments, the target sequence may be a sequence within an RNA molecule selected from the group consisting of ncRNA, and IncRNA. In some more preferred embodiments, the target sequence may be a sequence within an mRNA molecule or a pre-mRNA molecule.
  • PAM elements are sequences that can be recognized and bound by Cas proteins. Cas proteins/effector complexes can then unwind the dsDNA at a position adjacent to the PAM element. It will be appreciated that Cas proteins and systems that include them that target RNA do not require PAM sequences (Marraffini et al. 2010. Nature. 463:568-571). Instead, many rely on PFSs, which are discussed elsewhere herein.
  • the target sequence should be associated with a PAM (protospacer adjacent motif) or PFS (protospacer flanking sequence or site), that is, a short sequence recognized by the CRISPR complex.
  • the target sequence should be selected, such that its complementary sequence in the DNA duplex (also referred to herein as the non-target sequence) is upstream or downstream of the PAM.
  • the complementary sequence of the target sequence is downstream or 3’ of the PAM or upstream or 5’ of the PAM.
  • the precise sequence and length requirements for the PAM differ depending on the Cas protein used, but PAMs are typically 2-5 base pair sequences adjacent the protospacer (that is, the target sequence). Examples of the natural PAM sequences for different Cas proteins are provided herein below and the skilled person will be able to identify further PAM sequences for use with a given Cas protein.
  • the CRISPR effector protein may recognize a 3’ PAM.
  • the CRISPR effector protein may recognize a 3’ PAM which is 5 ⁇ , wherein H is A, C or U.
  • engineering of the PAM Interacting (PI) domain on the Cas protein may allow programing of PAM specificity, improve target site recognition fidelity, and increase the versatility of the CRISPR-Cas protein, for example as described for Cas9 in Kleinstiver BP et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature. 2015 Jul 23;523(7561):481-5. doi: 10.1038/naturel4592. As further detailed herein, the skilled person will understand that Casl3 proteins may be modified analogously.
  • Gao et al “Engineered Cpfl Enzymes with Altered PAM Specificities,” bioRxiv 091611; doi: dx.doi.org/10.1101/091611 (Dec. 4, 2016).
  • Doench et al. created a pool of sgRNAs, tiling across all possible target sites of a panel of six endogenous mouse and three endogenous human genes and quantitatively assessed their ability to produce null alleles of their target gene by antibody staining and flow cytometry. The authors showed that optimization of the PAM improved activity and also provided an on-line tool for designing sgRNAs.
  • PAM sequences can be identified in a polynucleotide using an appropriate design tool, which are commercially available as well as online.
  • Such freely available tools include, but are not limited to, CRISPRFinder and CRISPRTarget. Mojica et al. 2009. Microbiol. 155(Pt. 3):733-740; Atschul et al. 1990. J. Mol. Biol. 215:403-410; Biswass et al. 2013 RNA Biol. 10:817-827; and Grissa et al. 2007. Nucleic Acid Res. 35:W52-57.
  • Experimental approaches to PAM identification can include, but are not limited to, plasmid depletion assays (Jiang et al. 2013. Nat.
  • CRISPR-Cas systems that target RNA do not typically rely on PAM sequences. Instead, such systems typically recognize protospacer flanking sites (PFSs) instead of PAMs.
  • Type VI CRISPR-Cas systems typically recognize protospacer flanking sites (PFSs) instead of PAMs.
  • PFSs represents an analogue to PAMs for RNA targets.
  • Type VI CRISPR-Cas systems employ a Casl3.
  • Some Cas13 proteins analyzed to date, such as Cas13a (C2c2) identified from Leptotrichia shahii (LShCAsl3a) have a specific discrimination against G at the 3’ end of the target RNA.
  • RNA Biology. 16(4): 504-517 The presence of a C at the corresponding crRNA repeat site can indicate that nucleotide pairing at this position is rejected.
  • some Cas13 proteins e.g., LwaCAsl3a and PspCasl3b
  • Type VI proteins such as subtype B have 5 '-recognition of D (G, T, A) and a 3 '-motif requirement of NAN or NNA.
  • D D
  • NAN NNA
  • Cas13b protein identified in Bergeyella zoohelcum (BzCas13b). See e.g., Gleditzsch et al. 2019. RNA Biology. 16(4):504-517.
  • target sequence e.g., target sequence recognition than those that target DNA (e.g., Type V and type II).
  • the polynucleotide is modified using a Zinc Finger nuclease or system thereof.
  • a Zinc Finger nuclease or system thereof One type of programmable DNA-binding domain is provided by artificial zinc- finger (ZF) technology, which involves arrays of ZF modules to target new DNA-binding sites in the genome. Each finger module in a ZF array targets three DNA bases. A customized array of individual zinc finger domains is assembled into a ZF protein (ZFP).
  • ZFP ZF protein
  • ZFPs can comprise a functional domain.
  • the first synthetic zinc finger nucleases (ZFNs) were developed by fusing a ZF protein to the catalytic domain of the Type IIS restriction enzyme Fokl. (Kim, Y. G. et al., 1994, Chimeric restriction endonuclease, Proc. Natl. Acad. Sci. U.S.A. 91, 883-887; Kim, Y. G. et al., 1996, Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain. Proc. Natl. Acad. Sci. U.S.A. 93, 1156-1160).
  • ZFPs can also be designed as transcription activators and repressors and have been used to target many genes in a wide variety of organisms. Exemplary methods of genome editing using ZFNs can be found for example inU.S. Patent Nos. 6,534,261, 6,607,882, 6,746,838,
  • a TALE nuclease or TALE nuclease system can be used to modify a polynucleotide.
  • the methods provided herein use isolated, non- naturally occurring, recombinant or engineered DNA binding proteins that comprise TALE monomers or TALE monomers or half monomers as a part of their organizational structure that enable the targeting of nucleic acid sequences with improved efficiency and expanded specificity.
  • Naturally occurring TALEs or “wild type TALEs” are nucleic acid binding proteins secreted by numerous species of proteobacteria.
  • TALE polypeptides contain a nucleic acid binding domain composed of tandem repeats of highly conserved monomer polypeptides that are predominantly 33, 34 or 35 amino acids in length and that differ from each other mainly in amino acid positions 12 and 13.
  • the nucleic acid is DNA.
  • polypeptide monomers “TALE monomers” or “monomers” will be used to refer to the highly conserved repetitive polypeptide sequences within the TALE nucleic acid binding domain and the term “repeat variable di-residues” or “RVD” will be used to refer to the highly variable amino acids at positions 12 and 13 of the polypeptide monomers.
  • the amino acid residues of the RVD are depicted using the IUPAC single letter code for amino acids.
  • a general representation of a TALE monomer which is comprised within the DNA binding domain is X 1-11 -(X 12 X 13 )-X 14-33 or 34 or 35, where the subscript indicates the amino acid position and X represents any amino acid.
  • X12X13 indicate the RVDs.
  • the variable amino acid at position 13 is missing or absent and in such monomers, the RVD consists of a single amino acid.
  • the RVD may be alternatively represented as X*, where X represents X12 and (*) indicates that X13 is absent.
  • the DNA binding domain comprises several repeats of TALE monomers and this may be represented as ( X 1-11 -(X 12 X 13 )-X 14- 33 or 34 or 35) z , where in an advantageous embodiment, z is at least 5 to 40. In a further advantageous embodiment, z is at least 10 to 26.
  • the TALE monomers can have a nucleotide binding affinity that is determined by the identity of the amino acids in its RVD.
  • polypeptide monomers with an RVD of NI can preferentially bind to adenine (A)
  • monomers with an RVD of NG can preferentially bind to thymine (T)
  • monomers with an RVD of HD can preferentially bind to cytosine (C)
  • monomers with an RVD of NN can preferentially bind to both adenine (A) and guanine (G).
  • monomers with an RVD of IG can preferentially bind to T.
  • the number and order of the polypeptide monomer repeats in the nucleic acid binding domain of a TALE determines its nucleic acid target specificity.
  • monomers with an RVD of NS can recognize all four base pairs and can bind to A, T, G or C.
  • the structure and function of TALEs is further described in, for example, Moscou et al., Science 326:1501 (2009); Boch et al., Science 326:1509-1512 (2009); and Zhang et al., Nature Biotechnology 29:149-153 (2011).
  • polypeptides used in methods of the invention can be isolated, non-naturally occurring, recombinant or engineered nucleic acid-binding proteins that have nucleic acid or DNA binding regions containing polypeptide monomer repeats that are designed to target specific nucleic acid sequences.
  • polypeptide monomers having an RVD of HN or NH preferentially bind to guanine and thereby allow the generation of TALE polypeptides with high binding specificity for guanine containing target nucleic acid sequences.
  • polypeptide monomers having RVDs RN, NN, NK, SN, NH, KN, HN, NQ, HH, RG, KH, RH and SS can preferentially bind to guanine.
  • polypeptide monomers having RVDs RN, NK, NQ, HH, KH, RH, SS and SN can preferentially bind to guanine and can thus allow the generation of TALE polypeptides with high binding specificity for guanine containing target nucleic acid sequences.
  • polypeptide monomers having RVDs HH, KH, NH, NK, NQ, RH, RN and SS can preferentially bind to guanine and thereby allow the generation of TALE polypeptides with high binding specificity for guanine containing target nucleic acid sequences.
  • the RVDs that have high binding specificity for guanine are RN, NH RH and KH.
  • polypeptide monomers having an RVD of NV can preferentially bind to adenine and guanine.
  • monomers having RVDs of H*, HA, KA, N*, NA, NC, NS, RA, and S* bind to adenine, guanine, cytosine and thymine with comparable affinity.
  • the predetermined N-terminal to C-terminal order of the one or more polypeptide monomers of the nucleic acid or DNA binding domain determines the corresponding predetermined target nucleic acid sequence to which the polypeptides of the invention will bind.
  • the monomers and at least one or more half monomers are “specifically ordered to target” the genomic locus or gene of interest.
  • TALE binding sites In plant genomes, the natural TALE-binding sites always begin with a thymine (T), which may be specified by a cryptic signal within the non- repetitive N-terminus of the TALE polypeptide; in some cases, this region may be referred to as repeat 0
  • TALE binding sites do not necessarily have to begin with a thymine (T) and polypeptides of the invention may target DNA sequences that begin with T, A, G or C.
  • the tandem repeat of TALE monomers always ends with a half-length repeat or a stretch of sequence that may share identity with only the first 20 amino acids of a repetitive full-length TALE monomer and this half repeat may be referred to as a half-monomer. Therefore, it follows that the length of the nucleic acid or DNA being targeted is equal to the number of full monomers plus two.
  • TALE polypeptide binding efficiency may be increased by including amino acid sequences from the “capping regions” that are directly N-terminal or C-terminal of the DNA binding region of naturally occurring TALEs into the engineered TALEs at positions N-terminal or C-terminal of the engineered TALE DNA binding region.
  • the TALE polypeptides described herein further comprise an N-terminal capping region and/or a C-terminal capping region.
  • N-terminal capping region An exemplary amino acid sequence of a N-terminal capping region is:
  • the DNA binding domain comprising the repeat TALE monomers and the C-terminal capping region provide structural basis for the organization of different domains in the d-TALEs or polypeptides of the invention.
  • N-terminal and/or C-terminal capping regions are not necessary to enhance the binding activity of the DNA binding region. Therefore, in certain embodiments, fragments of the N-terminal and/or C-terminal capping regions are included in the TALE polypeptides described herein.
  • the TALE polypeptides described herein contain a N-terminal capping region fragment that included at least 10, 20, 30, 40, 50, 54, 60, 70, 80, 87, 90, 94, 100, 102, 110, 117, 120, 130, 140, 147, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260 or 270 amino acids of an N-terminal capping region.
  • the N-terminal capping region fragment amino acids are of the C-terminus (the DNA-binding region proximal end) of an N-terminal capping region.
  • N-terminal capping region fragments that include the C-terminal 240 amino acids enhance binding activity equal to the full length capping region, while fragments that include the C-terminal 147 amino acids retain greater than 80% of the efficacy of the full length capping region, and fragments that include the C-terminal 117 amino acids retain greater than 50% of the activity of the full- length capping region.
  • the TALE polypeptides described herein contain a C-terminal capping region fragment that included at least 6, 10, 20, 30, 37, 40, 50, 60, 68, 70, 80, 90, 100, 110, 120, 127, 130, 140, 150, 155, 160, 170, 180 amino acids of a C-terminal capping region.
  • the C-terminal capping region fragment amino acids are of the N-terminus (the DNA-binding region proximal end) of a C-terminal capping region.
  • C-terminal capping region fragments that include the C-terminal 68 amino acids enhance binding activity equal to the full-length capping region, while fragments that include the C-terminal 20 amino acids retain greater than 50% of the efficacy of the full-length capping region.
  • the capping regions of the TALE polypeptides described herein do not need to have identical sequences to the capping region sequences provided herein.
  • the capping region of the TALE polypeptides described herein have sequences that are at least 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identical or share identity to the capping region amino acid sequences provided herein.
  • Sequence identity is related to sequence homology. Homology comparisons may be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs may calculate percent (%) homology between two or more sequences and may also calculate the sequence identity shared by two or more amino acid or nucleic acid sequences.
  • the capping region of the TALE polypeptides described herein have sequences that are at least 95% identical or share identity to the capping region amino acid sequences provided herein.
  • Sequence homologies can be generated by any of a number of computer programs known in the art, which include, but are not limited to, BLAST or FASTA. Suitable computer programs for carrying out alignments like the GCG Wisconsin Bestfit package may also be used. Once the software has produced an optimal alignment, it is possible to calculate % homology, preferably % sequence identity. The software typically does this as part of the sequence comparison and generates a numerical result.
  • the TALE polypeptides of the invention include a nucleic acid binding domain linked to the one or more effector domains.
  • effector domain or “regulatory and functional domain” refer to a polypeptide sequence that has an activity other than binding to the nucleic acid sequence recognized by the nucleic acid binding domain.
  • the polypeptides of the invention may be used to target the one or more functions or activities mediated by the effector domain to a particular target DNA sequence to which the nucleic acid binding domain specifically binds.
  • the activity mediated by the effector domain is a biological activity.
  • the effector domain is a transcriptional inhibitor (i.e., a repressor domain), such as an mSin interaction domain (SID). SID4X domain or a Kriippel-associated box (KRAB) or fragments of the KRAB domain.
  • the effector domain is an enhancer of transcription (i.e., an activation domain), such as the VP16, VP64 or p65 activation domain.
  • the nucleic acid binding is linked, for example, with an effector domain that includes, but is not limited to, a transposase, integrase, recombinase, resolvase, invertase, protease, DNA methyltransferase, DNA demethylase, histone acetylase, histone deacetylase, nuclease, transcriptional repressor, transcriptional activator, transcription factor recruiting, protein nuclear-localization signal or cellular uptake signal.
  • an effector domain that includes, but is not limited to, a transposase, integrase, recombinase, resolvase, invertase, protease, DNA methyltransferase, DNA demethylase, histone acetylase, histone deacetylase, nuclease, transcriptional repressor, transcriptional activator, transcription factor recruiting, protein nuclear-localization signal or cellular uptake signal
  • the effector domain is a protein domain which exhibits activities which include but are not limited to transposase activity, integrase activity, recombinase activity, resolvase activity, invertase activity, protease activity, DNA methyltransferase activity, DNA demethylase activity, histone acetylase activity, histone deacetylase activity, nuclease activity, nuclear-localization signaling activity, transcriptional repressor activity, transcriptional activator activity, transcription factor recruiting activity, or cellular uptake signaling activity.
  • Other preferred embodiments of the invention may include any combination of the activities described herein.
  • a meganuclease or system thereof can be used to modify a polynucleotide.
  • Meganucleases which are endodeoxyribonucleases characterized by a large recognition site (double-stranded DNA sequences of 12 to 40 base pairs). Exemplary methods for using meganucleases can be found in US Patent Nos. 8,163,514, 8,133,697, 8,021,867, 8,119,361, 8,119,381, 8,124,369, and 8,129,134, which are specifically incorporated by reference.
  • one or more components in the composition for engineering cells may comprise one or more sequences related to nucleus targeting and transportation. Such sequence may facilitate the one or more components in the composition for targeting a sequence within a cell.
  • sequences may facilitate the one or more components in the composition for targeting a sequence within a cell.
  • NLSs nuclear localization sequences
  • the NLSs used in the context of the present disclosure are heterologous to the proteins.
  • Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence (SEQ ID NO: 3) or (SEQ ID NO: 4); the NLS from nucleoplasmin (e.g., the nucleoplasmin bipartite NLS with the sequence (SEQ ID NO: 5)); the c-myc NLS having the amino acid sequence (SEQ ID NO: 6) or (SEQ ID NO: 7); the hRNPAl M9 NLS having the sequence (SEQ ID NO: 8); the sequence (SEQ ID NO: 9) of the IBB domain from importin-alpha; the sequences (SEQ ID NO: 10) and (SEQ ID NO: 11) of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO: 12) of human p53; the sequence (SEQ ID NO: 13)
  • the one or more NLSs are of sufficient strength to drive accumulation of the DNA-targeting Cas protein in a detectable amount in the nucleus of a eukaryotic cell.
  • strength of nuclear localization activity may derive from the number of NLSs in the CRISPR-Cas protein, the particular NLS(s) used, or a combination of these factors.
  • Detection of accumulation in the nucleus may be performed by any suitable technique.
  • a detectable marker may be fused to the nucleic acid-targeting protein, such that location within a cell may be visualized, such as in combination with a means for detecting the location of the nucleus (e.g., a stain specific for the nucleus such as DAPI).
  • Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly, such as by an assay for the effect of nucleic acidtargeting complex formation (e.g., assay for deaminase activity) at the target sequence, or assay for altered gene expression activity affected by DNA-targeting complex formation and/or DNA- targeting), as compared to a control not exposed to the CRISPR-Cas protein and deaminase protein, or exposed to a CRISPR-Cas and/or deaminase protein lacking the one or more NLSs.
  • an assay for the effect of nucleic acidtargeting complex formation e.g., assay for deaminase activity
  • assay for altered gene expression activity affected by DNA-targeting complex formation and/or DNA- targeting assay for altered gene expression activity affected by DNA-targeting complex formation
  • the CRISPR-Cas and/or nucleotide deaminase proteins may be provided with 1 or more, such as with, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more heterologous NLSs.
  • the proteins comprises about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the amino-terminus, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the carboxy-terminus, or a combination of these (e.g., zero or at least one or more NLS at the amino-terminus and zero or at one or more NLS at the carboxy terminus).
  • an NLS is considered near the N- or C-terminus when the nearest amino acid of the NLS is within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, or more amino acids along the polypeptide chain from the N- or C-terminus.
  • an NLS attached to the C-terminal of the protein.
  • the CRISPR-Cas protein and the deaminase protein are delivered to the cell or expressed within the cell as separate proteins.
  • each of the CRISPR-Cas and deaminase protein can be provided with one or more NLSs as described herein.
  • the CRISPR-Cas and deaminase proteins are delivered to the cell or expressed with the cell as a fusion protein.
  • one or both of the CRISPR- Cas and deaminase protein is provided with one or more NLSs.
  • the one or more NLS can be provided on the adaptor protein, provided that this does not interfere with aptamer binding.
  • the one or more NLS sequences may also function as linker sequences between the nucleotide deaminase and the CRISPR-Cas protein.
  • guides of the disclosure comprise specific binding sites (e.g. aptamers) for adapter proteins, which may be linked to or fused to an nucleotide deaminase or catalytic domain thereof.
  • a guide forms a CRISPR complex (e.g., CRISPR-Cas protein binding to guide and target) the adapter proteins bind and, the nucleotide deaminase or catalytic domain thereof associated with the adapter protein is positioned in a spatial orientation which is advantageous for the attributed function to be effective.
  • the one or more modified guide may be modified at the tetra loop, the stem loop 1, stem loop 2, or stem loop 3, as described herein, preferably at either the tetra loop or stem loop 2, and in some cases at both the tetra loop and stem loop 2.
  • a component in the systems may comprise one or more nuclear export signals (NES), one or more nuclear localization signals (NLS), or any combinations thereof.
  • the NES may be an HIV Rev NES.
  • the NES may be MAPK NES.
  • the component is a protein, the NES or NLS may be at the C terminus of component. Alternatively, or additionally, the NES or NLS may be at the N terminus of component.
  • the Cas protein and optionally said nucleotide deaminase protein or catalytic domain thereof comprise one or more heterologous nuclear export signal(s) (NES(s)) or nuclear localization signal(s) (NLS(s)), preferably an HIV Rev NES or MAPK NES, preferably C-terminal.
  • the composition for engineering cells comprises a template, e.g., a recombination template.
  • a template may be a component of another vector as described herein, contained in a separate vector, or provided as a separate polynucleotide.
  • a recombination template is designed to serve as a template in homologous recombination, such as within or near a target sequence nicked or cleaved by a nucleic acid-targeting effector protein as a part of a nucleic acid-targeting complex.
  • the template nucleic acid alters the sequence of the target position. In an embodiment, the template nucleic acid results in the incorporation of a modified, or non- naturally occurring base into the target nucleic acid.
  • the template sequence may undergo a breakage mediated or catalyzed recombination with the target sequence.
  • the template nucleic acid may include sequence that corresponds to a site on the target sequence that is cleaved by a Cas protein mediated cleavage event.
  • the template nucleic acid may include sequence that corresponds to both, a first site on the target sequence that is cleaved in a first Cas protein mediated event, and a second site on the target sequence that is cleaved in a second Cas protein mediated event.
  • the template nucleic acid can include sequence which results in an alteration in the coding sequence of a translated sequence, e.g., one which results in the substitution of one amino acid for another in a protein product, e.g., transforming a mutant allele into a wild type allele, transforming a wild type allele into a mutant allele, and/or introducing a stop codon, insertion of an amino acid residue, deletion of an amino acid residue, or a nonsense mutation.
  • the template nucleic acid can include sequence which results in an alteration in a non-coding sequence, e.g., an alteration in an exon or in a 5' or 3' non-translated or non-transcribed region.
  • Such alterations include an alteration in a control element, e.g., a promoter, enhancer, and an alteration in a cis-acting or trans-acting control element.
  • a template nucleic acid having homology with a target position in a target gene may be used to alter the structure of a target sequence.
  • the template sequence may be used to alter an unwanted structure, e.g., an unwanted or mutant nucleotide.
  • the template nucleic acid may include sequence which, when integrated, results in: decreasing the activity of a positive control element; increasing the activity of a positive control element; decreasing the activity of a negative control element; increasing the activity of a negative control element; decreasing the expression of a gene; increasing the expression of a gene; increasing resistance to a disorder or disease; increasing resistance to viral entry; correcting a mutation or altering an unwanted amino acid residue conferring, increasing, abolishing or decreasing a biological property of a gene product, e.g., increasing the enzymatic activity of an enzyme, or increasing the ability of a gene product to interact with another molecule.
  • the template nucleic acid may include sequence which results in: a change in sequence of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12 or more nucleotides of the target sequence.
  • a template polynucleotide may be of any suitable length, such as about or more than about 10, 15, 20, 25, 50, 75, 100, 150, 200, 500, 1000, or more nucleotides in length.
  • the template nucleic acid may be 20+/- 10, 30+/- 10, 40+/- 10, 50+/- 10, 60+/- 10, 70+/- 10, 80+/- 10, 90+/- 10, 100+/- 10, 1 10+/- 10, 120+/- 10, 130+/- 10, 140+/- 10, 150+/- 10, 160+/- 10, 170+/- 10, 1 80+/- 10, 190+/- 10, 200+/- 10, 210+/- 10, of 220+/- 10 nucleotides in length.
  • the template nucleic acid may be 30+/-20, 40+/-20, 50+/-20, 60+/-20, 70+/- 20, 80+/-20, 90+/-20, 100+/-20, 1 10+/-20, 120+/-20, 130+/-20, 140+/-20, 150+/-20, 160+/- 20, 170+/-20, 180+/-20, 190+/-20, 200+/-20, 210+/-20, of 220+/-20 nucleotides in length.
  • the template nucleic acid is 10 to 1 ,000, 20 to 900, 30 to 800, 40 to 700, 50 to 600, 50 to 500, 50 to 400, 50 to300, 50 to 200, or 50 to 100 nucleotides in length.
  • the template polynucleotide is complementary to a portion of a polynucleotide comprising the target sequence.
  • a template polynucleotide might overlap with one or more nucleotides of a target sequences (e.g., about or more than about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 or more nucleotides).
  • the nearest nucleotide of the template polynucleotide is within about 1, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 1000, 5000, 10000, or more nucleotides from the target sequence.
  • the exogenous polynucleotide template comprises a sequence to be integrated (e.g., a mutated gene).
  • the sequence for integration may be a sequence endogenous or exogenous to the cell. Examples of a sequence to be integrated include polynucleotides encoding a protein or a noncoding RNA (e.g., a microRNA).
  • the sequence for integration may be operably linked to an appropriate control sequence or sequences.
  • the sequence to be integrated may provide a regulatory function.
  • An upstream or downstream sequence may comprise from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or 2500 bp.
  • the exemplary upstream or downstream sequence have about 200 bp to about 2000 bp, about 600 bp to about 1000 bp, or more particularly about 700 bp to about 1000.
  • An upstream or downstream sequence may comprise from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or 2500 bp.
  • the exemplary upstream or downstream sequence have about 200 bp to about 2000 bp, about 600 bp to about 1000 bp, or more particularly about 700 bp to about 1000 [0261]
  • one or both homology arms may be shortened to avoid including certain sequence repeat elements.
  • a 5' homology arm may be shortened to avoid a sequence repeat element.
  • a 3' homology arm may be shortened to avoid a sequence repeat element.
  • both the 5' and the 3' homology arms may be shortened to avoid including certain sequence repeat elements.
  • the exogenous polynucleotide template may further comprise a marker.
  • a marker may make it easy to screen for targeted integrations. Examples of suitable markers include restriction sites, fluorescent proteins, or selectable markers.
  • the exogenous polynucleotide template of the disclosure can be constructed using recombinant techniques (see, for example, Sambrook et al., 2001 and Ausubel et al., 1996).
  • a template nucleic acid for correcting a mutation may be designed for use as a single-stranded oligonucleotide.
  • 5' and 3' homology arms may range up to about 200 base pairs (bp) in length, e.g., at least 25, 50, 75, 100, 125, 150, 175, or 200 bp in length.
  • a template nucleic acid for correcting a mutation may be designed for use with a homology-independent targeted integration system.
  • Suzuki et al. describe in vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration (2016, Nature 540:144-149).
  • Schmid-Burgk, et al. describe use of the CRISPR-Cas9 system to introduce a double-strand break (DSB) at a user-defined genomic location and insertion of a universal donor DNA (Nat Commun. 2016 Jul 28;7: 12338).
  • Gao, et al. describe “Plug-and-Play Protein Modification Using Homology-Independent Universal Genome Engineering” (Neuron. 2019 Aug 21;103(4):583-597).
  • the genetic modulating agents may be interfering RNAs.
  • diseases caused by a dominant mutation in a gene is targeted by silencing the mutated gene using RNAi.
  • the nucleotide sequence may comprise coding sequence for one or more interfering RNAs.
  • the nucleotide sequence may be interfering RNA (RNAi).
  • RNAi refers to any type of interfering RNA, including but not limited to, siRNAi, shRNAi, endogenous microRNA and artificial microRNA.
  • RNAi can include both gene silencing RNAi molecules, and also RNAi effector molecules which activate the expression of a gene.
  • a modulating agent may comprise silencing one or more endogenous genes.
  • siRNA or miRNA refers to a decrease in the mRNA level in a cell for a target gene by at least about 5%, about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 95%, about 99%, about 100% of the mRNA level found in the cell without the presence of the miRNA or RNA interference molecule.
  • the mRNA levels are decreased by at least about 70%, about 80%, about 90%, about 95%, about 99%, about 100%.
  • a “siRNA” refers to a nucleic acid that forms a double stranded RNA, which double stranded RNA has the ability to reduce or inhibit expression of a gene or target gene when the siRNA is present or expressed in the same cell as the target gene.
  • the double stranded RNA siRNA can be formed by the complementary strands.
  • a siRNA refers to a nucleic acid that can form a double stranded siRNA.
  • the sequence of the siRNA can correspond to the full-length target gene, or a subsequence thereof.
  • the siRNA is at least about 15- 50 nucleotides in length (e.g., each complementary sequence of the double stranded siRNA is about 15-50 nucleotides in length, and the double stranded siRNA is about 15-50 base pairs in length, preferably about 19-30 base nucleotides, preferably about 20-25 nucleotides in length, e.g., 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides in length).
  • shRNA small hairpin RNA
  • stem loop is a type of siRNA.
  • these shRNAs are composed of a short, e.g., about 19 to about 25 nucleotide, antisense strand, followed by a nucleotide loop of about 5 to about 9 nucleotides, and the analogous sense strand.
  • the sense strand can precede the nucleotide loop structure and the antisense strand can follow.
  • microRNA or “miRNA”, used interchangeably herein, are endogenous RNAs, some of which are known to regulate the expression of protein-coding genes at the posttranscriptional level. Endogenous microRNAs are small RNAs naturally present in the genome that are capable of modulating the productive utilization of mRNA.
  • artificial microRNA includes any type of RNA sequence, other than endogenous microRNA, which is capable of modulating the productive utilization of mRNA. MicroRNA sequences have been described in publications such as Lim, et al., Genes & Development, 17, p.
  • miRNA-like stem-loops can be expressed in cells as a vehicle to deliver artificial miRNAs and short interfering RNAs (siRNAs) for the purpose of modulating the expression of endogenous genes through the miRNA and or RNAi pathways.
  • siRNAs short interfering RNAs
  • double stranded RNA or “dsRNA” refers to RNA molecules that are comprised of two strands. Double-stranded molecules include those comprised of a single RNA molecule that doubles back on itself to form a two-stranded structure. For example, the stem loop structure of the progenitor molecules from which the single-stranded miRNA is derived, called the pre-miRNA (Bartel et al. 2004. Cell 1 16:281 -297), comprises a dsRNA molecule.
  • the pre-miRNA Bartel et al. 2004. Cell 1 16:281 -297
  • the cell subset frequency and/or differential cell states can be detected for screening of novel therapeutic agents.
  • the present invention can be used to identify improved treatments by monitoring the identified cell states in a subject undergoing an experimental treatment.
  • an organoid system is used to detect shifts in the identified cell states to identify agents capable of shifting a subject from a severe disease state to a mild/moderate state (see, e.g., Yin X, Mead BE, Safaee H, Langer R, Karp JM, Levy O. Engineering Stem Cell Organoids. Cell Stem Cell. 2016; 18(l):25-38).
  • organoid or “epithelial organoid” refers to a cell cluster or aggregate that resembles an organ, or part of an organ, and possesses cell types relevant to that particular organ.
  • Organoid systems have been described previously, for example, for brain, retinal, stomach, lung, thyroid, small intestine, colon, liver, kidney, pancreas, prostate, mammary gland, fallopian tube, taste buds, salivary glands, and esophagus (see, e.g., Clevers, Modeling Development and Disease with Organoids, Cell. 2016 Jun 16;165(7): 1586-1597).
  • a tissue system or tissue explant is used to detect shifts in the identified cell states to identify agents capable of shifting a subject from a severe disease state to a mild/moderate state (see, e.g., Grivel JC, Margolis L. Use of human tissue explants to study human infectious agents. Nat Protoc. 2009;4(2):256-269).
  • an animal model is used to detect shifts in the identified cell states to identify agents capable of shifting a subject from a severe disease state to a mild/moderate state (see, e.g., Munoz-Fontela C, Dowling WE, Funnell SGP, et al. Animal models for COVID-19. Nature. 2020;586(7830):509-515).
  • candidate agents are screened.
  • agent broadly encompasses any condition, substance or agent capable of modulating one or more phenotypic aspects of a cell or cell population as disclosed herein. Such conditions, substances or agents may be of physical, chemical, biochemical and/or biological nature.
  • candidate agent refers to any condition, substance or agent that is being examined for the ability to modulate one or more phenotypic aspects of a cell or cell population as disclosed herein in a method comprising applying the candidate agent to the cell or cell population (e.g., exposing the cell or cell population to the candidate agent or contacting the cell or cell population with the candidate agent) and observing whether the desired modulation takes place.
  • Agents may include any potential class of biologically active conditions, substances or agents, such as for instance antibodies, proteins, peptides, nucleic acids, oligonucleotides, small molecules, or combinations thereof, as described herein.
  • therapeutic agent refers to a molecule or compound that confers some beneficial effect upon administration to a subject.
  • the beneficial effect includes enablement of diagnostic determinations; amelioration of a disease, symptom, disorder, or pathological condition; reducing or preventing the onset of a disease, symptom, disorder or condition; and generally counteracting a disease, symptom, disorder or pathological condition.
  • the present invention provides for gene signature screening to identify agents that shift expression of the gene targets described herein (e.g., cell subset markers and differentially expressed genes).
  • the concept of signature screening was introduced by Stegmaier et al. (Gene expression-based high-throughput screening (GE-HTS) and application to leukemia differentiation. Nature Genet. 36, 257-263 (2004)), who realized that if a gene- expression signature was the proxy for a phenotype of interest, it could be used to find small molecules that effect that phenotype without knowledge of a validated drug target.
  • the gene signatures or biological programs of the present invention may be used to screen for drugs that reduce the signature or biological program in cells as described herein.
  • the Connectivity Map is a collection of genome-wide transcriptional expression data from cultured human cells treated with bioactive small molecules and simple pattern-matching algorithms that together enable the discovery of functional connections between drugs, genes and diseases through the transitory feature of common gene-expression changes (see, Lamb et al., The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease. Science 29 Sep 2006: Vol. 313, Issue 5795, pp. 1929-1935, DOI: 10.1126/science.1132939; and Lamb, T, The Connectivity Map: a new tool for biomedical research. Nature Reviews Cancer January 2007: Vol. 7, pp. 54-60).
  • Cmap can be used to identify small molecules capable of modulating a gene signature or biological program of the present invention in silico.
  • NP Nasopharyngeal
  • UMMC University of Mississippi Medical Center
  • This cohort consisted of 35 individuals who had a positive SARS-CoV-2 PCR NP swab on the day of hospital presentation.
  • a Control group consisted of 15 individuals who were asymptomatic and had a negative SARS-CoV-2 NP PCR, 6 intubated individuals in the intensive care unit without a recent history of COVID-19 and negative SARS-CoV-2 NP PCR, and 2 additional individuals with recent history of COVID-19 and negative SARS-CoV-2 NP PCR, classified as “Convalescent” (Table 6, see Methods for full inclusion and exclusion criteria).
  • stromal cell populations such as endothelial cells, fibroblasts, or pericytes, which were found in previous scRNA-seq datasets from nasal epithelial surgical samples 47,48 .
  • Basal Cells Applicants readily identified both Basal Cells by their expression of canonical marker genes including TP63, KRT15, KRT5 , as well as Mitotic Basal Cells based on the added expression of genes involved in the cell cycle such asMKI67, and TOP2A ( Figure 1F).
  • Applicants also distinguished between goblet and secretory cells based on expression of MUC5AC-expressing goblet, and BPIFA1-expressing secretory cells.
  • Applicants also resolved a population of ionocytes, a recently-identified specialized subtype of secretory cell present in respiratory epithelia defined by expression of transcription factors FOXI1 and FOXI2 , as well as CTFR - thus thought to play a role in mucous viscosity 49,50 .
  • Squamous cells were identified by their expression of SCEL , as well as multiple SPPP- genes, and likely derive from pharyngeal/oral squamous cells as well those within the nasal epithelium.
  • GIP gastric inhibitory polypeptide
  • Ciliated cells were the most numerous epithelial cell type recovered in this dataset, defined by expression of transcription factor FOXJ1 as well as numerous genes involved in the formation of cilia, e.g., DLEC1, DNAH11, and CFAP43. Similar to intermediate/developing cells of the secretory and goblet lineage, Applicants also identified two populations of precursor ciliated cells. One, termed Developing Ciliated Cells, which expressed canonical Ciliated Cell genes such as FOXJ1 , CAPSL, and FIFO, however lower than mature Ciliated Cells and without the expression of cilia-forming genes.
  • DEUP1 a cluster defined by expression of DEUP1, which is critical for centriole amplification as a precursor to cilium assembly.
  • CCNO cytoplasmic factor
  • CDC20 cytoplasmic factor-like cells
  • FOXN4 cytoplasmic factor-like cells
  • HES6 a recently-defined cell type termed Deuterosomal Cells 48 , which represent an intermediate cell type in which Secretory cells trans-differentiate into Ciliated Cells.
  • Immune cells represent a minority of recovered cells, yet Applicants resolved multiple distinct clusters and cell types, representing major myeloid and lymphoid populations.
  • lymphoid cells Applicants recovered T cells, identified by CD3E, CD2, TRBC2 expression, and B cells, identified by MS4A1, CD79A, CD79B expression.
  • myeloid cell types Applicants recovered a large population of Macrophages ( CD14 , FCGR3A, VCAN ), Dendritic Cells ( CCR7 , CD86 ), and Plasmacytoid DCs ( IRF7 , IL3RA ).
  • SARS-CoV-2 spike protein requires “priming” or cleavage by host proteases to enable membrane fusion and viral release into the cell, since early 2020, researchers have identified TMPRSS2, TMPRSS4, CTSL, and FURIN as capable of spike protein cleavage and critical for viral entry 51 .
  • TMPRSS2 thought to be the principal host factor for SARS-CoV-2 S cleavage, is found in highest abundance on Squamous Cells, followed by modest expression on all other epithelial cell types.
  • CTSL and other cathepsins was found across diverse epithelial and myeloid cell types.
  • ANPEP and DPP4 host receptors targeted by other Human coronaviruses causing upper respiratory diseases, are found primarily on Goblet Cells and Secretory Cells.
  • CDHR3 the receptor utilized by Rhinovirus C, is found primarily on Ciliated Cells and Developing Ciliated Cells.
  • Deuterosomal cells which represent a developmental intermediate as secretory/goblet cells trans-differentiate into ciliated cells, were significantly increased among Control WHO 7-8, COVID-19 WHO 1-5, and COVID-19 WHO 6-8 samples, with the strongest increases observed from participants with severe COVID-19 compared to healthy controls ( Figure 1L). Likewise, Developing Ciliated Cells were significantly increased among participants with severe COVID-19 ( Figure 1M).
  • Example 2 Epithelial Diversity and Remodeling Following SARS-CoV-2 Infection
  • Applicants sought to more completely delineate the diversity of epithelial cells through iterative clustering and sub-clustering among epithelial cell types (see Methods). This enabled Applicants to divide the 10 “Coarse” epithelial cell types into 25 “Detailed” cell types/states ( Figure 2A-2E, Figure 8A, Table 1). Among some cell types, Applicants did not find additional within-type diversity, and thus the “Coarse” annotations (Figure 2A) are equivalent to the “Detailed” identities ( Figure 2D).
  • SERPINB11 high Secretory Cells (which, similar to MUC5AC high Goblet Cells, represented a more “generic” Secretory Cell phenotype), BPIFA1 high Secretory Cells, Early Response Secretory Cells (which expressed genes such as JUN, EGR1, FOS, NR4A1 ), KRT24 KRT13 high Secretory Cells (which are highly similar to previously- described KRT13+ “hillock” cells), BPIFA1 and Chemokine high Secretory Cells (example chemokines include CXCL8, CXCL2, CXCL1, and CXCL3), and Interferon Responsive Secretory Cells (defined by higher expression of broad anti -viral genes including IFITM3, IFI6, and MX I).
  • Squamous Cells were also found - detailed Squamous Cell subtypes include CCL5 high Squamous Cells, VEGFA high Squamous cells (which express multiple vascular endothelial genes including VEGFA and VWF), SPRR2D high Squamous Cells (which, in addition to SPRR2D, express the highest abundances of multiple SPRR- genes including SPRR2A, SPRR1B, SPRR2E, and SPRR3 ), and HOPX high Squamous Cells.
  • VEGFA high Squamous cells
  • SPRR2D high Squamous Cells
  • SPRR2D which, in addition to SPRR2D, express the highest abundances of multiple SPRR- genes including SPRR2A, SPRR1B, SPRR2E, and SPRR3
  • HOPX high Squamous Cells.
  • Ciliated Cells could be further divided into 5 distinct subtypes: Interferon Responsive Ciliated Cells (expressing anti-viral genes similar to other “Interferon Responsive” subsets, such as IFIT1, IFIT3, IFI6 ), FOXJ1 high Ciliated Cells, Early Response FOXJ1 high Ciliated Cells (which, in addition to high FOXJ1 , also express higher abundances of genes such as JUN, EGR1 , FOS than other ciliated cell subtypes), Cilia high Ciliated Cells (which broadly express the highest abundances of structural cilia genes, such as DLEC1 and CFAPIOO), and BEST4 high Cilia high Ciliated Cells (in addition to cilia components, also express the ion channel BEST4 ).
  • Interferon Responsive Ciliated Cells expressing anti-viral genes similar to other “Interferon Responsive” subsets, such as IFIT1, IFIT3, IFI6
  • ACE2 was previously identified as highest among Secretory, Goblet, and Ciliated Cells 35 ’ 36 - here Applicants observe substantial within-cell type heterogeneity in ACE2 expression among each of these cell types. Notably, among Goblet cells, AZGP1 high Goblet Cells express the highest abundance of ACE2 mRNA, suggesting this cell type may be a preferential target for SARS-CoV- 2 infection.
  • RNA velocity analysis leverages the dynamic relationships between expression of unspliced (intron- containing) and spliced (exonic) RNA across thousands of variable genes, enabling 1) estimation of the directionality of transitions between distinct cells and cell types, and 2) identification of putative driver genes behind these transitions.
  • vector fields black lines and arrows represent a smoothed estimate of cellular transitions based on RNA velocity.
  • RNA velocity appropriately places Basal Cells and Mitotic Basal Cells as the “root” or “origin” of cellular transitions, which then progress through the Developing Secretory and Goblet Cells to the Secretory Cells and Goblet Cells.
  • Basal Cells and Mitotic Basal Cells as the “root” or “origin” of cellular transitions, which then progress through the Developing Secretory and Goblet Cells to the Secretory Cells and Goblet Cells.
  • RNA velocity curves predict multiple routes for development between distinct subtypes. This observation is consistent with the current understanding of respiratory secretory cell plasticity and capacity for de-differentiation.
  • Interferon Responsive Ciliated Cells and Early Response FOXJ1 high Ciliated Cells represent phenotypic deviations from this ordered progression, and therefore appear collapsed/unresolved along this trajectory with the same pseudotime range as FOXJ1 high Ciliated Cells.
  • regions annotated as multiple Secretory Cell subsets and Developing Ciliated Cells were uniquely captured from COVID-19 participants.
  • Example 3 Alterations to Nasal Mucosal Immune Populations in COVID-19
  • Applicants further clustered and annotated detailed immune cell populations. Multiple cell types could not be further subdivided from their coarse annotation (Figure IB, Figure 9A-9E), including Mast Cells, Plasmacytoid DCs, B Cells, and Dendritic Cells.
  • Figure 9B Among Macrophages (coarse annotation), Applicants resolved 5 distinct subtypes ( Figure 9B).
  • FFAR4 high Macrophages were defined by expression of FFAR4 , MRC1 , CHIT1 , and SIGLEC11 , as well as chemotactic factors including CCL18 , CCL15 , genes involved in leukotriene synthesis ( ALOX5 , ALOX5AP, LTA4H ), and toll-like receptors TLR8 and TLR2 (Table 1, Figure 9F).
  • Interferon Responsive Macrophages were distinguished by elevated expression of anti-viral genes such as IFIT3, IFIT2, ISG15, and MX1, akin to the epithelial subsets labeled “Interferon Responsive”, along with CXCL9, CXCL10, CXCL11 , which are likely indicative of IFN ⁇ stimulation.
  • MSR1 C1QB high Macrophages are defined by cathepsin expression (CTSD, CTRL, CTSB ) and elevated expression of complement ( C1QB , C1QA, C1QC ), and lipid binding proteins (APOE, APOC, and NPC2).
  • ITGAX high Macrophages were distinguished from other immune cell types by ITGAX , VCAN, PSAP, FTL, FTH1 and CD163 (though these genes are shared by other specialized macrophages subsets).
  • T cells were largely CD69 and CD8A high, consistent with a T resident memory-like phenotype, and Applicants were not able to resolve a separate cluster of CD4 T cells.
  • CD8 T Cells Two specialized subtypes of CD8 T Cells were annotated from this dataset: one defined by exceptionally high expression of Early Response genes (FOSB, NR4A2, and CCL5 ), and the other termed Interferon Responsive Cytotoxic CD8 T Cells, defined by granzyme and perforin expression ( GZMB , GZMA, ONLY, PRF1, GZMH ), anti- viral genes (ISG20, IFIT3, APOBEC3C, GBP5 ) and genes associated with effector CD8 T cell function (LAG3, IL2RB, IKZF3, TBX21).
  • FOSB Early Response genes
  • NR4A2A2 NR4A2A2
  • CCL5 Interferon Responsive Cytotoxic CD8 T Cells, defined by granzyme and perforin expression
  • GZMB GZMA, ONLY, PRF1, GZMH
  • anti- viral genes ISG20, IFIT3, APOBEC3C, GBP5
  • Example 4 Cellular Behaviors Associated with COVID-19 Disease Trajectory
  • COVID-19 elicits major cell compositional changes within the nasopharyngeal mucosa, including expansion of the secretory cell/deuterosomal cell compartments to repopulate lost mature ciliated cells, and recruitment of highly inflammatory myeloid cells.
  • Ciliated cells in mild/moderate COVID-19 robustly induced type I interferon-specific gene signatures, both compared to cells from healthy controls, as well as individuals with severe COVID-19.
  • Ciliated cells in mild/moderate COVID-19 robustly induced type I interferon-specific gene signatures, both compared to cells from healthy controls, as well as individuals with severe COVID-19.
  • only a few genes were suggestive of a type II response, including induction of MHC-II genes among mild/moderate COVID-19 cases.
  • Ciliated cells from individuals with severe COVID-19 did not significantly induce type I or type II interferon responsive genes, potentially underlying poor control of viral spread.
  • Type II specific genes were globally blunted across all cell types from COVID-19 samples when compared to type I module scores ( Figure 3G, Figure 10D). Further, the absence of a transcriptional response to secreted interferon could not be explained by a lack of either interferon alpha receptor (IFNAR1 , IFNAR2 ) or interferon gamma receptor (IFNGR1 , IFNGR2 ) expression.
  • IFNAR1 , IFNAR2 interferon alpha receptor
  • IFNGR1 , IFNGR2 interferon gamma receptor
  • Previous work has identified ACE2 , the host receptor for SARS- CoV-2, as among the interferon-induced genes in nasal epithelial cells. Indeed, Applicants observe modest upregulation of this gene among cells from COVID-19 participants compared to healthy controls.
  • Inflammatory and Interferon Responsive Macrophages represent the primary sources of local TNF , IL6, and IL10 , and uniquely express high abundances of chemoattractant molecules such as CCL3, CCL2, CXCL8, CXCL9, CXCL10, and CXCL11
  • RNA-sequencing protocols utilize poly- adenylated RNA capture and reverse transcription to generate snapshots of the transcriptional status of each individual cell.
  • pathogens and commensal microbes also utilize poly- adenylation for RNA intermediates, or contain poly-adenylated stretches of RNA within their genomes, they may also be represented within single-cell RNA-seq libraries.
  • Applicants aimed to differentiate SARS-CoV-2 UMI derived from ambient or low-quality cell barcodes from those truly reflecting intracellular RNA molecules.
  • Applicants filtered to only viral UMIs associated with cells presented in Figure 1, thereby removing those associated with low-quality cell barcodes ( Figure 11G).
  • Figure 11G Next, using a combination of computational tools to 1) estimate the proportion of ambient RNA contamination per single cell and 2) estimate the abundance of SARS-CoV-2 RNA within the extracellular/ambient environment (i.e., not cell-associated), Applicants were able to test whether the amount of viral RNA associated with a given single-cell transcriptome was significantly higher than would be expected from ambient spillover.
  • SARS-CoV-2 RNA+ cells from participants with negative SARS-CoV-2 PCR: two from a participant classified as “Convalescent”, and one from a Control participant.
  • participants with any SARS-CoV-2 RNA+ cell Applicants found 20 +/- 7 (mean +/- SEM) SARS-CoV-2 RNA+ cells per sample (range 1-119), amounting to 4 +/-1.3% (range 0.1-24%) of the recovered cells per sample.
  • the abundance of SARS-CoV-2 UMI ranged from 1 to 12,612, corresponding to 0.01-98% of all human and viral UMI per cell.
  • the viral replication complex then produces both 1) negative strand genomic RNA intermediates, which serve as templates for further positive strand genomic RNA and 2) nested subgenomic mRNAs which are constructed from a 5’ leader sequence fused to a 3’ sequence encoding structural proteins for production of viral progeny (e.g., Spike, Envelope, Membrane, Nucleocapsid).
  • Generation of nested subgenomic mRNAs relies on discontinuous transcription occurring between pairs of 6- mer transcriptional regulatory sequences (TRS), one 3 ’ to the leader sequence (termed leader TRS, or TRS-L), and others 5’ to each gene coding sequence (termed body TRS, or TRS-B).
  • short SARS-CoV-2 aligning UMI could be readily distinguished by their strandedness (aligning to the negative vs. positive strand) and whether they fell within coding regions, across intact TRS (indicating RNA splicing had not occurred for that RNA molecule at that splice site) or across a TRS with leader-to-body fusions (corresponding to subgenomic RNA, Figure 4F, 4G, Figure 12A).
  • Single cells containing higher abundances of spliced or negative strand aligning reads are therefore more likely to represent truly virally infected cells with a functional viral replication and transcription complex.
  • Highest-confidence SARS-CoV-2 RNA+ cells (spliced UMI, negative strand UMI, > 100 SARS-CoV-2 UMI) tended to be found among MUC5AC high Goblet Cells, AZGP1 high Goblet Cells, BP IF A 1 high Secretory Cells, KRT24 KRT13 high Secretory Cells, CCL5 high Squamous Cells, Developing Ciliated Cells, and each Ciliated Cell subtype.
  • a high proportion of Interferon Responsive Macrophages contained SARS-CoV-2 genomic material, and rare ITGAX high Macrophages were found to contain UMI aligning to viral negative strand or spliced TRS regions - likely representing myeloid cells that have recently engulfed virally-infected epithelial cells or free virions. Applicants did not find major differences in the presumptive cellular tropism by the severity of COVID-19.
  • SARS-CoV-2 RNA+ A few cell types were commonly found to be SARS-CoV-2 RNA+ across all participants (including participants with only rare viral RNA+ cells): most frequently, participants had at least one Developing Ciliated or Squamous cell with SARS-CoV-2 RNA, followed by Goblet Cells, Cilia high Ciliated Cells, and FOXJ1 high Ciliated Cells ( Figure 5C).
  • Participants with the highest abundances of SARS-CoV-2 RNA+ cells viral RNA was spread broadly across many different cell types, including those outside of the expected tropism for SARS-CoV-2 (e.g., also found within Basal Cells, Ionocytes).
  • the cell types harboring the highest proportions of SARS-CoV-2 RNA+ cells represent the same cell types uniquely expanded or induced within COVID-19 participants, such as KRT24 KRT13 high Secretory Cells, AZGP1 high Goblet Cells, and Interferon Responsive Ciliated Cells, and contain the highest abundances of ACE2-expressing cells (Figure 5C, Figure 12F. Whether these cell types represent specific phenotypes elicited by intrinsic viral infection (potentially alongside induction of anti-viral genes) or are uniquely susceptible to SARS-CoV-2 entry (e.g., enhanced entry factor expression) will require further investigation.
  • ciliated cells contain among the highest SARS-CoV-2 RNA molecules per-cell, including positive strand, negative strand-aligning reads, and spliced TRS reads ( Figure 12G).
  • IFN responsive ciliated cells despite representing one of the most frequent “targets” of viral infection, contain the lowest per-cell abundances of SARS-CoV-2 RNA, potentially reflecting the impact of elevated anti-viral factors curbing high levels of intracellular viral replication (Figure 12H).
  • EIF2AK2 which encodes protein kinase R and drives host cell apoptosis following recognition of intracellular double-stranded RNA
  • EIF2AK2 which encodes protein kinase R and drives host cell apoptosis following recognition of intracellular double-stranded RNA
  • SARS-CoV-2 RNA appeared to robustly stimulate expression of genes involved in anti- viral sensing and defense (e.g., MX1, IRF1, OAS1, OAS2), as well as genes involved in antigen presentation via MHC class I ( Figure 6C, Table 5).
  • SARS-CoV-2 RNA+ cells expressed significantly higher abundances of multiple proteases involved in the cleavage of SARS-CoV-2 spike protein, a required step for viral entry (TMPRSS4 , TMPRSS2, CTSS, CTSD). This suggests that within a given cell type, natural variations in the abundance of genes which support the viral life cycle partially account for which cells are successfully targeted by the virus.
  • IFITM3 and IFITM1 are interferon-inducible factors that can disrupt viral release from endocytic compartments among a wide diversity of viral species.
  • IFITMs can instead facilitate entry by human betacoronaviruses. Therefore, enrichment of these factors within presumptive infected cells may reflect viral hijacking of a conserved host anti -viral responsive pathway.
  • SARS-CoV-2 RNA+ cells including FDFT1, MVK, FDPS, ACAT2, HMGCS1 , all enzymes involved in the mevalonate synthesis pathway.
  • SARS-CoV-2 RNA+ cells showed increased abundance of low- density lipoprotein receptors LDLR and LRP8 compared to matched bystanders.
  • various genes involved in cholesterol metabolism were recently identified as critical host factors for SARS-CoV-2 replication via CRISPR screens from multiple independent research groups 56,57 .
  • IFNAR1 was substantially increased in many bystander cells compared to both cells from SARS-CoV-2 negative participants as well as matched SARS-CoV-2 RNA+ cells ( Figure 6D). Blunting of interferon alpha signaling via downregulation of IFNAR1 within SARS-CoV-2 RNA+ cells may partially explain high levels of viral replication compared to neighboring cells.
  • EIF2AK2 which encodes protein kinase R and drives host cell apoptosis following recognition of intracellular double-stranded RNA, is among the most reliably expressed and upregulated genes among SARS-CoV-2 RNA+ cells compared to matched bystanders across diverse cell types, suggesting rapid activation of this gene following intrinsic PAMP recognition of SARS-CoV-2 replication intermediates (Krahling et al., (2009).
  • Applicants have created a comprehensive map of SARS-CoV-2 infection of the human nasopharynx using scRNA-seq, and identified tissue correlates of protection and disease severity within a large human cohort.
  • Applicants begin to untangle the myriad factors that underlie restriction of viral infection to the upper respiratory tract vs. expansion to the lower airways and lung parenchyma or support the development of severe lower respiratory tract disease (Figure 13C).
  • This study defines major compositional differences in the nasal epithelia during COVID-19 and directly relates these to NP viral load, cellular tropism, and cell-intrinsic responses to SARS-CoV-2.
  • Applicants identify marked variability in the induction of anti-viral gene expression that is associated with peak disease severity and may precede development of severe respiratory damage. Applicants find that anti-viral gene expression is profoundly blunted in cells isolated from individuals who develop severe disease, even in cells containing SARS-CoV-2 RNA.
  • Applicants provide a direct investigation into the host factors that enable or restrict SARS-CoV-2 replication within epithelial cells in vivo.
  • Applicants recapitulate expected “hits” based on well-described host factors involved in viral replication, e.g., TMPRSS2, TMPRSS4 enrichment among presumptive virally infected cells.
  • Applicants similarly observed expression of anti-viral genes which were globally enriched among cells from mild/moderate COVID-19 participants, with even higher expression among the viral RNA+ cells themselves.
  • Sample Collection and Biobanking - Nasopharyngeal samples were collected by trained healthcare provider using FLOQSwabs (Copan flocked swabs) following the manufacturer's instructions. Collectors would don personal protective equipment (PPE), including a gown, non-sterile gloves, a protective N95 mask, a bouffant, and a face shield. The patient's head was then tilted back slightly, and the swab inserted along the nasal septum, above the floor of the nasal passage to the nasopharynx until slight resistance was felt. The swab was then left in place for several seconds to absorb secretions and slowly removed while rotating swab. A second swab was then completed in the other nares.
  • PPE personal protective equipment
  • the swabs were then placed into a cryogenic vial with 900 ⁇ L of heat inactivated fetal bovine serum (FBS) and 100 ⁇ L of dimethyl sulfoxide (DMSO).
  • FBS heat inactivated fetal bovine serum
  • DMSO dimethyl sulfoxide
  • the vials were then placed into a Thermo Scientific Mr. Frosty Freezing Container for optimal cell preservation.
  • the Mr. Frosty containing the vials was then placed in cooler with dry ice for transportation from patient area to laboratory for processing. Once in the laboratory, the Mr. Frosty was placed into the -80°C Freezer overnight and then on the next day, the vials were moved to the liquid nitrogen storage container.
  • nasal swabs in freezing media were thawed, and each swab was rinsed in RPMI before incubation in 1 mL RPMI/10 mM DTT (Sigma) for 15 minutes at 37°C with agitation.
  • the nasal swab was incubated in 1 mL Accutase (Sigma) for 30 minutes at 37°C with agitation.
  • the 1 mL RPMI/10 mM DTT from the nasal swab incubation was centrifuged at 400 g for 5 minutes at 4°C to pellet cells, the supernatant was discarded, and the cell pellet was resuspended in 1 mL Accutase and incubated for 30 minutes at 37°C with agitation.
  • the original cryovial containing the freezing media and the original swab washings were combined and centrifuged at 400 g for 5 minutes at 4°C.
  • the cell pellet was then resuspended in RPMI/10 mM DTT, and incubated for 15 minutes at 37°C with agitation, centrifuged as above, the supernatant was aspirated, and the cell pellet was resuspended in 1 mL Accutase, and incubated for 30 minutes at 37°C with agitation. All cells were combined following Accutase digestion and filtered using a 70 ⁇ m nylon strainer. The filter and swab were washed with RPMI/10% FBS/4 mM EDTA, and all washings combined.
  • Dissociated, filtered cells were centrifuged at 400 g for 10 minutes at 4°C, and resuspended in 200 ⁇ L RPMI/10% FBS for counting. Cells were diluted to 20,000 cells in 200 ⁇ L for scRNA-seq. For the majority of swabs, fewer than 20,000 cells total were recovered. In these instances, all cells were input into scRNA-seq.
  • scRNA-seq - Seq-Well S 3 was run as previously described 44,46 . Briefly, a maximum of 20,000 single cells were deposited onto Seq-Well arrays preloaded with a single barcoded mRNA capture bead per well. Cells were allowed to settle by gravity into wells for 10 minutes, after which the arrays were washed with PBS and RPMI, and sealed with a semi-permeable membrane for 30 minutes, and incubated in lysis buffer (5 M guanidinium thiocyanate/1 mM EDTA/1% BME/0.5% sarkosyl) for 20 minutes.
  • lysis buffer 5 M guanidinium thiocyanate/1 mM EDTA/1% BME/0.5% sarkosyl
  • Arrays were then incubated in a hybridization buffer (2M NaCl/8% v/v PEG8000) for 40 minutes, and then the beads were removed from the arrays and collected in 1.5 mL tubes in wash buffer (2M NaCl/3 mM MgCl 2 /20 mM Tris-HCl/8% v/v PEG8000). Beads were resuspended in a reverse transcription master mix, and reverse transcription, exonuclease digestion, second strand synthesis, and whole transcriptome amplification were carried out as previously described.
  • a custom reference was created by combining human GRCh38 (from CellRanger version 3.0.0, Ensembl 93) and SARS-CoV-2 RNA genomes.
  • the SARS-CoV-2 viral sequence and GTF are as described in Kim et al. 2020 (github.com/hyeshik/sars-cov-2-transcriptome, BetaCov/South Korea/KCDC03/2020 based on NC_045512.2).
  • the GTF includes all CDS regions (as of this annotation of the transcriptome, the CDS regions completely cover the RNA genome without overlapping segments), and regions were added to describe the 5’ UTR (“SARSCoV2_5prime”), the 3’ UTR (“SARSCoV2_3 prime”), and reads aligning to anywhere within the Negative Strand (“SARSCoV2_NegStrand”). Trailing A’s at the 3’ end of the virus were excluded from the SARS- CoV-2 FASTA, as these were found to drive spurious viral alignment in pre-COVID19 samples.
  • Alignment references were tested against a diverse set of pre-COVID-19 samples and in vitro SARS-CoV-2 infected human bronchial epithelial cultures (Ravindra et al.) to confirm specificity of viral aligning reads (data not shown). Aligned cell-by-gene matrices were merged across all study participants, and cells were filtered to eliminate barcodes with fewer than 200 UMI, 150 unique genes, and greater than 50% mitochondrial reads (cutoffs determined by distributions of reads across cells, see Figure 7C). Of the 61 nasal swabs thawed and processed, 3 contained no high- quality cell barcodes after sequencing (NB: these samples contained ⁇ 5,000 viable cells prior to Seq-Well array loading).
  • Jackstraw function within Seurat Applicants selected the first 36 principal components that described the majority of variance within the dataset, and used these for defining a nearest neighbor graph and Uniform Manifold Approximation and Projection (UMAP) plot.
  • Cells were clustered using Louvain clustering, and the resolution parameter was chosen by maximizing the average silhouette score across all clusters.
  • Differentially expressed genes between each cluster and all other cells were calculated using the FindAllMarkers function, test.use set to “bimod”. Clusters were merged if they failed to contain sets of significantly differentially expressed genes.
  • Applicants proceeded iteratively through each cluster and subcluster until “terminal” cell subsets/cell states were identified - Applicants defined “terminal” cell states as those for whom principal components analysis and Louvain clustering did not confidently identify additional sub- states, as measured by abundance of differentially expressed genes between potential clusters.
  • Applicants pooled all cells determined to be of epithelial origin, and using the methods for dimensionality reduction as above (dispersion cutoff > 1, 30 principal components).
  • Applicants applied similar approaches for immune cell types, including iterative subclustering to resolve and annotate all constituent cells types and subtypes, and combined all immune cells for visualization purposes in Figure 10.
  • Cell cycle scoring utilized gene lists from Tirosh et al. Gene module scores were calculated using the AddModule Score function within Seurat.
  • RNA Velocity and Pseudotemporal Ordering of Epithelial Cells - RNA velocity was modeled using the scVelo package, version 0.2.3.
  • cluster annotations previously assigned from iterative clustering in Seurat cells from epithelial cell types were pre-processed according to the scVelo pipeline: genes were normalized using default parameters (pp.filter and normalize), principal components and nearest neighbors in PC A space were calculated (using defaults of 30 PCs, 30 nearest neighbors), and the first and second order moments of nearest neighbors were computed, which are used as inputs into velocity estimates (pp. moments).
  • Top velocity transition “driver” genes were identified by high “fit likelihood” parameters from the dynamical model, and are used for visualization in Figure 9G.
  • the same approaches were used for modeling RNA velocity among only Ciliated Cells (Figure 2H-2K), Basal, Secretory, and Goblet Cells ( Figure 2L-20), and only COVID-19 or only Control cells ( Figure 3A).
  • the velocity pseudotime was calculated using the tl. velocity _pseudotime function with default settings.
  • Applicants employed CellBender github.com/broadinstitute/CellBender
  • Input UMI count matrices contained the top 10,000 cell barcodes, therefore including at least 70% cell barcodes sampling the ambient RNA of low-quality cell pool.
  • CellBender's remove-background function was run with default parameters and — fpr 0.01 -expected-cells 500 —low-count-threshold 5.
  • Applicants calculated the proportion of ambient contamination per high-quality cell by comparing to the single-cell's transcriptome pre-correction, and summed all UMI from background/low-quality cell barcodes to recover an estimate of the total ambient pool.
  • n SARS-CoV-2 UMI per cell
  • x total UMI per cell
  • Gene ontology analysis was run using the Database for Annotation, Visualization, and Integrated Discovery (DAVID).
  • GSEA Gene set enrichment analysis
  • Gene lists corresponding to “Shared IFN Response”, “Type I IFN Specific Response” and “Type II IFN Specific Response” are derived from previously-published population RNA-seq data from nasal epithelial basal cells treated in vitro with 0.1 ng/mL - 10 ng/mL IFNA or IFNG for 12 hours. Module scores were calculated using the Seurat function AddModule Score with default inputs.
  • Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus. Nature (2003) doi:10.1038/nature02145.
  • Table 4B Expressed in COVID-19 WHO 1-5 (mild/moderate)individuals Table 5. Common Differentially Expressed Genes Between SARS-CoV-2 RNA+ Cells and Bystander Cells. Related to Figure 6. log2 fold change between SARS-CoV-2 RNA+ cells (high, positive values) and matched bystander cells (low, negative values). Columns: detailed cell types with at least 5 SARS-CoV-2 RNA+ cells

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Urology & Nephrology (AREA)
  • Hematology (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Biotechnology (AREA)
  • Wood Science & Technology (AREA)
  • Cell Biology (AREA)
  • Virology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medicinal Chemistry (AREA)
  • Pathology (AREA)
  • Genetics & Genomics (AREA)
  • General Physics & Mathematics (AREA)
  • Tropical Medicine & Parasitology (AREA)
  • Food Science & Technology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Toxicology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Oncology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Communicable Diseases (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)

Abstract

The subject matter disclosed herein is generally directed to stratifying and treating coronavirus infections based on intrinsic immune states.

Description

METHODS OF STRATIFYING AND TREATING CORONA VIRUS INFECTION
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application Nos. 63/151,002 filed February 18, 2021 and 63/203,514 filed July 26, 2021. The entire contents of the above- identified applications are hereby fully incorporated herein by reference.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH [0002] This invention was made with government support under Grant Nos. GM007753, All 18672, and DK122532, awarded by the National Institutes of Health. The government has certain rights in the invention.
REFERENCE TO AN ELECTRONIC SEQUENCE LISTING
[0003] The contents of the electronic sequence listing ("BROD-5375WP_ST25.txt"; Size is 8,000 bytes and it was created on February 17, 2022) is herein incorporated by reference in its entirety.
TECHNICAL FIELD
[0004] The subject matter disclosed herein is generally directed to determining whether a subject is at risk for severe respiratory disease from a coronavirus infection and treating the subject.
BACKGROUND
[0005] The novel coronavirus clade SARS-CoV-2 emerged in late 2019 and has quickly led to one of the most devastating global pandemics in modern history. SARS-CoV-2 infection can cause severe respiratory COVID-19. However, many individuals present with isolated upper respiratory symptoms, suggesting potential to constrain viral pathology to the nasopharynx. Which cells SARS-CoV-2 primarily targets and how infection influences the respiratory epithelium remains incompletely understood. Similar to other successful respiratory viruses, high replication within the nasopharynx (Pan et al., 2020; Sanche et al., 2020) and viral shedding by asymptomatic or presymptomatic individuals contributes to high transmissibility (Fears et al., 2020; Meyerowitz et al., 2021) and rapid community spread (Arons et al., 2020; Sakurai et al., 2020; Wang et al., 2020c). COVID-19, the disease caused by SARS-CoV-2 infection, occurs in a fraction of those infected by the virus and carries profound morbidity and mortality. The clinical pictures of COVID-19 vary widely - from some individuals who experience few mild symptoms to some with prolonged and severe disease characterized by pneumonia, acute respiratory distress syndrome, and diverse systemic effects impacting various other tissues (Guan et al., 2020; Huang et al., 2020a). To facilitate effective preventative and therapeutic strategies for COVID-19, differentiating the host protective mechanisms that support rapid viral clearance and limit disease severity from those that drive severe and fatal outcomes is essential.
[0006] Rapid mobilization of the scientific community and a commitment to open data sharing early in the COVID-19 pandemic enabled researchers across the globe to study SARS-CoV-2 and build initial models of disease pathogenesis (Chan et al., 2020a; Wu et al., 2020; Zhou et al., 2020). By analogy to related human betacoronaviruses (Frieman and Baric, 2008), we currently understand viral tropism and disease progression to begin with SARS-CoV-2 entry through the mouth or nares where it initially replicates within epithelial cells of the human nasopharynx, generating an upper respiratory infection over several days (Harrison et al., 2020). A subset of patients develop symptoms of lower respiratory, where a combination of inflammatory immune responses and direct viral-mediated pathogenesis can lead to diffuse damage to distal airways, alveoli, and vasculature (Ackermann et al., 2020; Borczuk et al., 2020). However, the precise early targets for SARS-CoV-2 in the nasopharynx, the scope of potential host cells, and the variance in viral tropism across patients and disease courses have yet to be defined. A clearer understanding of viral tropism, how the airway epithelium responds to infection, and the relationship to disease outcome may critically inform future therapeutic or prophylactic strategies.
[0007] Citation or identification of any document in this application is not an admission that such a document is available as prior art to the present invention.
SUMMARY
[0008] In one aspect, the present invention provides for a method of treating a barrier tissue infection in a subject in need thereof comprising: detecting one or more indicators of infection from a sample obtained from the subject, wherein the sample comprises one or more of epithelial, immune, stromal, and neuronal cells; comparing the indicators to control/healthy samples or disease reference values to determine whether the subject will progress to a risk group selected from: mild/moderate or severe; and administering one or more treatments if one or more indicators are present.
[0009] In certain embodiments, the barrier tissue infection is a respiratory barrier tissue infection. In certain embodiments, mild subjects are asymptomatic or symptomatic and not hospitalized, wherein moderate subjects are hospitalized and do not require oxygen by non- invasive ventilation or high flow, and wherein severe subjects are hospitalized and require oxygen by non-invasive ventilation, high flow, or intubation and mechanical ventilation. In certain embodiments, the infection is a viral infection. In certain embodiments, the viral infection is a coronavirus. In certain embodiments, the coronavirus is SARS-CoV2 or variant thereof. In certain embodiments, mild/moderate subjects have a WHO score of 1-5 and severe subjects have a WHO score of 6-8.
[0010] In certain example embodiments, one or more indicators of infection are selected from the group consisting of: decreased interferon-stimulated gene (ISG) induction; upregulation of one or more anti-viral factors or IFN-responsive genes; reduction of mature ciliated cell population or increased immature ciliated cell population; increased secretory cell population; increased deuterosomal cell population; increased ciliated cell population; increased goblet cell population; decreased expression in Type II interferon specific genes; increased expression in Type I interferon specific genes; increased MHC-I and MHC-II genes; increased developing ciliated cell populations; altered expression of one or more genes in a cell type selected from any of Tables 2- 4; altered expression of one or more genes in a cell type selected from Table 5; increase expression of IFITM3 and IFI44L; increased expression of EIF2AK2; increased expression of TMPRSS4, TMPRSS2, CTSS, CTSD; upregulation of cholesterol and lipid biosynthesis; and increased abundance of low-density lipoprotein receptors LDLR and LRP8.
[0011] In certain embodiments, one or more interferon-stimulated genes are detected, wherein if the one or more interferon-stimulated genes are downregulated the subject is at risk for severe disease and if the one or more interferon-stimulated genes are upregulated the subject is not at risk for severe disease. In certain embodiments, the one or more interferon-stimulated genes are selected from the group consisting of STAT1, STAT2, IRF1, and IRF9. [0012] In certain embodiments, the one or more indicators of infection are detected in infected host cells and compared to reference values in infected host cells from a risk group. In certain embodiments, one or more anti-viral factors or IFN-responsive genes are detected in virally- infected cells, wherein if the one or more anti-viral factors or IFN-responsive genes are downregulated or absent in virally-infected cells the subject is at risk for severe disease and if the one or more anti-viral factors or IFN-responsive genes are upregulated in virally-infected cells the subject is not at risk for severe disease. In certain embodiments, the one or more anti -viral factors or IFN-responsive genes are selected from the group consisting of EIF2AK2, STAT1 and STAT2. [0013] In certain example embodiments, the secretory cells comprise one or both of: KRT13 KRT24 high Secretory Cells and Early Response Secretory Cells. In certain example embodiments, wherein the secretory cells express CXCL8. In certain example embodiments, the goblet cells comprise one or both of: AZGP1 high Goblet Cells and SCGB1A1 high Goblet Cells. In certain example embodiments, the ciliated cells comprise one or more upregulated genes selected from the group consisting of IFI27, IFIT1, IFI6, IFITM3, and GBP3. In certain example embodiments, one or both of the ciliated cells and the goblet cells comprise increased gene expression of one or more IFN gene selected from any of Tables 2-4. In certain example embodiments, ACE2 expression is upregulated compared to other epithelial cells among one or more of secretory cells, goblet cells, ciliated cells, developing ciliated cells, and deuterosomal cells. In certain example embodiments, the mature ciliated cells are BEST4 high cilia high ciliated cells. In certain example embodiments, the MHC-I and MHC-II genes comprise at least one or more of: HLA-A, HLA-C, HLA-F, HLA-E, HLA-DRBl, and HLA-DRA. In certain example embodiments, the upregulated cholesterol and lipid biosynthesis genes comprise at least one or more of: FDFT1, MVK, FDPS, ACAT2, and HMGCS1. In certain example embodiments, detecting one or more indicators is performed by using Simpson’s index.
[0014] In certain example embodiments, a subject is determined to belong to the severe risk group if one or more of the following is detected in the sample: proinflammatory cytokines comprising at least one or more of: IL1B, TNF, CXCL8, CCL2, CCL3, CXCL9, CXCL10, and CXCL11; upregulation of alarmins comprising one or both of: S100A8 and S100A9; 14% - 26% of all epithelial cells are secretory cells; elevated BPIFAl high Secretory cells; elevated KRT13 KRT24 high secretory cells; macrophage population increase as compared to other immune cells; upregulated genes in ciliated cells comprising one or both of: IL5RA and NLRP1; no increase of at least one or more of: type I, type II, and type III interferon abundance; elevated stress response factors comprising at least one or more of: HSPA8, HSPA1A, and DUSP1; increased expression of one or more genes differentially expressed in COVID-19 WHO 6-8 according to Table 3 or Table 4; reduced or absent antiviral/interferon response; and reduced or absent mature ciliated cells. In certain example embodiments, the macrophage population comprises at least one or more of: ITGAX High Macrophages, FFAR High Macrophages, Inflammatory Macrophages, and Interferon Responsive Macrophages.
[0015] In certain example embodiments, a subject is determined to belong to the mild/moderate risk group if one or more of the following is detected in the sample: 4% - 12% of all epithelial cells are Secretory Cells; 10% - 20% of all epithelial cells comprise Interferon Responsive Ciliated Cells; upregulated ciliated cell genes comprising at least one or more of: IFI44L, STAT1, IFITM1, MX1, IFITM3, OAS1, OAS2, OAS3, STAT2, TAPI, HLA-C, ADAR, XAFl, IRF1, CTSS, and CTSB; increase in type I interferon abundance; high expression of interferon-responsive genes; decreased expression of one or more genes differentially expressed in COVID-19 WHO 6-8 according to Table 3 or Table 4; induction of type I interferon responses; and high abundance of IFI6 and IFI27.
[0016] In certain example embodiments, the interferon-responsive genes comprise at least one or more of: STAT1, MX1, HLA-B, and HLA-C. In certain example embodiments, the interferon response occurs in at least one or more of: MUC5AC high Goblet Cells, SCGB1A1 high Goblet Cells, Early Response Secretory Cells, Deuterosomal Cells, Interferon Responsive Ciliated Cells, and BEST4 high Cilia high Ciliated Cells.
[0017] In certain example embodiments, the treatment is administered according to determined risk group. In certain example embodiments, where the treatment involves administering a preventative or therapeutic intervention according to the determined risk group. In certain example embodiments, wherein if the subject is determined to be at risk for progression to the severe risk group the subject is administered a treatment comprising one or more treatments selected from the group consisting of: one or more antiviral; blood-derived immune-based therapy; one or more corticosteroid; one or more interferon; one or more interferon Type I agonists; one or more interleukin-1 inhibitors; one or more kinase inhibitors; one or TLR agonists; a glucocorticoid; and interleukin-6 inhibitor.
[0018] In certain example embodiments, if the subject is determined to be at risk for progression to either risk group the subject is administered a treatment comprising one or more of: one or more antiviral; one or more antibiotic; and one or more cholesterol biosynthesis inhibitor. [0019] In certain example embodiments, the treatment comprises an antiviral. In certain example embodiments, the antiviral inhibits viral replication. In certain example embodiments, the antiviral is paxlovid, molnupiravir and remdesivir.
[0020] In certain example embodiments, the treatment is an immune-based therapy. In certain example embodiments, the immune-based therapy is a blood-derived product comprising at least one or more of: a convalescent plasma and an immunoglobin. In certain example embodiments, the immune-based therapy is an immunomodulator comprising at least one or more of: a corticosteroid, a glucocorticoid, an interferon, an interferon Type I agonist, an interleukin- 1 inhibitor, an interleukin-6 inhibitor, a kinase inhibitor, and a TLR agonist. In certain example embodiments, the corticosteroid comprises at least one of: methylprednisolone, hydrocortisone, and dexamethasone. In certain example embodiments, the glucocorticoid comprises at least one of: cortisone, prednisone, prednisolone, methylprednisolone, dexamethasone, betamethasone, triamcinolone, Fludrocortisone acetate, deoxycorticosterone acetate, and hydrocortisone. In certain example embodiments, the interferon comprises at least one or more of: interferon beta-lb and interferon alpha-2b. In certain example embodiments, the interleukin-1 inhibitor comprises anakinra. In certain example embodiments, the interleukin-6 inhibitor comprises at least one or more of: anti-interleukin-6 receptor monoclonal antibodies and anti -interleukin-6 monoclonal antibody. In certain example embodiments, the anti-interleukin-6 receptor monoclonal antibody is tocilizumab. In certain example embodiments, the anti-interleukin-6 monoclonal antibody is siltuximab. In certain example embodiments, the kinase inhibitor comprises of at least one or more of Bruton's tyrosine kinase inhibitor and Janus kinase inhibitor. In certain example embodiments, the Bruton's tyrosine kinase inhibitor comprises at least one or more of: acalabrutinib, ibrutinib, and zanubrutinib. In certain example embodiments, the Janus kinase inhibitor comprises at least one or more of: baracitinib, ruxolitinib and tofacitinib. In certain example embodiments, the TLR agonist comprises at least one or more of: imiquimod, BCG, and MPL. [0021] In certain example embodiments, the treatment comprises inhibiting cholesterol biosynthesis. In certain example embodiments, inhibiting cholesterol biosynthesis comprises administering HMG-CoA reductase inhibitors. In certain example embodiments, the HMG-CoA reductase inhibitor comprises at least one or more of: simvastatin atorvastatin, lovastatin, pravastatin, fluvastatin, rosuvastatin, pitavastatin. In certain example embodiments, the treatment comprises an antibiotic.
[0022] In certain example embodiments, the treatment comprises one or more agents capable of shifting epithelial cells to express an antiviral signature. In certain example embodiments, the treatment comprises one or more agents capable of suppressing a myeloid inflammatory response. In certain example embodiments, the treatment comprises an RNA-guided nuclease system. In certain example embodiments, the RNA-guided nuclease system is a CRISPR system. In certain example embodiments, the CRISPR system comprises a CRISPR-Cas base editing system, a prime editor system, or a CAST system.
[0023] In certain example embodiments, the treatment is administered before severe disease. In certain example embodiments, the infection is a viral infection. In certain example embodiments, the viral infection is a coronavirus. In certain example embodiments, coronavirus is SARS-CoV2 or variant thereof.
[0024] In certain example embodiments, the one or more cell types are detected using one or markers differentially expressed in the cell types. In certain example embodiments, the one or more cell types or one or more genes are detected by immunohistochemistry (IHC), fluorescence activated cell sorting (FACS), fluorescently bar-coded oligonucleotide probes, RNA FISH (fluorescent in situ hybridization), RNA-seq, or any combination thereof. In certain example embodiments, single cell expression is inferred from bulk RNA-seq. In certain example embodiments, expression is determined by single cell RNA-seq.
[0025] In another aspect, the present invention provides for a method of screening for agents capable of shifting epithelial cells from a SARS-CoV2 severe phenotype to a mild/moderate phenotype comprising: treating a sample comprising epithelial cells with a drug candidate; detecting modulation of any indicators of infection according to any of the preceding claims; and identifying the drug, wherein the one or more indicators shift towards a mild/moderate phenotype. In certain example embodiments, the sample comprises epithelial cells infected with SARS-CoV2. In certain example embodiments, the sample comprises epithelial cells expressing one or more SARS-CoV2 genes. In certain example embodiments, the sample is an organoid or tissue model. In certain example embodiments, the sample is an animal model. In certain example embodiments, cell types are detected using one or markers selected from Table 1.
[0026] These and other aspects, objects, features, and advantages of the example embodiments will become apparent to those having ordinary skill in the art upon consideration of the following detailed description of example embodiments.
BRIEF DESCRIPTION OF THE DRAWINGS
[0027] An understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention may be utilized, and the accompanying drawings of which: [0028] FIGS. IA-IO - Cellular composition of nasopharyngeal swabs. FIG. 1A. Schematic of method for viable cryopreservation of nasopharyngeal swabs, cellular isolation, and scRNA- seq using the Seq-Well S^3 platform (created with BioRender). FIG. 1B. UMAP of 32,588 single- cell transcriptomes from all participants, colored by cell type (following iterative Louvain clustering). FIG. 1C. UMAP as in B, colored by SARS-CoV-2 PCR status at time of swab. FIG. 1D. UMAP as in B, colored by peak level of respiratory support (WHO COVID-19 severity scale). FIG. 1E. UMAP as in B, colored by participant. FIG. 1F. Violin plots of cluster marker genes (FDR < 0.01) for coarse cell type annotations (as in B). FIG. 1G. Proportional abundance of coarse cell types by participant (ordered within each disease cohort by increasing Ciliated cell abundance). FIG. 1H. Proportional abundance of participants by coarse cell types. Shades of red: COVID-19. Shades of blue: Control. FIG. 1 I. Expression of entry factors for SARS-CoV-2 and other common upper respiratory viruses. Dot size represents fraction of cell type (rows) expressing a given gene (columns). Dot hue represents scaled average expression. FIG. 1J. Proportion of Goblet Cells by sample. Statistical test above graph represents Kruskal-Wallis test results across all cohorts (following Bonferroni-correction). Statistical significance asterisks within box represent significant results from Dunn's post-hoc testing. * Bonferroni-corrected p-value < 0.05, ** q < 0.01, *** q < 0.001. FIG. 1K. Proportion of Secretory Cells by sample. FIG. 1L. Proportion of Deuterosomal Cells by sample. FIG. 1M. Proportion of Developing Ciliated Cells by sample. FIG. 1N. Proportion of Ciliated Cells by sample. FIG. 10. Simpson's Diversity index across epithelial cell types in COVID-19 vs. Control. Significance by Student's t-test.
[0029] FIGS. 2A-2R - Altered epithelial cell composition and recovery in the nasopharynx during COVID-19. FIG. 2A. UMAP of 28,948 epithelial cell types following re- clustering, colored by coarse cell types. Lines represent smoothed estimate of cellular differentiation trajectories (RNA velocity estimates via scVelo using intronic:exonic splice ratios). FIG. 2B. UMAP as in A, colored by SARS-CoV-2 PCR status at time of swab. FIG. 2C. UMAP as in A, colored by peak level of respiratory support (WHO illness severity scale). FIG. 2D. UMAP as in A, colored by detailed cell annotations. FIG. 2E. Violin plots of cluster marker genes (FDR < 0.01) for detailed epithelial cell type annotations (as in D). FIG. 2F. UMAP of 9,209 Basal, Goblet, and Secretory Cells, following sub-clustering and resolution of detailed cell annotations. FIG. 2G. UMAP of only Basal, Goblet, and Secretory Cells as in F, colored by SARS-CoV-2 PCR status at time of swab. FIG. 2H. UMAP of only Basal, Goblet, and Secretory Cells as in F, colored by inferred velocity pseudotime (darker blue shades: precursor cells, lighter yellow shades: more terminally differentiated cell types). FIG. 21. Plot of gene expression by Basal, Goblet, and Secretory Cell velocity pseudotime for select genes. Points colored by detailed cell type annotations. FIG. 2J. UMAP of 13,913 Ciliated Cells, following sub-clustering and resolution of detailed cell annotations. FIG. 2K. UMAP of Ciliated Cells as in J, colored by SARS-CoV-2 PCR status at time of swab. FIG. 2L. UMAP of Ciliated Cells as in J, colored by inferred velocity pseudotime (darker blue shades: precursor cells, lighter yellow shades: more terminally differentiated cell types). FIG. 2M. Plot of gene expression by Ciliated Cell velocity pseudotime for select genes (all significantly correlated with velocity expression. Points colored by detailed cell type annotations. FIG. 2N. Proportion of Secretory Cell subtypes (detailed annotation) by sample, normalized to all epithelial cells. FIG. 20. Proportion of Ciliated Cell subtypes (detailed annotation) by sample, normalized to all epithelial cells. FIG. 2P. UMAP of 13,210 epithelial cells (using UMAP embedding from A) from SARS-CoV-2 PCR negative participants (Control). Lines represent smoothed estimate of cellular differentiation trajectories (via RNA velocity) calculated using only cells from Control participants. FIG. 2Q. UMAP of 15,738 epithelial cells (using UMAP embedding from A) from SARS-CoV-2 PCR positive participants (COVID-19). Lines represent smoothed estimate of cellular differentiation trajectories (via RNA velocity) calculated using only cells from COVID-19 participants. Named cell types highlight those significantly altered between disease cohorts. FIG. 2R. UMAP of 32,588 cells from all participants, shaded by detailed cell type. Arrows represent smoothed estimate of cellular differentiation trajectories inferred by RNA Velocity.
[0030] FIGS. 3A-3J - Cell-type specific and shared transcriptional responses to SARS- CoV-2 infection. FIG. 3A. Abundance of significant differentially expressed (DE) genes by detailed cell type between Control WHO 0 vs. COVID-19 WHO 1-5 samples (left), Control WHO 0 and COVID-19 WHO 6-8 samples (middle), COVID-19 WHO 1-5 and COVID-19 WHO 6-8 samples (right). Restricted to genes with FDR-corrected p < 0.001, log2 fold change > 0.25. ø = comparison not tested due to too few cells in one group. FIG. 3B. Top: Volcano plots of average log fold change vs. -log10(FDR-adjusted p-value) for Ciliated cells (coarse annotation). Left: Control WHO 0 vs. COVID-19 WHO 1-5 (mild/moderate). Middle: Control WHO 0 vs. COVID- 19 WHO 6-8 (severe). Right: COVID-19 WHO 1-5 (mild/moderate) vs. COVID-19 WHO 6-8 (severe). Horizontal red dashed line: FDR-adjusted p-value cutoff of 0.05 for significance. Bottom: gene set enrichment analysis plots across shared, type I interferon specific, and type II interferon specific stimulated genes. Genes are ranked by their average log fold change (FC) between each comparison. Black lines represent the ranked location of genes belonging to the annotated gene set. Bar height represents running enrichment score (NES: Normalized Enrichment Score). P- values following Bonferroni-correction: *** corrected p < 0.001, ** p < 0.01, * p < 0.05. FIG. 3C. Heatmap of significantly DE genes between Interferon Responsive Ciliated Cells from different disease cohorts. FIG. 3D. Top: Volcano plots related to C. Average log fold change vs. - loglO(FDR-adjusted p-value) for Interferon Responsive Ciliated cells. Horizontal red dashed line: 0.05 cutoff for significance. Bottom: gene set enrichment analysis across shared, type I, and type II interferon stimulated genes. FIG. 3E. Heatmap of significantly DE genes between MUC5AC high Goblet Cells from different disease cohorts. FIG. 3F. Top: Volcano plots related to E. Average log fold change vs. -log10(FDR-adjusted p-value) for MUC5AC high Goblet Cells. Horizontal red dashed line: 0.05 cutoff for significance. Bottom: gene set enrichment analysis across shared, type I, and type II interferon stimulated genes. FIG. 3G. Top: Dot plot of IFNGR1/2 and IFNAR1/2 gene expression by selected cell types. Bottom: Violin plots of gene module scores across selected cell types, split by Control WHO 0 (blue), COVID-19 WHO 1-5 (red), and COVID-19 WHO 6-8 (pink). Gene modules represent transcriptional responses of human basal cells from the nasal epithelium following in vitro treatment with IFNA or IFNG. Significance by Wilcoxon signed- rank test. P-values following Bonferroni-correction: * p< 0.05, ** p < 0.01, *** p < 0.001. FIG. 3H. Common DE genes across detailed cell types. Left (red): genes upregulated in multiple cell types when comparing COVID-19 WHO 1-5 vs. Control WHO 0. Right (pink): genes upregulated in multiple cell types when comparing COVID-19 WHO 6-8 vs. Control WHO 0. FIG. 31. Relative abundances of IgG autoantibodies for human type I, II, and III interferons via multiplexed human antigen microarray (see Methods). Blue circles: Control WHO 0, n=5; red circles: COVID-19 WHO 1-5, n=12; pink squares: COVID-19 WHO 6-8, n=8. Large pink squares: autoantibodies against 12 type I interferons from a single donor: COVID-19 Participant 27 (peak WHO severity score: 8, swab WHO severity score: 5). FIG. 3J. Average expression of STAT1, STAT2, IRF1, and IRF9 among ciliated cells by participant. For each gene: left: participants separated by disease group, determined by participants’ peak WHO severity score. Statistical testing by Kruskal-Wallis test across disease groups (** p = 0.0018) with Dunn’s post hoc testing: * p < 0.05, ** p < 0.01, *** p < 0.001. Right: participants in COVID-19 WHO 6-8 group, separated by level of severity at time of nasal swab. Statistical testing by Wilcoxon signed-rank test, n.s. non-significant, p > 0.05. [0031] FIGS. 4A-4H - Co-detection of human and SARS-CoV-2 RNA. FIG 4A Metatranscriptomic classification of all single-cell RNA-seq reads using Kraken2. Results shown from selected respiratory viruses. Only results with greater than 5 reads are shown. FIG. 4B. Normalized abundance of SARS-CoV-2 aligning UMI from all single-cell RNA-seq reads (including those derived from ambient/low-quality cell barcodes). P < 0.0001 by Kruskal-Wallis test. Pairwise comparisons using Dunn's post-hoc testing. ** p < 0.01, *** p < 0.001. FIG. 4C. Proportional abundance of Secretory cells (all) vs. total SARS-CoV-2 UMI (normalized to M total UMI). FIG. 4D. Proportional abundance of FOXJ1 high Ciliated cells vs. total SARS-CoV-2 UMI (normalized to M total UMI). FIG. 4E. SARS-CoV-2 UMI per high-quality cell barcode. Results following correction for ambient viral reads. FIG. 4F. Schematic for SARS-CoV-2 genome and subgenomic RNA species. FIG. 4G. Schematic for SARS-CoV-2 genomic features annotated in the custom reference gtf. FIG. 4H. Heatmap of SARS-CoV-2 genes expression among SARS- CoV-2 RNA+ single cells (following correction for ambient viral reads). Top color bar indicates disease and severity cohort (red: COVID-19 WHO 1-5, pink: COVID-19 WHO 6-8, black: COVID-19 convalescent, blue: Control WHO 0). Top heatmap: SARS-CoV-2 genes and regions organized from 5’ to 3’. Bottom heatmap: alignment to 70-mer regions directly surrounding viral transcription regulatory sequence (TRS) sites, suggestive of spliced RNA species (joining of the leader to body regions) vs. unspliced RNA species (alignment across TRS).
[0032] FIGS. 5A-5E - Cellular targets of SARS-CoV-2 in the nasopharynx. FIG. 5A. Summary schematic of top SARS-CoV-2 RNA+ cells, (created with BioRender). FIG. 5B. SARS- CoV-2 RNA+ cell abundance (top) and percent (bottom) per participant. Results following correction for ambient viral reads. FIG. 5C. Abundance of SARS-CoV-2 RNA+ cells by detailed cell type, bars colored by participant. Results following correction for ambient viral reads. FIG. 5D. Dot plot of SARS-CoV-2 RNA presence by sample (columns) and detailed cell types (rows). Dot size reflects fraction of a given participant and cell type containing SARS-CoV-2 RNA (following viral ambient correction). Dot color reflects fraction of aligned reads corresponding to the SARS-CoV-2 positive strand (yellow) vs. negative strand (black). Dot plot across columns: alignment of viral reads by participant, separated by RNA species type. Dot plot across rows: alignment of viral reads by detailed cell type, separated by RNA species type. FIG. 5E. Percent ACE2+ cells vs. percent SARS-CoV-2 RNA+ cells by coarse cell type (left) and detailed cell type (right).
[0033] FIGS. 6A-6F - Intrinsic and bystander responses to SARS-CoV-2 infection. FIG. 6A. Violin plot of selected genes upregulated in SARS-CoV-2 RNA+ cells in at least 3 individual cell type comparisons. Dark red: SARS-CoV-2 RNA+ cells, red: bystander cells from COVID-19 participants, blue: cells from Control participants. From left to right the scale is log(1 + UMI per 10K) FIG. 6B. Enriched gene ontologies among genes consistently up- or down-regulated among SARS-CoV-2 RNA+ cells across cell types. FIG. 6C. Heatmap of genes consistently higher in SARS-CoV-2 RNA+ cells across multiple cell types. Colors represent log fold changes between SARS-CoV-2 RNA+ cells and bystander cells (SARS-CoV-2 RNA- cells, from COVID-19 infected donors) by cell type. Restricted to cell types with at least 5 SARS-CoV-2 RNA+ cells. Yellow: upregulated among SARS-CoV-2 RNA+ cells, blue: upregulated among bystander cells. FIG. 6D. Heatmap of genes consistently higher in bystander cells across multiple cell types. FIG. 6E. Top: Violin plots of SARS-CoV-2 aligning reads among SARS-CoV-2 RNA+ cells. Statistical significance by Wilcoxon rank sum test. Bottom: select differentially expressed genes between SARS-CoV-2 RNA+ cells from participants with mild/moderate COVID-19 (red) vs. severe COVID-19 (pink). Statistical significance by likelihood ratio test assuming an underlying negative binomial distribution. *** FDR-corrected p < 0.001, ** p < 0.01, * p < 0.05. FIG. 6F Percent ACE2+ cells vs. percent SARS-CoV-2 RNA+ cells by detailed cell type. Left: cells from participants with mild/moderate COVID-19. Right: cells from participants with severe COVID- 19. Point size reflects average type I interferon specific module score among SARS-CoV-2 RNA+ cells.
[0034] FIGS. 7A-7N - Participant cohort and cellular composition of nasopharyngeal swabs. FIG. 7 A. Cohort composition and participant demographics. FIG. 7B. IgM and IgG titers among Control WHO 0 and COVID-19 participants. FIG. 7C. Detailed schematic of sample preparation and cell processing from nasal swabs (created with BioRender). FIG. 7D. Single cell quality metrics by cohort (after filtering for low-quality cells). FIG. 7E. Single cell quality metrics by participant (after filtering for low quality cells). FIG. 7F. Quality metrics for matched fresh vs. frozen nasal swabs from two participants (P1 and P2). FIG. 7G. UMAP of cell types from PI . FIG. 7H. UMAP of cell types from P2. FIG. 71. Percent composition of each cell type by fresh (grey circles) or frozen (black squares) processing. FIG. 7J. UMAP from P1 as in G, colored by fresh (grey) vs. frozen (black). FIG. 7K. UMAP from P2 as in H, colored by fresh (grey) vs. frozen (black). FIG. 7L. Comparison of WHO severity at swab and peak. FIG. 7M. Comparison of WHO severity at swab and peak. FIG. 7N. Number of high-quality cells/array recovered for single-cell RNA-seq by disease group. Statistical testing by Kruskal-Wallis test (p = 0.37) with Dunn's post hoc testing, all p > 0.05.
[0035] FIGS. 8A-8G - COVID-19-induced changes to epithelial diversity and differentiation. FIG. 8A. Proportional abundance of detailed epithelial cell types by participant. FIG. 8B. Expression of entry factors for SARS-CoV-2 and other common upper respiratory viruses among detailed epithelial cell types. Dot size represents fraction of cell type (rows) expressing a given gene (columns). Dot hue represents average expression. FIG. 8C. Plot of gene expression by epithelial cell velocity pseudotime. Select genes significantly associated with ciliated cell pseudotime. Points colored by coarse cell type annotations. Top: alignment to unspliced (intronic) regions. Bottom: alignment to spliced (exonic) regions. FIG. 8D. Proportion of Goblet Cell subtypes (detailed annotation) by sample, normalized to all epithelial cells. Statistical test above graph represents Kruskal-Wallis test results across all cohorts (following Bonferroni-correction). FIG. 8E. Flow cytometry and gating scheme of immune cells from a fresh nasopharyngeal (NP) swab. Representative healthy participant. Bottom right: quantification of cellular proportions. FIG. 8F. Flow cytometry and gating scheme of epithelial cells from an NP swab. Representative data from a participant with severe COVID-19. FIG. 8G. Secretory cell proportion of live, CD45- cells from NP swabs. Healthy donors (Control WHO 0): n=7. Severe COVID-19 (COVID-19 WHO 6- 8): n=7. Secretory cells identified as Live, CD45-ATubulin-CD271-CD49f-CD66c+ cells. Statistical testing: Wilcoxon signed-rank test: ** p = 0.0047.
[0036] FIGS. 9A-9L - COVID-19-induced changes to nasopharynx-resident immune cells. FIG. 9A. UMAP of 3,640 immune cells following re-clustering, colored by coarse cell types. FIG. 9B. UMAP as in A, colored by detailed cell annotations. FIG. 9C. UMAP as in A, colored by level of respiratory support (WHO illness severity scale). FIG. 9D. UMAP as in A, colored by SARS-CoV-2 PCR status at time of swab. FIG. 9E. UMAP as in A, colored by participant. FIG. 9F. Violin plots of cluster marker genes (FDR < 0.01) for detailed immune cell type annotations (as in B). FIG. 9G. Proportional abundance of detailed immune cell types by participant. FIG. 9H. Proportion of immune cell subtypes by sample and cohort, normalized to all immune cells. Statistical test above graph represents Kruskal-Wallis test results across all cohorts (following Bonferroni-correction). FIG. 9F Heatmap of significantly DE genes between Macrophages (all, coarse annotation) from different disease cohorts. FIG. 9J. Heatmap of significantly DE genes between T Cells (all, coarse annotation) from different disease cohorts. FIG. 9K. Top: Dot plot of IFNGR1/2 and IFNARl/2 gene expression among all detailed immune subtypes. Bottom: Violin plots of gene module scores, split by Control WHO 0 (blue), COVID-19 WHO 1-5 (red), and COVID-19 WHO 6-8 (pink). Gene modules represent transcriptional responses of human basal cells from the nasal epithelium following in vitro treatment with IFNA or IFNG. Significance by Wilcoxon signed-rank test. P-values following Bonferroni-correction: * p< 0.05, ** p < 0.01, *** p < 0.001. FIG. 9L. Proportion of interferon responsive macrophages vs. proportion of interferon responsive cytotoxic CD8 T cells per sample, normalized to total immune cells. Including all samples, Control and COVID-19 groups.
[0037] FIGS. 10A-10H - Cell-type specific and shared transcriptional responses to SARS-CoV-2 infection. FIG. 10 A. Abundance of significant differentially expressed genes by coarse cell type between Control WHO 0 and COVID-19 WHO 1-5 samples (left), Control WHO 0 and COVID-19 WHO 6-8 samples (middle) and COVID-19 WHO 1-5 vs. COVID-19 WHO 6- 8 samples (right). FDR-corrected p < 0.001, log2 fold change > 0.25. FIG. 10B. Heatmap of significantly DE genes between Ciliated Cells (all, coarse annotation) from different disease cohorts. FIG. 1OC. Venn diagram of significantly upregulated genes among Ciliated Cells between COVID-19 WHO 1-5 vs Control WHO 0 (red) and COVID-19 WHO 6-8 vs. Control WHO 0 (pink). Asterisk: genes impacted by steroid treatment within each cohort. FIG. 10D. Interferon gene module scores across all detailed epithelial cell types, split by Control WHO 0 (blue), COVID-19 WHO 1-5 (red), and COVID-19 WHO 6-8 (pink). Gene modules represent transcriptional responses of human basal cells from the nasal epithelium following in vitro treatment with IFNA or IFNG. FIG. 10E. Dot plot of ACE2 expression across select coarse and detailed epithelial cell types and subsets. FIG. 10F. Dot plot of interferon and cytokine expression among detailed epithelial and immune cell types. FIG. 10G. Violin plots of select genes upregulated among ciliated cells in COVID-19 WHO 1-5 participants compared to Control WHO 0 (PARP14, ISG15) and in COVID-19 WHO 6-8 participants compared to Control WHO 0 ( FKBP5 ). Cells separated by participant treatment with corticosteroids. *** FDR-corrected p < 0.001. FIG. 10H. Dot plot of type I and type III interferons among ciliated, goblet, and squamous cells. Left: healthy vs. influenza A/B virus infected participants from Cao et al., 2020. Right: Control WHO 0 vs. COVID-19 WHO 1-5, vs. COVID-19 WHO 6-8 participants. Datasets processed and scaled identically.
[0038] FIGS. 11A-11J - Detection of SARS-CoV-2 RNA from single-cell RNA-seq data.
FIG. 11 A. Metatranscriptomic classification of all single-cell RNA-seq reads using Kraken2: reads per sample annotated as unclassified. FIG. 11B. Metatranscriptomic classification of all single- cell RNA-seq reads using Kraken2: reads per sample annotated as Homo sapiens. FIG. 11C. Metatranscriptomic classification of all single-cell RNA-seq reads using Kraken2: reads per sample annotated as SARS-related coronaviruses. FIG. 11D. Total recovered cells per sample vs. normalized abundance of SARS-CoV-2 aligning UMI from all single-cell RNA-seq reads (including those derived from ambient/low-quality cell barcodes). FIG. 11E. Normalized abundance of SARS-CoV-2 aligning UMI from all single-cell RNA-seq reads across all COVID- 19 participants. Dashed line represents partition between “Viral High” vs “Viral Low” samples. FIG. 11F. Proportional abundance of selected cell types according to total SARS-CoV-2 abundance among COVID-19 samples. Statistical test above graph represents Kruskal -Wallis test statistic across all cohorts. Statistical significance asterisks within box represent significant results from Dunn's post-hoc testing. Bonferroni-corrected p-value: * p < 0.05, ** p < 0.01, *** p < 0.001. FIG. 11G. Abundance of SARS-CoV-2 aligning UMI/cell by participant prior to (top) and following (bottom) ambient viral RNA correction. FIG. 11H. Quality metrics among 415 SARS- CoV-2 RNA+ cells (associated with high-quality cell barcodes and following ambient viral RNA correction). Left: abundance of SARS-CoV-2 aligning UMI vs. percent of all aligned reads (per cell barcode) aligning to SARS-CoV-2. Middle: abundance of human (GRCh38)-aligning UMI vs. abundance of SARS-CoV-2 aligning UMF Right: abundance of human (GRCh38) aligning UMI vs. percent of all aligned reads (per cell barcode) aligning to human genes. FIG. 11I. Normalized abundance of SARS-CoV-2 aligning UMI vs. anti-SARS-CoV-2 IgM (left) or IgG titers (right). Plasma samples taken on same day of nasopharyngeal swab. Subset of Control WHO 0 (blue circles, n=13) and COVID-19 (red circles, mild/moderate: n=8; pink squares, severe: n=15) participants. Dashed lines: lower limit of detection: 100; upper limit of detection: 100,000; positive threshold: 5,000. Pearson's correlation of COVID-19 samples: IgM: r = -0.59, ** p = 0.0028; IgG: r = -0.60, ** p = 0.0025. FIG. 11 J. Percent SARS-CoV-2 RNA+ cells (associated with high-quality cell barcodes and following ambient viral RNA correction) per donor, separated by disease group. Statistical test above graph represents Kruskal-Wallis test statistic across all groups. Statistical significance asterisks within box represent significant results from Dunn’s post-hoc testing. * p < 0.05, ** p 0.01.
[0039] FIGS. 12A-12H - SARS-CoV-2 RNA species and cell types containing viral reads.
FIG. 12A. Schematic of method to distinguish unspliced from spliced SARS-CoV-2 RNA species by searching for reads which align across a spliced or genomic Transcription Regulatory Sequence (TRS, 6mer). FIG. 12B. Abundance of SARS-CoV-2 aligning UMI/Cell per detailed cell type (following ambient viral RNA correction), split by UMI aligning to the viral positive strand, negative strand, 70-mer region across an unspliced TRS, and 70-mer region across a spliced TRS. FIG. 12C. Abundance of SARS-CoV-2 aligning UMI/Cell per participant (following ambient viral RNA correction), split by UMI aligning to the viral positive strand, negative strand, 70-mer region across an unspliced TRS, and 70-mer region across a spliced TRS. FIG. 12D. Dot plot of SARS- CoV-2 unspliced TRS aligning UMI by participant (columns) and detailed cell type (rows). FIG. 12E. Dot plot of SARS-CoV-2 spliced TRS aligning UMI by participant (columns) and detailed cell type (rows). FIG. 12F. Percent ACE2+ cells vs. percent SARS-CoV-2 RNA+ (after ambient correction) by detailed cell type. Including only cells from COVID-19 participants. Statistical testing using spearman’s correlation. FIG. 12G. Abundance of SARS-CoV-2 negative strand aligning reads by coarse epithelial cell types. FIG. 12H. Abundance of SARS-CoV-2 negative strand aligning reads by detailed ciliated cell types.
[0040] FIGS. 13A-13C - Intrinsic and bystander responses to SARS-CoV-2 infection.
FIG. 13A. Violin plots of select genes upregulated in SARS-CoV-2 RNA+ Cells when compared to matched bystanders. Plotting only SARS-CoV-2 RNA+ Cells from COVID-19 WHO 1-5 participants (red) and COVID-19 WHO 6-8 participants (pink). Top row: SARS-CoV-2 RNA expression by alignment type. FIG. 13B. Heatmaps of log fold changes between SARS-CoV-2 RNA+ cells and bystander cells by cell types. Gene sets derived from four CRISPR screens for important host factors in the SARS-CoV-2 viral life cycle. Restricted to cell types with at least 5 SARS-CoV-2 RNA+ cells. Yellow: upregulated among SARS-CoV-2 RNA+ cells, blue: upregulated among bystander cells. FIG. 13C. Heatmap of Spearman's correlation between 73 clinical parameters, demographic data, or results from scRNA-seq. Includes individuals from healthy (Control WHO 0), COVID-19 mild/moderate (COVID-19 WHO 1-5) and COVID-19 severe (COVID-19 WHO 6-8) groups. Colored squares represent statistically significant associations by permutation test (p < 0.01; red: positive Spearman's rho; blue: negative Spearman's rho).
[0041] The figures herein are for illustrative purposes only and are not necessarily drawn to scale.
DETAILED DESCRIPTION OF THE EXAMPLE EMBODIMENTS General Definitions
[0042] Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. Definitions of common terms and techniques in molecular biology may be found in Molecular Cloning: A Laboratory Manual, 2nd edition (1989) (Sambrook, Fritsch, and Maniatis); Molecular Cloning: A Laboratory Manual, 4th edition (2012) (Green and Sambrook); Current Protocols in Molecular Biology (1987) (F.M. Ausubel et al. eds.); the series Methods in Enzymology (Academic Press, Inc.): PCR2: APractical Approach (1995) (M.J. MacPherson, B.D. Hames, and G.R. Taylor eds.): Antibodies, A Laboratory Manual (1988) (Harlow and Lane, eds.): Antibodies A Laboratory Manual, 2nd edition 2013 (E.A. Greenfield ed.); Animal Cell Culture (1987) (R.I. Freshney, ed.); Benjamin Lewin, Genes IX, published by Jones and Bartlet, 2008 (ISBN 0763752223); Kendrew etal. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0632021829); Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 9780471185710); Singleton etal ., Dictionary of Microbiology and Molecular Biology 2nd ed., J. Wiley & Sons (New York, N.Y. 1994), March, Advanced Organic Chemistry Reactions, Mechanisms and Structure 4th ed., John Wiley & Sons (New York, N.Y. 1992); and Marten H. Hofker and Jan van Deursen, Transgenic Mouse Methods and Protocols, 2nd edition (2011). [0043] As used herein, the singular forms “a”, “an”, and “the” include both singular and plural referents unless the context clearly dictates otherwise.
[0044] The term “optional” or “optionally” means that the subsequent described event, circumstance or substituent may or may not occur, and that the description includes instances where the event or circumstance occurs and instances where it does not.
[0045] The recitation of numerical ranges by endpoints includes all numbers and fractions subsumed within the respective ranges, as well as the recited endpoints.
[0046] The terms “about” or “approximately” as used herein when referring to a measurable value such as a parameter, an amount, a temporal duration, and the like, are meant to encompass variations of and from the specified value, such as variations of +/-10% or less, +1-5% or less, +/- 1% or less, and +/-0.1% or less of and from the specified value, insofar such variations are appropriate to perform in the disclosed invention. It is to be understood that the value to which the modifier “about” or “approximately” refers is itself also specifically, and preferably, disclosed. [0047] As used herein, a “biological sample” may contain whole cells and/or live cells and/or cell debris. The biological sample may contain (or be derived from) a “bodily fluid”. The present invention encompasses embodiments wherein the bodily fluid is selected from amniotic fluid, aqueous humour, vitreous humour, bile, blood serum, breast milk, cerebrospinal fluid, cerumen (earwax), chyle, chyme, endolymph, perilymph, exudates, feces, female ejaculate, gastric acid, gastric juice, lymph, mucus (including nasal drainage and phlegm), pericardial fluid, peritoneal fluid, pleural fluid, pus, rheum, saliva, sebum (skin oil), semen, sputum, synovial fluid, sweat, tears, urine, vaginal secretion, vomit and mixtures of one or more thereof. Biological samples include cell cultures, bodily fluids, cell cultures from bodily fluids. Bodily fluids may be obtained from a mammal organism, for example by puncture, or other collecting or sampling procedures. [0048] The terms “subject,” “individual,” and “patient” are used interchangeably herein to refer to a vertebrate, preferably a mammal, more preferably a human. Mammals include, but are not limited to, murines, simians, humans, farm animals, sport animals, and pets. Tissues, cells and their progeny of a biological entity obtained in vivo or cultured in vitro are also encompassed. [0049] Various embodiments are described hereinafter. It should be noted that the specific embodiments are not intended as an exhaustive description or as a limitation to the broader aspects discussed herein. One aspect described in conjunction with a particular embodiment is not necessarily limited to that embodiment and can be practiced with any other embodiment(s). Reference throughout this specification to “one embodiment”, “an embodiment,” “an example embodiment,” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment,” “in an embodiment,” or “an example embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to a person skilled in the art from this disclosure, in one or more embodiments. Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention. For example, in the appended claims, any of the claimed embodiments can be used in any combination.
[0050] Reference is made to a manuscript entitled "Impaired local intrinsic immunity to SARS-CoV-2 infection in severe COVID-19," uploaded to Biorxiv on February 18, 2021 and having the following authors: Carly G. K. Ziegler, Vincent N. Miao, Anna H. Owings, Andrew W. Navia, Ying Tang, Joshua D. Bromley, Peter Lotfy, Meredith Sloan, Hannah Laird, Haley B. Williams, Micayla George, Riley S. Drake, Taylor Christian, Adam Parkerl, Campbell B. Sindel, Molly W. Burger, Yilianys Pride, Mohammad Hasan, George E. Abraham III, Michal Senitko, Tanya O. Robinson, Alex K. Shalek, Sarah C. Glover, Bruce H. Horwitz, Jose Ordovas-Montanes. Reference is also made to US Patent Application 16/631,898, published as US20200158716A1 and claiming priority to PCT/US2018/042557. Reference is also made to Ziegler CGK, Miao VN, Owings AH, et al. Impaired local intrinsic immunity to SARS-CoV-2 infection in severe COVID- 19. Cell. 2021 ; 184(18):4713-4733.e22. doi:10.1016/j.cell.2021.07.023. All publications, published patent documents, and patent applications cited herein are hereby incorporated by reference to the same extent as though each individual publication, published patent document, or patent application was specifically and individually indicated as being incorporated by reference. OVERVIEW
[0051] Embodiments disclosed herein provide methods of determining whether a subject is at risk for severe respiratory disease from a coronavirus infection and treating subjects at risk prophylactically or subjects suffering from severe respiratory disease. SARS-CoV-2, the virus that causes COVID-19, relies on efficient replication within cells of the human upper airways for infection and transmission. In some individuals, the virus accesses lower respiratory tissues, causing pneumonia, acute respiratory distress syndrome, and systemic effects which lead to profound morbidity and mortality. Despite major advances in understanding peripheral correlates of immunity during COVID-19, how SARS-CoV-2 impacts its primary target tissue, the human nasopharynx, remains unclear. Here, Applicants present a cohort of over 60 samples from healthy individuals and participants with COVID-19, representing a wide spectrum of disease states from ambulatory to critically ill. Using standard nasopharyngeal swabs, Applicants collected viable cells and performed single-cell RNA-seq, simultaneously profiling both host and viral RNA. Applicants performed scRNA-seq on nasopharyngeal swabs from 58 healthy and COVID-19 participants. Applicants find that following infection with SARS-CoV-2 the upper respiratory epithelium undergoes massive expansion and diversification of secretory cells and preferential loss of mature ciliated cells. During COVID-19, Applicants observe expansion of secretory, loss of ciliated, and epithelial cell repopulation via deuterosomal expansion. Active repopulation of lost ciliated cells appears to occur through secretory cell transdifferentiation via deuterosomal cell intermediates. Epithelial cells from participants with mild/moderate COVID-19 showed extensive induction of genes associated with anti-viral and type I interferon responses. In contrast, cells from participants with severe lower respiratory symptoms appear globally stunted in their anti -viral capacity, despite substantially higher local inflammatory myeloid populations and equivalent nasal viral loads: suggesting an essential role for intrinsic, local epithelial immunity in curbing and constraining viral infection. In mild/moderate COVID-19, epithelial cells express anti -viral/interferon- responsive genes, while cells in severe COVID-19 have muted anti -viral responses despite equivalent viral loads. Through a custom computational pipeline, Applicants characterized cell- associated SARS-CoV-2 RNA and identified rare cells with RNA intermediates strongly suggestive of active replication. Among SARS-CoV-2 RNA+ host cells, Applicants found remarkable diversity and heterogeneity both within and across individuals, including developing/immature and interferon-responsive ciliated cells, KRT13+ “hillock”-like cells, and unique subsets of secretory, goblet, and squamous cells. SARS-CoV-2 RNA+ host-target cells are highly heterogenous, including developing ciliated, interferon-responsive ciliated, AZGP1high goblet, and KRT13+ “hillock”-like cells, and Applicants identify genes associated with susceptibility, resistance, or infection response. Finally, among SARS-CoV-2 RNA+ cells, Applicants detected genes that were enriched compared to uninfected bystanders, suggesting involvement in either the cell-intrinsic response or susceptibility to infection. These included anti- viral genes (e.g., MX1, IFITM3, EIF2AK2), proteases (e.g., CTSL, TMPRSS2), and pathways involved in cholesterol biosynthesis. Together, this work defines the protective and detrimental host responses to SARS-CoV-2, determines the direct viral targets of infection, and suggests that failed cell-intrinsic anti-viral epithelial immunity in the nasal mucosa underlies the progression to severe COVID-19. The study defines protective and detrimental responses to SARS-CoV-2, the direct viral targets of infection, and suggests that failed nasal epithelial anti-viral immunity may underlie and precede severe COVID-19.
[0052] The present invention stratifies subjects based on their risk of developing severe respiratory disease or if the subject is predicted to have mild/moderate disease. The present invention also provides for predicting the risk of developing severe respiratory disease in subjects who initially present as asymptomatic or as mild/moderate disease. As used herein, the terms “severe” refers to a subject having intubation and mechanical ventilation, ventilation with additional organ support, or death. As used herein, the terms “mild” refers to a subject having no limitation of activities, limitation of activities, hospitalized and no oxygen therapy, oxygen by mask or nasal prongs, non-invasive ventilation or high-flow oxygen. As used herein, the terms “moderate” refers to a subject having no limitation of activities, limitation of activities, hospitalized and no oxygen therapy, oxygen by mask or nasal prongs, non-invasive ventilation or high-flow oxygen.
Figure imgf000023_0001
[0053] The present invention provides for cell subsets and cell states identified using single cell RNA sequencing of nasopharyngeal swabs from a large patient cohort of SARS-CoV-2 positive subjects. As used herein cell subsets refers to a cell that can be distinguished by a parent cell type, but expresses a specific gene signature or cell state that can further distinguish the cell from other cells of the parent cell type. As used herein, cell subsets are also referred to by a cluster (i.e., the different cell subsets cluster together). In certain embodiments, shifts in cell types or subsets of a cell type are used to predict a disease state and for selecting a treatment. In certain embodiments, shifts in cell states in cell types or subsets of a cell type and are used to predict a disease state and for selecting a treatment. As used herein, cell state refers to the expression of genes in specific cell subsets. As used herein, gene expression is not limited to mRNA expression and may also include proteins. In certain embodiments, the cell subset frequency and/or cell states can be detected for screening novel therapeutics. The present invention provides for subsets of epithelial cell types and immune cells. In certain embodiments, intrinsic immune responses are differentially induced in different patient populations (e.g., severe, mild or moderate). In certain embodiments, intrinsic immune states or conditions are monitored or detected during treatment. In certain embodiments, the frequency of the cell subsets are shifted in disease states. Disease states may include disease severity or response to any treatment in the standard of care for the disease. [0054] In certain embodiments, one or more cell subsets associated with a disease state or risk group is detected or shifted to a treat a subject in need thereof. In certain embodiments, the cell subsets can be identified using one or more marker genes specific for the subset. In certain embodiments, the cell subsets that are shifted include KRT13 KRT24 high Secretory Cells, Early Response Secretory Cells, CXCL8 Secretory Cells, AZGP1 high Goblet Cells, SCGB1A1 high Goblet Cells, IFI27; IFIT1; IFI6; IFITM3; and GBP3 ciliated cells, any IFN gene ciliated cells, any IFN goblet cells, ACE2 epithelial cells, ACE2 secretory cells, ACE2 goblet cells, ACE2 ciliated cells, ACE2 developing ciliated cells, ACE2 deuterosomal cells, BEST4 high cilia high ciliated cells. Applicants have identified specific markers for each cell subset using single cell RNA sequencing (scRNA-seq) (see, e.g., Table 1). In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 or more genes are detected. In certain embodiments, detecting 2 or more of the subset markers increases the probability of detecting a cell subset.
[0055] In certain embodiments, specific cell types or cell subtypes differentially express genes based on the disease state or risk of the disease state. Applicants have identified specific differentially expressed genes in specific cell types using single cell RNA sequencing (scRNA- seq). In particular, Applicants identified differentially expressed genes in specific cell types between subjects having different severity of disease (see, e.g., Tables 2-4). In certain embodiments, genes differentially expressed between WHO score 0 (healthy) and WHO score 1- 5 (mild/moderate) (Table 2) indicate genes that are expressed in subjects to reduce virus severity. In certain embodiments, a treatment would increase expression of one or more of these genes. In certain embodiments, detection of one or more of these genes indicates that the subject does not have a severe disease or risk of severe disease. In certain embodiments, genes differentially expressed between WHO score 0 (healthy) and WHO score 6-8 (severe) (Table 3) indicate genes that are expressed in subjects to reduce virus severity and/or generate an intrinsic immune response that leads to severe disease. In certain embodiments, a treatment would decrease expression of one or more of these genes. In certain embodiments, detection of one or more of these genes indicates that the subject has a severe disease or risk of severe disease. In certain embodiments, genes differentially expressed between WHO score 1-5 (mild/moderate) and WHO score 6-8 (severe) (Table 4) indicate genes that are expressed in subjects generate an intrinsic immune response that leads to severe disease. In certain embodiments, a treatment would decrease expression of one or more of these genes. In certain embodiments, detection of one or more of these genes indicates that the subject has a severe disease or risk of severe disease.
[0056] In certain embodiments, a cell state associated with a disease state or risk group is detected or shifted to a treat a subject in need thereof. In certain embodiments, the cell states can be identified using one or more differentially expressed genes in specific cell types between risk groups. In certain embodiments, 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 or more genes are detected. In certain embodiments, 10, 20, 30, 40, 50, 60, 70, 80, 90 or more than 100 genes are detected. In certain embodiments, detecting 2 or more of the differentially expressed genes increases the probability of detecting a subject having a cell state indicative of a specific intrinsic immune state and risk of severe disease.
[0057] In certain embodiments, the methods of the present invention use control values for the frequency of subsets and cell states. For example, the present nasal swab single cell atlas provides for the frequency of cell subsets and cell states for each of healthy WHO score 0 and COVID WHO score 1-8 subjects. Cells such as disclosed herein may in the context of the present specification be said to “comprise the expression” or conversely to “not express” one or more markers, such as one or more genes or gene products; or be described as “positive” or conversely as “negative” for one or more markers, such as one or more genes or gene products; or be said to “comprise” a defined “gene or gene product signature”.
[0058] Such terms are commonplace and well-understood by the skilled person when characterizing cell phenotypes. By means of additional guidance, when a cell is said to be positive for or to express or comprise expression of a given marker, such as a given gene or gene product, a skilled person would conclude the presence or evidence of a distinct signal for the marker when carrying out a measurement capable of detecting or quantifying the marker in or on the cell. Suitably, the presence or evidence of the distinct signal for the marker would be concluded based on a comparison of the measurement result obtained for the cell to a result of the same measurement carried out for a negative control (for example, a cell known to not express the marker) and/or a positive control (for example, a cell known to express the marker). Where the measurement method allows for a quantitative assessment of the marker, a positive cell may generate a signal for the marker that is at least 1.5-fold higher than a signal generated for the marker by a negative control cell or than an average signal generated for the marker by a population of negative control cells, e.g., at least 2-fold, at least 4-fold, at least 10-fold, at least 20-fold, at least 30-fold, at least 40-fold, at least 50-fold higher or even higher. Further, a positive cell may generate a signal for the marker that is 3.0 or more standard deviations, e.g., 3.5 or more, 4.0 or more, 4.5 or more, or 5.0 or more standard deviations, higher than an average signal generated for the marker by a population of negative control cells. In regard to frequency, a cell subset may be present or not present. In certain embodiments, a cell subset may be 5, 10, 20, 30, 40, 50, 60, 70, 80 or 90% more frequent in a parent cell population as compared to a control level.
[0059] In certain embodiments, the cell state is a gene program comprising one or more up and down regulated genes. Clusters (subsets) and gene programs as described herein can also be described as a metagene. As used herein a “metagene” refers to a pattern or aggregate of gene expression and not an actual gene. Each metagene may represent a collection or aggregate of genes behaving in a functionally correlated fashion within the genome. The metagene can be increased if the pattern is increased. As used herein the term “gene program” or “program” can be used interchangeably with “cell state”, “biological program”, “expression program”, “transcriptional program”, “expression profile”, “signature”, “gene signature” or “expression program” and may refer to a set of genes that share a role in a biological function (e.g., an antiviral program, inflammatory program, cell differentiation program, proliferation program). Biological programs can include a pattern of gene expression that result in a corresponding physiological event or phenotypic trait (e.g., inflammation). Biological programs can include up to several hundred genes that are expressed in a spatially and temporally controlled fashion. Expression of individual genes can be shared between biological programs. Expression of individual genes can be shared among different single cell subtypes; however, expression of a biological program may be cell subtype specific or temporally specific (e.g., the biological program is expressed in a cell subtype at a specific time). Multiple biological programs may include the same gene, reflecting the gene's roles in different processes. Expression of a biological program may be regulated by a master switch, such as a nuclear receptor or transcription factor.
[0060] As used herein a “signature” or “gene program” may encompass any gene or genes, protein or proteins, or epigenetic element(s) whose expression profile or whose occurrence is associated with a specific cell type, subtype, or cell state of a specific cell type or subtype within a population of cells. For ease of discussion, when discussing gene expression, any of gene or genes, protein or proteins, or epigenetic element(s) may be substituted. Levels of expression or activity or prevalence may be compared between different cells in order to characterize or identify for instance signatures specific for cell (sub)populations. Increased or decreased expression or activity or prevalence of signature genes may be compared between different cells in order to characterize or identify for instance specific cell (sub)populations. The detection of a signature in single cells may be used to identify and quantitate for instance specific cell (sub)populations. A signature may include a gene or genes, protein or proteins, or epigenetic element(s) whose expression or occurrence is specific to a cell (sub)population, such that expression or occurrence is exclusive to the cell (sub)population. A gene signature as used herein, may thus refer to any set of up- and down-regulated genes that are representative of a cell type or subtype. A gene signature as used herein, may also refer to any set of up- and down-regulated genes between different cells or cell (sub)populations derived from a gene-expression profile. For example, a gene signature may comprise a list of genes differentially expressed in a distinction of interest.
[0061] The signature as defined herein (being it a gene signature, protein signature or other genetic or epigenetic signature) can be used to indicate the presence of a cell type, a subtype of the cell type, the state of the microenvironment of a population of cells, a particular cell type population or subpopulation, and/or the overall status of the entire cell (sub)population. Furthermore, the signature may be indicative of cells within a population of cells in vivo. The signature may also be used to suggest for instance particular therapies, or to follow up treatment, or to suggest ways to modulate immune systems. The presence of subtypes or cell states may be determined by subtype specific or cell state specific signatures. The presence of these specific cell (sub)types or cell states may be determined by applying the signature genes to bulk sequencing data in a sample. Not being bound by a theory the signatures of the present invention may be microenvironment specific, such as their expression in a particular spatio-temporal context. Not being bound by a theory, signatures as discussed herein are specific to a particular pathological context. Not being bound by a theory, a combination of cell subtypes having a particular signature may indicate an outcome. Not being bound by a theory, the signatures can be used to deconvolute the network of cells present in a particular pathological condition. Not being bound by a theory the presence of specific cells and cell subtypes are indicative of a particular response to treatment, such as including increased or decreased susceptibility to treatment. The signature may indicate the presence of one particular cell type. In one embodiment, the novel signatures are used to detect multiple cell states or hierarchies that occur in subpopulations of immune cells that are linked to particular pathological condition (e.g., inflammation), or linked to a particular outcome or progression of the disease (e.g., autoimmunity), or linked to a particular response to treatment of the disease.
[0062] The signature according to certain embodiments of the present invention may comprise or consist of one or more genes, proteins and/or epigenetic elements, such as for instance 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of two or more genes, proteins and/or epigenetic elements, such as for instance 2, 3, 4, 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of three or more genes, proteins and/or epigenetic elements, such as for instance 3, 4, 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of four or more genes, proteins and/or epigenetic elements, such as for instance 4, 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of five or more genes, proteins and/or epigenetic elements, such as for instance 5, 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of six or more genes, proteins and/or epigenetic elements, such as for instance 6, 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of seven or more genes, proteins and/or epigenetic elements, such as for instance 7, 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of eight or more genes, proteins and/or epigenetic elements, such as for instance 8, 9, 10 or more. In certain embodiments, the signature may comprise or consist of nine or more genes, proteins and/or epigenetic elements, such as for instance 9, 10 or more. In certain embodiments, the signature may comprise or consist of ten or more genes, proteins and/or epigenetic elements, such as for instance 10, 11, 12, 13, 14, 15, or more. It is to be understood that a signature according to the invention may for instance also include genes or proteins as well as epigenetic elements combined.
[0063] It is to be understood that “differentially expressed” genes/proteins include genes/proteins which are up- or down-regulated as well as genes/proteins which are turned on or off. When referring to up-or down-regulation, in certain embodiments, such up- or down- regulation is preferably at least two-fold, such as two-fold, three-fold, four-fold, five-fold, or more, such as for instance at least ten-fold, at least 20-fold, at least 30-fold, at least 40-fold, at least 50- fold, or more. Alternatively, or in addition, differential expression may be determined based on common statistical tests, as is known in the art.
[0064] As discussed herein, differentially expressed genes/proteins, or differential epigenetic elements may be differentially expressed on a single cell level, or may be differentially expressed on a cell population level. Preferably, the differentially expressed genes/ proteins or epigenetic elements as discussed herein, such as constituting the gene signatures as discussed herein, when as to the cell population level, refer to genes that are differentially expressed in all or substantially all cells of the population (such as at least 80%, preferably at least 90%, such as at least 95% of the individual cells). This allows one to define a particular subpopulation of tumor cells. As referred to herein, a “subpopulation” of cells preferably refers to a particular subset of cells of a particular cell type which can be distinguished or are uniquely identifiable and set apart from other cells of this cell type. The cell subpopulation may be phenotypically characterized, and is preferably characterized by the signature as discussed herein. A cell (sub)population as referred to herein may constitute of a (sub)population of cells of a particular cell type characterized by a specific cell state.
[0065] When referring to induction, or alternatively suppression of a particular signature, preferable is meant induction or alternatively suppression (or upregulation or downregulation) of at least one gene/protein and/or epigenetic element of the signature, such as for instance at least two, at least three, at least four, at least five, at least six, or all genes/proteins and/or epigenetic elements of the signature. [0066] As used herein, all gene name symbols refer to the gene as commonly known in the art. The examples described herein that refer to the human gene names are to be understood to also encompasses mouse genes, as well as genes in any other organism (e.g., homologous, orthologous genes). Any reference to the gene symbol is a reference made to the entire gene or variants of the gene. Any reference to the gene symbol is also a reference made to the gene product (e.g., protein). The term, homolog, may apply to the relationship between genes separated by the event of speciation (e.g., ortholog). Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution. Gene symbols may be those referred to by the HUGO Gene Nomenclature Committee (HGNC) or National Center for Biotechnology Information (NCBI). The signature as described herein may encompass any of the genes described herein.
Diseases
[0067] In certain embodiments, the disease is a viral infection. In certain embodiments, the virus infects a barrier tissue. As used herein a “barrier cell” or “barrier tissues” refers generally to various epithelial tissues of the body such, but not limited to, those that line the respiratory system, digestive system, urinary system, and reproductive system as well as cutaneous systems. The epithelial barrier may vary in composition between tissues but is composed of basal and apical components, or crypt/villus components in the case of intestine.
[0068] In certain embodiments, the disease is caused by a differential immune response (e.g., subjects have different immune responses to SARS-CoV-2 which affects severity of COVID-19 disease). In certain embodiments, immune responses are coordinated by immune cells and epithelial cells. The term “immune cell” as used throughout this specification generally encompasses any cell derived from a hematopoietic stem cell that plays a role in the immune response. The term is intended to encompass immune cells both of the innate or adaptive immune system. The immune cell as referred to herein may be a leukocyte, at any stage of differentiation (e.g., a stem cell, a progenitor cell, a mature cell) or any activation stage. Immune cells include lymphocytes (such as natural killer cells, T-cells (including, e.g., thymocytes, Th or Tc; Th1, Th2, Th17, Thαβ , CD4+, CD8+, effector Th, memory Th, regulatory Th, CD4+/CD8+ thymocytes, CD4-/CD8- thymocytes, γδ T cells, etc.) or B-cells (including, e.g., pro-B cells, early pro-B cells, late pro-B cells, pre-B cells, large pre-B cells, small pre-B cells, immature or mature B-cells, producing antibodies of any isotype, T1 B-cells, T2, B-cells, naive B-cells, GC B-cells, plasmablasts, memory B-cells, plasma cells, follicular B-cells, marginal zone B-cells, B-1 cells, B-2 cells, regulatory B cells, etc.), such as for instance, monocytes (including, e.g., classical, non- classical, or intermediate monocytes), (segmented or banded) neutrophils, eosinophils, basophils, mast cells, histiocytes, microglia, including various subtypes, maturation, differentiation, or activation stages, such as for instance hematopoietic stem cells, myeloid progenitors, lymphoid progenitors, myeloblasts, promyelocytes, myelocytes, metamyelocytes, monoblasts, promonocytes, lymphoblasts, prolymphocytes, small lymphocytes, macrophages (including, e.g., Kupffer cells, stellate macrophages, Ml or M2 macrophages), (myeloid or lymphoid) dendritic cells (including, e.g., Langerhans cells, conventional or myeloid dendritic cells, plasmacytoid dendritic cells, mDC-1, mDC-2, Mo-DC, HP -DC, veiled cells), granulocytes, polymorphonuclear cells, antigen-presenting cells (APC), etc. As used throughout this specification, “immune response” refers to a response by a cell of the immune system, such as a B cell, T cell (CD4+ or CD8+), regulatory T cell, antigen-presenting cell, dendritic cell, monocyte, macrophage, NKT cell, NK cell, basophil, eosinophil, or neutrophil, to a stimulus. In some embodiments, the response is specific for a particular antigen (an “antigen-specific response”) and refers to a response by a CD4 T cell, CD8 T cell, or B cell via their antigen-specific receptor. In some embodiments, an immune response is a T cell response, such as a CD4+ response or a CD8+ response. Such responses by these cells can include, for example, cytotoxicity, proliferation, cytokine or chemokine production, trafficking, or phagocytosis, and can be dependent on the nature of the immune cell undergoing the response. An immune response can also be an innate immune response (see, e.g., Artis D, Spits H. The biology of innate lymphoid cells. Nature. 2015;517(7534):293-301).
[0069] In certain embodiments, the viral infection is a coronavirus infection. As used herein, “coronavirus” refers to enveloped viruses with a positive-sense single-stranded RNA genome and a nucleocapsid of helical symmetry that constitute the subfamily Orthocoronavirinae, in the family Coronaviridae (see, e.g., Woo PC, Huang Y, Lau SK, Yuen KY. Coronavirus genomics and bioinformatics analysis. Viruses. 2010;2(8): 1804-1820). The present disclosure relates to and/or involves SARS-CoV-2. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the virus causing the ongoing Coronavirus Disease 19 (COVID19) pandemic (see, e.g., Zhou, et al. (2020). A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270-273). In preferred embodiments, the virus is SARS-CoV-2 or variants thereof. In preferred embodiments, the disease treated is COVID-19. SARS-CoV-2 is the third zoonotic betacoronavirus to cause a human outbreak after SARS-CoV in 2002 and Middle East respiratory syndrome coronavirus (MERS-CoV) in 2012 (de Wit et al., 2016, SARS and MERS: recent insights into emerging coronaviruses. Nat Rev Microbiol 14, 523-534). As used herein, the term “variant” refers to any virus having one or more mutations as compared to a known virus. A strain is a genetic variant or subtype of a virus. The terms 'strain', 'variant', and 'isolate' may be used interchangeably. In certain embodiments, a variant has developed a “specific group of mutations” that causes the variant to behave differently than that of the strain it originated from.
[0070] While there are many thousands of variants of SARS-CoV-2, (Koyama, Takahiko Koyama; Platt, Daniela; Parida, Laxmi (June 2020). “Variant analysis of SARS-CoV-2 genomes”. Bulletin of the World Health Organization. 98: 495-504) there are also much larger groupings called clades. Several different clade nomenclatures for SARS-CoV-2 have been proposed. As of December 2020, GISAID, referring to SARS-CoV-2 as hCoV-19 identified seven clades (O, S, L, V, G, GH, and GR) (Aim E, Broberg EK, Connor T, et al. Geographical and temporal distribution of SARS-CoV-2 clades in the WHO European Region, January to June 2020 [published correction appears in Euro Surveill. 2020 Aug;25(33):]. Euro Surveill. 2020;25(32):2001410). Also as of December 2020, Nextstrain identified five (19A, 19B, 20A, 20B, and 20C) (Cited in Aim et al. 2020). Guan et al. identified five global clades (G614, S84, V251, 1378 and D392) (Guan Q, Sadykov M, Mfarrej S, et al. A genetic barcode of SARS-CoV-2 for monitoring global distribution of different clades during the COVID-19 pandemic. Int J Infect Dis. 2020;100:216-223). Rambaut et al. proposed the term “lineage” in a 2020 article in Nature Microbiology; as of December 2020, there have been five major lineages (A, B, B.1, B.1.1, and B.1.777) identified (Rambaut, A.; Holmes, E.C.; O'Toole, Á.; et al. “A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology”. 5: 1403-1407).
[0071] Genetic variants of SARS-CoV-2 have been emerging and circulating around the world throughout the COVID-19 pandemic (see, e.g., The US Centers for Disease Control and Prevention; www.cdc.gov/coronavirus/2019-ncov/variants/variant-info.html). Exemplary, non- limiting variants applicable to the present disclosure include variants of SARS-CoV-2, particularly those having substitutions of therapeutic concern. Table A shows exemplary, non-limiting genetic substitutions in SARS-CoV-2 variants.
Figure imgf000033_0001
Phylogenetic Assignment of Named Global Outbreak (PANGO) Lineages is software tool developed by members of the Rambaut Lab. The associated web application was developed by the Centre for Genomic Pathogen Surveillance in South Cambridgeshire and is intended to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the PANGO nomenclature. It is available at cov-lineages.org.
[0072] In some embodiments, the SARS-CoV-2 variant is and/or includes: B.1.1.7, also known as Alpha (WHO) or UK variant, having the following spike protein substitutions: 69del, 70del, 144del, (E484K*), (S494P*), N501Y, A570D, D614G, P681H, T716I, S982A, and D1118H (K1191N*); B.1.351, also known as Beta (WHO) or South Africa variant, having the following spike protein substitutions: D80A, D215G, 241del, 242del, 243del, K417N, E484K, N501Y, D614G, and A701V; B.1.427, also known as Epsilon (WHO) or US California variant, having the following spike protein substitutions: L452R, and D614G; B.1.429, also known as Epsilon (WHO) or US California variant, having the following spike protein substitutions: S13I, W152C, L452R, and D614G; B.1.617.2, also known as Delta (WHO) or India variant, having the following spike protein substitutions: T19R, (G142D), 156del, 157del, R158G, L452R, T478K, D614G, P681R, and D950N; P.1, also known as Gamma (WHO) or Japan/Brazil variant, having the following spike protein substitutions: L18F, T20N, P26S, D138Y, R190S, K417T, E484K, N501Y, D614G, H655Y, and T1027I; and B.1.1.529 also known as Omicron (WHO), having the following spike protein substitutions: A67V, del69-70, T95I, dell42-144, Y145D, del211, L212I, ins214EPE, G339D, S371L, S373P, S375F, K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, Y505H, T547K, D614G, H655Y, N679K, P681H, N764K, D796Y, N856K, Q954H, N969K, L981F, or any combination thereof.
[0073] In some embodiments, the SARS-CoV-2 variant is classified and/or otherwise identified as a Variant of Concern (VOC) by the World Health Organization and/or the U.S. Centers for Disease Control. A VOC is a variant for which there is evidence of an increase in transmissibility, more severe disease (e.g., increased hospitalizations or deaths), significant reduction in neutralization by antibodies generated during previous infection or vaccination, reduced effectiveness of treatments or vaccines, or diagnostic detection failures.
[0074] In some embodiments, the SARS-Cov-2 variant is classified and/or otherwise identified as a Variant of High Consequence (VHC) by the World Health Organization and/or the U.S. Centers for Disease Control. A variant of high consequence has clear evidence that prevention measures or medical countermeasures (MCMs) have significantly reduced effectiveness relative to previously circulating variants.
[0075] In some embodiments, the SARS-Cov-2 variant is classified and/or otherwise identified as a Variant of Interest (VOI) by the World Health Organization and/or the U.S. Centers for Disease Control. A VOI is a variant with specific genetic markers that have been associated with changes to receptor binding, reduced neutralization by antibodies generated against previous infection or vaccination, reduced efficacy of treatments, potential diagnostic impact, or predicted increase in transmissibility or disease severity.
[0076] In some embodiments, the SARS-Cov-2 variant is classified and/or is otherwise identified as a Variant of Note (VON). As used herein, VON refers to both “variants of concern” and “variants of note” as the two phrases are used and defined by Pangolin (cov-lineages.org) and provided in their available “VOC reports” available at cov-lineages.org.
[0077] In some embodiments the SARS-Cov-2 variant is a VOC. In some embodiments, the SARS-CoV-2 variant is or includes an Alpha variant (e.g., Pango lineage B.1.1.7), a Beta variant (e.g., Pango lineage B.1.351, B.1.351.1, B.1.351.2, and/or B.1.351.3), a Delta variant (e.g., Pango lineage B.1.617.2, AY.l, AY.2, AY.3 and/or AY.3.1); a Gamma variant (e.g., Pango lineage P.1, P.1.1, P.1.2, P.1.4, P.1.6, and/or P.1.7), an Omicron variant (B.1.1.529) or any combination thereof.
[0078] In some embodiments the SARS-Cov-2 variant is a VOI. In some embodiments, the SARS-CoV-2 variant is or includes an Eta variant (e.g., Pango lineage B.1.525 (Spike protein substitutions A67V, 69del, 70del, 144del, E484K, D614G, Q677H, F888L)); an Iota variant (e.g., Pango lineage B.1.526 (Spike protein substitutions L5F, (D80G*), T95I, (Y144-*), (F157S*), D253G, (L452R*), (S477N*), E484K, D614G, A701V, (T859N*), (D950H*), (Q957R*))); a Kappa variant (e.g., Pango lineage B.1.617.1 (Spike protein substitutions (T95I), G142D, E154K, L452R, E484Q, D614G, P681R, Q1071H)); Pango lineage variant B.1.617.2 (Spike protein substitutions T19R, G142D, L452R, E484Q, D614G, P681R, D950N)), Lambda (e.g., Pango lineage C.37); or any combination thereof.
[0079] In some embodiments SARS-Cov-2 variant is a VON. In some embodiments, the SARS-Cov-2 variant is or includes Pango lineage variant P.1 (alias, B.1.1.28.1.) as described in Rambaut et al. 2020. Nat. Microbiol. 5:1403-1407) (spike protein substitutions: T20N, P26S, D138Y, R190S, K417T, E484K, N501Y, H655Y, TI027I)); an Alpha variant (e.g., Pango lineage B.1.1.7); a Beta variant (e.g., Pango lineage B.1.351, B.1.351.1, B.1.351.2, and/or B.1.351.3); Pango lineage variant B.1.617.2 (Spike protein substitutions T19R, G142D, L452R, E484Q, D614G, P681R, D950N)); an Eta variant (e.g., Pango lineage B.1.525); Pango lineage variant A.23.1 (as described in Bugembe et al. medRxiv. 2021. doi: https://doi.org/10.1101/2021.02.08.21251393) (spike protein substitutions: F157L, V367F, Q613H, P681R); or any combination thereof.
DIAGNOSTIC METHODS
[0080] In certain embodiments, detecting cell subset markers or differentially expressed genes can be used to determine a treatment for a subject suffering from a disease or stratify a subject based on risk of developing severe disease (e.g., COVID-19). The invention provides biomarkers (e.g., phenotype specific or cell subtype) for the identification, diagnosis, prognosis and manipulation of cell properties, for use in a variety of diagnostic and/or therapeutic indications. Biomarkers in the context of the present invention encompasses, without limitation nucleic acids, proteins, reaction products, and metabolites, together with their polymorphisms, mutations, variants, modifications, subunits, fragments, and other analytes or sample-derived measures. In certain embodiments, biomarkers include the signature genes or signature gene products, and/or cells as described herein.
[0081] The terms “diagnosis” and “monitoring” are commonplace and well -understood in medical practice. By means of further explanation and without limitation the term “diagnosis” generally refers to the process or act of recognising, deciding on or concluding on a disease or condition in a subject on the basis of symptoms and signs and/or from results of various diagnostic procedures (such as, for example, from knowing the presence, absence and/or quantity of one or more biomarkers characteristic of the diagnosed disease or condition).
[0082] The terms “prognosing” or “prognosis” generally refer to an anticipation on the progression of a disease or condition and the prospect (e.g., the probability, duration, and/or extent) of recovery. A good prognosis of the diseases or conditions taught herein may generally encompass anticipation of a satisfactory partial or complete recovery from the diseases or conditions, preferably within an acceptable time period. A good prognosis of such may more commonly encompass anticipation of not further worsening or aggravating of such, preferably within a given time period. A poor prognosis of the diseases or conditions as taught herein may generally encompass anticipation of a substandard recovery and/or unsatisfactorily slow recovery, or to substantially no recovery or even further worsening of such.
[0083] The biomarkers of the present invention are useful in methods of identifying patient populations who would benefit from treatment based on a detected level of expression, activity and/or function of one or more biomarkers. These biomarkers are also useful in monitoring subjects undergoing treatments and therapies for suitable or aberrant response(s) to determine efficaciousness of the treatment or therapy and for selecting or modifying therapies and treatments that would be efficacious in treating, delaying the progression of or otherwise ameliorating a symptom. The biomarkers provided herein are useful for selecting a group of patients at a specific state of a disease with accuracy that facilitates selection of treatments.
[0084] The term “monitoring” generally refers to the follow-up of a disease or a condition in a subject for any changes which may occur over time.
[0085] The terms also encompass prediction of a disease. The terms “predicting” or “prediction” generally refer to an advance declaration, indication or foretelling of a disease or condition in a subject not (yet) having said disease or condition. For example, a prediction of a disease or condition in a subject may indicate a probability, chance or risk that the subject will develop said disease or condition, for example within a certain time period or by a certain age. Said probability, chance or risk may be indicated inter alia as an absolute value, range or statistics, or may be indicated relative to a suitable control subject or subject population (such as, e.g., relative to a general, normal or healthy subject or subject population). Hence, the probability, chance or risk that a subject will develop a disease or condition may be advantageously indicated as increased or decreased, or as fold-increased or fold-decreased relative to a suitable control subject or subject population. As used herein, the term “prediction” of the conditions or diseases as taught herein in a subject may also particularly mean that the subject has a 'positive' prediction of such, i.e., that the subject is at risk of having such (e.g., the risk is significantly increased vis-a- vis a control subject or subject population). The term “prediction of no” diseases or conditions as taught herein as described herein in a subject may particularly mean that the subject has a 'negative' prediction of such, i.e., that the subject's risk of having such is not significantly increased vis-a- vis a control subject or subject population.
[0086] Suitably, an altered quantity or phenotype of the cells in the subject compared to a control subject having normal status or not having a disease indicates response to treatment. Hence, the methods may rely on comparing the quantity of cell populations, biomarkers, or gene or gene product signatures measured in samples from patients with reference values, wherein said reference values represent known predictions, diagnoses and/or prognoses of diseases or conditions as taught herein.
[0087] For example, distinct reference values may represent the prediction of a risk (e.g., an abnormally elevated risk) of having a given disease or condition as taught herein vs. the prediction of no or normal risk of having said disease or condition. In another example, distinct reference values may represent predictions of differing degrees of risk of having such disease or condition. [0088] In a further example, distinct reference values can represent the diagnosis of a given disease or condition as taught herein vs. the diagnosis of no such disease or condition (such as, e.g., the diagnosis of healthy, or recovered from said disease or condition, etc.). In another example, distinct reference values may represent the diagnosis of such disease or condition of varying severity. [0089] In yet another example, distinct reference values may represent a good prognosis for a given disease or condition as taught herein vs. a poor prognosis for said disease or condition. In a further example, distinct reference values may represent varyingly favourable or unfavourable prognoses for such disease or condition.
[0090] Such comparison may generally include any means to determine the presence or absence of at least one difference and optionally of the size of such difference between values being compared. A comparison may include a visual inspection, an arithmetical or statistical comparison of measurements. Such statistical comparisons include, but are not limited to, applying a rule.
[0091] Reference values may be established according to known procedures previously employed for other cell populations, biomarkers and gene or gene product signatures. For example, a reference value may be established in an individual or a population of individuals characterised by a particular diagnosis, prediction and/or prognosis of said disease or condition (i.e., for whom said diagnosis, prediction and/or prognosis of the disease or condition holds true). Such population may comprise without limitation 2 or more, 10 or more, 100 or more, or even several hundred or more individuals.
[0092] A “deviation” of a first value from a second value may generally encompass any direction (e.g., increase: first value > second value; or decrease: first value < second value) and any extent of alteration.
[0093] For example, a deviation may encompass a decrease in a first value by, without limitation, at least about 10% (about 0.9-fold or less), or by at least about 20% (about 0.8-fold or less), or by at least about 30% (about 0.7-fold or less), or by at least about 40% (about 0.6-fold or less), or by at least about 50% (about 0.5-fold or less), or by at least about 60% (about 0.4-fold or less), or by at least about 70% (about 0.3-fold or less), or by at least about 80% (about 0.2-fold or less), or by at least about 90% (about 0.1 -fold or less), relative to a second value with which a comparison is being made.
[0094] For example, a deviation may encompass an increase of a first value by, without limitation, at least about 10% (about 1.1 -fold or more), or by at least about 20% (about 1.2-fold or more), or by at least about 30% (about 1.3-fold or more), or by at least about 40% (about 1.4-fold or more), or by at least about 50% (about 1.5-fold or more), or by at least about 60% (about 1.6- fold or more), or by at least about 70% (about 1.7-fold or more), or by at least about 80% (about 1.8-fold or more), or by at least about 90% (about 1.9-fold or more), or by at least about 100% (about 2-fold or more), or by at least about 150% (about 2.5-fold or more), or by at least about 200% (about 3-fold or more), or by at least about 500% (about 6-fold or more), or by at least about 700% (about 8-fold or more), or like, relative to a second value with which a comparison is being made.
[0095] Preferably, a deviation may refer to a statistically significant observed alteration. For example, a deviation may refer to an observed alteration which falls outside of error margins of reference values in a given population (as expressed, for example, by standard deviation or standard error, or by a predetermined multiple thereof, e.g., ±lxSD or ±2xSD or ±3xSD, or ±lxSE or ±2xSE or ±3xSE). Deviation may also refer to a value falling outside of a reference range defined by values in a given population (for example, outside of a range which comprises ≥40%, ≥ 50%, ≥60%, ≥70%, ≥75% or ≥80% or ≥85% or ≥90% or ≥95% or even ≥100% of values in said population).
[0096] In a further embodiment, a deviation may be concluded if an observed alteration is beyond a given threshold or cut-off. Such threshold or cut-off may be selected as generally known in the art to provide for a chosen sensitivity and/or specificity of the prediction methods, e.g., sensitivity and/or specificity of at least 50%, or at least 60%, or at least 70%, or at least 80%, or at least 85%, or at least 90%, or at least 95%.
[0097] For example, receiver-operating characteristic (ROC) curve analysis can be used to select an optimal cut-off value of the quantity of a given immune cell population, biomarker or gene or gene product signatures, for clinical use of the present diagnostic tests, based on acceptable sensitivity and specificity, or related performance measures which are well-known per se, such as positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+), negative likelihood ratio (LR-), Youden index, or similar.
Stratification of Subjects
[0098] In certain embodiments, the subject is determined to belong to or at risk to progress to the severe risk group if one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more) of proinflammatory cytokines comprising at least one or more of: IL1B, TNF, CXCL8, CCL2, CCL3,
CXCL9, CXCL10, and CXCL11; upregulation of alarmins comprising one or both of: S100A8 and S100A9; 14% - 26% of all epithelial cells are secretory cells; elevated BPIFA1 high Secretory cells; elevated KRT13 KRT24 high secretory cells; macrophage population increase as compared to other immune cells; upregulated genes in ciliated cells comprising one or both of: IL5RA and NLRP1; no increase of at least one or more of: type I, type II, and type III interferon abundance; elevated stress response factors comprising at least one or more of: HSPA8, HSPA1A, and DUSP1; and reduced or absent antiviral/interferon response, and reduced or absent mature ciliated cells is detected. In certain embodiments, the subject is determined to belong to the mild/moderate risk group if one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more) of 4% - 12% of all epithelial cells are Secretory Cells; 10% - 20% of all epithelial cells comprise Interferon Responsive Ciliated Cells; upregulated ciliated cell genes comprising at least one or more of: IFI44L, STAT1, IFITM1, MX1, IFITM3, OAS1, OAS2, OAS3, STAT2, TAP1, HLA-C, ADAR, XAF1, IRF1, CTSS, and CTSB; increase in type I interferon abundance; high expression of interferon-responsive genes; induction of type I interferon responses; and high abundance of IFI6 and IFI27 is detected.
[0099] In certain embodiments, one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more) cell subset markers or differentially expressed genes found in Table 2 are detected in a sample from a subject stratify the subject into the mild/moderate risk group. In certain embodiments, one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more) cell subset markers or differentially expressed genes found in Table 3 are detected in a sample from a subject stratify the subject into the severe risk group. In certain embodiments, one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more) cell subset markers or differentially expressed genes found in Table 3 are detected in a sample from a subject stratify the subject into the mild/moderate risk group or severe risk group. In certain embodiments, one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more) cell subset markers or differentially expressed genes found in Table 5 are detected in a sample from a subject stratify the subject into the risk of developing the disease or having the disease.
Sample Collection
[0100] In some embodiments, a sample can be collected with a nasal swab, endoscopy, polyester tipped swabs, plastic curettes, cytology brushes (Lai PS, et al. J Allergy Clin Immunol. 2015; 136(4)). Tissue samples for diagnosis, prognosis or detecting may be obtained by endoscopy. In one embodiment, a sample may be obtained by endoscopy and analyzed b FACS. As used herein, “endoscopy” refers to a procedure that uses an endoscope to examine the interior of a hollow organ or cavity of the body. The endoscope may include a camera and a light source. The endoscope may include tools for dissection or for obtaining a biological sample. A cutting tool can be attached to the end of the endoscope, and the apparatus can then be used to perform surgery. Applications of endoscopy that can be used with the present invention include, but are not limited to examination of the oesophagus, stomach and duodenum (esophagogastroduodenoscopy); small intestine (enteroscopy); large intestine/colon (colonoscopy, sigmoidoscopy); bile duct; rectum (rectoscopy) and anus (anoscopy), both also referred to as (proctoscopy); respiratory tract; nose (rhinoscopy); lower respiratory tract (bronchoscopy); ear (otoscope); urinary tract (cystoscopy); female reproductive system (gynoscopy); cervix (colposcopy); uterus (hysteroscopy); fallopian tubes (falloposcopy); normally closed body cavities (through a small incision); abdominal or pelvic cavity (laparoscopy); interior of a joint (arthroscopy); or organs of the chest (thoracoscopy and mediastinoscopy).
[0101] In one non-limiting example, nasopharyngeal samples are collected by a trained healthcare provider using FLOQSwabs (Copan 1109 flocked swabs) following the manufacturer's instructions. Collectors don personal protective equipment (PPE), including a gown, non-sterile gloves, a protective N95 mask, a bouffant, and a face shield. The patient's head is tilted back slightly, and the swab is inserted along the nasal septum, above the floor of the nasal passage to the nasopharynx until slight resistance was felt. The swab is then left in place for several seconds to absorb secretions and is slowly removed while rotating swab. The swab is then placed into a cryogenic vial with 900 μL of heat inactivated fetal bovine serum (FBS) and 100 μL of dimethyl sulfoxide (DMSO). Vials are placed into a Mr. Frosty Freezing Container (Thermo Fisher Scientific) for optimal cell preservation. A Mr. Frosty containing the vials is placed in a cooler with dry ice for transportation from patient areas to the laboratory for processing. Once in the laboratory, the Mr. Frosty is placed into a -80°C freezer overnight, and on the next day, the vials are moved to liquid nitrogen storage containers.
[0102] In one non-limiting example, swabs in freezing media (90% FBS/10% DMSO) were stored in liquid nitrogen until immediately prior to dissociation. This approach ensures that all cells and cellular material from the nasal swab (whether directly attached to the nasal swab, or released during the washing and digestion process), are exposed first to DTT for 15 minutes, followed by an Accutase digestion for 30 minutes. Briefly, nasal swabs in freezing media were thawed, and each swab was rinsed in RPMI before incubation in 1 mL RPMI/10 mM DTT (Sigma) for 15 minutes at 37°C with agitation. Next, the nasal swab was incubated in 1 mL Accutase (Sigma) for 30 minutes at 37°C with agitation. The 1 mL RPMI/10 mM DTT from the nasal swab incubation was centrifuged at 400 g for 5 minutes at 4°C to pellet cells, the supernatant was discarded, and the cell pellet was resuspended in 1 mL Accutase and incubated for 30 minutes at 37°C with agitation. The original cryovial containing the freezing media and the original swab washings were combined and centrifuged at 400 g for 5 minutes at 4°C. The cell pellet was then resuspended in RPMI/10 mM DTT, and incubated for 15 minutes at 37°C with agitation, centrifuged as above, the supernatant was aspirated, and the cell pellet was resuspended in 1 mL Accutase, and incubated for 30 minutes at 37°C with agitation. All cells were combined following Accutase digestion and filtered using a 70 pm nylon strainer. The filter and swab were washed with RPMI/10% FBS/4 mM EDTA, and all washings combined. Dissociated, filtered cells were centrifuged at 400 g for 10 minutes at 4°C, and resuspended in 200 μL RPMI/10% FBS for counting. Cells were diluted to 20,000 cells in 200 μL for scRNA-seq. For the majority 1140 of swabs, fewer than 20,000 cells total were recovered. In these instances, all cells were input into scRNA-seq.
Detection of Biomarkers
[0103] In one embodiment, the signature genes, biomarkers, and/or cells may be detected by immunofluorescence, immunohistochemistry (IHC), fluorescence activated cell sorting (FACS), mass spectrometry (MS), mass cytometry (CyTOF), RNA-seq, single cell RNA-seq (described further herein), quantitative RT-PCR, single cell qPCR, FISH, RNA-FISH, MERFISH (multiplex (in situ) RNA FISH) (Chen et al., Spatially resolved, highly multiplexed RNA profiling in single cells. Science, 2015, 348:aaa6090; and Xia et al., Multiplexed detection of RNA using MERFISH and branched DNA amplification. Sci Rep. 2019 May 22;9(1):7721. doi: 10.1038/s41598-019- 43943-8), ExSeq (Alon, S. etal. Expansion Sequencing: Spatially Precise In Situ Transcriptomics in Intact Biological Systems, biorxiv.org/lookup/doi/10.1101/2020.05.13.094268 (2020) doi:10.1101/2020.05.13.094268), and/or by in situ hybridization. Other methods including absorbance assays and colorimetric assays are known in the art and may be used herein, detection may comprise primers and/or probes or fluorescently bar-coded oligonucleotide probes for hybridization to RNA (see e.g., Geiss GK, et al., Direct multiplexed measurement of gene expression with color-coded probe pairs. Nat Biotechnol. 2008 Mar;26(3):317-25).
[0104] In certain embodiments, a tissue sample may be obtained and analyzed for specific cell markers (IHC) or specific transcripts (e.g., RNA-FISH). Tissue samples for diagnosis, prognosis or detecting may be obtained by endoscopy. In one embodiment, a sample may be obtained by endoscopy and analyzed by FACS. As used herein, “endoscopy” refers to a procedure that uses an endoscope to examine the interior of a hollow organ or cavity of the body. The endoscope may include a camera and a light source. The endoscope may include tools for dissection or for obtaining a biological sample (e.g., a biopsy).
[0105] The present invention also may comprise a kit with a detection reagent that binds to one or more biomarkers or can be used to detect one or more biomarkers.
Immunoassays
[0106] Immunoassay methods are based on the reaction of an antibody to its corresponding target or analyte and can detect the analyte in a sample depending on the specific assay format. To improve specificity and sensitivity of an assay method based on immunoreactivity, monoclonal antibodies are often used because of their specific epitope recognition. Polyclonal antibodies have also been successfully used in various immunoassays because of their increased affinity for the target as compared to monoclonal antibodies Immunoassays have been designed for use with a wide range of biological sample matrices Immunoassay formats have been designed to provide qualitative, semi-quantitative, and quantitative results.
[0107] Quantitative results may be generated through the use of a standard curve created with known concentrations of the specific analyte to be detected. The response or signal from an unknown sample is plotted onto the standard curve, and a quantity or value corresponding to the target in the unknown sample is established.
[0108] Numerous immunoassay formats have been designed. ELISA or EIA can be quantitative for the detection of an analyte/biomarker. This method relies on attachment of a label to either the analyte or the antibody and the label component includes, either directly or indirectly, an enzyme. ELISA tests may be formatted for direct, indirect, competitive, or sandwich detection of the analyte. Other methods rely on labels such as, for example, radioisotopes (I125) or fluorescence. Additional techniques include, for example, agglutination, nephelometry, turbidimetry, Western blot, immunoprecipitation, immunocytochemistry, immunohistochemistry, flow cytometry, Luminex assay, and others (see ImmunoAssay : A Practical Guide, edited by Brian Law, published by Taylor & Francis, Ltd., 2005 edition).
[0109] Exemplary assay formats include enzyme-linked immunosorbent assay (ELISA), radioimmunoassay, fluorescent, chemiluminescence, and fluorescence resonance energy transfer (FRET) or time resolved-FRET (TR-FRET) immunoassays. Examples of procedures for detecting biomarkers include biomarker immunoprecipitation followed by quantitative methods that allow size and peptide level discrimination, such as gel electrophoresis, capillary electrophoresis, planar electrochromatography, and the like.
[0110] Methods of detecting and/or quantifying a detectable label or signal generating material depend on the nature of the label. The products of reactions catalyzed by appropriate enzymes (where the detectable label is an enzyme; see above) can be, without limitation, fluorescent, luminescent, or radioactive or they may absorb visible or ultraviolet light. Examples of detectors suitable for detecting such detectable labels include, without limitation, x-ray film, radioactivity counters, scintillation counters, spectrophotometers, colorimeters, fluorometers, luminometers, and densitometers.
[0111] Any of the methods for detection can be performed in any format that allows for any suitable preparation, processing, and analysis of the reactions. This can be, for example, in multi- well assay plates (e.g., 96 wells or 384 wells) or using any suitable array or microarray. Stock solutions for various agents can be made manually or robotically, and all subsequent pipetting, diluting, mixing, distribution, washing, incubating, sample readout, data collection and analysis can be done robotically using commercially available analysis software, robotics, and detection instrumentation capable of detecting a detectable label.
Hybridization assays
[0112] Such applications are hybridization assays in which a nucleic acid that displays “probe” nucleic acids for each of the genes to be assayed/profiled in the profile to be generated is employed. In these assays, a sample of target nucleic acids is first prepared from the initial nucleic acid sample being assayed, where preparation may include labeling of the target nucleic acids with a label, e.g., a member of a signal producing system. Following target nucleic acid sample preparation, the sample is contacted with the array under hybridization conditions, whereby complexes are formed between target nucleic acids that are complementary to probe sequences attached to the array surface. The presence of hybridized complexes is then detected, either qualitatively or quantitatively. Specific hybridization technology which may be practiced to generate the expression profiles employed in the subject methods includes the technology described in U.S. Pat. Nos. 5,143,854; 5,288,644; 5,324,633; 5,432,049; 5,470,710; 5,492,806; 5,503,980; 5,510,270; 5,525,464; 5,547,839; 5,580,732; 5,661,028; 5,800,992; the disclosures of which are herein incorporated by reference; as well as WO 95/21265; WO 96/31622; WO 97/10365; WO 97/27317; EP 373 203; and EP 785 280. In these methods, an array of “probe” nucleic acids that includes a probe for each of the biomarkers whose expression is being assayed is contacted with target nucleic acids as described above. Contact is carried out under hybridization conditions, e.g., stringent hybridization conditions as described above, and unbound nucleic acid is then removed. The resultant pattern of hybridized nucleic acids provides information regarding expression for each of the biomarkers that have been probed, where the expression information is in terms of whether or not the gene is expressed and, typically, at what level, where the expression data, i.e., expression profile, may be both qualitative and quantitative.
[0113] Optimal hybridization conditions will depend on the length (e.g., oligomer vs. polynucleotide greater than 200 bases) and type (e.g., RNA, DNA, PNA) of labeled probe and immobilized polynucleotide or oligonucleotide. General parameters for specific (i.e., stringent) hybridization conditions for nucleic acids are described in Sambrook et al., supra, and in Ausubel et al., “Current Protocols in Molecular Biology”, Greene Publishing and Wiley-interscience, NY (1987), which is incorporated in its entirety for all purposes. When the cDNA microarrays are used, typical hybridization conditions are hybridization in 5xSSC plus 0.2% SDS at 65C for 4 hours followed by washes at 25°C in low stringency wash buffer (lxSSC plus 0.2% SDS) followed by 10 minutes at 25°C in high stringency wash buffer (0.1 SSC plus 0.2% SDS) (see Shena et al., Proc. Natl. Acad. Sci. USA, Vol. 93, p. 10614 (1996)). Useful hybridization conditions are also provided in, e.g., Tijessen, Hybridization With Nucleic Acid Probes”, Elsevier Science Publishers B.V. (1993) and Kricka, “Nonisotopic DNA Probe Techniques”, Academic Press, San Diego, Calif. (1992). Single cell sequencing
[0114] In certain embodiments, the invention involves single cell RNA sequencing (see, e.g., Kalisky, T., Blainey, P. & Quake, S. R. Genomic Analysis at the Single-Cell Level. Annual review of genetics 45, 431-445, (2011); Kalisky, T. & Quake, S. R. Single-cell genomics. Nature Methods 8, 311-314 (2011); Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Research, (2011); Tang, F. et al. RNA-Seq analysis to capture the transcriptome landscape of a single cell. Nature Protocols 5, 516-535, (2010); Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nature Methods 6, 377-382, (2009); Ramskold, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nature Biotechnology 30, 777-782, (2012); and Hashimshony, T., Wagner, F., Sher, N. & Yanai, I. CEL-Seq: Single-Cell RNA-Seq by Multiplexed Linear Amplification. Cell Reports, Cell Reports, Volume 2, Issue 3, p666-673, 2012).
[0115] In certain embodiments, the invention involves plate based single cell RNA sequencing (see, e.g., Picelli, S. et al., 2014, “Full-length RNA-seq from single cells using Smart-seq2” Nature protocols 9, 171-181, doi:10.1038/nprot.2014.006).
[0116] In certain embodiments, the invention involves high-throughput single-cell RNA-seq. In this regard reference is made to Macosko et al., 2015, “Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets” Cell 161, 1202-1214; International patent application number PCT/US2015/049178, published as WO2016/040476 on March 17, 2016; Klein et al., 2015, “Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells” Cell 161, 1187-1201; International patent application number PCT/US2016/027734, published as WO2016168584A1 on October 20, 2016; Zheng, et al., 2016, “Haplotyping germline and cancer genomes with high-throughput linked-read sequencing” Nature Biotechnology 34, 303-311; Zheng, et al., 2017, “Massively parallel digital transcriptional profiling of single cells” Nat. Commun. 8, 14049 doi: 10.1038/ncommsl4049; International patent publication number WO2014210353A2; Zilionis, et al., 2017, “Single-cell barcoding and sequencing using droplet microfluidics” Nat Protoc. Jan;12(l):44-73; Cao et al., 2017, “Comprehensive single cell transcriptional profiling of a multicellular organism by combinatorial indexing” bioRxiv preprint first posted online Feb. 2, 2017, doi: dx.doi.org/10.1101/104844; Rosenberg et al., 2017, “Scaling single cell transcriptomics through split pool barcoding” bioRxiv preprint first posted online Feb. 2, 2017, doi: dx.doi.org/10.1101/105163; Rosenberg et al., “Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding” Science 15 Mar 2018; Vitak, et al., “Sequencing thousands of single-cell genomes with combinatorial indexing” Nature Methods, 14(3):302-308, 2017; Cao, et al., Comprehensive single-cell transcriptional profiling of a multicellular organism. Science, 357(6352):661-667, 2017; Gierahn et al., “Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput” Nature Methods 14, 395-398 (2017); and Hughes, et al., “Highly Efficient, Massively-Parallel Single-Cell RNA-Seq Reveals Cellular States and Molecular Features of Human Skin Pathology” bioRxiv 689273; doi: doi.org/10.1101/689273, all the contents and disclosure of each of which are herein incorporated by reference in their entirety.
[0117] In certain embodiments, the invention involves single nucleus RNA sequencing. In this regard reference is made to Swiech et al., 2014, “In vivo interrogation of gene function in the mammalian brain using CRISPR-Cas9” Nature Biotechnology Vol. 33, pp. 102-106; Habib et al., 2016, “Div-Seq: Single-nucleus RNA-Seq reveals dynamics of rare adult newborn neurons” Science, Vol. 353, Issue 6302, pp. 925-928; Habib et al., 2017, “Massively parallel single-nucleus RNA-seq with DroNc-seq” Nat Methods. 2017 Oct;14(10):955-958; International Patent Application No. PCT/US2016/059239, published as WO2017164936 on September 28, 2017; International Patent Application No.PCT/US2018/060860, published as WO/2019/094984 on May 16, 2019; International Patent Application No. PCT/US2019/055894, published as
WO/2020/077236 on April 16, 2020; and Drokhlyansky, et al., “The enteric nervous system of the human and mouse colon at a single-cell resolution,” bioRxiv 746743; doi: doi.org/10.1101/746743, which are herein incorporated by reference in their entirety.
MS methods
[0118] Biomarker detection may also be evaluated using mass spectrometry methods. A variety of configurations of mass spectrometers can be used to detect biomarker values. Several types of mass spectrometers are available or can be produced with various configurations. In general, a mass spectrometer has the following major components: a sample inlet, an ion source, a mass analyzer, a detector, a vacuum system, and instrument-control system, and a data system. Difference in the sample inlet, ion source, and mass analyzer generally define the type of instrument and its capabilities. For example, an inlet can be a capillary-column liquid chromatography source or can be a direct probe or stage such as used in matrix-assisted laser desorption. Common ion sources are, for example, electrospray, including nanospray and microspray or matrix-assisted laser desorption. Common mass analyzers include a quadrupole mass filter, ion trap mass analyzer and time-of-flight mass analyzer. Additional mass spectrometry methods are well known in the art (see Burlingame et al., Anal. Chem. 70:647 R-716R (1998); Kinter and Sherman, New York (2000)).
[0119] Protein biomarkers and biomarker values can be detected and measured by any of the following: electrospray ionization mass spectrometry (ESI-MS), ESI-MS/MS, ESI-MS/(MS)n, matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS), surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF-MS), desorption/ionization on silicon (DIOS), secondary ion mass spectrometry (SIMS), quadrupole time-of-flight (Q-TOF), tandem time-of-flight (TOF/TOF) technology, called ultraflex III TOF/TOF, atmospheric pressure chemical ionization mass spectrometry (APCI-MS), APCI- MS/MS, APCI-(MS).sup.N, atmospheric pressure photoionization mass spectrometry (APPI-MS), APPI-MS/MS, and APPI-(MS).sup.N, quadrupole mass spectrometry, Fourier transform mass spectrometry (FTMS), quantitative mass spectrometry, and ion trap mass spectrometry.
[0120] Sample preparation strategies are used to label and enrich samples before mass spectroscopic characterization of protein biomarkers and determination biomarker values. Labeling methods include but are not limited to isobaric tag for relative and absolute quantitation (iTRAQ) and stable isotope labeling with amino acids in cell culture (SILAC). Capture reagents used to selectively enrich samples for candidate biomarker proteins prior to mass spectroscopic analysis include but are not limited to aptamers, antibodies, nucleic acid probes, chimeras, small molecules, an F(ab')2 fragment, a single chain antibody fragment, an Fv fragment, a single chain Fv fragment, a nucleic acid, a lectin, a ligand-binding receptor, affybodies, nanobodies, ankyrins, domain antibodies, alternative antibody scaffolds (e.g. diabodies etc) imprinted polymers, avimers, peptidomimetics, peptoids, peptide nucleic acids, threose nucleic acid, a hormone receptor, a cytokine receptor, and synthetic receptors, and modifications and fragments of these. THERAPEUTIC METHODS Treatment selection
[0121] In certain embodiments, the methods of the present invention are used to select a treatment within the current standard of care and provide for less toxicity and improved treatment. The term “standard of care” as used herein refers to the current treatment that is accepted by medical experts as a proper treatment for a certain type of disease and that is widely used by healthcare professionals. Standard of care is also called best practice, standard medical care, and standard therapy. In certain embodiments, a subject having a mild or moderate phenotype will recover without any treatment. In certain embodiments, a subject having a severe phenotype requires treatment in order to recover. In certain embodiments, severe subjects or subjects at risk for severe disease as determined by detecting cell subsets and/or differentially expressed genes are treated with one or more agents as described further herein. In certain embodiments, subjects already suffering from severe disease are treated. In certain embodiments, subjects at risk for severe disease are treated. In certain embodiments, the treatment results in induction of a phenotype identified in mild/moderate subjects (e.g., antiviral response).
[0122] As used herein, “treatment” or “treating,” or “palliating” or “ameliorating” are used interchangeably. These terms refer to an approach for obtaining beneficial or desired results including but not limited to a therapeutic benefit and/or a prophylactic benefit. By therapeutic benefit is meant any therapeutically relevant improvement in or effect on one or more diseases, conditions, or symptoms under treatment. For prophylactic benefit, the compositions may be administered to a subject at risk of developing a particular disease, condition, or symptom, or to a subject reporting one or more of the physiological symptoms of a disease, even though the disease, condition, or symptom may not have yet been manifested. As used herein “treating” includes ameliorating, curing, preventing it from becoming worse, slowing the rate of progression, or preventing the disorder from re-occurring (i.e., to prevent a relapse).
[0123] In certain embodiments, the therapeutic agents are administered in an effective amount or therapeutically effective amount. The term “effective amount” or “therapeutically effective amount” refers to the amount of an agent that is sufficient to effect beneficial or desired results. The therapeutically effective amount may vary depending upon one or more of: the subject and disease condition being treated, the weight and age of the subject, the severity of the disease condition, the manner of administration and the like, which can readily be determined by one of ordinary skill in the art. The term also applies to a dose that will provide an image for detection by any one of the imaging methods described herein. The specific dose may vary depending on one or more of: the particular agent chosen, the dosing regimen to be followed, whether it is administered in combination with other compounds, timing of administration, the tissue to be imaged, and the physical delivery system in which it is carried.
Therapeutic Agents
[0124] In certain embodiments, the present invention provides for one or more therapeutic agents capable of shifting a phenotype as described herein. In certain embodiments, the present invention provides for one or more therapeutic agents against one or more of the targets identified. In certain embodiments, the one or more agents comprises a small molecule inhibitor, small molecule degrader (e.g., ATTEC, AUTAC, LYTAC, or PROTAC), genetic modifying agent, antibody, antibody fragment, antibody-like protein scaffold, aptamer, protein, or any combination thereof.
[0125] The terms “therapeutic agent”, “therapeutic capable agent” or “treatment agent” are used interchangeably and refer to a molecule or compound that confers some beneficial effect upon administration to a subject. The beneficial effect includes enablement of diagnostic determinations; amelioration of a disease, symptom, disorder, or pathological condition; reducing or preventing the onset of a disease, symptom, disorder or condition; and generally counteracting a disease, symptom, disorder or pathological condition.
[0126] In certain embodiments, the therapeutic agents are administered in an effective amount or therapeutically effective amount. The term “effective amount” or “therapeutically effective amount” refers to the amount of an agent that is sufficient to effect beneficial or desired results. The therapeutically effective amount may vary depending upon one or more of: the subject and disease condition being treated, the weight and age of the subject, the severity of the disease condition, the manner of administration and the like, which can readily be determined by one of ordinary skill in the art. The term also applies to a dose that will provide an image for detection by any one of the imaging methods described herein. The specific dose may vary depending on one or more of: the particular agent chosen, the dosing regimen to be followed, whether it is administered in combination with other compounds, timing of administration, the tissue to be imaged, and the physical delivery system in which it is carried.
[0127] In certain embodiments, an agent against one of the targets is used in combination with a treatment already be known or used clinically. In certain embodiments, targeting the combination may require less of the agent as compared to the current standard of care and provide for less toxicity and improved treatment.
Antiviral
[0128] In certain embodiments, the one or more agent is an antiviral. In certain embodiments, an antiviral inhibits viral replication. In certain embodiments, the antiviral is paxlovid. The U.S. Food and Drug Administration issued an emergency use authorization (EUA) for Pfizer’ s Paxlovid (nirmatrelvir tablets and ritonavir tablets, co-packaged for oral use) for the treatment of mild-to- moderate coronavirus disease (COVID-19) in adults and pediatric patients (12 years of age and older weighing at least 40 kilograms or about 88 pounds) with positive results of direct SARS- CoV-2 testing, and who are at high risk for progression to severe COVID-19, including hospitalization or death (Paxlovid EUA Letter of Authorization issued December 22, 2021). In certain embodiments, the antiviral is molnupiravir. The U.S. Food and Drug Administration issued an emergency use authorization (EUA) for Merck's molnupiravir for the treatment of mild-to- moderate coronavirus disease (COVID-19) in adults with positive results of direct SARS-CoV-2 viral testing, and who are at high risk for progression to severe COVID-19, including hospitalization or death, and for whom alternative COVID-19 treatment options authorized by the FDA are not accessible or clinically appropriate (Molnupiravir EUA Letter of Authorization issued February 11, 2022). In certain embodiments, the antiviral is Remdesivir.
Immune-Based Therapy
[0129] In certain embodiments, the one or more agent is immune-based therapy. In certain embodiments, the immune-based therapy is a blood-derived product. In certain embodiments, the blood-derived product is convalescent plasma. In certain embodiments, the blood-derived product is immunoglobulin. In certain embodiments, the immune-based therapy is immunoglobin. In certain embodiments, the immune-based therapy is one or more of: a corticosteroid, a glucocorticoid, an interferon, an interferon Type I agonist, an interleukin-1 inhibitor, an interleukin-6 inhibitor, a kinase inhibitor, and a TLR agonist. In certain embodiments, the corticosteroid comprises at least one of: methylprednisolone, hydrocortisone, and dexamethasone. In certain embodiments, the glucocorticoid comprises at least one of: cortisone, prednisone, prednisolone, methylprednisolone, dexamethasone, betamethasone, triamcinolone, Fludrocortisone acetate, deoxycorticosterone acetate, and hydrocortisone. In certain embodiments, the interferon comprises at least one or more of: interferon beta-lb and interferon alpha-2b. In certain embodiments, the interleukin- 1 inhibitor comprises anakinra. In certain embodiments, the interleukin-6 inhibitor comprises at least one or more of: anti-interleukin-6 receptor monoclonal antibodies and anti-interleukin-6 monoclonal antibody. In certain embodiments, the anti- interleukin-6 receptor monoclonal antibody is tocilizumab. In certain embodiments, the anti- interleukin-6 monoclonal antibody is siltuximab. In certain embodiments, the kinase inhibitor comprises of at least one or more of Bruton's tyrosine kinase inhibitor and Janus kinase inhibitor. In certain embodiments, the Bruton's tyrosine kinase inhibitor comprises at least one or more of: acalabrutinib, ibrutinib, and zanubrutinib. In certain embodiments, the Janus kinase inhibitor comprises at least one or more of: baracitinib, ruxolitinib and tofacitinib. In certain embodiments, the TLR agonist comprises at least one or more of: imiquimod, BCG, and MPL.
Other Treatment Options
[0130] In certain embodiments, the treatment comprises inhibiting cholesterol biosynthesis. In certain embodiments, inhibiting cholesterol biosynthesis comprises administering HMG-CoA reductase inhibitors. in certain embodiments, the HMG-CoA reductase inhibitor comprises at least one or more of: simvastatin atorvastatin, lovastatin, pravastatin, fluvastatin, rosuvastatin, pitavastatin. In certain embodiments, wherein the treatment comprises one or more agents capable of shifting epithelial cells to express an antiviral signature. In certain embodiments, the treatment comprises one or more agents capable of suppressing a myeloid inflammatory response. Antibodies
[0131] In certain embodiments, the one or more agent is an antibody. In certain embodiments, an antibody targets one or more surface genes or polypeptides. The term “antibody” is used interchangeably with the term “immunoglobulin” herein, and includes intact antibodies, fragments of antibodies, e.g., Fab, F(ab')2 fragments, and intact antibodies and fragments that have been mutated either in their constant and/or variable region (e.g., mutations to produce chimeric, partially humanized, or fully humanized antibodies, as well as to produce antibodies with a desired trait, e.g., enhanced binding and/or reduced FcR binding). The term “fragment” refers to a part or portion of an antibody or antibody chain comprising fewer amino acid residues than an intact or complete antibody or antibody chain. Fragments can be obtained via chemical or enzymatic treatment of an intact or complete antibody or antibody chain. Fragments can also be obtained by recombinant means. Exemplary fragments include Fab, Fab', F(ab')2, Fabc, Fd, dAb, VHH and scFv and/or Fv fragments.
[0132] As used herein, a preparation of antibody protein having less than about 50% of non- antibody protein (also referred to herein as a “contaminating protein”), or of chemical precursors, is considered to be “substantially free.” 40%, 30%, 20%, 10% and more preferably 5% (by dry weight), of non-antibody protein, or of chemical precursors is considered to be substantially free. When the antibody protein or biologically active portion thereof is recombinantly produced, it is also preferably substantially free of culture medium, i.e., culture medium represents less than about 30%, preferably less than about 20%, more preferably less than about 10%, and most preferably less than about 5% of the volume or mass of the protein preparation.
[0133] The term “antigen-binding fragment” refers to a polypeptide fragment of an immunoglobulin or antibody that binds antigen or competes with intact antibody (i.e., with the intact antibody from which they were derived) for antigen binding (i.e., specific binding). As such these antibodies or fragments thereof are included in the scope of the invention, provided that the antibody or fragment binds specifically to a target molecule.
[0134] It is intended that the term “antibody” encompass any Ig class or any Ig subclass (e.g., the IgG1, IgG2, IgG3, and IgG4 subclasses of IgG) obtained from any source (e.g., humans and non-human primates, and in rodents, lagomorphs, caprines, bovines, equines, ovines, etc.).
[0135] The term “Ig class” or “immunoglobulin class”, as used herein, refers to the five classes of immunoglobulin that have been identified in humans and higher mammals, IgG, IgM, IgA, IgD, and IgE. The term “Ig subclass” refers to the two subclasses of IgM (H and L), three subclasses of IgA (IgA1, IgA2, and secretory IgA), and four subclasses of IgG (IgG1, IgG2, IgG3, and IgG4) that have been identified in humans and higher mammals. The antibodies can exist in monomeric or polymeric form; for example, IgM antibodies exist in pentameric form, and IgA antibodies exist in monomeric, dimeric or multimeric form. [0136] The term “IgG subclass” refers to the four subclasses of immunoglobulin class IgG - IgG1, IgG2, IgG3, and IgG4 that have been identified in humans and higher mammals by the heavy chains of the immunoglobulins, VI - g4, respectively. The term “single-chain immunoglobulin” or “single-chain antibody” (used interchangeably herein) refers to a protein having a two- polypeptide chain structure consisting of a heavy and a light chain, said chains being stabilized, for example, by interchain peptide linkers, which has the ability to specifically bind antigen. The term “domain” refers to a globular region of a heavy or light chain polypeptide comprising peptide loops (e.g., comprising 3 to 4 peptide loops) stabilized, for example, by b pleated sheet and/or intrachain disulfide bond. Domains are further referred to herein as “constant” or “variable”, based on the relative lack of sequence variation within the domains of various class members in the case of a “constant” domain, or the significant variation within the domains of various class members in the case of a “variable” domain. Antibody or polypeptide “domains” are often referred to interchangeably in the art as antibody or polypeptide “regions”. The “constant” domains of an antibody light chain are referred to interchangeably as “light chain constant regions”, “light chain constant domains”, “CL” regions or “CL” domains. The “constant” domains of an antibody heavy chain are referred to interchangeably as “heavy chain constant regions”, “heavy chain constant domains”, “CH” regions or “CH” domains). The “variable” domains of an antibody light chain are referred to interchangeably as “light chain variable regions”, “light chain variable domains”, “VL” regions or “VL” domains). The “variable” domains of an antibody heavy chain are referred to interchangeably as “heavy chain constant regions”, “heavy chain constant domains”, “VH” regions or “VH” domains).
[0137] The term “region” can also refer to a part or portion of an antibody chain or antibody chain domain (e.g., a part or portion of a heavy or light chain or a part or portion of a constant or variable domain, as defined herein), as well as more discrete parts or portions of said chains or domains. For example, light and heavy chains or light and heavy chain variable domains include “complementarity determining regions” or “CDRs” interspersed among “framework regions” or “FRs”, as defined herein.
[0138] The term “conformation” refers to the tertiary structure of a protein or polypeptide (e.g., an antibody, antibody chain, domain or region thereof). For example, the phrase “light (or heavy) chain conformation” refers to the tertiary structure of a light (or heavy) chain variable region, and the phrase “antibody conformation” or “antibody fragment conformation” refers to the tertiary structure of an antibody or fragment thereof.
[0139] The term “antibody-like protein scaffolds” or “engineered protein scaffolds” broadly encompasses proteinaceous non-immunoglobulin specific-binding agents, typically obtained by combinatorial engineering (such as site-directed random mutagenesis in combination with phage display or other molecular selection techniques). Usually, such scaffolds are derived from robust and small soluble monomeric proteins (such as Kunitz inhibitors or lipocalins) or from a stably folded extra-membrane domain of a cell surface receptor (such as protein A, fibronectin or the ankyrin repeat).
[0140] Such scaffolds have been extensively reviewed in Binz et al. (Engineering novel binding proteins from nonimmunoglobulin domains. Nat Biotechnol 2005, 23:1257-1268), Gebauer and Skerra (Engineered protein scaffolds as next-generation antibody therapeutics. Curr Opin Chem Biol. 2009, 13:245-55), Gill and Damle (Biopharmaceutical drug discovery using novel protein scaffolds. Curr Opin Biotechnol 2006, 17:653-658), Skerra (Engineered protein scaffolds for molecular recognition. J Mol Recognit 2000, 13:167-187), and Skerra (Alternative non-antibody scaffolds for molecular recognition. Curr Opin Biotechnol 2007, 18:295-304), and include without limitation affibodies, based on the Z-domain of staphylococcal protein A, a three- helix bundle of 58 residues providing an interface on two of its alpha-helices (Nygren, Alternative binding proteins: Affibody binding proteins developed from a small three-helix bundle scaffold. FEBS J 2008, 275:2668-2676); engineered Kunitz domains based on a small (ca. 58 residues) and robust, disulphide-crosslinked serine protease inhibitor, typically of human origin (e.g., LACI- Dl), which can be engineered for different protease specificities (Nixon and Wood, Engineered protein inhibitors of proteases. Curr Opin Drug Discov Dev 2006, 9:261-268); monobodies or adnectins based on the 10th extracellular domain of human fibronectin III (10Fn3), which adopts an Ig-like beta-sandwich fold (94 residues) with 2-3 exposed loops, but lacks the central disulphide bridge (Koide and Koide, Monobodies: antibody mimics based on the scaffold of the fibronectin type III domain. Methods Mol Biol 2007, 352:95-109); anticalins derived from the lipocalins, a diverse family of eight-stranded beta-barrel proteins (ca. 180 residues) that naturally form binding sites for small ligands by means of four structurally variable loops at the open end, which are abundant in humans, insects, and many other organisms (Skerra, Alternative binding proteins: Anticalins — harnessing the structural plasticity of the lipocalin ligand pocket to engineer novel binding activities. FEBS J 2008, 275:2677-2683); DARPins, designed ankyrin repeat domains (166 residues), which provide a rigid interface arising from typically three repeated beta-turns (Stumpp et al., DARPins: a new generation of protein therapeutics. Drug Discov Today 2008, 13:695-701); avimers (multimerized LDLR-A module) (Silverman et al., Multivalent avimer proteins evolved by exon shuffling of a family of human receptor domains. Nat Biotechnol 2005, 23:1556-1561); and cysteine-rich knottin peptides (Kolmar, Alternative binding proteins: biological activity and therapeutic potential of cystine-knot miniproteins. FEBS J 2008, 275:2684- 2690).
[0141] “Specific binding” of an antibody means that the antibody exhibits appreciable affinity for a particular antigen or epitope and, generally, does not exhibit significant cross reactivity. “Appreciable” binding includes binding with an affinity of at least 25 mM. Antibodies with affinities greater than 1 × 107 M-1 (or a dissociation coefficient of ImM or less or a dissociation coefficient of lnm or less) typically bind with correspondingly greater specificity. Values intermediate of those set forth herein are also intended to be within the scope of the present invention and antibodies of the invention bind with a range of affinities, for example, 100nM or less, 75nM or less, 50nM or less, 25nM or less, for example 10nM or less, 5nM or less, 1nM or less, or in embodiments 500pM or less, 100pM or less, 50pM or less or 25pM or less. An antibody that “does not exhibit significant crossreactivity” is one that will not appreciably bind to an entity other than its target (e.g., a different epitope or a different molecule). For example, an antibody that specifically binds to a target molecule will appreciably bind the target molecule but will not significantly react with non-target molecules or peptides. An antibody specific for a particular epitope will, for example, not significantly crossreact with remote epitopes on the same protein or peptide. Specific binding can be determined according to any art-recognized means for determining such binding. Preferably, specific binding is determined according to Scatchard analysis and/or competitive binding assays.
[0142] As used herein, the term “affinity” refers to the strength of the binding of a single antigen-combining site with an antigenic determinant. Affinity depends on the closeness of stereochemical fit between antibody combining sites and antigen determinants, on the size of the area of contact between them, on the distribution of charged and hydrophobic groups, etc. Antibody affinity can be measured by equilibrium dialysis or by the kinetic BIACORE™ method. The dissociation constant, Kd, and the association constant, Ka, are quantitative measures of affinity.
[0143] As used herein, the term “monoclonal antibody” refers to an antibody derived from a clonal population of antibody-producing cells (e.g., B lymphocytes or B cells) which is homogeneous in structure and antigen specificity. The term “polyclonal antibody” refers to a plurality of antibodies originating from different clonal populations of antibody-producing cells which are heterogeneous in their structure and epitope specificity, but which recognize a common antigen. Monoclonal and polyclonal antibodies may exist within bodily fluids, as crude preparations, or may be purified, as described herein.
[0144] The term “binding portion” of an antibody (or “antibody portion”) includes one or more complete domains, e.g., a pair of complete domains, as well as fragments of an antibody that retain the ability to specifically bind to a target molecule. It has been shown that the binding function of an antibody can be performed by fragments of a full-length antibody. Binding fragments are produced by recombinant DNA techniques, or by enzymatic or chemical cleavage of intact immunoglobulins. Binding fragments include Fab, Fab', F(ab')2, Fabc, Fd, dAb, Fv, single chains, single-chain antibodies, e.g., scFv, and single domain antibodies.
[0145] “Humanized” forms of non-human (e.g., murine) antibodies are chimeric antibodies that contain minimal sequence derived from non-human immunoglobulin. For the most part, humanized antibodies are human immunoglobulins (recipient antibody) in which residues from a hypervariable region of the recipient are replaced by residues from a hypervariable region of a non-human species (donor antibody) such as mouse, rat, rabbit, or nonhuman primate having the desired specificity, affinity, and capacity. In some instances, FR residues of the human immunoglobulin are replaced by corresponding non-human residues. Furthermore, humanized antibodies may comprise residues that are not found in the recipient antibody or in the donor antibody. These modifications are made to further refine antibody performance. In general, the humanized antibody will comprise substantially all of at least one, and typically two, variable domains, in which all or substantially all of the hypervariable regions correspond to those of a nonhuman immunoglobulin and all or substantially all of the FR regions are those of a human immunoglobulin sequence. The humanized antibody optionally also will comprise at least a portion of an immunoglobulin constant region (Fc), typically that of a human immunoglobulin. [0146] Examples of portions of antibodies or epitope-binding proteins encompassed by the present definition include: (i) the Fab fragment, having VL, CL, VH and CHI domains; (ii) the Fab' fragment, which is a Fab fragment having one or more cysteine residues at the C-terminus of the CHI domain; (iii) the Fd fragment having VH and CHI domains; (iv) the Fd' fragment having VH and CHI domains and one or more cysteine residues at the C-terminus of the CHI domain; (v) the Fv fragment having the VL and VH domains of a single arm of an antibody; (vi) the dAb fragment (Ward et al., 341 Nature 544 (1989)) which consists of a VH domain or a VL domain that binds antigen; (vii) isolated CDR regions or isolated CDR regions presented in a functional framework; (viii) F(ab')2 fragments which are bivalent fragments including two Fab' fragments linked by a disulphide bridge at the hinge region; (ix) single chain antibody molecules (e.g., single chain Fv; scFv) (Bird et al., 242 Science 423 (1988); and Huston et al., 85 PNAS 5879 (1988)); (x) “diabodies” with two antigen binding sites, comprising a heavy chain variable domain (VH) connected to a light chain variable domain (VL) in the same polypeptide chain (see, e.g., EP 404,097; WO 93/11161; Hollinger et al., 90 PNAS 6444 (1993)); (xi) “linear antibodies” comprising a pair of tandem Fd segments (VH-Chl-VH-Chl) which, together with complementary light chain polypeptides, form a pair of antigen binding regions (Zapata et al., Protein Eng. 8(10): 1057-62 (1995); and U.S. Patent No. 5,641,870).
[0147] As used herein, a “blocking” antibody or an antibody “antagonist” is one which inhibits or reduces biological activity of the antigen(s) it binds (e.g., CD160). In certain embodiments, the blocking antibodies or antagonist antibodies or portions thereof described herein completely inhibit the biological activity of the antigen(s).
[0148] Antibodies may act as agonists or antagonists of the recognized polypeptides. For example, the present invention includes antibodies which disrupt receptor/ligand interactions either partially or fully. The invention features both receptor-specific antibodies and ligand- specific antibodies. The invention also features receptor-specific antibodies which do not prevent ligand binding but prevent receptor activation. Receptor activation (i.e., signaling) may be determined by techniques described herein or otherwise known in the art. For example, receptor activation can be determined by detecting the phosphorylation (e.g., tyrosine or serine/threonine) of the receptor or of one of its down-stream substrates by immunoprecipitation followed by western blot analysis. In specific embodiments, antibodies are provided that inhibit ligand activity or receptor activity by at least 95%, at least 90%, at least 85%, at least 80%, at least 75%, at least 70%, at least 60%, or at least 50% of the activity in absence of the antibody.
[0149] The invention also features receptor-specific antibodies which both prevent ligand binding and receptor activation as well as antibodies that recognize the receptor-ligand complex. Likewise, encompassed by the invention are neutralizing antibodies which bind the ligand and prevent binding of the ligand to the receptor, as well as antibodies which bind the ligand, thereby preventing receptor activation, but do not prevent the ligand from binding the receptor. Further included in the invention are antibodies which activate the receptor. These antibodies may act as receptor agonists, i.e., potentiate or activate either all or a subset of the biological activities of the ligand-mediated receptor activation, for example, by inducing dimerization of the receptor. The antibodies may be specified as agonists, antagonists or inverse agonists for biological activities comprising the specific biological activities of the peptides disclosed herein. The antibody agonists and antagonists can be made using methods known in the art. See, e.g., PCT publication WO 96/40281; U.S. Pat. No. 5,811,097; Deng et al., Blood 92(6): 1981-1988 (1998); Chen et al., Cancer Res. 58(16):3668-3678 (1998); Harrop et al., J. Immunol. 161(4): 1786-1794 (1998); Zhu et al., Cancer Res. 58(15):3209-3214 (1998); Yoon et al., J. Immunol. 160(7):3170-3179 (1998); Prat et al., J. Cell. Sci. Ill (Pt2):237-247 (1998); Pitard et al., J. Immunol. Methods 205(2): 177-190 (1997); Liautard et al., Cytokine 9(4):233-241 (1997); Carlson et al., J. Biol. Chem. 272(17): 11295-11301 (1997); Taryman et al., Neuron 14(4):755-762 (1995); Muller et al., Structure 6(9): 1153-1167 (1998); Bartunek et al., Cytokine 8(1): 14-20 (1996).
[0150] The antibodies as defined for the present invention include derivatives that are modified, i.e., by the covalent attachment of any type of molecule to the antibody such that covalent attachment does not prevent the antibody from generating an anti -idiotypic response. For example, but not by way of limitation, the antibody derivatives include antibodies that have been modified, e.g., by glycosylation, acetylation, pegylation, phosphylation, amidation, derivatization by known protecting/blocking groups, proteolytic cleavage, linkage to a cellular ligand or other protein, etc. Any of numerous chemical modifications may be carried out by known techniques, including, but not limited to specific chemical cleavage, acetylation, formylation, metabolic synthesis of tunicamycin, etc. Additionally, the derivative may contain one or more non-classical amino acids.
[0151] Simple binding assays can be used to screen for or detect agents that bind to a target protein, or disrupt the interaction between proteins (e.g., a receptor and a ligand). Because certain targets of the present invention are transmembrane proteins, assays that use the soluble forms of these proteins rather than full-length protein can be used, in some embodiments. Soluble forms include, for example, those lacking the transmembrane domain and/or those comprising the IgV domain or fragments thereof which retain their ability to bind their cognate binding partners. Further, agents that inhibit or enhance protein interactions for use in the compositions and methods described herein, can include recombinant peptido-mimetics.
[0152] Detection methods useful in screening assays include antibody-based methods, detection of a reporter moiety, detection of cytokines as described herein, and detection of a gene signature as described herein.
[0153] Another variation of assays to determine binding of a receptor protein to a ligand protein is through the use of affinity biosensor methods. Such methods may be based on the piezoelectric effect, electrochemistry, or optical methods, such as ellipsometry, optical wave guidance, and surface plasmon resonance (SPR).
Bispecific antibodies
[0154] In certain embodiments, bispecific antibodies are used to target specific cell types (e.g., viral infected cells). Bi-specific antigen-binding constructs, e.g., bi-specific antibodies (bsAb) or BiTEs, bind two antigens (see, e.g., Suurs et al., A review of bispecific antibodies and antibody constructs in oncology and clinical challenges. Pharmacol Ther. 2019 Sep;201: 103-119; and Huehls, et al., Bispecific T cell engagers for cancer immunotherapy. Immunol Cell Biol. 2015 Mar; 93(3): 290-296). The bi-specific antigen-binding construct includes two antigen-binding polypeptide constructs, e.g., antigen binding domains. In some embodiments, the antigen-binding construct is derived from known antibodies or antigen-binding constructs. In some embodiments, the antigen- binding polypeptide constructs comprise two antigen binding domains that comprise antibody fragments. In some embodiments, the first antigen binding domain and second antigen binding domain each independently comprises an antibody fragment selected from the group of: an scFv, a Fab, and an Fc domain. The antibody fragments may be the same format or different formats from each other. For example, in some embodiments, the antigen-binding polypeptide constructs comprise a first antigen binding domain comprising an scFv and a second antigen binding domain comprising a Fab. In some embodiments, the antigen-binding polypeptide constructs comprise a first antigen binding domain and a second antigen binding domain, wherein both antigen binding domains comprise an scFv. In some embodiments, the first and second antigen binding domains each comprise a Fab. In some embodiments, the first and second antigen binding domains each comprise an Fc domain. Any combination of antibody formats is suitable for the bi-specific antibody constructs disclosed herein.
Aptamers
[0155] In certain embodiments, the one or more agent is an aptamer. Nucleic acid aptamers are nucleic acid species that have been engineered through repeated rounds of in vitro selection or equivalently, SELEX (systematic evolution of ligands by exponential enrichment) to bind to various molecular targets such as small molecules, proteins, nucleic acids, cells, tissues and organisms. Nucleic acid aptamers have specific binding affinity to molecules through interactions other than classic Watson-Crick base pairing. Aptamers are useful in biotechnological and therapeutic applications as they offer molecular recognition properties similar to antibodies. In addition to their discriminate recognition, aptamers offer advantages over antibodies as they can be engineered completely in a test tube, are readily produced by chemical synthesis, possess desirable storage properties, and elicit little or no immunogenicity in therapeutic applications. In certain embodiments, RNA aptamers may be expressed from a DNA construct. In other embodiments, a nucleic acid aptamer may be linked to another polynucleotide sequence. The polynucleotide sequence may be a double stranded DNA polynucleotide sequence. The aptamer may be covalently linked to one strand of the polynucleotide sequence. The aptamer may be ligated to the polynucleotide sequence. The polynucleotide sequence may be configured, such that the polynucleotide sequence may be linked to a solid support or ligated to another polynucleotide sequence.
[0156] Aptamers, like peptides generated by phage display or monoclonal antibodies (“mAbs”), are capable of specifically binding to selected targets and modulating the target's activity, e.g., through binding, aptamers may block their target's ability to function. A typical aptamer is 10-15 kDa in size (30-45 nucleotides), binds its target with sub-nanomolar affinity, and discriminates against closely related targets (e.g., aptamers will typically not bind other proteins from the same gene family). Structural studies have shown that aptamers are capable of using the same types of binding interactions (e.g., hydrogen bonding, electrostatic complementarity, hydrophobic contacts, steric exclusion) that drives affinity and specificity in antibody-antigen complexes.
[0157] Aptamers have a number of desirable characteristics for use in research and as therapeutics and diagnostics including high specificity and affinity, biological efficacy, and excellent pharmacokinetic properties. In addition, they offer specific competitive advantages over antibodies and other protein biologies. Aptamers are chemically synthesized and are readily sealed as needed to meet production demand for research, diagnostic or therapeutic applications. Aptamers are chemically robust. They are intrinsically adapted to regain activity following exposure to factors such as heat and denaturants and can be stored for extended periods (>1 yr) at room temperature as lyophilized powders. Not being bound by a theory, aptamers bound to a solid support or beads may be stored for extended periods.
[0158] Oligonucleotides in their phosphodiester form may be quickly degraded by intracellular and extracellular enzymes such as endonucleases and exonucleases. Aptamers can include modified nucleotides conferring improved characteristics on the ligand, such as improved in vivo stability or improved delivery characteristics. Examples of such modifications include chemical substitutions at the ribose and/or phosphate and/or base positions. SELEX identified nucleic acid ligands containing modified nucleotides are described, e.g., in U.S. Pat. No. 5,660,985, which describes oligonucleotides containing nucleotide derivatives chemically modified at the 2' position of ribose, 5 position of pyrimidines, and 8 position of purines, U.S. Pat. No. 5,756,703 which describes oligonucleotides containing various 2' -modified pyrimidines, and U.S. Pat. No. 5,580,737 which describes highly specific nucleic acid ligands containing one or more nucleotides modified with 2'-amino (2'-NH2) 2'-fluoro (2'-F) and/or 2'-O-methyl (2'-OMe) substituents. Modifications of aptamers may also include, modifications at exocydic amines, substitution of 4- thiouridine, substitution of 5-bromo or 5-iodo-uracil; backbone modifications, phosphorothioate or allyl phosphate modifications, methylations, and unusual base-pairing combinations such as the isobases isocytidine and isoguanosine. Modifications can also include 3' and 5' modifications such as capping. As used herein, the term phosphorothioate encompasses one or more non-bridging oxygen atoms in a phosphodiester bond replaced by one or more sulfur atoms. In further embodiments, the oligonucleotides comprise modified sugar groups, for example, one or more of the hydroxyl groups is replaced with halogen, aliphatic groups, or functionalized as ethers or amines. In one embodiment, the 2'-position of the furanose residue is substituted by any of an O- methyl, O-alkyl, O-allyl, S-alkyl, S-allyl, or halo group. Methods of synthesis of 2'-modified sugars are described, e.g., in Sproat, et al., Nucl. Acid Res. 19:733-738 (1991); Gotten, et al. ,Nucl. Acid Res. 19:2629-2635 (1991); and Hobbs, et al, Biochemistry' 12:5138-5145 (1973). Other modifications are known to one of ordinary skill in the art. In certain embodiments, aptamers include aptamers with improved off-rates as described in International Patent Publication No. WO 2009012418, “Method for generating aptamers with improved off-rates,” incorporated herein by reference in its entirety. In certain embodiments aptamers are chosen from a library of aptamers. Such libraries include, but are not limited to, those described in Rohloff et al., “Nucleic Acid Ligands With Protein-like Side Chains: Modified Aptamers and Their Use as Diagnostic and Therapeutic Agents,” Molecular Therapy Nucleic Acids (2014) 3, e201. Aptamers are also commercially available (see, e.g., SomaLogic, Inc., Boulder, Colorado). In certain embodiments, the present invention may utilize any aptamer containing any modification as described herein. Small Molecules
[0159] In certain embodiments, the one or more agents is a small molecule. The term “small molecule” refers to compounds, preferably organic compounds, with a size comparable to those organic molecules generally used in pharmaceuticals. The term excludes biological macromolecules (e.g., proteins, peptides, nucleic acids, etc.). Preferred small organic molecules range in size up to about 5000 Da, e.g., up to about 4000, preferably up to 3000 Da, more preferably up to 2000 Da, even more preferably up to about 1000 Da, e.g., up to about 900, 800, 700, 600 or up to about 500 Da. In certain embodiments, the small molecule may act as an antagonist or agonist (e.g., blocking an enzyme active site or activating a receptor by binding to a ligand binding site). [0160] One type of small molecule applicable to the present invention is a degrader molecule (see, e.g., Ding, et al., Emerging New Concepts of Degrader Technologies, Trends Pharmacol Sci. 2020 Jul;41(7):464-474). The terms “degrader” and “degrader molecule” refer to all compounds capable of specifically targeting a protein for degradation (e.g., ATTEC, AUTAC, LYTAC, or PROTAC, reviewed in Ding, et al. 2020). Proteolysis Targeting Chimera (PROTAC) technology is a rapidly emerging alternative therapeutic strategy with the potential to address many of the challenges currently faced in modern drug development programs. PROTAC technology employs small molecules that recruit target proteins for ubiquitination and removal by the proteasome (see, e.g., Zhou et al., Discovery of a Small-Molecule Degrader of Bromodomain and Extra- Terminal (BET) Proteins with Picomolar Cellular Potencies and Capable of Achieving Tumor Regression. J. Med. Chem. 2018, 61, 462-481; Bondeson and Crews, Targeted Protein Degradation by Small Molecules, Annu Rev Pharmacol Toxicol. 2017 Jan 6; 57: 107-123; and Lai et al., Modular PROTAC Design for the Degradation of Oncogenic BCR-ABL Angew Chem Int Ed Engl. 2016 Jan 11; 55(2): 807-810). In certain embodiments, LYTACs are particularly advantageous for cell surface proteins as described herein (e.g., CD160).
Genetic Modifying Agents
[0161] In certain embodiments, the one or more modulating agents may be a genetic modifying agent. The genetic modifying agents may manipulate nucleic acids (e.g., genomic DNA or mRNA). The genetic modulating agent can be used to up- or downregulate expression of a gene either by targeting a nuclease or functional domain to a DNA or RNA sequence. The genetic modifying agent may comprise an RNA-guided nuclease system (e.g., CRISPR system), RNAi system, a zinc finger nuclease, a TALE, or a meganuclease. In certain embodiments, one or more genes capable of shifting cell composition or cell states is modified by a genetic modifying agent (e.g., one or more genes in Tables 1-5). In certain embodiments, a genetic modifying agent is used in subjects already having severe disease.
CRISPR-Cas Modification
[0162] In some embodiments, a polynucleotide of the present invention described elsewhere herein can be modified using a CRISPR-Cas and/or Cas-based system (e.g., genomic DNA or mRNA, preferably, for a disease gene). The nucleotide sequence may be or encode one or more components of a CRISPR-Cas system. For example, the nucleotide sequences may be or encode guide RNAs. The nucleotide sequences may also encode CRISPR proteins, variants thereof, or fragments thereof.
[0163] In general, a CRISPR-Cas or CRISPR system as used herein and in other documents, such as WO 2014/093622 (PCT/US2013/074667), refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g., tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system), or “RNA(s)” as that term is herein used (e.g., RNA(s) to guide Cas, such as Cas9, e.g., CRISPR RNA and transactivating (tracr) RNA or a single guide RNA (sgRNA) (chimeric RNA)) or other sequences and transcripts from a CRISPR locus. In general, a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system). See, e.g., Shmakov et al. (2015) “Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems”, Molecular Cell, DOI: dx.doi.org/10.1016/j.molcel.2015.10.008.
[0164] CRISPR-Cas systems can generally fall into two classes based on their architectures of their effector molecules, which are each further subdivided by type and subtype. The two classes are Class 1 and Class 2. Class 1 CRISPR-Cas systems have effector modules composed of multiple Cas proteins, some of which form crRNA-binding complexes, while Class 2 CRISPR-Cas systems include a single, multi-domain crRNA-binding protein.
[0165] In some embodiments, the CRISPR-Cas system that can be used to modify a polynucleotide of the present invention described herein can be a Class 1 CRISPR-Cas system. In some embodiments, the CRISPR-Cas system that can be used to modify a polynucleotide of the present invention described herein can be a Class 2 CRISPR-Cas system.
Class 1 CRISPR-Cas Systems
[0166] In some embodiments, the CRISPR-Cas system that can be used to modify a polynucleotide of the present invention described herein can be a Class 1 CRISPR-Cas system. Class 1 CRISPR-Cas systems are divided into Types I, II, and IV. Makarova et al. 2020. Nat. Rev. 18: 67-83., particularly as described in Figure 1. Type I CRISPR-Cas systems are divided into 9 subtypes (I-A, I-B, I-C, I-D, I-E, I-Fl, I-F2, 1-F3, and IG). Makarova et al, 2020. Class 1, Type I CRISPR-Cas systems can contain a Cas3 protein that can have helicase activity. Type III CRISPR- Cas systems are divided into 6 subtypes (III-A, III-B, III-C, III-D, III-E, and III-F). Type III CRISPR-Cas systems can contain a Cas 10 that can include an RNA recognition motif called Palm and a cyclase domain that can cleave polynucleotides. Makarova et al., 2020. Type IV CRISPR- Cas systems are divided into 3 subtypes. (IV- A, IV-B, and IV-C). Makarova et al., 2020. Class 1 systems also include CRISPR-Cas variants, including Type I-A, I-B, I-E, I-F and I-U variants, which can include variants carried by transposons and plasmids, including versions of subtype I- F encoded by a large family of Tn7-like transposon and smaller groups of Tn7-like transposons that encode similarly degraded subtype I-B systems. Peters et al., PNAS 114 (35) (2017); DOI: 10.1073/pnas.1709035114; see also, Makarova et al. 2018. The CRISPR Journal, v. 1, n5, Figure 5.
[0167] The Class 1 systems typically use a multi-protein effector complex, which can, in some embodiments, include ancillary proteins, such as one or more proteins in a complex referred to as a CRISPR-associated complex for antiviral defense (Cascade), one or more adaptation proteins (e.g., Casl, Cas2, RNA nuclease), and/or one or more accessory proteins (e.g., Cas 4, DNA nuclease), CRISPR associated Rossman fold (CARF) domain containing proteins, and/or RNA transcriptase.
[0168] The backbone of the Class 1 CRISPR-Cas system effector complexes can be formed by RNA recognition motif domain-containing protein(s) of the repeat-associated mysterious proteins (RAMPs) family subunits (e.g., Cas 5, Cas6, and/or Cas7). RAMP proteins are characterized by having one or more RNA recognition motif domains. In some embodiments, multiple copies of RAMPs can be present. In some embodiments, the Class I CRISPR-Cas system can include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or more Cas5, Cas6, and/or Cas 7 proteins. In some embodiments, the Cas6 protein is an RNAse, which can be responsible for pre-crRNA processing. When present in a Class 1 CRISPR-Cas system, Cas6 can be optionally physically associated with the effector complex.
[0169] Class 1 CRISPR-Cas system effector complexes can, in some embodiments, also include a large subunit. The large subunit can be composed of or include a Cas8 and/or Cas 10 protein. See , e.g., Figures 1 and 2. Koonin EV, Makarova KS. 2019. Phil. Trans. R. Soc. B 374: 20180087, DOI: 10.1098/rstb.2018.0087 and Makarova et al. 2020.
[0170] Class 1 CRISPR-Cas system effector complexes can, in some embodiments, include a small subunit (for example, Casl l). See , e.g., Figures 1 and 2. Koonin EV, Makarova KS. 2019 Origins and Evolution of CRISPR-Cas systems. Phil. Trans. R. Soc. B 374: 20180087, DOI: 10.1098/rstb.2018.0087. [0171] In some embodiments, the Class 1 CRISPR-Cas system can be a Type I CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-A CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-B CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-C CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-D CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-E CRISPR-Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-Fl CRISPR- Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-F2 CRISPR- Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-F3 CRISPR- Cas system. In some embodiments, the Type I CRISPR-Cas system can be a subtype I-G CRISPR- Cas system. In some embodiments, the Type I CRISPR-Cas system can be a CRISPR Cas variant, such as a Type I-A, I-B, I-E, I-F and I-U variants, which can include variants carried by transposons and plasmids, including versions of subtype I-F encoded by a large family of Tn7-like transposon and smaller groups of Tn7-like transposons that encode similarly degraded subtype I- B systems as previously described.
[0172] In some embodiments, the Class 1 CRISPR-Cas system can be a Type III CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-A CRISPR- Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-B CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype
III-C CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-D CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-E CRISPR-Cas system. In some embodiments, the Type III CRISPR-Cas system can be a subtype III-F CRISPR-Cas system.
[0173] In some embodiments, the Class 1 CRISPR-Cas system can be a Type IV CRISPR- Cas-system. In some embodiments, the Type IV CRISPR-Cas system can be a subtype IV-A CRISPR-Cas system. In some embodiments, the Type IV CRISPR-Cas system can be a subtype
IV-B CRISPR-Cas system. In some embodiments, the Type IV CRISPR-Cas system can be a subtype IV-C CRISPR-Cas system.
[0174] The effector complex of a Class 1 CRISPR-Cas system can, in some embodiments, include a Cas3 protein that is optionally fused to a Cas2 protein, a Cas4, a Cas5, a Cas6, a Cas7, a Cas8, a Cas10, a Cas11, or a combination thereof. In some embodiments, the effector complex of a Class 1 CRISPR-Cas system can have multiple copies, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, or 14, of any one or more Cas proteins.
Class 2 CRISPR-Cas Systems
[0175] The compositions, systems, and methods described in greater detail elsewhere herein can be designed and adapted for use with Class 2 CRISPR-Cas systems. Thus, in some embodiments, the CRISPR-Cas system is a Class 2 CRISPR-Cas system. Class 2 systems are distinguished from Class 1 systems in that they have a single, large, multi-domain effector protein. In certain example embodiments, the Class 2 system can be a Type II, Type V, or Type VI system, which are described in Makarova et al. “Evolutionary classification of CRISPR-Cas systems: a burst of class 2 and derived variants” Nature Reviews Microbiology, 18:67-81 (Feb 2020), incorporated herein by reference. Each type of Class 2 system is further divided into subtypes. See Markova et al. 2020, particularly at Figure. 2. Class 2, Type II systems can be divided into 4 subtypes: II-A, II-B, II-C1, and II-C2. Class 2, Type V systems can be divided into 17 subtypes: V-A, V-B1, V-B2, V-C, V-D, V-E, V-Fl, V-F1(V-U3), V-F2, V-F3, V-G, V-H, V-I, V-K (V-U5),
V-U1, V-U2, and V-U4. Class 2, Type IV systems can be divided into 5 subtypes: VI-A, VI-B1,
VI-B2, VI-C, and VI-D.
[0176] The distinguishing feature of these types is that their effector complexes consist of a single, large, multi-domain protein. Type V systems differ from Type II effectors (e.g., Cas9), which contain two nuclear domains that are each responsible for the cleavage of one strand of the target DNA, with the HNH nuclease inserted inside the Ruv-C like nuclease domain sequence. The Type V systems (e.g., Casl2) only contain a RuvC-like nuclease domain that cleaves both strands. Type VI (Casl3) are unrelated to the effectors of Type II and V systems and contain two HEPN domains and target RNA. Casl3 proteins also display collateral activity that is triggered by target recognition. Some Type V systems have also been found to possess this collateral activity with two single-stranded DNA in in vitro contexts.
[0177] In some embodiments, the Class 2 system is a Type II system. In some embodiments, the Type II CRISPR-Cas system is a II-A CRISPR-Cas system. In some embodiments, the Type II CRISPR-Cas system is a II-B CRISPR-Cas system. In some embodiments, the Type II CRISPR- Cas system is a II-C1 CRISPR-Cas system. In some embodiments, the Type II CRISPR-Cas system is a II-C2 CRISPR-Cas system. In some embodiments, the Type II system is a Cas9 system. In some embodiments, the Type II system includes a Cas9.
[0178] In some embodiments, the Class 2 system is a Type V system. In some embodiments, the Type V CRISPR-Cas system is a V-A CRISPR-Cas system. In some embodiments, the Type
V CRISPR-Cas system is a V-B1 CRISPR-Cas system. In some embodiments, the Type V
CRISPR-Cas system is a V-B2 CRISPR-Cas system. In some embodiments, the Type V CRISPR- Cas system is a V-C CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-D CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-E CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-F1 CRISPR- Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-F1 (V-U3) CRISPR- Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-F2 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-F3 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-G CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-H CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-I CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-K (V-U5) CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-U1 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-U2 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system is a V-U4 CRISPR-Cas system. In some embodiments, the Type V CRISPR-Cas system includes a Cas12a (Cpf1), Cas12b (C2c1), Cas12c (C2c3), CasX, and/or Cas14.
[0179] In some embodiments the Class 2 system is a Type VI system. In some embodiments, the Type VI CRISPR-Cas system is a VI-A CRISPR-Cas system. In some embodiments, the Type
VI CRISPR-Cas system is a VI-B1 CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system is a VI-B2 CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system is a VI-C CRISPR-Cas system. In some embodiments, the Type VI CRISPR- Cas system is a VI-D CRISPR-Cas system. In some embodiments, the Type VI CRISPR-Cas system includes a Cas13a (C2c2), Cas13b (Group 29/30), Cas13c, and/or Cas13d. Specialized Cas-based Systems
[0180] In some embodiments, the system is a Cas-based system that is capable of performing a specialized function or activity. For example, the Cas protein may be fused, operably coupled to, or otherwise associated with one or more functionals domains. In certain example embodiments, the Cas protein may be a catalytically dead Cas protein (“dCas”) and/or have nickase activity. A nickase is a Cas protein that cuts only one strand of a double stranded target. In such embodiments, the dCas or nickase provide a sequence specific targeting functionality that delivers the functional domain to or proximate a target sequence. Example functional domains that may be fused to, operably coupled to, or otherwise associated with a Cas protein can be or include, but are not limited to a nuclear localization signal (NLS) domain, a nuclear export signal (NES) domain, a translational activation domain, a transcriptional activation domain (e.g. VP64, p65, MyoDl, HSF1, RTA, and SET7/9), a translation initiation domain, a transcriptional repression domain (e.g., a KRAB domain, NuE domain, NcoR domain, and a SID domain such as a SID4X domain), a nuclease domain (e.g., Fokl), a histone modification domain (e.g., a histone acetyltransferase), a light inducible/controllable domain, a chemically inducible/controllable domain, a transposase domain, a homologous recombination machinery domain, a recombinase domain, an integrase domain, and combinations thereof. Methods for generating catalytically dead Cas9 or a nickase Cas9 (WO 2014/204725, Ran et al. Cell. 2013 Sept 12; 154(6): 1380-1389), Cas12 (Liu et al. Nature Communications, 8, 2095 (2017), and Casl3 (WO 2019/005884, WO2019/060746) are known in the art and incorporated herein by reference.
[0181] In some embodiments, the functional domains can have one or more of the following activities: methylase activity, demethylase activity, translation activation activity, translation initiation activity, translation repression activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, nuclease activity, single-strand RNA cleavage activity, double-strand RNA cleavage activity, single-strand DNA cleavage activity, double-strand DNA cleavage activity, molecular switch activity, chemical inducibility, light inducibility, and nucleic acid binding activity. In some embodiments, the one or more functional domains may comprise epitope tags or reporters. Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags. Examples of reporters include, but are not limited to, glutathione-S-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT) beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and auto-fluorescent proteins including blue fluorescent protein (BFP).
[0182] The one or more functional domain(s) may be positioned at, near, and/or in proximity to a terminus of the effector protein (e.g., a Cas protein). In embodiments having two or more functional domains, each of the two can be positioned at or near or in proximity to a terminus of the effector protein (e.g., a Cas protein). In some embodiments, such as those where the functional domain is operably coupled to the effector protein, the one or more functional domains can be tethered or linked via a suitable linker (including, but not limited to, GlySer linkers) to the effector protein (e.g., a Cas protein). When there is more than one functional domain, the functional domains can be same or different. In some embodiments, all the functional domains are the same. In some embodiments, all of the functional domains are different from each other. In some embodiments, at least two of the functional domains are different from each other. In some embodiments, at least two of the functional domains are the same as each other.
[0183] Other suitable functional domains can be found, for example, in International Patent Publication No. WO 2019/018423.
Split CRISPR-Cas systems
[0184] In some embodiments, the CRISPR-Cas system is a split CRISPR-Cas system. See e.g., Zetche et al., 2015. Nat. Biotechnol. 33(2): 139-142 and WO 2019/018423, the compositions and techniques of which can be used in and/or adapted for use with the present invention. Split CRISPR-Cas proteins are set forth herein and in documents incorporated herein by reference in further detail herein. In certain embodiments, each part of a split CRISPR protein is attached to a member of a specific binding pair, and when bound with each other, the members of the specific binding pair maintain the parts of the CRISPR protein in proximity. In certain embodiments, each part of a split CRISPR protein is associated with an inducible binding pair. An inducible binding pair is one which is capable of being switched “on” or “off” by a protein or small molecule that binds to both members of the inducible binding pair. In some embodiments, CRISPR proteins may preferably split between domains, leaving domains intact. In particular embodiments, said Cas split domains (e.g., RuvC and HNH domains in the case of Cas9) can be simultaneously or sequentially introduced into the cell such that said split Cas domain(s) process the target nucleic acid sequence in the algae cell. The reduced size of the split Cas compared to the wild type Cas allows other methods of delivery of the systems to the cells, such as the use of cell penetrating peptides as described herein.
DNA and RNA Base Editing
[0185] In some embodiments, a polynucleotide of the present invention described elsewhere herein can be modified using a base editing system. In some embodiments, a Cas protein is connected or fused to a nucleotide deaminase. Thus, in some embodiments the Cas-based system can be a base editing system. As used herein “base editing” refers generally to the process of polynucleotide modification via a CRISPR-Cas-based or Cas-based system that does not include excising nucleotides to make the modification. Base editing can convert base pairs at precise locations without generating excess undesired editing byproducts that can be made using traditional CRISPR-Cas systems.
[0186] In certain example embodiments, the nucleotide deaminase may be a DNA base editor used in combination with a DNA binding Cas protein such as, but not limited to, Class 2 Type II and Type V systems. Two classes of DNA base editors are generally known: cytosine base editors (CBEs) and adenine base editors (ABEs). CBEs convert a C·G base pair into a T·A base pair (Komor et al. 2016. Nature. 533:420-424; Nishida et al. 2016. Science. 353; and Li et al. Nat. Biotech. 36:324-327) and ABEs convert an A·T base pair to a C·G base pair. Collectively, CBEs and ABEs can mediate all four possible transition mutations (C to T, A to G, T to C, and G to A). Rees and Liu. 2018. Nat. Rev. Genet. 19(12): 770-788, particularly at Figures lb, 2a-2c, 3a-3f, and Table 1. In some embodiments, the base editing system includes a CBE and/or an ABE. In some embodiments, a polynucleotide of the present invention described elsewhere herein can be modified using a base editing system. Rees and Liu. 2018. Nat. Rev. Gent. 19(12):770-788. Base editors also generally do not need a DNA donor template and/or rely on homology-directed repair. Komor et al. 2016. Nature. 533:420-424; Nishida et al. 2016. Science. 353; and Gaudeli et al. 2017. Nature. 551:464-471. Upon binding to a target locus in the DNA, base pairing between the guide RNA of the system and the target DNA strand leads to displacement of a small segment of ssDNA in an “R-loop”. Nishimasu et al. Cell. 156:935-949. DNA bases within the ssDNA bubble are modified by the enzyme component, such as a deaminase. In some systems, the catalytically disabled Cas protein can be a variant or modified Cas can have nickase functionality and can generate a nick in the non-edited DNA strand to induce cells to repair the non-edited strand using the edited strand as a template. Komor et al. 2016. Nature. 533:420-424; Nishida et al. 2016. Science. 353; and Gaudeli et al. 2017. Nature. 551:464-471. Base editors may be further engineered to optimize conversion of nucleotides (e.g., A:T to G:C). Richter et al. 2020. Nature Biotechnology . doi . org /10.1038/s41587-020-0453 -z.
[0187] Other Example Type V base editing systems are described in WO 2018/213708, WO 2018/213726, PCT/US2018/067207, PCT/US2018/067225, and PCT/US2018/067307 which are incorporated by referenced herein.
[0188] In certain example embodiments, the base editing system may be a RNA base editing system. As with DNA base editors, a nucleotide deaminase capable of converting nucleotide bases may be fused to a Cas protein. However, in these embodiments, the Cas protein will need to be capable of binding RNA. Example RNA binding Cas proteins include, but are not limited to, RNA- binding Cas9s such as Francisella novicida Cas9 (“FnCas9”), and Class 2 Type VI Cas systems. The nucleotide deaminase may be a cytidine deaminase or an adenosine deaminase, or an adenosine deaminase engineered to have cytidine deaminase activity. In certain example embodiments, the RNA based editor may be used to delete or introduce a post-translation modification site in the expressed mRNA. In contrast to DNA base editors, whose edits are permanent in the modified cell, RNA base editors can provide edits where finer temporal control may be needed, for example in modulating a particular immune response. Example Type VI RNA- base editing systems are described in Cox et al. 2017. Science 358: 1019-1027, WO 2019/005884, WO 2019/005886, WO 2019/071048, PCT/US20018/05179, PCT/US2018/067207, which are incorporated herein by reference. An example FnCas9 system that may be adapted for RNA base editing purposes is described in WO 2016/106236, which is incorporated herein by reference. [0189] An example method for delivery of base-editing systems, including use of a split-intein approach to divide CBE and ABE into reconstitutable halves, is described in Levy et al. Nature Biomedical Engineering doi.org/10.1038/s41441-019-0505-5 (2019), which is incorporated herein by reference. Prime Editors
[0190] In some embodiments, a polynucleotide of the present invention described elsewhere herein can be modified using a prime editing system (See e.g., Anzalone et al. 2019. Nature. 576: 149-157). Like base editing systems, prime editing systems can be capable of targeted modification of a polynucleotide without generating double stranded breaks and does not require donor templates. Further prime editing systems can be capable of all 12 possible combination swaps. Prime editing can operate via a “search-and-replace” methodology and can mediate targeted insertions, deletions, all 12 possible base-to-base conversion, and combinations thereof. Generally, a prime editing system, as exemplified by PE1, PE2, and PE3 (Id.), can include a reverse transcriptase fused or otherwise coupled or associated with an RNA-programmable nickase, and a prime-editing extended guide RNA (pegRNA) to facility direct copying of genetic information from the extension on the pegRNA into the target polynucleotide. Embodiments that can be used with the present invention include these and variants thereof. Prime editing can have the advantage of lower off-target activity than traditional CRIPSR-Cas systems along with few byproducts and greater or similar efficiency as compared to traditional CRISPR-Cas systems.
[0191] In some embodiments, the prime editing guide molecule can specify both the target polynucleotide information (e.g., sequence) and contain a new polynucleotide cargo that replaces target polynucleotides. To initiate transfer from the guide molecule to the target polynucleotide, the PE system can nick the target polynucleotide at a target side to expose a 3’ hydroxyl group, which can prime reverse transcription of an edit-encoding extension region of the guide molecule (e.g., a prime editing guide molecule or peg guide molecule) directly into the target site in the target polynucleotide. See e.g., Anzalone et al. 2019. Nature. 576: 149-157, particularly at Figures lb, lc, related discussion, and Supplementary discussion.
[0192] In some embodiments, a prime editing system can be composed of a Cas polypeptide having nickase activity, a reverse transcriptase, and a guide molecule. The Cas polypeptide can lack nuclease activity. The guide molecule can include a target binding sequence as well as a primer binding sequence and a template containing the edited polynucleotide sequence. The guide molecule, Cas polypeptide, and/or reverse transcriptase can be coupled together or otherwise associate with each other to form an effector complex and edit a target sequence. In some embodiments, the Cas polypeptide is a Class 2, Type V Cas polypeptide. In some embodiments, the Cas polypeptide is a Cas9 polypeptide (e.g., is a Cas9 nickase). In some embodiments, the Cas polypeptide is fused to the reverse transcriptase. In some embodiments, the Cas polypeptide is linked to the reverse transcriptase.
[0193] In some embodiments, the prime editing system can be a PEI system or variant thereof, a PE2 system or variant thereof, or a PE3 (e.g., PE3, PE3b) system. See e.g., Anzalone et al. 2019. Nature. 576: 149-157, particularly at pgs. 2-3, Figs. 2a, 3a-3f, 4a-4b, Extended data Figs. 3a-3b, 4,
[0194] The peg guide molecule can be about 10 to about 200 or more nucleotides in length, such as 10 to/or 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58 59, 60, 61, 62, 63, 64, 65, 66, 67 , 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126,
127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145,
146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183,
184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, or 200 or more nucleotides in length. Optimization of the peg guide molecule can be accomplished as described in Anzalone et al. 2019. Nature. 576: 149-157, particularly at pg. 3, Fig. 2a-2b, and Extended Data Figs. 5a-c.
CRISPR Associated Transposase (CAST) Systems
[0195] In some embodiments, a polynucleotide of the present invention described elsewhere herein can be modified using a CRISPR Associated Transposase (“CAST”) system. CAST system can include a Cas protein that is catalytically inactive, or engineered to be catalytically active, and further comprises a transposase (or subunits thereof) that catalyze RNA-guided DNA transposition. Such systems are able to insert DNA sequences at a target site in a DNA molecule without relying on host cell repair machinery. CAST systems can be Class 1 or Class 2 CAST systems. An example Class 1 system is described in Klompe et al. Nature, doi:10.1038/s41586- 019-1323, which is in incorporated herein by reference. An example Class 2 system is described in Strecker et al. Science. 10/1126/science. aax9181 (2019), and PCT/US2019/066835 which are incorporated herein by reference.
Guide Molecules
[0196] The CRISPR-Cas or Cas-Based system described herein can, in some embodiments, include one or more guide molecules. The terms guide molecule, guide sequence and guide polynucleotide, refer to polynucleotides capable of guiding Cas to a target genomic locus and are used interchangeably as in foregoing cited documents such as WO 2014/093622 (PCT/US2013/074667). In general, a guide sequence is any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a CRISPR complex to the target sequence. The guide molecule can be a polynucleotide.
[0197] The ability of a guide sequence (within a nucleic acid-targeting guide RNA) to direct sequence-specific binding of a nucleic acid-targeting complex to a target nucleic acid sequence may be assessed by any suitable assay. For example, the components of a nucleic acid-targeting CRISPR system sufficient to form a nucleic acid-targeting complex, including the guide sequence to be tested, may be provided to a host cell having the corresponding target nucleic acid sequence, such as by transfection with vectors encoding the components of the nucleic acid-targeting complex, followed by an assessment of preferential targeting (e.g., cleavage) within the target nucleic acid sequence, such as by Surveyor assay (Qui et al. 2004. BioTechniques. 36(4)702-707). Similarly, cleavage of a target nucleic acid sequence may be evaluated in a test tube by providing the target nucleic acid sequence, components of a nucleic acid-targeting complex, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. Other assays are possible and will occur to those skilled in the art.
[0198] In some embodiments, the guide molecule is an RNA. The guide molecule(s) (also referred to interchangeably herein as guide polynucleotide and guide sequence) that are included in the CRISPR-Cas or Cas based system can be any polynucleotide sequence having sufficient complementarity with a target nucleic acid sequence to hybridize with the target nucleic acid sequence and direct sequence-specific binding of a nucleic acid-targeting complex to the target nucleic acid sequence. In some embodiments, the degree of complementarity, when optimally aligned using a suitable alignment algorithm, can be about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows- Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies; available at www.novocraft.com), ELAND (Illumina, San Diego, CA), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net).
[0199] A guide sequence, and hence a nucleic acid-targeting guide, may be selected to target any target nucleic acid sequence. The target sequence may be DNA. The target sequence may be any RNA sequence. In some embodiments, the target sequence may be a sequence within an RNA molecule selected from the group consisting of messenger RNA (mRNA), pre-mRNA, ribosomal RNA (rRNA), transfer RNA (tRNA), micro-RNA (miRNA), small interfering RNA (siRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), double stranded RNA (dsRNA), non-coding RNA (ncRNA), long non-coding RNA (IncRNA), and small cytoplasmatic RNA (scRNA). In some preferred embodiments, the target sequence may be a sequence within an RNA molecule selected from the group consisting of mRNA, pre-mRNA, and rRNA. In some preferred embodiments, the target sequence may be a sequence within an RNA molecule selected from the group consisting of ncRNA, and IncRNA. In some more preferred embodiments, the target sequence may be a sequence within an mRNA molecule or a pre-mRNA molecule.
[0200] In some embodiments, a nucleic acid-targeting guide is selected to reduce the degree secondary structure within the nucleic acid-targeting guide. In some embodiments, about or less than about 75%, 50%, 40%, 30%, 25%, 20%, 15%, 10%, 5%, 1%, or fewer of the nucleotides of the nucleic acid-targeting guide participate in self-complementary base pairing when optimally folded. Optimal folding may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker and Stiegler (Nucleic Acids Res. 9 (1981), 133-148). Another example folding algorithm is the online Webserver RNAfold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see e.g., A.R. Gruber et al., 2008, Cell 106(1): 23-24; and PA Carr and GM Church, 2009, Nature Biotechnology 27(12): 1151-62). [0201] In certain embodiments, a guide RNA or crRNA may comprise, consist essentially of, or consist of a direct repeat (DR) sequence and a guide sequence or spacer sequence. In certain embodiments, the guide RNA or crRNA may comprise, consist essentially of, or consist of a direct repeat sequence fused or linked to a guide sequence or spacer sequence. In certain embodiments, the direct repeat sequence may be located upstream (i.e., 5’) from the guide sequence or spacer sequence. In other embodiments, the direct repeat sequence may be located downstream (i.e., 3’) from the guide sequence or spacer sequence.
[0202] In certain embodiments, the crRNA comprises a stem loop, preferably a single stem loop. In certain embodiments, the direct repeat sequence forms a stem loop, preferably a single stem loop.
[0203] In certain embodiments, the spacer length of the guide RNA is from 15 to 35 nt. In certain embodiments, the spacer length of the guide RNA is at least 15 nucleotides. In certain embodiments, the spacer length is from 15 to 17 nt, e.g., 15, 16, or 17 nt, from 17 to 20 nt, e.g., 17, 18, 19, or 20 nt, from 20 to 24 nt, e.g., 20, 21, 22, 23, or 24 nt, from 23 to 25 nt, e.g., 23, 24, or 25 nt, from 24 to 27 nt, e.g., 24, 25, 26, or 27 nt, from 27 to 30 nt, e.g., 27, 28, 29, or 30 nt, from 30 to 35 nt, e.g., 30, 31, 32, 33, 34, or 35 nt, or 35 nt or longer.
[0204] The “tracrRNA” sequence or analogous terms includes any polynucleotide sequence that has sufficient complementarity with a crRNA sequence to hybridize. In some embodiments, the degree of complementarity between the tracrRNA sequence and crRNA sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In some embodiments, the tracr sequence is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length. In some embodiments, the tracr sequence and crRNA sequence are contained within a single transcript, such that hybridization between the two produces a transcript having a secondary structure, such as a hairpin.
[0205] In general, degree of complementarity is with reference to the optimal alignment of the sea sequence and tracr sequence, along the length of the shorter of the two sequences. Optimal alignment may be determined by any suitable alignment algorithm and may further account for secondary structures, such as self-complementarity within either the sea sequence or tracr sequence. In some embodiments, the degree of complementarity between the tracr sequence and sea sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher.
[0206] In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence can be about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or 100%; a guide or RNA or sgRNA can be about or more than about 5,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length; or guide or RNA or sgRNA can be less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length; and tracr RNA can be 30 or 50 nucleotides in length. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence is greater than 94.5% or 95% or 95.5% or 96% or 96.5% or 97% or 97.5% or 98% or 98.5% or 99% or 99.5% or 99.9%, or 100%. Off target is less than 100% or 99.9% or 99.5% or 99% or 99% or 98.5% or 98% or 97.5% or 97% or 96.5% or 96% or 95.5% or 95% or 94.5% or 94% or 93% or 92% or 91% or 90% or 89% or 88% or 87% or 86% or 85% or 84% or 83% or 82% or 81% or 80% complementarity between the sequence and the guide, with it advantageous that off target is 100% or 99.9% or 99.5% or 99% or 99% or 98.5% or 98% or 97.5% or 97% or 96.5% or 96% or 95.5% or 95% or 94.5% complementarity between the sequence and the guide.
[0207] In some embodiments according to the invention, the guide RNA (capable of guiding Cas to a target locus) may comprise (1) a guide sequence capable of hybridizing to a genomic target locus in the eukaryotic cell; (2) a tracr sequence; and (3) a tracr mate sequence. All (1) to (3) may reside in a single RNA, i.e., an sgRNA (arranged in a 5’ to 3’ orientation), or the tracr RNA may be a different RNA than the RNA containing the guide and tracr sequence. The tracr hybridizes to the tracr mate sequence and directs the CRISPR/Cas complex to the target sequence. Where the tracr RNA is on a different RNA than the RNA containing the guide and tracr sequence, the length of each RNA may be optimized to be shortened from their respective native lengths, and each may be independently chemically modified to protect from degradation by cellular RNase or otherwise increase stability.
[0208] Many modifications to guide sequences are known in the art and are further contemplated within the context of this invention. Various modifications may be used to increase the specificity of binding to the target sequence and/or increase the activity of the Cas protein and/or reduce off-target effects. Example guide sequence modifications are described in PCT US2019/045582, specifically paragraphs [0178]-[0333], which is incorporated herein by reference.
Target Sequences, PAMs, and PFSs
Tarset Sequences
[0209] In the context of formation of a CRISPR complex, “target sequence” refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. A target sequence may comprise RNA polynucleotides. The term “target RNA” refers to an RNA polynucleotide being or comprising the target sequence. In other words, the target polynucleotide can be a polynucleotide or a part of a polynucleotide to which a part of the guide sequence is designed to have complementarity with and to which the effector function mediated by the complex comprising the CRISPR effector protein and a guide molecule is to be directed. In some embodiments, a target sequence is located in the nucleus or cytoplasm of a cell.
[0210] The guide sequence can specifically bind a target sequence in a target polynucleotide. The target polynucleotide may be DNA. The target polynucleotide may be RNA. The target polynucleotide can have one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, etc. or more) target sequences. The target polynucleotide can be on a vector. The target polynucleotide can be genomic DNA. The target polynucleotide can be episomal. Other forms of the target polynucleotide are described elsewhere herein.
[0211] The target sequence may be DNA. The target sequence may be any RNA sequence. In some embodiments, the target sequence may be a sequence within an RNA molecule selected from the group consisting of messenger RNA (mRNA), pre-mRNA, ribosomal RNA (rRNA), transfer RNA (tRNA), micro-RNA (miRNA), small interfering RNA (siRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), double stranded RNA (dsRNA), non-coding RNA (ncRNA), long non-coding RNA (IncRNA), and small cytoplasmatic RNA (scRNA). In some preferred embodiments, the target sequence (also referred to herein as a target polynucleotide) may be a sequence within an RNA molecule selected from the group consisting of mRNA, pre-mRNA, and rRNA. In some preferred embodiments, the target sequence may be a sequence within an RNA molecule selected from the group consisting of ncRNA, and IncRNA. In some more preferred embodiments, the target sequence may be a sequence within an mRNA molecule or a pre-mRNA molecule.
PAM and PFS Elements
[0212] PAM elements are sequences that can be recognized and bound by Cas proteins. Cas proteins/effector complexes can then unwind the dsDNA at a position adjacent to the PAM element. It will be appreciated that Cas proteins and systems that include them that target RNA do not require PAM sequences (Marraffini et al. 2010. Nature. 463:568-571). Instead, many rely on PFSs, which are discussed elsewhere herein. In certain embodiments, the target sequence should be associated with a PAM (protospacer adjacent motif) or PFS (protospacer flanking sequence or site), that is, a short sequence recognized by the CRISPR complex. Depending on the nature of the CRISPR-Cas protein, the target sequence should be selected, such that its complementary sequence in the DNA duplex (also referred to herein as the non-target sequence) is upstream or downstream of the PAM. In the embodiments, the complementary sequence of the target sequence is downstream or 3’ of the PAM or upstream or 5’ of the PAM. The precise sequence and length requirements for the PAM differ depending on the Cas protein used, but PAMs are typically 2-5 base pair sequences adjacent the protospacer (that is, the target sequence). Examples of the natural PAM sequences for different Cas proteins are provided herein below and the skilled person will be able to identify further PAM sequences for use with a given Cas protein.
[0213] The ability to recognize different PAM sequences depends on the Cas polypeptide(s) included in the system. See e.g., Gleditzsch et al. 2019. RNA Biology. 16(4):504-517. Table A below shows several Cas polypeptides and the PAM sequence they recognize.
Figure imgf000081_0001
Figure imgf000082_0001
[0214] In a preferred embodiment, the CRISPR effector protein may recognize a 3’ PAM. In certain embodiments, the CRISPR effector protein may recognize a 3’ PAM which is 5Ή, wherein H is A, C or U.
[0215] Further, engineering of the PAM Interacting (PI) domain on the Cas protein may allow programing of PAM specificity, improve target site recognition fidelity, and increase the versatility of the CRISPR-Cas protein, for example as described for Cas9 in Kleinstiver BP et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature. 2015 Jul 23;523(7561):481-5. doi: 10.1038/naturel4592. As further detailed herein, the skilled person will understand that Casl3 proteins may be modified analogously. Gao et al, “Engineered Cpfl Enzymes with Altered PAM Specificities,” bioRxiv 091611; doi: dx.doi.org/10.1101/091611 (Dec. 4, 2016). Doench et al. created a pool of sgRNAs, tiling across all possible target sites of a panel of six endogenous mouse and three endogenous human genes and quantitatively assessed their ability to produce null alleles of their target gene by antibody staining and flow cytometry. The authors showed that optimization of the PAM improved activity and also provided an on-line tool for designing sgRNAs.
[0216] PAM sequences can be identified in a polynucleotide using an appropriate design tool, which are commercially available as well as online. Such freely available tools include, but are not limited to, CRISPRFinder and CRISPRTarget. Mojica et al. 2009. Microbiol. 155(Pt. 3):733-740; Atschul et al. 1990. J. Mol. Biol. 215:403-410; Biswass et al. 2013 RNA Biol. 10:817-827; and Grissa et al. 2007. Nucleic Acid Res. 35:W52-57. Experimental approaches to PAM identification can include, but are not limited to, plasmid depletion assays (Jiang et al. 2013. Nat. Biotechnol. 31:233-239; Esvelt et al. 2013. Nat. Methods. 10:1116-1121; Kleinstiver et al. 2015. Nature. 523:481-485), screened by a high-throughput in vivo model called PAM-SCNAR (Pattanayak et al. 2013. Nat. Biotechnol. 31:839-843 and Leenay et al. 2016. Mol. Cell. 16:253), and negative screening (Zetsche et al. 2015. Cell. 163:759-771).
[0217] As previously mentioned, CRISPR-Cas systems that target RNA do not typically rely on PAM sequences. Instead, such systems typically recognize protospacer flanking sites (PFSs) instead of PAMs Thus, Type VI CRISPR-Cas systems typically recognize protospacer flanking sites (PFSs) instead of PAMs. PFSs represents an analogue to PAMs for RNA targets. Type VI CRISPR-Cas systems employ a Casl3. Some Cas13 proteins analyzed to date, such as Cas13a (C2c2) identified from Leptotrichia shahii (LShCAsl3a) have a specific discrimination against G at the 3’ end of the target RNA. The presence of a C at the corresponding crRNA repeat site can indicate that nucleotide pairing at this position is rejected. However, some Cas13 proteins (e.g., LwaCAsl3a and PspCasl3b) do not seem to have a PFS preference. See e.g., Gleditzsch et al. 2019. RNA Biology. 16(4): 504-517.
[0218] Some Type VI proteins, such as subtype B, have 5 '-recognition of D (G, T, A) and a 3 '-motif requirement of NAN or NNA. One example is the Cas13b protein identified in Bergeyella zoohelcum (BzCas13b). See e.g., Gleditzsch et al. 2019. RNA Biology. 16(4):504-517.
[0219] Overall Type VI CRISPR-Cas systems appear to have less restrictive rules for substrate
(e.g., target sequence) recognition than those that target DNA (e.g., Type V and type II).
Zinc Finger Nucleases
[0220] In some embodiments, the polynucleotide is modified using a Zinc Finger nuclease or system thereof. One type of programmable DNA-binding domain is provided by artificial zinc- finger (ZF) technology, which involves arrays of ZF modules to target new DNA-binding sites in the genome. Each finger module in a ZF array targets three DNA bases. A customized array of individual zinc finger domains is assembled into a ZF protein (ZFP).
[0221] ZFPs can comprise a functional domain. The first synthetic zinc finger nucleases (ZFNs) were developed by fusing a ZF protein to the catalytic domain of the Type IIS restriction enzyme Fokl. (Kim, Y. G. et al., 1994, Chimeric restriction endonuclease, Proc. Natl. Acad. Sci. U.S.A. 91, 883-887; Kim, Y. G. et al., 1996, Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain. Proc. Natl. Acad. Sci. U.S.A. 93, 1156-1160). Increased cleavage specificity can be attained with decreased off target activity by use of paired ZFN heterodimers, each targeting different nucleotide sequences separated by a short spacer. (Doyon, Y. et al., 2011, Enhancing zinc-finger-nuclease activity with improved obligate heterodimeric architectures. Nat. Methods 8, 74-79). ZFPs can also be designed as transcription activators and repressors and have been used to target many genes in a wide variety of organisms. Exemplary methods of genome editing using ZFNs can be found for example inU.S. Patent Nos. 6,534,261, 6,607,882, 6,746,838,
6,794,136, 6,824,978, 6,866,997, 6,933,113, 6,979,539, 7,013,219, 7,030,215, 7,220,719, 7,241,573, 7,241,574, 7,585,849, 7,595,376, 6,903,185, and 6,479,626, all of which are specifically incorporated by reference.
TALE Nucleases
[0222] In some embodiments, a TALE nuclease or TALE nuclease system can be used to modify a polynucleotide. In some embodiments, the methods provided herein use isolated, non- naturally occurring, recombinant or engineered DNA binding proteins that comprise TALE monomers or TALE monomers or half monomers as a part of their organizational structure that enable the targeting of nucleic acid sequences with improved efficiency and expanded specificity. [0223] Naturally occurring TALEs or “wild type TALEs” are nucleic acid binding proteins secreted by numerous species of proteobacteria. TALE polypeptides contain a nucleic acid binding domain composed of tandem repeats of highly conserved monomer polypeptides that are predominantly 33, 34 or 35 amino acids in length and that differ from each other mainly in amino acid positions 12 and 13. In advantageous embodiments the nucleic acid is DNA. As used herein, the term “polypeptide monomers”, “TALE monomers” or “monomers” will be used to refer to the highly conserved repetitive polypeptide sequences within the TALE nucleic acid binding domain and the term “repeat variable di-residues” or “RVD” will be used to refer to the highly variable amino acids at positions 12 and 13 of the polypeptide monomers. As provided throughout the disclosure, the amino acid residues of the RVD are depicted using the IUPAC single letter code for amino acids. A general representation of a TALE monomer which is comprised within the DNA binding domain is X1-11-(X12X13)-X14-33 or 34 or 35, where the subscript indicates the amino acid position and X represents any amino acid. X12X13 indicate the RVDs. In some polypeptide monomers, the variable amino acid at position 13 is missing or absent and in such monomers, the RVD consists of a single amino acid. In such cases the RVD may be alternatively represented as X*, where X represents X12 and (*) indicates that X13 is absent. The DNA binding domain comprises several repeats of TALE monomers and this may be represented as ( X1-11-(X12X13)-X14- 33 or 34 or 35)z, where in an advantageous embodiment, z is at least 5 to 40. In a further advantageous embodiment, z is at least 10 to 26.
[0224] The TALE monomers can have a nucleotide binding affinity that is determined by the identity of the amino acids in its RVD. For example, polypeptide monomers with an RVD of NI can preferentially bind to adenine (A), monomers with an RVD of NG can preferentially bind to thymine (T), monomers with an RVD of HD can preferentially bind to cytosine (C) and monomers with an RVD of NN can preferentially bind to both adenine (A) and guanine (G). In some embodiments, monomers with an RVD of IG can preferentially bind to T. Thus, the number and order of the polypeptide monomer repeats in the nucleic acid binding domain of a TALE determines its nucleic acid target specificity. In some embodiments, monomers with an RVD of NS can recognize all four base pairs and can bind to A, T, G or C. The structure and function of TALEs is further described in, for example, Moscou et al., Science 326:1501 (2009); Boch et al., Science 326:1509-1512 (2009); and Zhang et al., Nature Biotechnology 29:149-153 (2011). [0225] The polypeptides used in methods of the invention can be isolated, non-naturally occurring, recombinant or engineered nucleic acid-binding proteins that have nucleic acid or DNA binding regions containing polypeptide monomer repeats that are designed to target specific nucleic acid sequences.
[0226] As described herein, polypeptide monomers having an RVD of HN or NH preferentially bind to guanine and thereby allow the generation of TALE polypeptides with high binding specificity for guanine containing target nucleic acid sequences. In some embodiments, polypeptide monomers having RVDs RN, NN, NK, SN, NH, KN, HN, NQ, HH, RG, KH, RH and SS can preferentially bind to guanine. In some embodiments, polypeptide monomers having RVDs RN, NK, NQ, HH, KH, RH, SS and SN can preferentially bind to guanine and can thus allow the generation of TALE polypeptides with high binding specificity for guanine containing target nucleic acid sequences. In some embodiments, polypeptide monomers having RVDs HH, KH, NH, NK, NQ, RH, RN and SS can preferentially bind to guanine and thereby allow the generation of TALE polypeptides with high binding specificity for guanine containing target nucleic acid sequences. In some embodiments, the RVDs that have high binding specificity for guanine are RN, NH RH and KH. Furthermore, polypeptide monomers having an RVD of NV can preferentially bind to adenine and guanine. In some embodiments, monomers having RVDs of H*, HA, KA, N*, NA, NC, NS, RA, and S* bind to adenine, guanine, cytosine and thymine with comparable affinity.
[0227] The predetermined N-terminal to C-terminal order of the one or more polypeptide monomers of the nucleic acid or DNA binding domain determines the corresponding predetermined target nucleic acid sequence to which the polypeptides of the invention will bind. As used herein the monomers and at least one or more half monomers are “specifically ordered to target” the genomic locus or gene of interest. In plant genomes, the natural TALE-binding sites always begin with a thymine (T), which may be specified by a cryptic signal within the non- repetitive N-terminus of the TALE polypeptide; in some cases, this region may be referred to as repeat 0 In animal genomes, TALE binding sites do not necessarily have to begin with a thymine (T) and polypeptides of the invention may target DNA sequences that begin with T, A, G or C. The tandem repeat of TALE monomers always ends with a half-length repeat or a stretch of sequence that may share identity with only the first 20 amino acids of a repetitive full-length TALE monomer and this half repeat may be referred to as a half-monomer. Therefore, it follows that the length of the nucleic acid or DNA being targeted is equal to the number of full monomers plus two.
[0228] As described in Zhang et al., Nature Biotechnology 29:149-153 (2011), TALE polypeptide binding efficiency may be increased by including amino acid sequences from the “capping regions” that are directly N-terminal or C-terminal of the DNA binding region of naturally occurring TALEs into the engineered TALEs at positions N-terminal or C-terminal of the engineered TALE DNA binding region. Thus, in certain embodiments, the TALE polypeptides described herein further comprise an N-terminal capping region and/or a C-terminal capping region.
[0229] An exemplary amino acid sequence of a N-terminal capping region is:
[0230]
Figure imgf000086_0001
Figure imgf000086_0002
Figure imgf000086_0003
(SEQ ID NO: 1) [0231] An exemplary amino acid sequence of a C-terminal capping region is:
[0232]
Figure imgf000086_0005
Figure imgf000086_0004
Figure imgf000087_0001
Figure imgf000087_0002
(SEQ ID NO:2)
[0233] As used herein the predetermined “N-terminus” to “C terminus” orientation of the N- terminal capping region, the DNA binding domain comprising the repeat TALE monomers and the C-terminal capping region provide structural basis for the organization of different domains in the d-TALEs or polypeptides of the invention.
[0234] The entire N-terminal and/or C-terminal capping regions are not necessary to enhance the binding activity of the DNA binding region. Therefore, in certain embodiments, fragments of the N-terminal and/or C-terminal capping regions are included in the TALE polypeptides described herein.
[0235] In certain embodiments, the TALE polypeptides described herein contain a N-terminal capping region fragment that included at least 10, 20, 30, 40, 50, 54, 60, 70, 80, 87, 90, 94, 100, 102, 110, 117, 120, 130, 140, 147, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260 or 270 amino acids of an N-terminal capping region. In certain embodiments, the N-terminal capping region fragment amino acids are of the C-terminus (the DNA-binding region proximal end) of an N-terminal capping region. As described in Zhang et al., Nature Biotechnology 29:149-153 (2011), N-terminal capping region fragments that include the C-terminal 240 amino acids enhance binding activity equal to the full length capping region, while fragments that include the C-terminal 147 amino acids retain greater than 80% of the efficacy of the full length capping region, and fragments that include the C-terminal 117 amino acids retain greater than 50% of the activity of the full- length capping region.
[0236] In some embodiments, the TALE polypeptides described herein contain a C-terminal capping region fragment that included at least 6, 10, 20, 30, 37, 40, 50, 60, 68, 70, 80, 90, 100, 110, 120, 127, 130, 140, 150, 155, 160, 170, 180 amino acids of a C-terminal capping region. In certain embodiments, the C-terminal capping region fragment amino acids are of the N-terminus (the DNA-binding region proximal end) of a C-terminal capping region. As described in Zhang et al., Nature Biotechnology 29:149-153 (2011), C-terminal capping region fragments that include the C-terminal 68 amino acids enhance binding activity equal to the full-length capping region, while fragments that include the C-terminal 20 amino acids retain greater than 50% of the efficacy of the full-length capping region. [0237] In certain embodiments, the capping regions of the TALE polypeptides described herein do not need to have identical sequences to the capping region sequences provided herein. Thus, in some embodiments, the capping region of the TALE polypeptides described herein have sequences that are at least 50%, 60%, 70%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identical or share identity to the capping region amino acid sequences provided herein. Sequence identity is related to sequence homology. Homology comparisons may be conducted by eye, or more usually, with the aid of readily available sequence comparison programs. These commercially available computer programs may calculate percent (%) homology between two or more sequences and may also calculate the sequence identity shared by two or more amino acid or nucleic acid sequences. In some preferred embodiments, the capping region of the TALE polypeptides described herein have sequences that are at least 95% identical or share identity to the capping region amino acid sequences provided herein.
[0238] Sequence homologies can be generated by any of a number of computer programs known in the art, which include, but are not limited to, BLAST or FASTA. Suitable computer programs for carrying out alignments like the GCG Wisconsin Bestfit package may also be used. Once the software has produced an optimal alignment, it is possible to calculate % homology, preferably % sequence identity. The software typically does this as part of the sequence comparison and generates a numerical result.
[0239] In some embodiments described herein, the TALE polypeptides of the invention include a nucleic acid binding domain linked to the one or more effector domains. The terms “effector domain” or “regulatory and functional domain” refer to a polypeptide sequence that has an activity other than binding to the nucleic acid sequence recognized by the nucleic acid binding domain. By combining a nucleic acid binding domain with one or more effector domains, the polypeptides of the invention may be used to target the one or more functions or activities mediated by the effector domain to a particular target DNA sequence to which the nucleic acid binding domain specifically binds.
[0240] In some embodiments of the TALE polypeptides described herein, the activity mediated by the effector domain is a biological activity. For example, in some embodiments the effector domain is a transcriptional inhibitor (i.e., a repressor domain), such as an mSin interaction domain (SID). SID4X domain or a Kriippel-associated box (KRAB) or fragments of the KRAB domain. In some embodiments the effector domain is an enhancer of transcription (i.e., an activation domain), such as the VP16, VP64 or p65 activation domain. In some embodiments, the nucleic acid binding is linked, for example, with an effector domain that includes, but is not limited to, a transposase, integrase, recombinase, resolvase, invertase, protease, DNA methyltransferase, DNA demethylase, histone acetylase, histone deacetylase, nuclease, transcriptional repressor, transcriptional activator, transcription factor recruiting, protein nuclear-localization signal or cellular uptake signal.
[0241] In some embodiments, the effector domain is a protein domain which exhibits activities which include but are not limited to transposase activity, integrase activity, recombinase activity, resolvase activity, invertase activity, protease activity, DNA methyltransferase activity, DNA demethylase activity, histone acetylase activity, histone deacetylase activity, nuclease activity, nuclear-localization signaling activity, transcriptional repressor activity, transcriptional activator activity, transcription factor recruiting activity, or cellular uptake signaling activity. Other preferred embodiments of the invention may include any combination of the activities described herein.
Meganucleases
[0242] In some embodiments, a meganuclease or system thereof can be used to modify a polynucleotide. Meganucleases, which are endodeoxyribonucleases characterized by a large recognition site (double-stranded DNA sequences of 12 to 40 base pairs). Exemplary methods for using meganucleases can be found in US Patent Nos. 8,163,514, 8,133,697, 8,021,867, 8,119,361, 8,119,381, 8,124,369, and 8,129,134, which are specifically incorporated by reference. SEQUENCES RELATED TO NUCLEUS TARGETING AND TRANSPORTATION
[0243] In some embodiments, one or more components (e.g., the Cas protein and/or deaminase, Zn Finger protein, TALE, or meganuclease) in the composition for engineering cells may comprise one or more sequences related to nucleus targeting and transportation. Such sequence may facilitate the one or more components in the composition for targeting a sequence within a cell. In order to improve targeting of the CRISPR-Cas protein and/or the nucleotide deaminase protein or catalytic domain thereof used in the methods of the present disclosure to the nucleus, it may be advantageous to provide one or both of these components with one or more nuclear localization sequences (NLSs). [0244] In some embodiments, the NLSs used in the context of the present disclosure are heterologous to the proteins. Non-limiting examples of NLSs include an NLS sequence derived from: the NLS of the SV40 virus large T-antigen, having the amino acid sequence
Figure imgf000090_0001
(SEQ ID NO: 3) or
Figure imgf000090_0002
(SEQ ID NO: 4); the NLS from nucleoplasmin (e.g., the nucleoplasmin bipartite NLS with the sequence
Figure imgf000090_0003
(SEQ ID NO: 5)); the c-myc NLS having the amino acid sequence
Figure imgf000090_0004
(SEQ ID NO: 6) or
Figure imgf000090_0005
(SEQ ID NO: 7); the hRNPAl M9 NLS having the sequence
Figure imgf000090_0006
(SEQ ID NO: 8); the sequence
Figure imgf000090_0007
(SEQ ID NO: 9) of the IBB domain from importin-alpha; the sequences
Figure imgf000090_0008
(SEQ ID NO: 10) and
Figure imgf000090_0009
(SEQ ID NO: 11) of the myoma T protein; the sequence PQPKKKPL (SEQ ID NO: 12) of human p53; the sequence
Figure imgf000090_0010
(SEQ ID NO: 13) of mouse c-abl IV; the sequences
Figure imgf000090_0012
(SEQ ID NO: 14) and
Figure imgf000090_0011
(SEQ ID NO: 15) of the influenza virus NS1; the sequence
Figure imgf000090_0013
(SEQ ID NO: 16) of the Hepatitis virus delta antigen; the sequence
Figure imgf000090_0014
(SEQ ID NO: 17) of the mouse Mxl protein; the sequence
Figure imgf000090_0015
(SEQ ID NO: 18) of the human poly(ADP-ribose) polymerase; and the sequence
Figure imgf000090_0016
(SEQ ID NO: 19) of the steroid hormone receptors (human) glucocorticoid. In general, the one or more NLSs are of sufficient strength to drive accumulation of the DNA-targeting Cas protein in a detectable amount in the nucleus of a eukaryotic cell. In general, strength of nuclear localization activity may derive from the number of NLSs in the CRISPR-Cas protein, the particular NLS(s) used, or a combination of these factors. Detection of accumulation in the nucleus may be performed by any suitable technique. For example, a detectable marker may be fused to the nucleic acid-targeting protein, such that location within a cell may be visualized, such as in combination with a means for detecting the location of the nucleus (e.g., a stain specific for the nucleus such as DAPI). Cell nuclei may also be isolated from cells, the contents of which may then be analyzed by any suitable process for detecting protein, such as immunohistochemistry, Western blot, or enzyme activity assay. Accumulation in the nucleus may also be determined indirectly, such as by an assay for the effect of nucleic acidtargeting complex formation (e.g., assay for deaminase activity) at the target sequence, or assay for altered gene expression activity affected by DNA-targeting complex formation and/or DNA- targeting), as compared to a control not exposed to the CRISPR-Cas protein and deaminase protein, or exposed to a CRISPR-Cas and/or deaminase protein lacking the one or more NLSs. [0245] The CRISPR-Cas and/or nucleotide deaminase proteins may be provided with 1 or more, such as with, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more heterologous NLSs. In some embodiments, the proteins comprises about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the amino-terminus, about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLSs at or near the carboxy-terminus, or a combination of these (e.g., zero or at least one or more NLS at the amino-terminus and zero or at one or more NLS at the carboxy terminus). When more than one NLS is present, each may be selected independently of the others, such that a single NLS may be present in more than one copy and/or in combination with one or more other NLSs present in one or more copies. In some embodiments, an NLS is considered near the N- or C-terminus when the nearest amino acid of the NLS is within about 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, or more amino acids along the polypeptide chain from the N- or C-terminus. In preferred embodiments of the CRISPR-Cas proteins, an NLS attached to the C-terminal of the protein.
[0246] In certain embodiments, the CRISPR-Cas protein and the deaminase protein are delivered to the cell or expressed within the cell as separate proteins. In these embodiments, each of the CRISPR-Cas and deaminase protein can be provided with one or more NLSs as described herein. In certain embodiments, the CRISPR-Cas and deaminase proteins are delivered to the cell or expressed with the cell as a fusion protein. In these embodiments one or both of the CRISPR- Cas and deaminase protein is provided with one or more NLSs. Where the nucleotide deaminase is fused to an adaptor protein (such as MS2) as described above, the one or more NLS can be provided on the adaptor protein, provided that this does not interfere with aptamer binding. In particular embodiments, the one or more NLS sequences may also function as linker sequences between the nucleotide deaminase and the CRISPR-Cas protein.
[0247] In certain embodiments, guides of the disclosure comprise specific binding sites (e.g. aptamers) for adapter proteins, which may be linked to or fused to an nucleotide deaminase or catalytic domain thereof. When such a guide forms a CRISPR complex (e.g., CRISPR-Cas protein binding to guide and target) the adapter proteins bind and, the nucleotide deaminase or catalytic domain thereof associated with the adapter protein is positioned in a spatial orientation which is advantageous for the attributed function to be effective. [0248] The skilled person will understand that modifications to the guide which allow for binding of the adapter + nucleotide deaminase, but not proper positioning of the adapter + nucleotide deaminase (e.g., due to steric hindrance within the three-dimensional structure of the CRISPR complex) are modifications which are not intended. The one or more modified guide may be modified at the tetra loop, the stem loop 1, stem loop 2, or stem loop 3, as described herein, preferably at either the tetra loop or stem loop 2, and in some cases at both the tetra loop and stem loop 2.
[0249] In some embodiments, a component (e.g., the dead Cas protein, the nucleotide deaminase protein or catalytic domain thereof, or a combination thereof) in the systems may comprise one or more nuclear export signals (NES), one or more nuclear localization signals (NLS), or any combinations thereof. In some cases, the NES may be an HIV Rev NES. In certain cases, the NES may be MAPK NES. When the component is a protein, the NES or NLS may be at the C terminus of component. Alternatively, or additionally, the NES or NLS may be at the N terminus of component. In some examples, the Cas protein and optionally said nucleotide deaminase protein or catalytic domain thereof comprise one or more heterologous nuclear export signal(s) (NES(s)) or nuclear localization signal(s) (NLS(s)), preferably an HIV Rev NES or MAPK NES, preferably C-terminal.
Templates
[0250] In some embodiments, the composition for engineering cells comprises a template, e.g., a recombination template. A template may be a component of another vector as described herein, contained in a separate vector, or provided as a separate polynucleotide. In some embodiments, a recombination template is designed to serve as a template in homologous recombination, such as within or near a target sequence nicked or cleaved by a nucleic acid-targeting effector protein as a part of a nucleic acid-targeting complex.
[0251] In an embodiment, the template nucleic acid alters the sequence of the target position. In an embodiment, the template nucleic acid results in the incorporation of a modified, or non- naturally occurring base into the target nucleic acid.
[0252] The template sequence may undergo a breakage mediated or catalyzed recombination with the target sequence. In an embodiment, the template nucleic acid may include sequence that corresponds to a site on the target sequence that is cleaved by a Cas protein mediated cleavage event. In an embodiment, the template nucleic acid may include sequence that corresponds to both, a first site on the target sequence that is cleaved in a first Cas protein mediated event, and a second site on the target sequence that is cleaved in a second Cas protein mediated event.
[0253] In certain embodiments, the template nucleic acid can include sequence which results in an alteration in the coding sequence of a translated sequence, e.g., one which results in the substitution of one amino acid for another in a protein product, e.g., transforming a mutant allele into a wild type allele, transforming a wild type allele into a mutant allele, and/or introducing a stop codon, insertion of an amino acid residue, deletion of an amino acid residue, or a nonsense mutation. In certain embodiments, the template nucleic acid can include sequence which results in an alteration in a non-coding sequence, e.g., an alteration in an exon or in a 5' or 3' non-translated or non-transcribed region. Such alterations include an alteration in a control element, e.g., a promoter, enhancer, and an alteration in a cis-acting or trans-acting control element.
[0254] A template nucleic acid having homology with a target position in a target gene may be used to alter the structure of a target sequence. The template sequence may be used to alter an unwanted structure, e.g., an unwanted or mutant nucleotide. The template nucleic acid may include sequence which, when integrated, results in: decreasing the activity of a positive control element; increasing the activity of a positive control element; decreasing the activity of a negative control element; increasing the activity of a negative control element; decreasing the expression of a gene; increasing the expression of a gene; increasing resistance to a disorder or disease; increasing resistance to viral entry; correcting a mutation or altering an unwanted amino acid residue conferring, increasing, abolishing or decreasing a biological property of a gene product, e.g., increasing the enzymatic activity of an enzyme, or increasing the ability of a gene product to interact with another molecule.
[0255] The template nucleic acid may include sequence which results in: a change in sequence of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12 or more nucleotides of the target sequence.
[0256] A template polynucleotide may be of any suitable length, such as about or more than about 10, 15, 20, 25, 50, 75, 100, 150, 200, 500, 1000, or more nucleotides in length. In an embodiment, the template nucleic acid may be 20+/- 10, 30+/- 10, 40+/- 10, 50+/- 10, 60+/- 10, 70+/- 10, 80+/- 10, 90+/- 10, 100+/- 10, 1 10+/- 10, 120+/- 10, 130+/- 10, 140+/- 10, 150+/- 10, 160+/- 10, 170+/- 10, 1 80+/- 10, 190+/- 10, 200+/- 10, 210+/- 10, of 220+/- 10 nucleotides in length. In an embodiment, the template nucleic acid may be 30+/-20, 40+/-20, 50+/-20, 60+/-20, 70+/- 20, 80+/-20, 90+/-20, 100+/-20, 1 10+/-20, 120+/-20, 130+/-20, 140+/-20, 150+/-20, 160+/- 20, 170+/-20, 180+/-20, 190+/-20, 200+/-20, 210+/-20, of 220+/-20 nucleotides in length. In an embodiment, the template nucleic acid is 10 to 1 ,000, 20 to 900, 30 to 800, 40 to 700, 50 to 600, 50 to 500, 50 to 400, 50 to300, 50 to 200, or 50 to 100 nucleotides in length.
[0257] In some embodiments, the template polynucleotide is complementary to a portion of a polynucleotide comprising the target sequence. When optimally aligned, a template polynucleotide might overlap with one or more nucleotides of a target sequences (e.g., about or more than about 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 or more nucleotides). In some embodiments, when a template sequence and a polynucleotide comprising a target sequence are optimally aligned, the nearest nucleotide of the template polynucleotide is within about 1, 5, 10, 15, 20, 25, 50, 75, 100, 200, 300, 400, 500, 1000, 5000, 10000, or more nucleotides from the target sequence.
[0258] The exogenous polynucleotide template comprises a sequence to be integrated (e.g., a mutated gene). The sequence for integration may be a sequence endogenous or exogenous to the cell. Examples of a sequence to be integrated include polynucleotides encoding a protein or a noncoding RNA (e.g., a microRNA). Thus, the sequence for integration may be operably linked to an appropriate control sequence or sequences. Alternatively, the sequence to be integrated may provide a regulatory function.
[0259] An upstream or downstream sequence may comprise from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or 2500 bp. In some methods, the exemplary upstream or downstream sequence have about 200 bp to about 2000 bp, about 600 bp to about 1000 bp, or more particularly about 700 bp to about 1000.
[0260] An upstream or downstream sequence may comprise from about 20 bp to about 2500 bp, for example, about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, or 2500 bp. In some methods, the exemplary upstream or downstream sequence have about 200 bp to about 2000 bp, about 600 bp to about 1000 bp, or more particularly about 700 bp to about 1000 [0261] In certain embodiments, one or both homology arms may be shortened to avoid including certain sequence repeat elements. For example, a 5' homology arm may be shortened to avoid a sequence repeat element. In other embodiments, a 3' homology arm may be shortened to avoid a sequence repeat element. In some embodiments, both the 5' and the 3' homology arms may be shortened to avoid including certain sequence repeat elements.
[0262] In some methods, the exogenous polynucleotide template may further comprise a marker. Such a marker may make it easy to screen for targeted integrations. Examples of suitable markers include restriction sites, fluorescent proteins, or selectable markers. The exogenous polynucleotide template of the disclosure can be constructed using recombinant techniques (see, for example, Sambrook et al., 2001 and Ausubel et al., 1996).
[0263] In certain embodiments, a template nucleic acid for correcting a mutation may be designed for use as a single-stranded oligonucleotide. When using a single-stranded oligonucleotide, 5' and 3' homology arms may range up to about 200 base pairs (bp) in length, e.g., at least 25, 50, 75, 100, 125, 150, 175, or 200 bp in length.
[0264] In certain embodiments, a template nucleic acid for correcting a mutation may be designed for use with a homology-independent targeted integration system. Suzuki et al. describe in vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration (2016, Nature 540:144-149). Schmid-Burgk, et al. describe use of the CRISPR-Cas9 system to introduce a double-strand break (DSB) at a user-defined genomic location and insertion of a universal donor DNA (Nat Commun. 2016 Jul 28;7: 12338). Gao, et al. describe “Plug-and-Play Protein Modification Using Homology-Independent Universal Genome Engineering” (Neuron. 2019 Aug 21;103(4):583-597).
RNAi
[0265] In some embodiments, the genetic modulating agents may be interfering RNAs. In certain embodiments, diseases caused by a dominant mutation in a gene is targeted by silencing the mutated gene using RNAi. In some cases, the nucleotide sequence may comprise coding sequence for one or more interfering RNAs. In certain examples, the nucleotide sequence may be interfering RNA (RNAi). As used herein, the term “RNAi” refers to any type of interfering RNA, including but not limited to, siRNAi, shRNAi, endogenous microRNA and artificial microRNA. For instance, it includes sequences previously identified as siRNA, regardless of the mechanism of down-stream processing of the RNA (i.e., although siRNAs are believed to have a specific method of in vivo processing resulting in the cleavage of mRNA, such sequences can be incorporated into the vectors in the context of the flanking sequences described herein). The term “RNAi” can include both gene silencing RNAi molecules, and also RNAi effector molecules which activate the expression of a gene.
[0266] In certain embodiments, a modulating agent may comprise silencing one or more endogenous genes. As used herein, “gene silencing” or “gene silenced” in reference to an activity of an RNAi molecule, for example a siRNA or miRNA refers to a decrease in the mRNA level in a cell for a target gene by at least about 5%, about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 95%, about 99%, about 100% of the mRNA level found in the cell without the presence of the miRNA or RNA interference molecule. In one preferred embodiment, the mRNA levels are decreased by at least about 70%, about 80%, about 90%, about 95%, about 99%, about 100%.
[0267] As used herein, a “siRNA” refers to a nucleic acid that forms a double stranded RNA, which double stranded RNA has the ability to reduce or inhibit expression of a gene or target gene when the siRNA is present or expressed in the same cell as the target gene. The double stranded RNA siRNA can be formed by the complementary strands. In one embodiment, a siRNA refers to a nucleic acid that can form a double stranded siRNA. The sequence of the siRNA can correspond to the full-length target gene, or a subsequence thereof. Typically, the siRNA is at least about 15- 50 nucleotides in length (e.g., each complementary sequence of the double stranded siRNA is about 15-50 nucleotides in length, and the double stranded siRNA is about 15-50 base pairs in length, preferably about 19-30 base nucleotides, preferably about 20-25 nucleotides in length, e.g., 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides in length).
[0268] As used herein “shRNA” or “small hairpin RNA” (also called stem loop) is a type of siRNA. In one embodiment, these shRNAs are composed of a short, e.g., about 19 to about 25 nucleotide, antisense strand, followed by a nucleotide loop of about 5 to about 9 nucleotides, and the analogous sense strand. Alternatively, the sense strand can precede the nucleotide loop structure and the antisense strand can follow.
[0269] The terms “microRNA” or “miRNA”, used interchangeably herein, are endogenous RNAs, some of which are known to regulate the expression of protein-coding genes at the posttranscriptional level. Endogenous microRNAs are small RNAs naturally present in the genome that are capable of modulating the productive utilization of mRNA. The term artificial microRNA includes any type of RNA sequence, other than endogenous microRNA, which is capable of modulating the productive utilization of mRNA. MicroRNA sequences have been described in publications such as Lim, et al., Genes & Development, 17, p. 991 - 1008 (2003), Lim et al Science 299, 1540 (2003), Lee and Ambros Science, 294, 862 (2001), Lau et al., Science 294, 858-861 (2001), Lagos-Quintana et al, Current Biology, 12, 735-739 (2002), Lagos Quintana et al, Science 294, 853- 857 (2001), and Lagos-Quintana et al, RNA, 9, 175- 179 (2003), which are incorporated by reference. Multiple microRNAs can also be incorporated into a precursor molecule. Furthermore, miRNA-like stem-loops can be expressed in cells as a vehicle to deliver artificial miRNAs and short interfering RNAs (siRNAs) for the purpose of modulating the expression of endogenous genes through the miRNA and or RNAi pathways.
[0270] As used herein, “double stranded RNA” or “dsRNA” refers to RNA molecules that are comprised of two strands. Double-stranded molecules include those comprised of a single RNA molecule that doubles back on itself to form a two-stranded structure. For example, the stem loop structure of the progenitor molecules from which the single-stranded miRNA is derived, called the pre-miRNA (Bartel et al. 2004. Cell 1 16:281 -297), comprises a dsRNA molecule.
[0271]
SCREENING METHODS
Identifying Novel and Improved Treatments
[0272] In certain embodiments, the cell subset frequency and/or differential cell states (e.g., intrinsic immune response) can be detected for screening of novel therapeutic agents. In certain embodiments, the present invention can be used to identify improved treatments by monitoring the identified cell states in a subject undergoing an experimental treatment. In certain embodiments, an organoid system is used to detect shifts in the identified cell states to identify agents capable of shifting a subject from a severe disease state to a mild/moderate state (see, e.g., Yin X, Mead BE, Safaee H, Langer R, Karp JM, Levy O. Engineering Stem Cell Organoids. Cell Stem Cell. 2016; 18(l):25-38). As used herein, the term “organoid” or “epithelial organoid” refers to a cell cluster or aggregate that resembles an organ, or part of an organ, and possesses cell types relevant to that particular organ. Organoid systems have been described previously, for example, for brain, retinal, stomach, lung, thyroid, small intestine, colon, liver, kidney, pancreas, prostate, mammary gland, fallopian tube, taste buds, salivary glands, and esophagus (see, e.g., Clevers, Modeling Development and Disease with Organoids, Cell. 2016 Jun 16;165(7): 1586-1597). In certain embodiments, a tissue system or tissue explant is used to detect shifts in the identified cell states to identify agents capable of shifting a subject from a severe disease state to a mild/moderate state (see, e.g., Grivel JC, Margolis L. Use of human tissue explants to study human infectious agents. Nat Protoc. 2009;4(2):256-269). In certain embodiments, an animal model is used to detect shifts in the identified cell states to identify agents capable of shifting a subject from a severe disease state to a mild/moderate state (see, e.g., Munoz-Fontela C, Dowling WE, Funnell SGP, et al. Animal models for COVID-19. Nature. 2020;586(7830):509-515).
[0273] In certain embodiments, candidate agents are screened. The term “agent” broadly encompasses any condition, substance or agent capable of modulating one or more phenotypic aspects of a cell or cell population as disclosed herein. Such conditions, substances or agents may be of physical, chemical, biochemical and/or biological nature. The term “candidate agent” refers to any condition, substance or agent that is being examined for the ability to modulate one or more phenotypic aspects of a cell or cell population as disclosed herein in a method comprising applying the candidate agent to the cell or cell population (e.g., exposing the cell or cell population to the candidate agent or contacting the cell or cell population with the candidate agent) and observing whether the desired modulation takes place.
[0274] Agents may include any potential class of biologically active conditions, substances or agents, such as for instance antibodies, proteins, peptides, nucleic acids, oligonucleotides, small molecules, or combinations thereof, as described herein.
[0275] The terms “therapeutic agent”, “therapeutic capable agent” or “treatment agent” are used interchangeably and refer to a molecule or compound that confers some beneficial effect upon administration to a subject. The beneficial effect includes enablement of diagnostic determinations; amelioration of a disease, symptom, disorder, or pathological condition; reducing or preventing the onset of a disease, symptom, disorder or condition; and generally counteracting a disease, symptom, disorder or pathological condition.
[0276] In certain embodiments, the present invention provides for gene signature screening to identify agents that shift expression of the gene targets described herein (e.g., cell subset markers and differentially expressed genes). The concept of signature screening was introduced by Stegmaier et al. (Gene expression-based high-throughput screening (GE-HTS) and application to leukemia differentiation. Nature Genet. 36, 257-263 (2004)), who realized that if a gene- expression signature was the proxy for a phenotype of interest, it could be used to find small molecules that effect that phenotype without knowledge of a validated drug target. The gene signatures or biological programs of the present invention may be used to screen for drugs that reduce the signature or biological program in cells as described herein.
[0277] The Connectivity Map (cmap) is a collection of genome-wide transcriptional expression data from cultured human cells treated with bioactive small molecules and simple pattern-matching algorithms that together enable the discovery of functional connections between drugs, genes and diseases through the transitory feature of common gene-expression changes (see, Lamb et al., The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease. Science 29 Sep 2006: Vol. 313, Issue 5795, pp. 1929-1935, DOI: 10.1126/science.1132939; and Lamb, T, The Connectivity Map: a new tool for biomedical research. Nature Reviews Cancer January 2007: Vol. 7, pp. 54-60). In certain embodiments, Cmap can be used to identify small molecules capable of modulating a gene signature or biological program of the present invention in silico.
[0278] Further embodiments are illustrated in the following Examples which are given for illustrative purposes only and are not intended to limit the scope of the invention.
EXAMPLES
Example 1 - Defining Cellular Diversity in the Human Nasopharynx (Nasopharyngeal Mucosa)
[0279] Here, Applicants present a comprehensive analysis of the cellular phenotypes in the nasal mucosa during early SARS-CoV-2 infection. To achieve this, Applicants developed tissue handling protocols that enabled high-quality scRNA-seq from frozen nasopharyngeal swabs collected from a large patient cohort (n = 58) and created a detailed map of epithelial and immune cell diversity. Applicants found that SARS-CoV-2 infection leads to a dramatic loss of mature ciliated cells, which is associated with secretory cell expansion, differentiation, and the accumulation of deuterosomal cell intermediates - potentially involved in the compensatory repopulation of damaged ciliated epithelium. While Applicants observe broad induction of interferon-responsive and anti-viral genes in cells from individuals with mild/moderate COVID- 19, severe COVID-19 is characterized by a dramatically blunted interferon response, and mucosal recruitment of highly inflammatory myeloid populations, which represent the primary sources of tissue pro-inflammatory cytokines including TNF, IL1B, and CXCL8. Further, using unbiased whole-transcriptomic amplification, Applicants map not only host cellular RNA, but also cell- associated SARS-CoV-2 RNA, allowing us to trace viral tropism to specific epithelial subsets and identify host pathways linked with susceptibility or resistance to viral infection. Together, the data suggest that an early failure of intrinsic anti -viral immunity among nasal epithelial cells responding to SARS-CoV-2 infection may underlie and predict progression to severe COVID-19.
[0280] Nasopharyngeal (NP) swabs were collected from 58 individuals from the University of Mississippi Medical Center (UMMC) between April and September 2020. This cohort consisted of 35 individuals who had a positive SARS-CoV-2 PCR NP swab on the day of hospital presentation. A Control group consisted of 15 individuals who were asymptomatic and had a negative SARS-CoV-2 NP PCR, 6 intubated individuals in the intensive care unit without a recent history of COVID-19 and negative SARS-CoV-2 NP PCR, and 2 additional individuals with recent history of COVID-19 and negative SARS-CoV-2 NP PCR, classified as “Convalescent” (Table 6, see Methods for full inclusion and exclusion criteria). 38 individuals were diagnosed with COVID- 19, and nasopharyngeal swabs were collected within the first 3 days following admission to the hospital. Using the World Health Organization (WHO) guidelines for stratification and classification of COVID-19 severity based on the level of required respiratory support, 16 of the individuals were considered COVID-19 mild/moderate (WHO score 1-5) and 22 had severe COVID-19 (WHO score 6-8) (see Methods, Table 6, Figures 7A, 7B for complete demographic and clinical information). Patient groups by WHO score reflects the peak disease severity, rather than the severity at the moment samples were collected. Applicants grouped individuals with COVID-19 based on the maximum (“peak”) level of required respiratory support (World Health Organization, 2020). Samples from the nasopharyngeal epithelium were taken by a trained healthcare provider and rapidly processed and cryopreserved to maintain cellular viability (Figure 1A, Figure 7C). Swabs were later processed to recover single-cell suspensions (mean +/- SEM: 57,000 +/- 15,000 total cells recovered per swab), before generating single-cell transcriptomes using the Seq-Well S3 44-46.
[0281] Among all COVID-19 and Control samples, Applicants recovered 32,871 genes across 32,588 cells (following filtering and quality control), with an average recovery of 562 +/- 69 cells per swab (mean +/- SEM). Among recovered cells, Applicants found roughly equivalent transcriptomic quality following uniform preprocessing steps and filtering (see Methods) between COVID-19 and Control participants, despite high variability in cellular recovery and quality of recovered cells between participants (Figures 7D, 7E). Following dimensionality reduction and clustering approaches to resolve individual cell types and cell states, Applicants annotated 18 clusters corresponding to distinct cell types across immune and epithelial identities (Figure 1B-E, Table 1). As tissue sampling relied on surface-resident cells that were gently scraped off of the nasopharyngeal epithelium, Applicants did not expect to recover stromal cell populations such as endothelial cells, fibroblasts, or pericytes, which were found in previous scRNA-seq datasets from nasal epithelial surgical samples47,48. Among epithelial cell types, Applicants readily identified both Basal Cells by their expression of canonical marker genes including TP63, KRT15, KRT5 , as well as Mitotic Basal Cells based on the added expression of genes involved in the cell cycle such asMKI67, and TOP2A (Figure 1F). Applicants resolved large populations of both Secretory Cells and Goblet cells, identified by expression of KRT7 , CXCL17, F3, AQP5, and CP. Despite strong transcriptional similarity between Secretory and Goblet cells, Applicants distinguished between both cell types based on expression of MUC5AC , which defines Goblet Cells, and BPIFA1, which Applicants found primarily expressed within Secretory Cell types and diminished in MUC5AC high cells. Applicants also designated a small population of cells Developing Secretory and Goblet Cells based on their lower expression of classic Secretory/Goblet Cell genes, as well as persistent expression of some Basal Cell markers (e.g., persistent COL7A1 and DST expression, but diminishing KRT5/KRT15 expression). Applicants also distinguished between goblet and secretory cells based on expression of MUC5AC-expressing goblet, and BPIFA1-expressing secretory cells. Applicants also resolved a population of ionocytes, a recently-identified specialized subtype of secretory cell present in respiratory epithelia defined by expression of transcription factors FOXI1 and FOXI2 , as well as CTFR - thus thought to play a role in mucous viscosity49,50. Squamous cells were identified by their expression of SCEL , as well as multiple SPPP- genes, and likely derive from pharyngeal/oral squamous cells as well those within the nasal epithelium. Applicants also recovered a very small population of cells Applicants term “Enteroendocrine Cells”, based on unique expression of gastric inhibitory polypeptide ( GIP ), which is typically produced by intestinal and gastric enteroendocrine cells and LGR5 , which classically marks stem cell populations in the gastrointestinal mucosa.
[0282] Ciliated cells were the most numerous epithelial cell type recovered in this dataset, defined by expression of transcription factor FOXJ1 as well as numerous genes involved in the formation of cilia, e.g., DLEC1, DNAH11, and CFAP43. Similar to intermediate/developing cells of the secretory and goblet lineage, Applicants also identified two populations of precursor ciliated cells. One, termed Developing Ciliated Cells, which expressed canonical Ciliated Cell genes such as FOXJ1 , CAPSL, and FIFO, however lower than mature Ciliated Cells and without the expression of cilia-forming genes. Applicants also identified a cluster defined by expression of DEUP1, which is critical for centriole amplification as a precursor to cilium assembly. Together with co-expression of CCNO, CDC20, FOXN4, and HES6 , these cells match a recently-defined cell type termed Deuterosomal Cells48, which represent an intermediate cell type in which Secretory cells trans-differentiate into Ciliated Cells.
[0283] Immune cells represent a minority of recovered cells, yet Applicants resolved multiple distinct clusters and cell types, representing major myeloid and lymphoid populations. Among lymphoid cells, Applicants recovered T cells, identified by CD3E, CD2, TRBC2 expression, and B cells, identified by MS4A1, CD79A, CD79B expression. Among myeloid cell types, Applicants recovered a large population of Macrophages ( CD14 , FCGR3A, VCAN ), Dendritic Cells ( CCR7 , CD86 ), and Plasmacytoid DCs ( IRF7 , IL3RA ). Relative to true tissue-resident abundances, Applicants under-recovered granulocyte populations, likely due to the intrinsic fragility of these cell types and the cryopreservation methods required in the sample pipeline. Applicants recovered a very small population of Mast Cells, defined by expression of GATA2, TPSB2, and PTGS2. Among two samples, Applicants recovered Erythroblast-like cells, defined by expression of hemoglobin subunits including HBB and HBA2. With the exception of Erythroblasts, each cell type was represented by cells from numerous participants, and from each participant Applicants recovered a diversity of cell types and states, though the cellular composition was highly variable between distinct individuals (Figure 1G, 1H). [0284] Applicants directly tested whether cell types collected from nasal swabs following cryopreservation were representative of cellular composition extracted from a freshly swabbed nasal epithelium, or if certain cell types were lost during freezing (Figure 7F-7K). Recovery of viable cells, technical metrics of single-cell library quality, and cellular proportions after clustering and analysis were all largely stable between matched fresh and cryopreserved swabs taken from the same individual. Importantly, no “new” cell types were recovered from the freshly processed samples (from healthy participants), thus supporting adequate data representation of the nasal mucosa even following on-swab cryopreservation.
[0285] Applicants interrogated each cell type for their expression of host factors utilized by common respiratory viruses for cellular entry (Figure 1I) 35,51-55. Applicants found ACE2 expression highest among Secretory Cells and Goblet Cells, and to a lesser extent on Ciliated Cells, Developing Ciliated Cells, Deuterosomal Cells, and Squamous Cells - suggesting these cells are likely targets for SARS-CoV-2 (and other beta coronaviruses that use ACE2 as their primary cellular entry factor). SARS-CoV-2 spike protein requires “priming” or cleavage by host proteases to enable membrane fusion and viral release into the cell, since early 2020, researchers have identified TMPRSS2, TMPRSS4, CTSL, and FURIN as capable of spike protein cleavage and critical for viral entry51. TMPRSS2, thought to be the principal host factor for SARS-CoV-2 S cleavage, is found in highest abundance on Squamous Cells, followed by modest expression on all other epithelial cell types. Similarly, CTSL (and other cathepsins) was found across diverse epithelial and myeloid cell types. ANPEP and DPP4 , host receptors targeted by other Human coronaviruses causing upper respiratory diseases, are found primarily on Goblet Cells and Secretory Cells. As expected, CDHR3 , the receptor utilized by Rhinovirus C, is found primarily on Ciliated Cells and Developing Ciliated Cells.
[0286] Next, Applicants binned both Control and COVID-19 participants by their level of respiratory support according to the WHO scoring system: Control WHO 0 (comprising healthy SARS-CoV-2 PCR negative participants, n=15), Control WHO 7-8 (SARS-CoV-2 PCR negative, incubated participants treated in the ICU for non-COVID-19 diagnoses, n=6), COVID-19 WHO 1-5 (SARS-CoV-2 PCR positive, mild/moderate disease, n=14), and COVID-19 WHO 6-8 (SARS-CoV-2 PCR positive, intubated, severe disease, n=21). Applicants compared proportional cell type abundances from the coarse cell type annotations across these four disease cohorts (Figure 1 J-1N). Applicants found that the abundance of Ciliated Cells (all, coarse annotation) was significantly impacted by cohort (Bonferroni-corrected p = 0.025) and were significantly reduced among COVID-19 WHO 6-8 participants compared to healthy controls (mean +/- SEM 17.1 +/- 3.6 % of COVID-19 WHO 6-8 samples were Ciliated Cells, compared to 46.7 +/- 7.4 % of Control WHO 0, p < 0.01) (Figure 1N). Deuterosomal cells, which represent a developmental intermediate as secretory/goblet cells trans-differentiate into ciliated cells, were significantly increased among Control WHO 7-8, COVID-19 WHO 1-5, and COVID-19 WHO 6-8 samples, with the strongest increases observed from participants with severe COVID-19 compared to healthy controls (Figure 1L). Likewise, Developing Ciliated Cells were significantly increased among participants with severe COVID-19 (Figure 1M). Secretory cells were also dramatically increased among all COVID-19 participants compared to non-COVID-19 controls, with 20.4 +/- 5.0% (mean +/- SEM) of all epithelial cells were Secretory Cells within severe COVID-19 participants, while mild/moderate COVID-19 participants contained 8.3 +/- 2.8% Secretory Cells, and on average, fewer than 4% of cells per participant were Secretory among either Control WHO 0 and Control WHO 7-8 samples (Figure 1K). Goblet Cells, however, did not reach significance but were substantially increased in a subset of participants from the COVID-19 mild/moderate and severe groups (Figure 1J). Intriguingly, expansion of secretory cells and loss of ciliated cells resulted in a net gain in epithelial diversity, calculated by Simpson's index which calculates the richness of the epithelial “ecosystem” (Figure 1O).
Example 2 - Epithelial Diversity and Remodeling Following SARS-CoV-2 Infection [0287] Next, Applicants sought to more completely delineate the diversity of epithelial cells through iterative clustering and sub-clustering among epithelial cell types (see Methods). This enabled Applicants to divide the 10 “Coarse” epithelial cell types into 25 “Detailed” cell types/states (Figure 2A-2E, Figure 8A, Table 1). Among some cell types, Applicants did not find additional within-type diversity, and thus the “Coarse” annotations (Figure 2A) are equivalent to the “Detailed” identities (Figure 2D). This applied to Ionocytes, Deuterosomal Cells, Developing Secretory and Goblet Cells, Basal Cells, Mitotic Basal Cells, and Developing Ciliated Cells. Applicants split Goblet Cells (Coarse annotation) into 4 distinct Detailed annotations: MUC5AC high Goblet Cells, which lacked additional specialized markers beyond classic Goblet Cell identifiers, SCGB1A1 high Goblet Cells, AZGP1 high Goblet Cells, and AZGP1 SCGB3A1 LTF high Goblet Cells (each named by a representative defining marker or marker set). Secretory Cells were divided into 6 distinct Detailed subtypes: SERPINB11 high Secretory Cells (which, similar to MUC5AC high Goblet Cells, represented a more “generic” Secretory Cell phenotype), BPIFA1 high Secretory Cells, Early Response Secretory Cells (which expressed genes such as JUN, EGR1, FOS, NR4A1 ), KRT24 KRT13 high Secretory Cells (which are highly similar to previously- described KRT13+ “hillock” cells), BPIFA1 and Chemokine high Secretory Cells (example chemokines include CXCL8, CXCL2, CXCL1, and CXCL3), and Interferon Responsive Secretory Cells (defined by higher expression of broad anti -viral genes including IFITM3, IFI6, and MX I). Subsets of Squamous Cells were also found - detailed Squamous Cell subtypes include CCL5 high Squamous Cells, VEGFA high Squamous cells (which express multiple vascular endothelial genes including VEGFA and VWF), SPRR2D high Squamous Cells (which, in addition to SPRR2D, express the highest abundances of multiple SPRR- genes including SPRR2A, SPRR1B, SPRR2E, and SPRR3 ), and HOPX high Squamous Cells. Finally, Ciliated Cells could be further divided into 5 distinct subtypes: Interferon Responsive Ciliated Cells (expressing anti-viral genes similar to other “Interferon Responsive” subsets, such as IFIT1, IFIT3, IFI6 ), FOXJ1 high Ciliated Cells, Early Response FOXJ1 high Ciliated Cells (which, in addition to high FOXJ1 , also express higher abundances of genes such as JUN, EGR1 , FOS than other ciliated cell subtypes), Cilia high Ciliated Cells (which broadly express the highest abundances of structural cilia genes, such as DLEC1 and CFAPIOO), and BEST4 high Cilia high Ciliated Cells (in addition to cilia components, also express the ion channel BEST4 ).
[0288] Here, Applicants again examined the epithelial subtypes for their expression of host entry factors which facilitate viral entry among common upper respiratory pathogens (Figure 8B). ACE2 was previously identified as highest among Secretory, Goblet, and Ciliated Cells35 36 - here Applicants observe substantial within-cell type heterogeneity in ACE2 expression among each of these cell types. Notably, among Goblet cells, AZGP1 high Goblet Cells express the highest abundance of ACE2 mRNA, suggesting this cell type may be a preferential target for SARS-CoV- 2 infection. Likewise, Early Response Secretory Cells, KRT24 KRT13 high Secretory Cells, and Interferon Responsive Secretory cells, all express elevated abundances of ACE2 , and many other Secretory and Goblet Cell types express detectable ACE2 , but lower levels. Similarly, multiple detailed subsets of Ciliated Cells expressed ACE2, however Cilia high and BEST4 high Cilia high Ciliated Cells notably did not appear to be actively transcribing ACE2 mRNA.
[0289] To map the differentiation trajectories and lineage relationships between epithelial cell types, Applicants analyzed single-cell RNA velocity (scVelo) across all epithelial cells. RNA velocity analysis leverages the dynamic relationships between expression of unspliced (intron- containing) and spliced (exonic) RNA across thousands of variable genes, enabling 1) estimation of the directionality of transitions between distinct cells and cell types, and 2) identification of putative driver genes behind these transitions. Overlaying the UMAPs of cell type identities and associated metadata in Figures 2A-2D, vector fields (black lines and arrows) represent a smoothed estimate of cellular transitions based on RNA velocity. Globally, RNA velocity appropriately places Basal Cells and Mitotic Basal Cells as the “root” or “origin” of cellular transitions, which then progress through the Developing Secretory and Goblet Cells to the Secretory Cells and Goblet Cells. Applicants hypothesize that the squamous cells recovered in this dataset arise from a distinct set of basal cells present in oral/upper esophageal mucosa, therefore their differentiation intermediates and trajectory are poorly represented here. Likewise, Applicants do not recover intermediate cell types for Ionocytes, so cannot trace their development from basal cells. Developing Ciliated Cells and Ciliated Cells are placed “later” in the differentiation trajectory, distal to development of both Secretory and Deuterosomal Cells, which is consistent with current models where ciliated cells represent a terminally differentiated state and may arise from these precursor cell types. By visualizing spliced and unspliced forms of representative markers underlying ciliated cell development, Applicants can visualize the transition from precursor Secretory Cell to Deuterosomal Cells to Developing Ciliated Cells, and finally mature Ciliated Cells differentiation (Figure 8C).
[0290] Applicants next mapped and visualized developmental transitions and relationships between Basal, Goblet, and Secretory cell subtypes from the detailed cluster annotations (Figure 2F-2I). As observed when considering all epithelial cells (Figure 2A), Basal Cells and Mitotic Basal Cells were accurately predicted to represent the “root” of this differentiation trajectory. From here, TP 63, KRT5 and LGR6 expression gradually decline across Basal and Developing Secretory and Goblet Cells, while expression of Secretory and Goblet Cell specific markers such as KRT7 and AQP5 progressively increase. The transition from Basal to Secretory and Goblet cell types through Developing Secretory and Goblet Cells is marked by transient upregulation of FGFR3 and progressive downregulation of EGFR. Notably, transitions between detailed Secretory and detailed Goblet cells are substantially less linear than among the coarse cell types or as seen in ciliated cells. RNA velocity curves predict multiple routes for development between distinct subtypes. This observation is consistent with the current understanding of respiratory secretory cell plasticity and capacity for de-differentiation.
[0291] Ciliated Cell subtypes were analyzed by their RNA velocity and pseudotemporal ordering in the same manner. Here, a focused UMAP with only Developing Ciliated Cells and Ciliated Cells is presented and overlaid with vector fields representing RNA velocity transitions (Figures 2J-2M). The velocity pseudotime predicts progression from Developing Ciliated Cells, to FOXJ1 high Ciliated Cells, to BEST4 high Cilia high Ciliated Cells, and terminating in Cilia high Ciliated Cells. (Figure 2M). Interferon Responsive Ciliated Cells and Early Response FOXJ1 high Ciliated Cells represent phenotypic deviations from this ordered progression, and therefore appear collapsed/unresolved along this trajectory with the same pseudotime range as FOXJ1 high Ciliated Cells.
[0292] Applicants next connected the composition of the detailed nasal epithelial microenvironment to the disease status of the participant (Figures 2N-2Q). Applicants mapped epithelial cell diversity and differentiation trajectories as before, including either cells from SARS- CoV-2 negative participants (Figure 2P) or cells from SARS-CoV-2 positive participants (Figure 2Q). Notably, cells from Control participants poorly populated the intermediate regions that bridge Secretory and Goblet Cell types to mature Ciliated Cells. Conversely, regions annotated as multiple Secretory Cell subsets and Developing Ciliated Cells were uniquely captured from COVID-19 participants. Dysregulated abundances of mature ciliated cell subsets were also observed, with decreased proportions of both Cilia high and BEST4 high Cilia high Ciliated Cells (representing the most terminally differentiated branches of ciliated cell development) among COVID-19 participants compared to healthy controls (Figure 20). Interferon Responsive Ciliated Cells were substantially increased among COVID-19 participants - averaging 15.9% of all epithelial cells among mild/moderate COVID-19 participants, compared to fewer than 1% among healthy controls. Among Secretory cell subtypes, BPIFA1 high Secretory cells were significantly elevated among participants with severe COVID-19, as were KRT13 KRT24 high Secretory Cells (Figure 2N). Goblet Cells, Ionocytes, and Squamous Cells were largely unchanged by cohort, however SCGB1A1 high Goblet Cells were modestly increased among both mild/moderate and severe COVID-19 participants (Figure 8D).
[0293] Together, the analysis defines both the cellular diversity among cells collected from nasopharyngeal swabs, as well as the nuanced developmental relationships between epithelial cells of the upper airway. Further, Applicants observe substantial expansion of immature/intermediate and specialized subtypes of secretory, goblet, and ciliated cells during COVID-19, presumably as a result of direct viral targeting and pathology, as well as part of the intrinsic capacity of the nasal epithelium to regenerate and repopulate following damage.
Example 3 - Alterations to Nasal Mucosal Immune Populations in COVID-19 [0294] As with epithelial cells, Applicants further clustered and annotated detailed immune cell populations. Multiple cell types could not be further subdivided from their coarse annotation (Figure IB, Figure 9A-9E), including Mast Cells, Plasmacytoid DCs, B Cells, and Dendritic Cells. Among Macrophages (coarse annotation), Applicants resolved 5 distinct subtypes (Figure 9B). FFAR4 high Macrophages were defined by expression of FFAR4 , MRC1 , CHIT1 , and SIGLEC11 , as well as chemotactic factors including CCL18 , CCL15 , genes involved in leukotriene synthesis ( ALOX5 , ALOX5AP, LTA4H ), and toll-like receptors TLR8 and TLR2 (Table 1, Figure 9F). Interferon Responsive Macrophages were distinguished by elevated expression of anti-viral genes such as IFIT3, IFIT2, ISG15, and MX1, akin to the epithelial subsets labeled “Interferon Responsive”, along with CXCL9, CXCL10, CXCL11 , which are likely indicative of IFNγ stimulation. MSR1 C1QB high Macrophages are defined by cathepsin expression (CTSD, CTRL, CTSB ) and elevated expression of complement ( C1QB , C1QA, C1QC ), and lipid binding proteins (APOE, APOC, and NPC2). The fourth “specialized” subtype of Macrophage Applicants found was termed “Inflammatory Macrophages”, which uniquely expressed inflammatory cytokines such as CCL3, CCL3L1, IL1B, CXCL2, and CXCL3. The remaining “ITGAX high” Macrophages were distinguished from other immune cell types by ITGAX , VCAN, PSAP, FTL, FTH1 and CD163 (though these genes are shared by other specialized macrophages subsets). T cells were largely CD69 and CD8A high, consistent with a T resident memory-like phenotype, and Applicants were not able to resolve a separate cluster of CD4 T cells. Two specialized subtypes of CD8 T Cells were annotated from this dataset: one defined by exceptionally high expression of Early Response genes (FOSB, NR4A2, and CCL5 ), and the other termed Interferon Responsive Cytotoxic CD8 T Cells, defined by granzyme and perforin expression ( GZMB , GZMA, ONLY, PRF1, GZMH ), anti- viral genes (ISG20, IFIT3, APOBEC3C, GBP5 ) and genes associated with effector CD8 T cell function (LAG3, IL2RB, IKZF3, TBX21).
[0295] Among immune cells, Macrophages were markedly increased relative to other immune cell types during severe COVID-19 (Figure 9G, 9H). Multiple specialized myeloid cell types were uniquely detected and enriched among COVID-19 participants, albeit in a subset of participants, and biased to severe COVID-19 cases: ITGAX high Macrophages, FFAR high Macrophages, Inflammatory Macrophages, and Interferon Responsive Macrophages (Figure 9H). Through rare, plasmacytoid DCs and mast cells were only recovered as > 1% of immune cells among COVID- 19 participants. Somewhat surprisingly, T Cells and T Cell subtypes were not dramatically altered between disease cohorts. Finally, Applicants assessed the correlation between distinct cell types across all participants. When samples from all disease cohorts were considered, Applicants found that proportional abundance of Dendritic Cells, Mast Cells, and Macrophages were highly correlated with one another (p < 0.01), likely indicative of the coordinated recruitment of these immune subtypes during inflammation. Among detailed immune cell types, Interferon Responsive Macrophages were highly correlated with Interferon Responsive Cytotoxic CD8 T Cells (p < 0.01), suggesting direct communication between IFNG -expressing tissue resident T cells and CXCL9/10/11 expressing myeloid cells.
[0296] These analyses demonstrate how the epithelial and immune compartments are dramatically altered during COVID-19, likely reflecting both protective anti -viral and regenerative responses, as well as pathologic changes underlying progression to severe disease.
Example 4 - Cellular Behaviors Associated with COVID-19 Disease Trajectory [0297] Thus far, Applicants have characterized how COVID-19 elicits major cell compositional changes within the nasopharyngeal mucosa, including expansion of the secretory cell/deuterosomal cell compartments to repopulate lost mature ciliated cells, and recruitment of highly inflammatory myeloid cells. Next, Applicants examined how each individual cell type responds during COVID-19. Here, Applicants restricted the analysis to pairwise comparisons between Control WHO 0, COVID-19 WHO 1-5 (mild/moderate), and COVID-19 WHO 6-8 (severe), and compared both high-level “Coarse” cell types (Figure 1B, Tables 2-4), and “Detailed” cell subsets (Figures 2A, 1D, Figure 9B). Among all coarse cell types, the largest magnitude transcriptional changes (measured by the number of differentially expressed (DE) genes with FDR < 0.001, and log fold change > 0.25) were observed primarily within the epithelial compartment, most strikingly within Ciliated Cells, Developing Ciliated Cells, Secretory Cells, Goblet Cells, and Ionocytes (Figure 10A). Notably, as Applicants had previously discovered substantial heterogeneity among some of these coarse cell types, namely Secretory and Goblet Cells, it is unsurprising that many of these differentially expressed genes (e.g., between Goblet Cells from Control WHO 0 participants vs. Goblet Cells from COVID-19 WHO 6-8 participants) reflect novel cell subtypes that emerge or dominate during COVID-19 and may partially confound true “cell-type intrinsic” transcriptional responses. Therefore, Applicants similarly compared transcriptomic responses among the detailed cell type annotations between disease cohorts (Figure 3A). Here, the largest transcriptional changes were found among AZGP1 high Goblet Cells, Early Response FOXJ1 high Ciliated Cells, FOXJ1 high Ciliated Cells, Goblet Cells, SERPINB11 high Secretory Cells, Early Response Secretory Cells, and Interferon Responsive Ciliated Cells. Broadly, major differences were observed in the identity of cell types with large transcriptional responses - with mild/moderate COVID-19 driving differences principally in multiple Ciliated Cell subtypes, MUC5AC high Goblet Cells, and Ionocytes, while severe COVID-19 included major perturbations among Basal cells, AZGP1 high Goblet Cells, and various Ciliated Cell types. Finally, when Applicants directly compared mild/moderate to severe COVID-19, multiple cell types showed robust differential gene expression, most drastically among Ciliated Cell subtypes (Interferon Responsive Ciliated Cells, FOXJ1 high Ciliated Cells, Early Response FOXJ1 high Ciliated Cells, Developing Ciliated Cells), Ionocytes, SERPINB11 high Secretory Cells, Early Response Secretory Cells, and AZGP1 high Goblet Cells.
[0298] First, Applicants examined the specific DE genes among Ciliated Cells (all, coarse annotation) between each cohort (Figure 3B, Tables 2-4). Compared to Ciliated Cells from Control WHO 0 participants, cells from both mild/moderate COVID-19 and severe COVID-19 robustly upregulated genes involved in the host response to virus, including IFI27, IFIT1, IFI6, IFITM3, and GBP3, and both cohorts induced expression of MHC-I and MHC-II genes (including HLA-A, HLA-C, HLA-F, HLA-E, HLA-DRB1, HLA-DRA ) and other factors involved in antigen processing and presentation (Figures 10B, IOC). Notably, large sets of interferon-responsive and anti -viral genes were exclusively induced among Ciliated Cells from COVID-19 WHO 1-5 participants when compared to Control WHO 0 participants, and in a direct comparison of Ciliated Cells from mild/moderate COVID-19 to severe COVID-19, the cells from individuals with mild/moderate disease showed strong upregulation of diverse anti-viral factors, including IFI44L, STAT1, IFIIM1, MX1, IFITM3, OAS1, OAS2, OAS3, STAT2, TAPI, HLA-C, ADAR, XAF1, IRF1, CTSS, CTSB , and many others. Ciliated Cells from severe COVID-19 uniquely upregulated 1L5RA and NLRP1 (compared to both control and mild/moderate COVID-19). Together, these differentially expressed gene lists are suggestive of exposure to secreted inflammatory factors and type I/II/III interferons, as well as direct cellular sensing of viral products. Using previously published data from human nasal basal cells treated in vitro with either type I (IFNA) or type II (IFNG) interferon36, Applicants created gene sets that represented the “shared” gene responses to type I and type II interferon, and the cellular responses specific to either type (Figure 3B). Using gene set enrichment analysis, Applicants tested whether the genes that discriminate Ciliated cells from different disease cohorts (e.g., mild/moderate COVID-19 vs. severe COVID-19) imply exposure to specific interferon types. Applicants found that Ciliated cells in mild/moderate COVID-19 robustly induced type I interferon-specific gene signatures, both compared to cells from healthy controls, as well as individuals with severe COVID-19. Conversely, only a few genes were suggestive of a type II response, including induction of MHC-II genes among mild/moderate COVID-19 cases. Further, when compared to healthy individuals, Ciliated cells from individuals with severe COVID-19 did not significantly induce type I or type II interferon responsive genes, potentially underlying poor control of viral spread.
[0299] Applicants next investigated whether these effects were observed among other cell types and subsets. Surprisingly, even among cells defined as “Interferon Responsive” Ciliated Cells, cells from mild/moderate COVID-19 participants expressed higher fold changes of interferon-responsive genes compared to cells from COVID-19 WHO 6-8 participants or Control WHO 0 (Figures 3C, 3D, Tables 2-4). Other detailed epithelial cell types displayed a similar pattern: where broad interferon-responsive genes (largely type I specific) were strongly upregulated among cells from mild/moderate COVID-19 participants, while cells from severe COVID-19 upregulated few shared markers with mild/moderate COVID-19 participants, and instead skewed towards inflammatory genes such as S100A8 and S100A9 instead of anti-viral factors (Figure 3E-3H). In some cases, cells from individuals with severe COVID-19 expressed levels of interferon responsive or anti-viral genes indistinguishable from healthy controls. Strongest induction of type I specific interferon responses among mild/moderate COVID-19 cases was observed in MUC5AC high Goblet Cells, SCGB1A1 high Goblet Cells, Early Response Secretory Cells, Deuterosomal Cells, Interferon Responsive Ciliated Cells, and BEST4 high Cilia high Ciliated Cells (Figure 3G). Rare cell types from severe COVID-19 individuals induced comparable type I interferon responses to their mild/moderate counterparts, including AZGP1 SCGB3A1 LTF high Goblet Cells, Interferon Responsive Secretory Cells, and VEGFA high Squamous Cells. Expression of type II specific genes were globally blunted across all cell types from COVID-19 samples when compared to type I module scores (Figure 3G, Figure 10D). Further, the absence of a transcriptional response to secreted interferon could not be explained by a lack of either interferon alpha receptor (IFNAR1 , IFNAR2 ) or interferon gamma receptor (IFNGR1 , IFNGR2 ) expression. Previous work has identified ACE2 , the host receptor for SARS- CoV-2, as among the interferon-induced genes in nasal epithelial cells. Indeed, Applicants observe modest upregulation of this gene among cells from COVID-19 participants compared to healthy controls. Further, some of the cell subtypes identified as expanded during COVID-19 (e.g., Interferon Responsive Ciliated Cells, BPIFA1 high Secretory Cells, BPIFA1 and Chemokine high Secretory Cells, and KRT24 KRT13 high Secretory Cells) express relatively high abundances of ACE2 (Figure 10E).
[0300] Together, across all detailed cell types, cells within the COVID-19 WHO 1-5 cohort recurrently upregulated interferon-responsive factors including STAT1, MX1, HLA-B, HLA-C , among others (compared to matched cell types among Control WHO 0 participants), while cells from the COVID-19 WHO 6-8 cohort repeatedly induced a distinct set of genes, including S100A9, S100A8 and stress response factors (HSPA8 , HSPA1A, DUSP1, Figure 3H).
[0301] Applicants were curious as to whether depressed interferon and anti-viral responses could be explained by higher rates of steroid treatment among the severe COVID-19 group (Table 1). Applicants therefore stratified the cohorts further into Steroid-Treated vs. Untreated, and assessed expression of genes previously identified as DE between Control WHO 0, COVID-19 WHO 1-5, and COVID-19 WHO 6-8. For some genes, steroid treatment partially suppressed the interferon response within each cohort - for instance, Ciliated Cells from Untreated COVID-19 WHO 1-5 participants showed higher abundances of IFITMl, OAS2, IFI6 , and IFI27 than their Steroid-Treated counterparts - while still maintaining strong differences in expression between cohorts (with abundance in COVID-19 WHO 1-5 > COVID-19 WHO 6-8 > COVID-19 WHO 0, see annotations on Figure IOC). Intriguingly, induction of FKBP5 expression among Ciliated Cells from severe COVID-19 participants was fully explained by steroid treatment, which is consistent with the role for this protein in modulating glucocorticoid receptor activity. Other sets of anti-viral genes were equivalently expressed within each cohort, independent of steroid treatment, including STAT1, STAT2, IFI44, and ISG15. For many anti -viral factors in multiple cell types, Applicants observed no effect of steroid treatment on the intrinsic anti -viral response during COVID-19.
[0302] Together, these data demonstrate global blunting of the local anti -viral/interferon response among nasopharyngeal epithelial cells during severe COVID-19. Applicants next attempted to query the source of local interferon, particularly in the COVID-19 WHO 1-5 samples where cell types appeared to be maximally responding to interferon stimulation. Notably, Applicants expect many of the tissue-resident immune cells to reside principally within the deeper lamina propria and submucosal spaces, and are therefore are poorly represented in the dataset due to sampling type (swabbing of surface epithelial cells). Accordingly, Applicants found exceedingly few immune cell types producing interferons: IFNA and IFNB were absent, rare IFNL1 UMI were observed among T cells and Macrophages, and IFNG was robustly produced from cytotoxic CD8 T cells, despite limited evidence for type II responses among epithelial cells (Figure 10F). Further, Applicants could not detect expression of any interferon types among epithelial cells, which is dramatically different from previous observations of robust type I/III interferon expression among nasal ciliated cells during influenza A and B infection (Figure 10G). Rather, Applicants found robust induction of other inflammatory molecules from both immune and epithelial cell types. CXCL8 was produced by several specialized secretory cell types, including those uniquely expanded in COVID-19. Inflammatory and Interferon Responsive Macrophages represent the primary sources of local TNF , IL6, and IL10 , and uniquely express high abundances of chemoattractant molecules such as CCL3, CCL2, CXCL8, CXCL9, CXCL10, and CXCL11
(Figures 10F).
[0303] Applicants directly tested whether the lack of an IFN-stimulated response among nasal epithelial cells in severe COVID-19 participants could be explained by autoantibody mediated inhibition of secreted interferons as reported in other cohorts (Bastard, P., et al. (2020). Autoantibodies against type I IFNs in patients with life-threatening COVID-19. Science (80); Bastard, P., et al. (2021). Preexisting autoantibodies to type I IFNs underlie critical COVID-19 pneumonia in patients with APS-1. J. Exp. Med. 218; Wang, N., et al. (2020a). Retrospective Multicenter Cohort Study Shows Early Interferon Therapy Is Associated with Favorable Clinical Responses in COVID-19 Patients. Cell Host Microbe). Using matched plasma collected at the time of NP swab, Applicants analyzed a subset of 25 participants for IgG and IgM antibodies targeting a large panel of potential antigens (using a microarray -based antibody hybridization platform, see Methods). Here Applicants found evidence for IgG autoantibodies targeting IFN-ω and 11 IFNα subtypes in 1/8 participants who developed severe COVID-19, 0/12 participants with mild/moderate disease, and 0/5 healthy donors (Figure 31). Applicants caution against generalizing this result due to the limited cohort size; Applicants note however that the findings agree well with the expected proportion (-10%) of severe individuals with autoantibodies to IFN- components from published data (Bastard et al., 2020).
[0304] To better understand parti cipant-to-participant variability in anti -viral and IFN- responsive gene signatures, Applicants analyzed the average expression of STAT1, STAT2, IRF1, and IRF9 - key transcription factors responsible for the induction of IFN-stimulated gene expression and IFN-induced genes themselves - among ciliated cells from each participant (Figure 3J). Applicants found that the expression of STAT1, STAT2, and IRF1 was indistinguishable among cells from control WHO 0, control WHO 7-8, and COVID- 19 WHO 6-8 participants. IRF9 was diminished among COVID-19 WHO 6-8 participants and control WHO 7-8 participants compared to healthy donors and participants with mild or moderate COVID-19. Intriguingly, despite the absence of autoantibodies directed at type I interferons, nearly all participants who developed severe COVID-19 failed to induce STAT1, STAT2, IRF1, and IRF9 expression (among other IFN-stimulated genes). Even individuals who had milder disease and limited requirement for respiratory support at the time of nasal swab, but later went on to develop severe or fatal COVID- 19 (swab WHO 1-5, peak WHO 6-8), already had diminished STAT1 expression at the time of nasal swab (Figure 3J). This suggests a potential predictive value of poor interferon-stimulated gene (ISG) induction.
Example 5 - Targets of SARS-CoV-2 Infection in the Nasopharynx
[0305] Given a comprehensive picture of host cell biology during COVID-19 and across the spectrum of disease severity, Applicants next tested whether the observed epithelial phenotypes were associated with altered viral loads. Single cell RNA-sequencing protocols utilize poly- adenylated RNA capture and reverse transcription to generate snapshots of the transcriptional status of each individual cell. As other pathogens and commensal microbes also utilize poly- adenylation for RNA intermediates, or contain poly-adenylated stretches of RNA within their genomes, they may also be represented within single-cell RNA-seq libraries. First, to perform an unbiased search for co-detected viral, bacterial, and fungal genomic material, Applicants used metatranscriptomic classification (implemented with Kraken2) to assign reads according to a comprehensive reference database (previously described, see Methods). As expected, the majority (28/38) of swabs from individuals with COVID-19 contained reads classified as SARS coronavirus species (Figure 4A, Figures 11A-11C). Among samples containing SARS coronavirus genomic material, the read abundance ranged from 2e0 to 8.8e6 reads (1.8e-3 to 1.9e4 reads/M total reads). Applicants found little evidence for co-occurring respiratory viral infections, which may be partially explained by the season when many of the swabs were collected (April-September 2020) and concurrent social distancing practices. Swabs from two individuals were found to contain rare reads classified as Influenza A virus species (maximum 5 reads per donor, within range for spurious classification), and Applicants found no evidence for other seasonal human coronaviruses, Influenza B virus, metapneumovirus, or orthopneumovirus. Swabs from two individuals with mild/moderate COVID-19 were found to contain exceptionally high abundances of reads classified as Rhinovirus A (2.1e5 and 2.4e5 reads). Finally, Applicants recovered SARS coronavirus assigned reads from two participants from the Control WHO 0 cohort and one individual classified as Convalescent (> 40 days following resolution of mild COVID-19).
[0306] Next, Applicants analyzed all SARS-CoV-2-aligned UMI following alignment to a joint genome containing both human and SARS-CoV-2. Applicants took the sum of all SARS- CoV-2 aligning UMI from a given participant - both from high-quality single-cell transcriptomes and low-quality/ambient RNA - as a representative measure of the total SARS-CoV-2 burden within the tissue microenvironment. As observed using metatranscriptomic classification, Applicants found relatively low/spurious alignments to SARS-CoV-2 among Control participants, while swabs from COVID-19 participants contained a wide range of SARS-CoV-2 aligning reads (Figure 4B, Figures 11D, HE). Samples from COVID-19 WHO 6-8 participants contained significantly higher abundances of SARS-CoV-2 aligning reads than both control cohorts, with an average of 1. Ie2 +/- 2.8e0 (geometric mean +/- SEM) UMI per million aligned UMI (ranging from 0 to 1.5e5 per sample). Swabs from participants with mild/moderate COVID-19 contained slightly fewer SARS-CoV-2 aligning UMI, with an average of 1. lei +/- 4.3e0 (geometric mean +/- SEM) UMI per M.
[0307] Given the large diversity in SARS-CoV-2 abundance across all COVID-19 participants, Applicants interrogated whether cell composition correlated with total SARS-CoV-2 (NB: contemporaneous work by Applicants has evaluated the accuracy of single-cell RNA-seq derived estimates of total SARS-CoV-2 abundance with more established protocols such as Real- Time RT-PCR). Among all cell types, Applicants found that Secretory Cells were significantly positively correlated with the total viral abundance (Spearman's rho = 0.49, Bonferroni-corrected p = 0.0015), while FOXJ1 high Ciliated Cells were significantly negatively correlated (Spearman's rho = -0.43, Bonferroni-corrected p = 0.020, Figure 4C, 4D). This observation is in line with findings outlined in Figures 1 and 2 where epithelial cell destruction during SARS-CoV-2 infection drives preferential loss of differentiated ciliated cell types, and secretory cells may expand to repopulate lost epithelial cell types. Next, Applicants binned the samples from COVID- 19 participants into “Viral Low” and “Viral High” groupings (based on an arbitrary cutoff of le3 SARS-CoV-2 UMI per M, though the findings were robust to a range of partition choices, Figures 11E, 11F). Interferon Responsive Ciliated Cells were expanded among “Viral High” COVID-19 samples and plasmacytoid DCs were absent from “Viral High” samples.
[0308] Next, Applicants aimed to differentiate SARS-CoV-2 UMI derived from ambient or low-quality cell barcodes from those truly reflecting intracellular RNA molecules. First, Applicants filtered to only viral UMIs associated with cells presented in Figure 1, thereby removing those associated with low-quality cell barcodes (Figure 11G). Next, using a combination of computational tools to 1) estimate the proportion of ambient RNA contamination per single cell and 2) estimate the abundance of SARS-CoV-2 RNA within the extracellular/ambient environment (i.e., not cell-associated), Applicants were able to test whether the amount of viral RNA associated with a given single-cell transcriptome was significantly higher than would be expected from ambient spillover. Together, this enabled Applicants to identify cell barcodes whose SARS-CoV-2 aligning UMI were likely driven by spurious contamination, and annotate single cells that contain probable cell-associated or intracellular SARS-CoV-2 RNA (Figure 4E, Figure 11G). Across all single cells, this analysis recovered 415 high-confidence SARS-CoV-2 RNA+ cells across 21 participants, and Applicants confirmed that cell assignment as “SARS-CoV-2 RNA+” was not driven by technical factors such as sequencing depth or cell complexity (Figure 11H). 262 cells (of 12,909) were from participants with severe COVID-19 and 150 (of 5,194) from mild/moderate COVID-19. Applicants found 3 SARS-CoV-2 RNA+ cells from participants with negative SARS-CoV-2 PCR: two from a participant classified as “Convalescent”, and one from a Control participant. Among participants with any SARS-CoV-2 RNA+ cell, Applicants found 20 +/- 7 (mean +/- SEM) SARS-CoV-2 RNA+ cells per sample (range 1-119), amounting to 4 +/-1.3% (range 0.1-24%) of the recovered cells per sample. Within a given single cell, the abundance of SARS-CoV-2 UMI ranged from 1 to 12,612, corresponding to 0.01-98% of all human and viral UMI per cell.
[0309] To further understand the biological significance behind SARS-CoV-2 aligning UMI within a single cell, and to better identify cells with the highest-likelihood of actively supporting viral replication, Applicants analyzed the specific viral sequences and their alignment regions in the viral genome. During SARS-CoV-2 infection, viral uncoating from endosomal vesicles releases the positive, single-stranded, 5’ capped, poly-adenylated genome into the host cytosol (Figure 4F, 4G). Here, translation of non-structural proteins proceeds first by templating directly off of the viral genome, generating a replication and transcription complex. The viral replication complex then produces both 1) negative strand genomic RNA intermediates, which serve as templates for further positive strand genomic RNA and 2) nested subgenomic mRNAs which are constructed from a 5’ leader sequence fused to a 3’ sequence encoding structural proteins for production of viral progeny (e.g., Spike, Envelope, Membrane, Nucleocapsid). Generation of nested subgenomic mRNAs relies on discontinuous transcription occurring between pairs of 6- mer transcriptional regulatory sequences (TRS), one 3 ’ to the leader sequence (termed leader TRS, or TRS-L), and others 5’ to each gene coding sequence (termed body TRS, or TRS-B). Applicants reasoned that short SARS-CoV-2 aligning UMI could be readily distinguished by their strandedness (aligning to the negative vs. positive strand) and whether they fell within coding regions, across intact TRS (indicating RNA splicing had not occurred for that RNA molecule at that splice site) or across a TRS with leader-to-body fusions (corresponding to subgenomic RNA, Figure 4F, 4G, Figure 12A). Single cells containing higher abundances of spliced or negative strand aligning reads are therefore more likely to represent truly virally infected cells with a functional viral replication and transcription complex. Critically, the co-detection of host transcriptomic and viral genomic material associated with a single cell barcode cannot definitively establish the presence of intracellular virus and/or productive infection. Rather, Applicants integrate these and other aspects of the host and viral transcriptomes to refine and contextualize the confidence in “SARS-CoV-2 RNA+” cells.
[0310] The majority of SARS-CoV-2 aligning UMI among SARS-CoV-2 RNA+ cells was found heavily biased towards the 3’ end of the genome, attributed to the 3’ UTR, ORFIO, and N gene regions, as expected due to poly-A priming (Figure 4H). A majority (68.7%) of SARS-CoV- 2 RNA+ cells contained reads aligning to the viral negative strand, increasing the likelihood that many of these cells represent true targets of SARS-CoV-2 virions in vivo. In addition to negative strand alignment, Applicants find roughly ~ 1/4 of the SARS-CoV-2 RNA+ cells contain at least 100 UMI that map to more than 20 distinct viral genomic locations per cell. Finally, comparing spliced to unspliced UMI, Applicants found a minor fraction of cells with reads mapping directly across a spliced TRS sequence (4.6%), while 35% of SARS-CoV-2 RNA+ cells contained reads mapping across the equivalent 70mer window around an unspliced TRS. Notably, single cells containing reads aligning to spliced (subgenomic) RNA were heavily skewed toward those cells that contained the highest overall abundances of viral UMI - this may be an accurate reflection of coronavirus biology, wherein subgenomic RNA are most frequent within cells robustly producing new virions and total viral genomic material, but also points to inherent limitations in the detection of low-frequency RNA species by single-cell RNA-seq technologies.
[0311] Next, Applicants integrated 1) the strand and splice information among SARS-CoV-2 aligning UMIs, 2) participant-to-participant diversity and 3) cell type annotations to gain a comprehensive picture of the identity and range of SARS-CoV-2 RNA+ cells within the nasopharyngeal mucosa (Figure 5A-D, Figure 12A-12E). Applicants found incredible diversity in both the identity of SARS-CoV-2 RNA+ cells, as well as the distribution of SARS-CoV-2 RNA+ cells within and across participants. The majority of SARS-CoV-2 RNA+ cells were Ciliated, Goblet, Secretory or Squamous. Highest-confidence SARS-CoV-2 RNA+ cells (spliced UMI, negative strand UMI, > 100 SARS-CoV-2 UMI) tended to be found among MUC5AC high Goblet Cells, AZGP1 high Goblet Cells, BP IF A 1 high Secretory Cells, KRT24 KRT13 high Secretory Cells, CCL5 high Squamous Cells, Developing Ciliated Cells, and each Ciliated Cell subtype. A high proportion of Interferon Responsive Macrophages contained SARS-CoV-2 genomic material, and rare ITGAX high Macrophages were found to contain UMI aligning to viral negative strand or spliced TRS regions - likely representing myeloid cells that have recently engulfed virally-infected epithelial cells or free virions. Applicants did not find major differences in the presumptive cellular tropism by the severity of COVID-19. A few cell types were commonly found to be SARS-CoV-2 RNA+ across all participants (including participants with only rare viral RNA+ cells): most frequently, participants had at least one Developing Ciliated or Squamous cell with SARS-CoV-2 RNA, followed by Goblet Cells, Cilia high Ciliated Cells, and FOXJ1 high Ciliated Cells (Figure 5C). However, among the individuals with the highest abundances of SARS-CoV-2 RNA+ cells, viral RNA was spread broadly across many different cell types, including those outside of the expected tropism for SARS-CoV-2 (e.g., also found within Basal Cells, Ionocytes). Further, the cell types harboring the highest proportions of SARS-CoV-2 RNA+ cells represent the same cell types uniquely expanded or induced within COVID-19 participants, such as KRT24 KRT13 high Secretory Cells, AZGP1 high Goblet Cells, and Interferon Responsive Ciliated Cells, and contain the highest abundances of ACE2-expressing cells (Figure 5C, Figure 12F. Whether these cell types represent specific phenotypes elicited by intrinsic viral infection (potentially alongside induction of anti-viral genes) or are uniquely susceptible to SARS-CoV-2 entry (e.g., enhanced entry factor expression) will require further investigation. Developing ciliated cells contain among the highest SARS-CoV-2 RNA molecules per-cell, including positive strand, negative strand-aligning reads, and spliced TRS reads (Figure 12G). Among ciliated cell subtypes, IFN responsive ciliated cells, despite representing one of the most frequent “targets” of viral infection, contain the lowest per-cell abundances of SARS-CoV-2 RNA, potentially reflecting the impact of elevated anti-viral factors curbing high levels of intracellular viral replication (Figure 12H).
Example 6 - Cell Intrinsic Responses to SARS-CoV-2 Infection
[0312] Above, Applicants carefully mapped the specific cell types and states harboring SARS- CoV-2 RNA+ cells, identifying the subsets of epithelial cells that appear to actively support viral replication in vivo across distinct individuals (Figure 5). Further, Applicants have characterized robust and cell-type-specific host responses among cells from COVID-19 participants, ostensibly representing both the bystander cell response to local virus and an inflammatory microenvironment, as well as the intrinsic response to intracellular SARS-CoV-2 RNA (Figure 3). Here, by directly comparing single cells containing SARS-CoV-2 RNA to their matched bystanders, Applicants aimed to map both the cell-intrinsic response to direct viral infection, as well as the host cell identities that may potentiate or enable SARS-CoV-2 replication and tropism. [0313] To control for variability among different SARS-CoV-2 RNA+ cell types and individuals, Applicants compared SARS-CoV-2 RNA+ cells to bystander cells of the same cell type and participant. Among cell types with at least 5 SARS-CoV-2 RNA+ cells, Applicants observed robust and specific transcriptional changes compared to both matched bystander cells as well as cells from healthy individuals (Figures 6A, 6B). Notably, many of the genes previously identified as increased within all cells from COVID-19 donors, e.g., anti -viral factors IFITM3, MX1, IFI44L, and IRF1 , were also upregulated among SARS-CoV-2 RNA+ cells compared to matched bystanders within multiple cell types. SARS-CoV-2 RNA+ cells from participants with mild/moderate COVID-19 showed stronger induction of anti -viral and interferon responsive pathways compared to those with severe COVID-19, despite equivalent abundances of cell- associated viral UMI (Figure 13A). EIF2AK2, which encodes protein kinase R and drives host cell apoptosis following recognition of intracellular double-stranded RNA, was among the most reliably expressed and upregulated genes among SARS-CoV-2 RNA+ cells compared to matched bystanders across diverse cell types, suggesting rapid activation of this locus following intrinsic PAMP recognition of SARS-CoV-2 replication intermediates. Therefore, direct sensing of intracellular viral products amplifies interferon-responsive and anti-viral gene upregulation, though these pathways are also elevated within bystander cells. The majority of genes induced within SARS-CoV-2 RNA+ cells were shared across diverse cell types, suggesting a conserved anti-viral response, as well as common features that facilitate or restrict infection (Figure 6B-6D, Table 5). SARS-CoV-2 RNA appeared to robustly stimulate expression of genes involved in anti- viral sensing and defense (e.g., MX1, IRF1, OAS1, OAS2), as well as genes involved in antigen presentation via MHC class I (Figure 6C, Table 5). SARS-CoV-2 RNA+ cells expressed significantly higher abundances of multiple proteases involved in the cleavage of SARS-CoV-2 spike protein, a required step for viral entry (TMPRSS4 , TMPRSS2, CTSS, CTSD). This suggests that within a given cell type, natural variations in the abundance of genes which support the viral life cycle partially account for which cells are successfully targeted by the virus. Among the core anti-viral/interferon-responsive gene sets induced within SARS-CoV-2 RNA+ cells, Applicants found repeated and robust upregulation of IFITM3 and IFITM1. Multiple studies have demonstrated that while these two interferon-inducible factors can disrupt viral release from endocytic compartments among a wide diversity of viral species, IFITMs can instead facilitate entry by human betacoronaviruses. Therefore, enrichment of these factors within presumptive infected cells may reflect viral hijacking of a conserved host anti -viral responsive pathway. Genes involved in cholesterol and lipid biosynthesis were also upregulated among SARS-CoV-2 RNA+ cells, including FDFT1, MVK, FDPS, ACAT2, HMGCS1 , all enzymes involved in the mevalonate synthesis pathway. In addition, SARS-CoV-2 RNA+ cells showed increased abundance of low- density lipoprotein receptors LDLR and LRP8 compared to matched bystanders. Intriguingly, various genes involved in cholesterol metabolism were recently identified as critical host factors for SARS-CoV-2 replication via CRISPR screens from multiple independent research groups56,57. Further, these groups found that direct inhibition of cholesterol biosynthesis decreased SARS- CoV-2 (as well as coronavirus strains 299E and OC43) replication within cell lines, and suggest S-mediated entry relies on host cholesterol. Applicants queried the full collections of presumptive replication factors identified by four published CRISPR screens56-59, and found significant enrichment among SARS-CoV-2 RNA+ cells for RAB GTPases (e.g. RAB9A, RHOC, RASEF), vacuolar ATPase H+ pump subunits, as well as transcriptional modulators such as SPEN, SLTM, CREBBP, SMAD4 andEGRl (Figure 13B).
[0314] Finally, Applicants found multiple previously unappreciated genes implicated in susceptibility and response to SARS-CoV-2 infection, including SlOO/Calbindin genes such as S100A6, S100A4, and S100A9, which may directly play a role in leukocyte recruitment to infected cells. IFNAR1 was substantially increased in many bystander cells compared to both cells from SARS-CoV-2 negative participants as well as matched SARS-CoV-2 RNA+ cells (Figure 6D). Blunting of interferon alpha signaling via downregulation of IFNAR1 within SARS-CoV-2 RNA+ cells may partially explain high levels of viral replication compared to neighboring cells. Moreover, this may represent a novel mechanism for interferon antagonism by SARS-CoV-2. Finally, bystander cells expressed significantly higher abundances of MHC-II molecules compared to SARS-CoV-2 RNA+ cells, including HLA-DQB1, HLA-DRB1, HLA-DRB5, HLA-DRA, and CD74.
[0315] Anti-viral factors were largely absent from presumptive virally infected cells in participants who developed severe COVID-19, despite equivalent abundances of cell-associated viral UMIs, and elevated UMIs/cell aligning to the viral negative strand (Figure 6E, Figure 13A). EIF2AK2, which encodes protein kinase R and drives host cell apoptosis following recognition of intracellular double-stranded RNA, is among the most reliably expressed and upregulated genes among SARS-CoV-2 RNA+ cells compared to matched bystanders across diverse cell types, suggesting rapid activation of this gene following intrinsic PAMP recognition of SARS-CoV-2 replication intermediates (Krahling et al., (2009). Severe Acute Respiratory Syndrome Coronavirus Triggers Apoptosis via Protein Kinase R but Is Resistant to Its Antiviral Activity. J. Virol.). Neither EIF2AK2 Vor IFN-responsive transcription factors such as STAT1 and STAT2 were expressed within SARS-CoV-2 RNA+ cells from participants who developed severe COVID-19 (Figure 6E). This suggests that direct sensing of intracellular viral products may amplify IFN- responsive and anti-viral gene upregulation, though these pathways are only induced among SARS-CoV-2 RNA+ cells from participants with mild/moderate COVID-19 (Figure 6F). Together, this suggests a failure of the intrinsic immune response to viral infection among nasal epithelial cells in individuals who develop severe COVID-19.
Example 7 - Discussion
[0316] Here, Applicants have created a comprehensive map of SARS-CoV-2 infection of the human nasopharynx using scRNA-seq, and identified tissue correlates of protection and disease severity within a large human cohort. By linking a detailed census of cell types and states across disease outcomes, Applicants begin to untangle the myriad factors that underlie restriction of viral infection to the upper respiratory tract vs. expansion to the lower airways and lung parenchyma or support the development of severe lower respiratory tract disease (Figure 13C). This study defines major compositional differences in the nasal epithelia during COVID-19 and directly relates these to NP viral load, cellular tropism, and cell-intrinsic responses to SARS-CoV-2. Further, Applicants identify marked variability in the induction of anti-viral gene expression that is associated with peak disease severity and may precede development of severe respiratory damage. Applicants find that anti-viral gene expression is profoundly blunted in cells isolated from individuals who develop severe disease, even in cells containing SARS-CoV-2 RNA.
[0317] First, Applicants find that mature ciliated cells decline dramatically within the nasopharynx of COVID-19 samples, directly correlated with the tissue abundance of SARS-CoV- 2 RNA at the time of sampling. Conversely, secretory cell populations expand among samples with high viral loads, which potentially represents a conserved response for epithelial re- population of lost mature ciliated cells through a recently identified mechanism of secretory/goblet trans-differentiation, using deuterosomal cells as intermediates. Accordingly, deuterosomal cells and immature/devel oping ciliated cells were dramatically expanded among COVID-19 samples, suggesting interdependence between each of these compartments in maintaining epithelial homeostasis during viral challenge. Further work is required to understand how this process relates to epithelial responses in other common upper respiratory viral infections and inflammatory states. Broadly, SARS-CoV-2 infection induced dramatic increases in the diversity of epithelial cell types, both with respect to shifted compositional balance among major cell identities, and also via expansion of specialized secretory and goblet cell subsets, including a subset termed KRT13 KRT24 high Secretory Cells, which closely match the recently-identified KRT13 “hillock” cell, previously associated with epithelial regions experiencing rapid cellular turnover and inflammation49. Other specialized subsets of secretory and goblet cells, such as Early Response Secretory Cells, AZGP1 high Goblet Cells, and SCGB1A1 high Goblet Cells are expanded among COVID-19 participants, however, are found within discrete subsets of individuals and are not homogenous across the disease cohorts Applicants sample here. Indeed, heterogeneous responses in the epithelial compartment between individuals with COVID-19 underscores the need for larger cohort studies, with a focus on longitudinal responses following initial infection. [0318] Beyond compositional changes during COVID-19, this study found that individuals who developed severe disease exhibited profoundly blunted anti-viral responses and diminished expression of interferon-responsive genes compared to individuals with milder courses. This effect was observed among diverse cell types, including those thought to represent direct targets of viral infection, such as ciliated cells and secretory cells, and also bystanders and co-resident immune cells. Notably, individuals with severe COVID-19 disease had equivalent or even elevated levels of nasal SARS-CoV-2 RNA at the time of sampling, and contained expanded inflammatory and type Il-interferon responsive macrophages compared to mild/moderate cases. Surprisingly, even among mild cases with robust interferon stimulated gene expression, Applicants found little to no type I/III interferon transcription amongst any recovered cell types. In a related study mapping the nasal epithelium during influenza infection, the authors found extensive upregulation of IFNA, IFNB1, and IFNL1-3 within ciliated cells and goblet cells, both highlighting the capacity of superficial nasal epithelial cells to secrete local interferons during viral infection, but also the technical capacity of the scRNA-seq platform used in both studies to capture interferon mRNA. The precise source and signal which motivates a broad anti-viral response among mild COVID- 19 cases in this study remains unknown, and may originate from immune cells contained deeper within the respiratory mucosa (therefore inaccessible through the superficial sampling used here), or may derive from direct PAMP/DAMP sensing or alternative inflammatory signals. Indeed, published peripheral immune studies comparing mild and severe COVID-19 also observe diminished type I and type III interferon abundances, and note restricted interferon stimulated gene expression among circulating immune cells17,18. The close association between disease severity and weak anti-viral gene expression among nasal epithelial cells is also intriguing given recent observations of inborn defects in TLR3, IRF7, IRF9, and IFNAR1 or direct antibody-mediated neutralization of secreted type I interferons within individuals who develop severe COVID-1932- 34. Even among cells containing SARS-CoV-2 RNA, individuals who developed severe disease failed to induce expression of classic anti -viral factors including MX1, IFITM1, ISG15 , which were all robustly associated with intracellular viral RNA within mild/moderate cases. Further, Applicants found lower nasal viral loads were associated with elevated detection of tissue plasmacytoid DCs, suggesting diminished or delayed recruitment of these cells may partially explain how local viral replication proceeds to such high abundances. These findings strongly suggest severe infection can arise in the setting of an intrinsic impairment of epithelial anti-viral immunity. Further, human betacoronaviruses including MERS, SARS-CoV, and SARS-CoV-2 all exhibit multiple strategies to avoid triggering pattern recognition receptor pathways, including degradation of host mRNA within infected cells, sequestration of viral replication intermediates (e.g., double stranded RNA) from host sensors, and direct inhibition of immune effector molecules, thereby leading to diminished induction of anti-viral pathways and blunted autocrine and paracrine interferon signaling. Applicants surmise that the combined effects of a viral strain with naturally poor interferon induction in a host with intrinsic defects in immune or epithelial anti-viral responses drives prolonged viral replication in the upper airway, which eventually leads to immunopathology characteristic of severe COVID-19.
[0319] Critically, this work does not address the dynamics of nasal epithelial anti-viral responses during SARS-CoV-2 infection, nor does it directly relate failed intrinsic epithelial immunity in the nasopharynx to potential interferon or anti-viral responses in the lung or distal airways. Indeed, related work suggests type III interferons are present in the lungs, but not the nasopharynx, during SARS-CoV-2 infection, and may contribute to tissue damage late in disease course60. Further, as the individuals in this cohort were intentionally sampled as early within their disease course as possible, and the majority have elevated viral levels within their nasopharynx, the findings have an unclear relation to the tissue response during hyper-inflammatory “late” stages of COVID-19. However, among individuals who develop severe COVID-19, Applicants observe unique recruitment of highly inflammatory macrophages that represent the major tissue sources of proinflammatory cytokines including IL1B, TNF, CXCL8, CCL2 , CCL3 and CXCL9/10/11 - of likely relation to the immune dysregulation characterized by elevation of the same factors in the periphery in late, severe disease. In addition, Applicants note specific upregulation of alarmins S100A8/S100A9 (which together form TLR4 and RAGE ligand calprotectin) among epithelial cells in severe COVID-19 compared to mild and control counterparts, and even higher expression of S100A9 within SARS-CoV-2 RNA+ cells from those same individuals. A recent study identified these as potential biomarkers of severe COVID-19, and proposed that these factors directly drive excessive inflammation and precede the massive cytokine release characteristic of late disease. This work suggests that severe COVID- 19-specific expression of calprotectin may originate instead within the virally-infected nasal epithelia, and suggests that further work to understand the epithelial cell regulation of S100A8/A9 gene expression may help clarify maladaptive responses to SARS-CoV-2 infection.
[0320] Finally, Applicants provide a direct investigation into the host factors that enable or restrict SARS-CoV-2 replication within epithelial cells in vivo. Here, Applicants recapitulate expected “hits” based on well-described host factors involved in viral replication, e.g., TMPRSS2, TMPRSS4 enrichment among presumptive virally infected cells. Applicants similarly observed expression of anti-viral genes which were globally enriched among cells from mild/moderate COVID-19 participants, with even higher expression among the viral RNA+ cells themselves. In accordance with previous studies into the nasal epithelial response to influenza infection, Applicants observed bystander epithelial cell upregulation of both MHC-I and MHC-II family genes, however found that SARS-CoV-2 RNA+ cells only expressed MHC-I, and uniformly downregulated MHC-II genes compared to matched bystanders. To Applicants' knowledge, downregulation of host cell pathways for antigen presentation by coronaviruses has not been previously described. A recent study found that CIITA and CD74 can intrinsically block entry of a range of viruses (including SARS-CoV-2) via endosomal sequestration, and therefore cells that upregulate these (and other) components of MHC-II machinery may naturally restrict viral entry. [0321] Together, this work demonstrates that many of the factors that determine the clinical trajectory following SARS-CoV-2 infection stem from initial host- viral encounters in the nasopharyngeal epithelium. Further, it implies that dysregulated tissue immunity may be subverted by focusing preventative or therapeutic interventions early within the nasopharynx, thereby bolstering anti-viral responses and curbing pathological inflammatory signaling prior to development of severe respiratory dysfunction or systemic disease.
Example 8 - Methods
[0322] Study Participants and Design - Subjects 18 years and older were recruited from the University of Mississippi Medical Center (UMMC) (Jackson, Mississippi) between April 2020 and September 2020. All patients were enrolled in the prospective study at UMMC, which included patients with COVID-19 who were inpatient hospitalized as well as non-COVID-19 (control) who were outpatient and seen at UMMC Acute Respiratory Clinic or UMMC GI Endoscopy. Inclusion criteria for COVID-19 participants included fever, cough, sore throat and/or shortness of breath with presumed diagnosis of COVID-19 upper respiratory tract infection. The patients all weighed 110 lbs or greater. Non-COVID-19 (control) participants all had a negative SARS-CoV-2 test, weighed 110 pounds or greater, and were seen in either GI Endoscopy or UMMC Acute Respiratory Clinic. Exclusion criteria for both cohorts included a history of blood transfusion within 4 weeks and subjects who could not be assigned a definitive COVID-19 diagnosis from either nucleic acid testing or Chest CT imaging. For the nasopharyngeal (NP) samples, 38 individuals with COVID-19 were included, both male (n=20) and female (n=18). 21 of the participants were non-COVID-19 (control) - 11 identified as male, 10 as female. The median age of COVID-19 participants was 56.5 years old; the median age of Control participants was 62 years old. Among hospitalized participants, samples were collected between Day 1 to Day 3 of hospitalization. The Institutional Review Board approved the study, and all subjects provided written informed consent, or their legally authorized representative provided it on their behalf. Research samples were collected from volunteers in the form of nasal swabs. A healthcare provider collected the nasopharyngeal sample using two cotton swabs. COVID-19 participants were classified according to the 8-level ordinal scale proposed by the WHO representing severity and level of respiratory support required.
[0323] Sample Collection and Biobanking - Nasopharyngeal samples were collected by trained healthcare provider using FLOQSwabs (Copan flocked swabs) following the manufacturer's instructions. Collectors would don personal protective equipment (PPE), including a gown, non-sterile gloves, a protective N95 mask, a bouffant, and a face shield. The patient's head was then tilted back slightly, and the swab inserted along the nasal septum, above the floor of the nasal passage to the nasopharynx until slight resistance was felt. The swab was then left in place for several seconds to absorb secretions and slowly removed while rotating swab. A second swab was then completed in the other nares. The swabs were then placed into a cryogenic vial with 900 μL of heat inactivated fetal bovine serum (FBS) and 100 μL of dimethyl sulfoxide (DMSO). The vials were then placed into a Thermo Scientific Mr. Frosty Freezing Container for optimal cell preservation. The Mr. Frosty containing the vials was then placed in cooler with dry ice for transportation from patient area to laboratory for processing. Once in the laboratory, the Mr. Frosty was placed into the -80°C Freezer overnight and then on the next day, the vials were moved to the liquid nitrogen storage container. [0324] Dissociation and Collection of Viable Single Cells from Nasal Swabs - Swabs in freezing media (90% FBS/10% DMSO) were stored in liquid nitrogen until immediately prior to dissociation. A detailed sample protocol can be found here: protocols. io/view/human- nasopharyngeal-swab-processing-for-viable-si-bjhkkj4w.html. This approach ensures that all cells and cellular material from the nasal swab (whether directly attached to the nasal swab, or released during the washing and digestion process), are exposed first to DTT for 15 minutes, followed by an Accutase digestion for 30 minutes. Briefly, nasal swabs in freezing media were thawed, and each swab was rinsed in RPMI before incubation in 1 mL RPMI/10 mM DTT (Sigma) for 15 minutes at 37°C with agitation. Next, the nasal swab was incubated in 1 mL Accutase (Sigma) for 30 minutes at 37°C with agitation. The 1 mL RPMI/10 mM DTT from the nasal swab incubation was centrifuged at 400 g for 5 minutes at 4°C to pellet cells, the supernatant was discarded, and the cell pellet was resuspended in 1 mL Accutase and incubated for 30 minutes at 37°C with agitation. The original cryovial containing the freezing media and the original swab washings were combined and centrifuged at 400 g for 5 minutes at 4°C. The cell pellet was then resuspended in RPMI/10 mM DTT, and incubated for 15 minutes at 37°C with agitation, centrifuged as above, the supernatant was aspirated, and the cell pellet was resuspended in 1 mL Accutase, and incubated for 30 minutes at 37°C with agitation. All cells were combined following Accutase digestion and filtered using a 70 μm nylon strainer. The filter and swab were washed with RPMI/10% FBS/4 mM EDTA, and all washings combined. Dissociated, filtered cells were centrifuged at 400 g for 10 minutes at 4°C, and resuspended in 200 μL RPMI/10% FBS for counting. Cells were diluted to 20,000 cells in 200 μL for scRNA-seq. For the majority of swabs, fewer than 20,000 cells total were recovered. In these instances, all cells were input into scRNA-seq.
[0325] scRNA-seq - Seq-Well S3 was run as previously described44,46. Briefly, a maximum of 20,000 single cells were deposited onto Seq-Well arrays preloaded with a single barcoded mRNA capture bead per well. Cells were allowed to settle by gravity into wells for 10 minutes, after which the arrays were washed with PBS and RPMI, and sealed with a semi-permeable membrane for 30 minutes, and incubated in lysis buffer (5 M guanidinium thiocyanate/1 mM EDTA/1% BME/0.5% sarkosyl) for 20 minutes. Arrays were then incubated in a hybridization buffer (2M NaCl/8% v/v PEG8000) for 40 minutes, and then the beads were removed from the arrays and collected in 1.5 mL tubes in wash buffer (2M NaCl/3 mM MgCl2/20 mM Tris-HCl/8% v/v PEG8000). Beads were resuspended in a reverse transcription master mix, and reverse transcription, exonuclease digestion, second strand synthesis, and whole transcriptome amplification were carried out as previously described. Libraries were generated using Illumina Nextera XT Library Prep Kits and sequenced on NextSeq 500/550 High Output v2.5 kits to an average depth of 180 million aligned reads per array: read 1: 21 (cell barcode, UMI), read 2: 50 (digital gene expression), index 1: 8 (N700 barcode).
[0326] Data Preprocessing and Quality Control - Pooled libraries were demultiplexed using bcl2fastq (v2.17.1.14) with default settings (mask_short_adapter_reads 10, minimum trimmed read length 10, implemented using Cumulus, snapshot 4, cumulus. readthedocs.io/en/stable/bcl2fastq.html). Libraries were aligned using STAR within the Drop-Seq Computational Protocol (github.com/broadinstitute/Drop-seq) and implemented on Cumulus (cumulus. readthedocs.io/en/latest/drop_seq.html, snapshot 9, default parameters). A custom reference was created by combining human GRCh38 (from CellRanger version 3.0.0, Ensembl 93) and SARS-CoV-2 RNA genomes. The SARS-CoV-2 viral sequence and GTF are as described in Kim et al. 2020 (github.com/hyeshik/sars-cov-2-transcriptome, BetaCov/South Korea/KCDC03/2020 based on NC_045512.2). The GTF includes all CDS regions (as of this annotation of the transcriptome, the CDS regions completely cover the RNA genome without overlapping segments), and regions were added to describe the 5’ UTR (“SARSCoV2_5prime”), the 3’ UTR (“SARSCoV2_3 prime”), and reads aligning to anywhere within the Negative Strand (“SARSCoV2_NegStrand”). Trailing A’s at the 3’ end of the virus were excluded from the SARS- CoV-2 FASTA, as these were found to drive spurious viral alignment in pre-COVID19 samples. Finally, additional small sequences were appended to the FASTA and GTF that differentiate reads that align to the 70-nucleotide region around the viral TRS sequence - either across the intact, unspliced genomic sequences (e.g., named “SARSCoV2_Unspliced_S” or “SARSCoV2_Unspliced_Leader”) or various spliced RNA species (e.g., “SARSCoV2_Spliced_Leader_TRS_S”), see schematics in Figures 12K, 12L. Alignment references were tested against a diverse set of pre-COVID-19 samples and in vitro SARS-CoV-2 infected human bronchial epithelial cultures (Ravindra et al.) to confirm specificity of viral aligning reads (data not shown). Aligned cell-by-gene matrices were merged across all study participants, and cells were filtered to eliminate barcodes with fewer than 200 UMI, 150 unique genes, and greater than 50% mitochondrial reads (cutoffs determined by distributions of reads across cells, see Figure 7C). Of the 61 nasal swabs thawed and processed, 3 contained no high- quality cell barcodes after sequencing (NB: these samples contained < 5,000 viable cells prior to Seq-Well array loading). This resulted in a final dataset of 32,871 genes and 32,588 cells across 58 study participants (35 COVID-19 individuals, 21 control individuals, 2 COVID-19 convalescent individuals). Preprocessing, alignment, and data filtering was applied equivalently to samples from the fresh vs. frozen cohort. For analysis of RNA velocity, Applicants also recovered both exonic and intronic alignment information using DropEst (Cumulus (cumulus.readthedocs.io/en/latest/drop_seq.html, snapshot 9, dropest velocyto true, run dropest true).
[0327] Cell Clustering and Annotation - Dimensionality reduction, cell clustering and differential gene analysis were all achieved using the Seurat (v3.1.5) package in R programming language (v3.0.2). Dimensionality reduction was carried out by running principal components analysis over the 3,483 most variable genes with dispersion > 0.8 (tested over a range of dispersion > 0.7 to dispersion > 1.2; dispersion > 0.8 was determined as optimal based on number of variable genes, and general stability of clustering results across these cutoffs was confirmed). Only variable genes from human transcripts were considered for dimensionality reduction and clustering. Using the Jackstraw function within Seurat, Applicants selected the first 36 principal components that described the majority of variance within the dataset, and used these for defining a nearest neighbor graph and Uniform Manifold Approximation and Projection (UMAP) plot. Cells were clustered using Louvain clustering, and the resolution parameter was chosen by maximizing the average silhouette score across all clusters. Differentially expressed genes between each cluster and all other cells were calculated using the FindAllMarkers function, test.use set to “bimod”. Clusters were merged if they failed to contain sets of significantly differentially expressed genes. Applicants proceeded iteratively through each cluster and subcluster until “terminal” cell subsets/cell states were identified - Applicants defined “terminal” cell states as those for whom principal components analysis and Louvain clustering did not confidently identify additional sub- states, as measured by abundance of differentially expressed genes between potential clusters. For visualization in Figures 2, 3, and Figure 9, Applicants pooled all cells determined to be of epithelial origin, and using the methods for dimensionality reduction as above (dispersion cutoff > 1, 30 principal components). Applicants applied similar approaches for immune cell types, including iterative subclustering to resolve and annotate all constituent cells types and subtypes, and combined all immune cells for visualization purposes in Figure 10. Cell cycle scoring utilized gene lists from Tirosh et al. Gene module scores were calculated using the AddModule Score function within Seurat.
[0328] RNA Velocity and Pseudotemporal Ordering of Epithelial Cells - RNA velocity was modeled using the scVelo package, version 0.2.3. Using cluster annotations previously assigned from iterative clustering in Seurat, cells from epithelial cell types were pre-processed according to the scVelo pipeline: genes were normalized using default parameters (pp.filter and normalize), principal components and nearest neighbors in PC A space were calculated (using defaults of 30 PCs, 30 nearest neighbors), and the first and second order moments of nearest neighbors were computed, which are used as inputs into velocity estimates (pp. moments). RNA velocity was estimated using the scVelo tool tl.recover dynamics with default input parameters, which maps the full splicing kinetics for all genes and tl.velocity, with mode= “dynamical”. Top velocity transition “driver” genes were identified by high “fit likelihood” parameters from the dynamical model, and are used for visualization in Figure 9G. The same approaches were used for modeling RNA velocity among only Ciliated Cells (Figure 2H-2K), Basal, Secretory, and Goblet Cells (Figure 2L-20), and only COVID-19 or only Control cells (Figure 3A). For RNA velocity analysis of Ciliated Cells or Basal, Secretory and Goblet Cells, the velocity pseudotime was calculated using the tl. velocity _pseudotime function with default settings.
[0329] Metagenomic Classification of Reads from Single-Cell RNA-Seq - To identify co- detected microbial taxa present in the cell-associated or ambient RNA of nasopharyngeal swabs, Applicants used the Kraken2 software implemented using the Broad Institute viral-ngs pipelines on Terra (github.com/broadinstitute/viral-pipelines/tree/master). A previously-published reference database included Human, archaea, bacteria, plasmid, viral, fungi, and protozoa species and was constructed on May 5, 2020, therefore included sequences belonging to the novel SARS-CoV-2 virus. Inputs to Kraken2 were: kraken2_db_tgz = ”gs://pathogen-public-dbs/vl/kraken2-broad- 20200505. tar.zst”, krona_taxonomy_db_kraken2_tgz = ”gs://pathogen-public- dbs/vl/krona.taxonomy-20200505.tab.zst”, ncbi_taxdump_tgz = ”gs://pathogen-public- dbs/vl/taxdump-20200505.tar.gz”, trim_clip_db = ”gs://pathogen-public- dbs/vO/contaminants.clip_db.fasta” and spikein_db = ”gs://pathogen-public- dbs/vO/ERCC_96_nopolyA.fasta”. Species with fewer than 5 reads were considered spurious and excluded.
[0330] Correction for Ambient Viral RNA - Single-cell data from high-throughput single-cell RNA-seq platforms frequently experience low-levels of non-specific RNA assigned to cell barcodes that does not represent true cell-derived transciprtomic material, but rather contamination from the ambient pool of RNA. To safeguard against spurious assignment of SARS-CoV-2 RNA to cells without true intracellular viral material, i.e., viral RNA non-specifically picked up from the microenvironment as a component of ambient RNA contamination, Applicants employed the following corrections and statistical tests to correct for ambient viral RNA and enable confident assignments for SARS-CoV-2 RNA+ cells. Similar to approaches previously described, Applicants tested whether the abundance of viral RNA within a given single cell was significantly higher than expected by chance given the estimate of ambient RNA contaminating that cell, as well as the proportion of viral RNA of the total ambient RNA pool. First, this required modeling and estimating the ambient RNA fraction associated with each individual swab. Here, Applicants employed CellBender (github.com/broadinstitute/CellBender), a software package built to learn the ambient RNA profile and provide an ambient RNA-corrected output. Input UMI count matrices contained the top 10,000 cell barcodes, therefore including at least 70% cell barcodes sampling the ambient RNA of low-quality cell pool. CellBender' s remove-background function was run with default parameters and — fpr 0.01 -expected-cells 500 —low-count-threshold 5. Using the corrected output from each sample's count matrix following CellBender, Applicants calculated the proportion of ambient contamination per high-quality cell by comparing to the single-cell's transcriptome pre-correction, and summed all UMI from background/low-quality cell barcodes to recover an estimate of the total ambient pool. Next, Applicants tested whether the abundance of viral RNA in a given single cell was significantly above the null abundance given the ambient RNA characteristics using an exact binomial test (implemented in R (binom.test):
[0331] where n = SARS-CoV-2 UMI per cell, x = total UMI per cell
Figure imgf000132_0001
[0332] p = (ambient fraction per cell)*(SARS-CoV-2 UMI fraction of all ambient UMI), and q = 1-p [0333] P-values were FDR-corrected within sample, and cells whose SARS-CoV-2 UMI abundance with FDR < 0.01 were considered “SARS-CoV-2 RNA+”.
[0334] Differential Expression by Cohort, Cell Type, or Viral RNA Status - To compare gene expression between cells from distinct donor cohorts Applicants employed a negative binomial generalized linear model. Cells from each cell type belonging to either COVID-19 WHO 1-5 (mild/moderate), COVID-19 WHO 6-8 (severe), or Control WHO 0 were compared in a pairwise manner, implemented using the Seurat FindAllMarkers function. Applicants considered genes as differentially expressed with an FDR-adjusted p value < 0.001 and log fold change > 0.25. To compare gene expression between SARS-CoV-2 RNA+ cells and bystander cells (from COVID- 19 participants, but without intracellular viral RNA) Applicants again used a negative binomial generalized linear model, but instead implemented using DESeq2. Applicants only tested cell types containing at least 15 SARS-CoV-2 RNA+ cells, and for each cell type, Applicants restricted the bystander cells to the same participants as the SARS-CoV-2 RNA+ cells. Next, given the large discrepancies in cell number between SARS-CoV-2 RNA+ and bystander groups among most cell types, Applicants randomly sub-sampled the bystander cells to at most 4x the number of SARS- CoV-2 RNA+ cells. Further, Applicants selected bystander cell subsets that matched the cell quality distribution of the SARS-CoV-2 RNA+ cells, based on binned deciles of UMI/Cell. DESeq2 was run with default parameters and test = “Wald”. Gene ontology analysis was run using the Database for Annotation, Visualization, and Integrated Discovery (DAVID). Gene set enrichment analysis (GSEA) was completed using the R package fgsea over genes ranked by average log foldchange expression between each cohort, including all genes with an average expression > 0.5 UMI within each respective cell type. Gene lists corresponding to “Shared IFN Response”, “Type I IFN Specific Response” and “Type II IFN Specific Response” are derived from previously-published population RNA-seq data from nasal epithelial basal cells treated in vitro with 0.1 ng/mL - 10 ng/mL IFNA or IFNG for 12 hours. Module scores were calculated using the Seurat function AddModule Score with default inputs.
[0335] Statistical Testing - All statistical tests were implemented either in R (v4.0.2) or Prism (v6) software. Comparisons between cell type proportions by cohort were tested using a Kruskal- Wallis test and Bonferroni -correction, implemented in R using the kruskal.test, and p. adjust functions. Post-tests for between-group pairwise comparisons used Dunn's test. Spearman correlation was used where appropriate, implemented using the cor.test function in R. All testing for differential expression was implemented in R using either Seurat, scVelo, or DESeq2, and all results were FDR-corrected as noted in specific Methods sections. P-values, n, and all summary statistics are provided either in the results section, figure legends, figure panels, or tables.
[0336] Data and Code Availability - Prism (v6), R (v4.0.2) packages ggplot2 (v3.3.2), Seurat (v3.2.2), ComplexHeatmap (v2.7.3), and Circlize (0.4.11), fgsea (v.1.16.0) and Python (v3.8.3) package scVelo (v0.3.0) were used for visualization.
References
1. Meyerowitz, E. A., Richterman, A., Gandhi, R. T. & Sax, P. E. Transmission of SARS-CoV-2: A Review of Viral, Host, and Environmental Factors. Ann. Intern. Med. (2021) doi: 10.7326/m20-5008.
2. Fears, A. C. et al. Persistence of Severe Acute Respiratory Syndrome Coronavirus 2 in Aerosol Suspensions. Emerg. Infect. Dis. (2020) doi: 10.3201/eid2609.201806.
3. Pan, Y., Zhang, D., Yang, P., Poon, L. L. M. & Wang, Q. Viral load of SARS-CoV- 2 in clinical samples. The Lancet Infectious Diseases (2020) doi:10.1016/S1473-3099(20)30113-
4.
4. Sanche, S. et al. RESEARCH High Contagiousness and Rapid Spread of Severe Acute Respiratory Syndrome Coronavirus 2. Emerg. Infect. Dis. (2020) doi: 10.3201/eid2607.200282.
5. Arons, M. M. et al. Presymptomatic SARS-CoV-2 Infections and Transmission in a Skilled Nursing Facility. N. Engl. J Med. (2020) doi:10.1056/nejmoa2008457.
6. Wang, Y. et al. Clinical outcome of 55 asymptomatic cases at the time of hospital admission infected with SARS-Coronavirus-2 in Shenzhen, China. J. Infect. Dis. (2020) doi: 10.1093/infdis/jiaal 19.
7. Sakurai, A. et al. Natural History of Asymptomatic SARS-CoV-2 Infection. N. Engl. J. Med. (2020) doi:10.1056/nejmc2013020.
8. Guan, W. et al. Clinical Characteristics of Coronavirus Disease 2019 in China. N. Engl. J. Med. (2020) doi:10.1056/nejmoa2002032.
9. Huang, C. et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet (2020) doi:10.1016/S0140-6736(20)30183-5. 10. Chan, J. F. W. etal. Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerg. Microbes Infect. (2020) doi: 10.1080/22221751.2020.1719902.
11. Frieman, M. & Baric, R. Mechanisms of Severe Acute Respiratory Syndrome Pathogenesis and Innate Immunomodulation. Microbiol. Mol. Biol. Rev. (2008) doi : 10.1128/mmbr.OOO 15-08.
12. Harrison, A. G., Lin, T. & Wang, P. Mechanisms of SARS-CoV-2 Transmission and Pathogenesis. Trends in Immunology (2020) doi: 10.1016/j.it.2020.10.004.
13. Lucas, C. et al. Longitudinal analyses reveal immunological misfiring in severe COVID-19. Nature (2020) doi:10.1038/s41586-020-2588-y.
14. Mathew, D. et al. Deep immune profiling of COVID-19 patients reveals distinct immunotypes with therapeutic implications. Science (80- ). (2020) doi: 10.1126/SCIENCE. ABC8511.
15. Schulte-Schrepping, J. et al. Severe COVID-19 Is Marked by a Dysregulated Myeloid Cell Compartment. Cell (2020) doi: 10.1016/j. cell.2020.08.001.
16. Su, Y. et al. Multi-Omics Resolves a Sharp Disease-State Shift between Mild and Moderate COVID-19. Cell (2020) doi: 10.1016/j. cell.2020.10.037.
17. Galani, I. E. etal. Untuned antiviral immunity in COVID-19 revealed by temporal type I/III interferon patterns and flu comparison. Nat. Immunol. (2021) doi:10.1038/s41590-020- 00840-x.
18. Hadjadj, J. et al. Impaired type I interferon activity and inflammatory responses in severe COVID-19 patients. Science (80-. ). (2020) doi: 10.1126/science. abc6027.
19. Szabo, P. A. et al. Analysis of respiratory and systemic immune responses in COVID-19 reveals mechanisms of disease pathogenesis. medRxiv (2020) doi:10.1101/2020.10.15.20208041.
20. Speranza, E. et al. Single-cell RNA sequencing reveals SARS-CoV-2 infection dynamics in lungs of African green monkeys. Sci. Transl. Med. (2021) doi:10.1126/scitranslmed.abe8146.
21. Munster, V. et al. Respiratory disease and virus shedding in rhesus macaques inoculated with SARS-CoV-2. bioRxiv (2020) doi:10.1101/2020.03.21.001628. 22. Chandrashekar, A. et al. SARS-CoV-2 infection protects against rechallenge in rhesus macaques. Science (80-. ). (2020) doi: 10.1126/science. abc4776.
23. Chan, J. F. W. et al. Simulation of the Clinical and Pathological Manifestations of Coronavirus Disease 2019 (COVID-19) in a Golden Syrian Hamster Model: Implications for Disease Pathogenesis and Transmissibility. Clin. Infect. Dis. (2020) doi:10.1093/cid/ciaa325.
24. Sia, S. F. et al. Pathogenesis and transmission of SARS-CoV-2 in golden hamsters. Nature (2020) doi:10.1038/s41586-020-2342-5.
25. Sun, S. H. et al. A Mouse Model of SARS-CoV-2 Infection and Pathogenesis. Cell Host Microbe (2020) doi:10.1016/j.chom.2020.05.020.
26. Bao, L. et al. The pathogenicity of SARS-CoV-2 in hACE2 transgenic mice. Nature (2020) doi: 10.1038/s41586-020-2312-y.
27. Jiang, R. Di et al. Pathogenesis of SARS-CoV-2 in Transgenic Mice Expressing Human Angiotensin-Converting Enzyme 2. Cell (2020) doi: 10.1016/j. cell.2020.05.027.
28. Kim, Y. II et al. Infection and Rapid Transmission of SARS-CoV-2 in Ferrets. Cell Host Microbe (2020) doi:10.1016/j.chom.2020.03.023.
29. Richard, M. et al. SARS-CoV-2 is transmitted via contact and via the air between ferrets. Nat. Commun. (2020) doi:10.1038/s41467-020-17367-2.
30. Ravindra, N. G. et al. Single-cell longitudinal analysis of SARS-CoV-2 infection in human airway epithelium. bioRxiv (2020) doi: 10.1101/2020.05.06.081695.
31. Blanco-Melo, D. et al. SARS-CoV-2 launches a unique transcriptional signature from in vitro, ex vivo, and in vivo systems. bioRxiv (2020) doi:10.1101/2020.03.24.004655.
32. Zhang, Q. et al. Inborn errors of type I IFN immunity in patients with life- threatening COVID-19. Science (80-. ). (2020) doi: 10.1126/science. abd4570.
33. Bastard, P. et al. Autoantibodies against type I IFNs in patients with life-threatening COVID-19. Science (80-. ). (2020) doi:10.1126/science.abd4585.
34. Combes, A. J. et al. Global Absence and Targeting of Protective Immune States in Severe COVID-19. bioRxiv (2020).
35. Sungnak, W. et al. SARS-CoV-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes. Nat. Med. (2020) doi:10.1038/s41591-020- 0868-6. 36. Ziegler, C. G. K. et al. SARS-CoV-2 Receptor ACE2 Is an Interferon-Stimulated Gene in Human Airway Epithelial Cells and Is Detected in Specific Cell Subsets across Tissues. Cell (2020) doi:10.1016/j.cell.2020.04.035.
37. Huang, N. et al. Integrated Single-Cell Atlases Reveal an Oral SARS-CoV-2 Infection and Transmission Axis. medRxiv (2020).
38. Muus, C. et al. Integrated analyses of single-cell atlases reveal age, gender, and smoking status associations with cell type-specific expression of mediators of SARS-CoV-2 viral entry and highlights inflammatory programs in putative target cells. bioRxiv 2020.04.19.049254 (2020) doi: 10.1101/2020.04.19.049254.
39. Lukassen, S. et al. SARS-CoV-2 receptor ACE2 and TMPRSS2 are predominantly expressed in a transient secretory cell type in subsegmental bronchial branches. bioRxiv (2020) doi:10.101/2020.03.13.991455.
40. Chua, R. L. et al. COVID-19 severity correlates with airway epithelium-immune cell interactions identified by single-cell analysis. Nat. Biotechnol. (2020) doi:10.1038/s41587- 020-0602-4.
41. Schaefer, I. M. et al. In situ detection of SARS-CoV-2 in lungs and airways of patients with COVID-19. Mod. Pathol. (2020) doi:10.1038/s41379-020-0595-z.
42. Hou, Y. J. et al. SARS-CoV-2 Reverse Genetics Reveals a Variable Infection Gradient in the Respiratory Tract. Cell (2020) doi: 10.1016/j. cell.2020.05.042.
43. Zhu, N. et al. Morphogenesis and cytopathic effect of SARS-CoV-2 infection in human airway epithelial cells. Nat. Commun. (2020) doi:10.1038/s41467-020-17796-z.
44. Gierahn, T. M. et al. Seq-Well: Portable, low-cost rna sequencing of single cells at high throughput. Nat. Methods (2017) doi:10.1038/nmeth.4179.
45. Hughes, T. K. et al. Highly Efficient, Massively-Parallel Single-Cell RNA-Seq Reveals Cellular States and Molecular Features of Human Skin Pathology. bioRxiv (2019) doi: 10.1101/689273.
46. Aicher, T. P. et al. Seq-Well: A sample-efficient, portable picowell platform for massively parallel single-cell RNA sequencing, in Methods in Molecular Biology (2019). doi : 10.1007/978-1-4939-9240-9_8.
47. Ordovas-Montanes, J. et al. Allergic inflammatory memory in human respiratory epithelial progenitor cells. Nature (2018) doi:10.1038/s41586-018-0449-8.
48. Garcia, S. R. et al. Novel dynamics of human mucociliary differentiation revealed by single-cell RNA sequencing of nasal epithelial cultures. Dev. (2019) doi: 10.1242/dev.177428.
49. Montoro, D. T. et al. A revised airway epithelial hierarchy includes CFTR- expressing ionocytes. Nature (2018) doi:10.1038/s41586-018-0393-7.
50. Plasschaert, L. W. et al. A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte. Nature (2018) doi:10.1038/s41586-018-0394-6.
51. Hoffmann, M. et al. SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor. Cell (2020) doi: 10.1016/j.cell.2020.02.052.
52. Li, W. etal. Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus. Nature (2003) doi:10.1038/nature02145.
53. Yan, R. et al. Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2. Science (80-. ). (2020) doi: 10.1126/science. abb2762.
54. Wrapp, D. et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science (80-. ). (2020) doi: 10.1126/science. aax0902.
55. Wang, Q. et al. Structural and Functional Basis of SARS-CoV-2 Entry by Using Human ACE2. Cell (2020) doi:10.1016/j.cell.2020.03.045.
56. Daniloski, Z. et al. Identification of Required Host Factors for SARS-CoV-2 Infection in Human Cells. Cell (2021) doi: 10.1016/j. cell.2020.10.030.
57. Wang, R. et al. Genetic Screens Identify Host Factors for SARS-CoV-2 and Common Cold Coronaviruses. Cell (2021) doi: 10.1016/j. cell.2020.12.004.
58. Wei, J. et al. Genome-wide CRISPR Screens Reveal Host Factors Critical for SARS-CoV-2 Infection. Cell (2021) doi:10.1016/j.cell.2020.10.028.
59. Schneider, W. M. et al. Genome-Scale Identification of SARS-CoV-2 and Pan- coronavirus Host Factor Networks. Cell (2021) doi: 10.1016/j. cell.2020.12.006.
60. Broggi, A., Granucci, F. & Zanoni, I. Type III interferons: Balancing tissue tolerance and resistance to pathogen invasion. J. Exp. Med. (2020) doi :10.1084/jem.20190295. Tables
Table 1. Cell Type Marker Genes (related to Figures 1, 2, 9)
Table 1A. Coarse Cell Types (see Figure 1)
Figure imgf000139_0001
Figure imgf000139_0002
Figure imgf000139_0003
Figure imgf000140_0001
Figure imgf000141_0001
Figure imgf000142_0001
Figure imgf000143_0001
Figure imgf000144_0001
Figure imgf000145_0001
Figure imgf000146_0001
Figure imgf000147_0001
Figure imgf000148_0001
Figure imgf000149_0001
Figure imgf000150_0001
Figure imgf000151_0001
Figure imgf000152_0001
Figure imgf000153_0001
Figure imgf000154_0001
Figure imgf000155_0001
Figure imgf000156_0001
Figure imgf000157_0001
Figure imgf000158_0001
Table 1B. Detailed Epithelial Cell Types (see Figure 2)
Figure imgf000158_0002
Figure imgf000159_0001
Figure imgf000160_0001
Figure imgf000161_0001
Figure imgf000162_0001
Figure imgf000163_0001
Figure imgf000164_0001
Figure imgf000165_0001
Figure imgf000166_0001
Figure imgf000167_0001
Figure imgf000168_0001
Figure imgf000169_0001
Figure imgf000170_0001
Figure imgf000171_0001
Figure imgf000172_0001
Figure imgf000173_0001
Figure imgf000174_0001
Figure imgf000175_0001
Figure imgf000176_0001
Figure imgf000177_0001
Figure imgf000178_0001
Figure imgf000179_0001
Figure imgf000180_0001
Figure imgf000181_0001
Figure imgf000182_0001
Figure imgf000183_0001
Figure imgf000184_0001
Figure imgf000185_0001
Figure imgf000186_0001
Figure imgf000187_0001
Figure imgf000188_0001
Figure imgf000189_0001
Figure imgf000190_0001
Figure imgf000191_0001
Figure imgf000192_0001
Figure imgf000193_0001
Figure imgf000194_0001
Figure imgf000195_0001
Figure imgf000196_0001
Figure imgf000197_0001
Figure imgf000198_0001
Figure imgf000199_0001
Figure imgf000200_0001
Figure imgf000201_0001
Figure imgf000202_0001
Table 1C. Detailed Immune Cell types (see Figure 9)
Figure imgf000202_0002
Figure imgf000203_0001
Figure imgf000204_0001
Figure imgf000205_0001
Figure imgf000206_0001
Table 2. Differentially Expressed Genes Between Cell Types from Control WHO 0 vs. COVID-19 WHO 1-5 (Mild/Moderate). Related to Figure 3. Results from the comparison of cells from each cell type between Control WHO 0 vs. COVID-19 WHO 1-5 (mild/moderate) individuals. (Implemented using the FindAllMarkers function in Seurat, test.use = "negbinom"; Genes included with adjusted pvalue < 0.001, logFC > 0.25; Cell types without sufficient cells to test or fewer than 5 significant genes meeting the cutoffs are not listed).
Table 2A. Expressed in COVID-19 WHO 1-5 (mild/moderate) individuals
Figure imgf000206_0002
Figure imgf000207_0001
Figure imgf000208_0001
Table 2B. Expressed in Control WHO 0 individuals
Figure imgf000208_0002
Figure imgf000209_0001
Table 3. Differentially Expressed Genes Between Cell Types from Control WHO 0 vs. COVID-19 WHO 6-8 (Severe). Related to Figure 3. Results from the comparison of cells from each cell type between Control WHO 0 vs. COVID-19 WHO 6-8 (severe) individuals. (Implemented using the FindAllMarkers function in Seurat, test.use = "negbinom"; Genes included with adjusted pvalue < 0.001, logFC > 0.25; Cell types without sufficient cells to test or fewer than 5 significant genes meeting the cutoffs are not listed).
Table 3A. Expressed in COVID-19 WHO 6-8 (severe) individuals
Figure imgf000209_0002
Figure imgf000210_0001
Figure imgf000211_0001
Figure imgf000212_0001
Figure imgf000213_0001
Figure imgf000214_0001
Table 3B. Expressed in Control WHO 0 individuals
Figure imgf000214_0002
Figure imgf000215_0001
Figure imgf000216_0001
Table 4. Differentially Expressed Genes Between Cell Types from Control WHO 1-5 (Mild/Moderate) vs. COVID-19 WHO 6-8 (Severe). Related to Figure 3. Results from the comparison of cells from each cell type between COVID-19 WHO 1-5 (mild/moderate) vs. COVID-19 WHO 6-8 (severe) individuals. (Implemented using the FindAllMarkers function in Seurat, test.use = "negbinom"; Genes included with adjusted pvalue < 0.001, logFC > 0.25; Cell types without sufficient cells to test or fewer than 5 significant genes meeting the cutoffs are not listed).
Table 4A. Expressed in COVID-19 WHO 6-8 (severe) individuals
Figure imgf000216_0002
Figure imgf000217_0001
Figure imgf000218_0001
Figure imgf000219_0001
Figure imgf000220_0001
Table 4B. Expressed in COVID-19 WHO 1-5 (mild/moderate)individuals
Figure imgf000220_0002
Figure imgf000221_0001
Figure imgf000222_0001
Figure imgf000223_0001
Figure imgf000224_0001
Figure imgf000225_0001
Table 5. Common Differentially Expressed Genes Between SARS-CoV-2 RNA+ Cells and Bystander Cells. Related to Figure 6. log2 fold change between SARS-CoV-2 RNA+ cells (high, positive values) and matched bystander cells (low, negative values). Columns: detailed cell types with at least 5 SARS-CoV-2 RNA+ cells
Figure imgf000226_0001
Figure imgf000227_0001
Figure imgf000228_0001
Figure imgf000229_0001
Figure imgf000230_0001
Figure imgf000231_0001
Figure imgf000232_0001
Figure imgf000233_0001
Figure imgf000234_0001
Figure imgf000235_0001
Figure imgf000236_0001
Figure imgf000237_0001
Figure imgf000237_0002
Figure imgf000238_0001
Figure imgf000239_0001
Figure imgf000240_0001
Figure imgf000241_0001
Figure imgf000242_0001
Figure imgf000243_0001
Figure imgf000244_0001
Figure imgf000245_0001
Figure imgf000246_0001
Figure imgf000247_0001
Figure imgf000248_0001
Figure imgf000249_0001
Table 6. Participant characteristics
Figure imgf000250_0001
[0337] Various modifications and variations of the described methods, pharmaceutical compositions, and kits of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific embodiments, it will be understood that it is capable of further modifications and that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention that are obvious to those skilled in the art are intended to be within the scope of the invention. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure come within known customary practice within the art to which the invention pertains and may be applied to the essential features herein before set forth.

Claims

CLAIMS What is claimed:
1. A method of treating a barrier tissue infection in a subject in need thereof comprising: detecting one or more indicators of infection from a sample obtained from the subject, wherein the sample comprises one or more of epithelial, immune, stromal, and neuronal cells; comparing the indicators to control/healthy samples or disease reference values to determine whether the subject will progress to a risk group selected from:
(i) mild/moderate disease; or
(ii) severe disease; and administering one or more treatments if one or more indicators are present.
2. The method of claim 1, wherein the barrier tissue infection is a respiratory barrier tissue infection.
3. The method of claim 2, wherein mild subjects are asymptomatic or symptomatic and not hospitalized, wherein moderate subjects are hospitalized and do not require oxygen by non- invasive ventilation or high flow, and wherein severe subjects are hospitalized and require oxygen by non-invasive ventilation, high flow, or intubation and mechanical ventilation.
4. The method of any of claims 1 to 3, wherein the infection is a viral infection.
5. The method of claim 4, wherein the viral infection is a coronavirus.
6. The method of claim 5, wherein the coronavirus is SARS-CoV2 or variant thereof.
7. The method of claim 6, wherein mild/moderate subjects have a WHO score of 1-5 and severe subjects have a WHO score of 6-8.
8. The method of any of claims 1 to 7, wherein the one or more indicators of infection are selected from the group consisting of: a) decreased interferon-stimulated gene (ISG) induction; b) upregulation of one or more anti-viral factors or IFN-responsive genes; c) reduction of mature ciliated cell population or increased immature ciliated cell population; d) increased secretory cell population; e) increased deuterosomal cell population; f) increased ciliated cell population; g) increased goblet cell population; h) decreased expression in Type II interferon specific genes; i) increased expression in Type I interferon specific genes; j) increased MHC-I and MHC-II genes; k) increased developing ciliated cell populations; l) altered expression of one or more genes in a cell type selected from any of Tables
2-4; m) altered expression of one or more genes in a cell type selected from Table 5; n) increase expression of IFITM3 and IFI44L; o) increased expression of EIF2AK2; p) increased expression of TMPRSS4, TMPRSS2, CTSS, CTSD; q) upregulation of cholesterol and lipid biosynthesis; and r) increased abundance of low-density lipoprotein receptors LDLR and LRP8.
9. The method of claim 8, wherein one or more interferon-stimulated genes are detected, wherein if the one or more interferon-stimulated genes are downregulated the subject is at risk for severe disease and if the one or more interferon-stimulated genes are upregulated the subject is not at risk for severe disease.
10. The method of claim 9, wherein the one or more interferon-stimulated genes are selected from the group consisting of STAT1, STAT2, IRF1, and IRF9.
11. The method of any of claims 1 to 10, wherein the one or more indicators of infection are detected in infected host cells and compared to reference values in infected host cells from a risk group.
12. The method of claim 11, wherein one or more anti-viral factors or IFN-responsive genes are detected in virally-infected cells, wherein if the one or more anti-viral factors or IFN- responsive genes are downregulated or absent in virally-infected cells the subject is at risk for severe disease and if the one or more anti-viral factors or IFN-responsive genes are upregulated in virally-infected cells the subject is not at risk for severe disease.
13. The method of claim 12, wherein the one or more anti -viral factors or IFN-responsive genes are selected from the group consisting of EIF2AK2, STAT1 and STAT2.
14. The method of any of claims 8 to 13, wherein the secretory cells comprise one or both of: KRT13 KRT24 high Secretory Cells and Early Response Secretory Cells.
15. The method of any of claims 8 to 13, wherein the secretory cells express CXCL8.
16. The method of any of claims 8 to 13, wherein the goblet cells comprise one or both of: AZGP1 high Goblet Cells and SCGB1A1 high Goblet Cells.
17. The method of any of claims 8 to 13, where the ciliated cells comprise one or more upregulated genes selected from the group consisting of IFI27, IFIT1, IFI6, IFITM3, and GBP3.
18. The method of any of claims 8 to 13, wherein one or both of the ciliated cells and the goblet cells comprise increased gene expression of one or more IFN gene selected from any of Tables 2- 4.
19. The method of any of claims 8 to 13, wherein ACE2 expression is upregulated compared to other epithelial cells among one or more of secretory cells, goblet cells, ciliated cells, developing ciliated cells, and deuterosomal cells.
20. The method of any of claims 8 to 13, wherein the mature ciliated cells are BEST4 high cilia high ciliated cells.
21. The method of any of claims 8 to 13, wherein the MHC-I and MHC-II genes comprise at least one or more of: HLA-A, HLA-C, HLA-F, HLA-E, HLA-DRB1, and HLA-DRA.
22. The method of any of claims 8 to 13, wherein the upregulated cholesterol and lipid biosynthesis genes comprise at least one or more of: FDFT1, MVK, FDPS, ACAT2, and HMGCS1.
23. The method of any of claims 1 to 22, wherein detecting one or more indicators is performed by using Simpson's index.
24. The method of any of claims 1 to 23, where a subject will progress to the severe risk group if one or more of the following is detected in the sample: a) proinflammatory cytokines comprising at least one or more of: IL1B, TNF, CXCL8, CCL2, CCL3, CXCL9, CXCL10, and CXCL11; b) upregulation of alarmins comprising one or both of: S100A8 and S100A9; c) 14% - 26% of all epithelial cells are secretory cells; d) elevated BPIFA1 high Secretory cells; e) elevated KRT13 KRT24 high secretory cells; f) macrophage population increase as compared to other immune cells; g) upregulated genes in ciliated cells comprising one or both of: IL5RA and NLRPl; h) no increase of at least one or more of: type I, type II, and type III interferon abundance; i) elevated stress response factors comprising at least one or more of: HSPA8, HSPA1A, and DUSP1; j) increased expression of one or more genes differentially expressed in COVID-19 WHO 6-8 according to Table 3 or Table 4; k) reduced or absent antiviral/interferon response; and l) reduced or absent mature ciliated cells.
25. The method of claim 24, wherein the macrophage population comprises at least one or more of: ITGAX High Macrophages, FFAR High Macrophages, Inflammatory Macrophages, and Interferon Responsive Macrophages.
26. The method of any of claims 1 to 23, where a subject is determined to belong to the mild/moderate risk group if one or more of the following is detected in the sample: a) 4% - 12% of all epithelial cells are Secretory Cells; b) 10% - 20% of all epithelial cells comprise Interferon Responsive Ciliated Cells; c) upregulated ciliated cell genes comprising at least one or more of: IFI44L, STAT1, IFITM1, MX1, IFITM3, OAS1, OAS2, OAS3, STAT2, TAP1, HLA-C, ADAR, XAF1, IRF1, CTSS, and CTSB; d) increase in type I interferon abundance; e) high expression of interferon-responsive genes; f) decreased expression of one or more genes differentially expressed in COVID-19 WHO 6-8 according to Table 3 or Table 4; g) induction of type I interferon responses; and h) high abundance of IFI6 and IFI27.
27. The method of claim 26, where the interferon-responsive genes comprise at least one or more of: STAT1, MX1, HLA-B, and HLA-C.
28. The method of claim 26, where the interferon response occurs in at least one or more of: MUC5AC high Goblet Cells, SCGB1A1 high Goblet Cells, Early Response Secretory Cells, Deuterosomal Cells, Interferon Responsive Ciliated Cells, and BEST4 high Cilia high Ciliated Cells.
29. The method of any of claims 1 to 28, wherein the treatment is administered according to determined risk group.
30. The method of claim 29, where the treatment involves administering a preventative or therapeutic intervention according to the determined risk group.
31. The method of claim 29 or 30, wherein if the subject is determined to be at risk for progression to the severe risk group the subject is administered a treatment comprising one or more treatments selected from the group consisting of: a) one or more antiviral; b) blood-derived immune-based therapy; c) one or more corticosteroid; d) one or more interferon; e) one or more interferon Type I agonists; f) one or more interleukin- 1 inhibitors; g) one or more kinase inhibitors; h) one or TLR agonists; i) a glucocorticoid; and j) interleukin-6 inhibitor.
32. The method of claim 29 or 30, wherein if the subject is determined to be at risk for progression to either risk group the subject is administered a treatment comprising one or more of: a) one or more antiviral; b) one or more antibiotic; and c) one or more cholesterol biosynthesis inhibitor.
33. The method of claim 29 to 32, where the treatment comprises an antiviral.
34. The method of the 33, where the antiviral inhibits viral replication.
35. The method of claim 34, where the antiviral is selected from the group consisting of paxlovid, molnupiravir and remdesivir.
36. The method of claim 29 to 32, where the treatment is an immune-based therapy.
37. The method of claim 36, where the immune-based therapy is a blood-derived product comprising at least one or more of: a convalescent plasma and an immunoglobin.
38. The method of claim 37, where the immune-based therapy is an immunomodulator comprising at least one or more of: a corticosteroid, a glucocorticoid, an interferon, an interferon Type I agonist, an interleukin-1 inhibitor, an interleukin-6 inhibitor, a kinase inhibitor, and a TLR agonist.
39. The method of the claim 38, where the corticosteroid comprises at least one of: methylprednisolone, hydrocortisone, and dexamethasone.
40. The method of the claim 38, where the glucocorticoid comprises at least one of: cortisone, prednisone, prednisolone, methylprednisolone, dexamethasone, betamethasone, triamcinolone, Fludrocortisone acetate, deoxycorticosterone acetate, and hydrocortisone.
41. The method of claim 38, where the interferon comprises at least one or more of: interferon beta-lb and interferon alpha-2b.
42. The method of claim 38, where the interleukin- 1 inhibitor comprises anakinra.
43. The method of claim 38, where the interleukin-6 inhibitor comprises at least one or more of: anti-interleukin-6 receptor monoclonal antibodies and anti-interleukin-6 monoclonal antibody.
44. The method of the claim 43, where the anti-interleukin-6 receptor monoclonal antibody is tocilizumab.
45. The method of the claim 43, where the anti-interleukin-6 monoclonal antibody is siltuximab.
46. The method of the claim 38, where the kinase inhibitor comprises of at least one or more of Bruton's tyrosine kinase inhibitor and Janus kinase inhibitor.
47. The method of claim 46, where the Bruton's tyrosine kinase inhibitor comprises at least one or more of: acalabrutinib, ibrutinib, and zanubrutinib.
48. The method of claim 46, where the Janus kinase inhibitor comprises at least one or more of: baracitinib, ruxolitinib and tofacitinib.
49. The method of claim 38, were the TLR agonist comprises at least one or more of: imiquimod, BCG, and MPL.
50. The method of claim 29 to 32, wherein the treatment comprises inhibiting cholesterol biosynthesis.
51. The method of claim 50, wherein inhibiting cholesterol biosynthesis comprises administering HMG-CoA reductase inhibitors.
52. The method of 51, wherein the HMG-CoA reductase inhibitor comprises at least one or more of: simvastatin atorvastatin, lovastatin, pravastatin, fluvastatin, rosuvastatin, pitavastatin.
53. The method of any claim 29 to 32, where the treatment comprises an antibiotic.
54. The method of claim 1, wherein the treatment comprises one or more agents capable of shifting epithelial cells to express an antiviral signature.
55. The method of claim 1, wherein the treatment comprises one or more agents capable of suppressing a myeloid inflammatory response.
56. The method of claim 1, wherein the treatment comprises a CRISPR-Cas system.
57. The method of claim 56, wherein the CRISPR system comprises a CRISPR-Cas base editing system, a prime editor system, or a CAST system.
58. The method of any of the preceding claims, wherein the treatment is administered before disease onset.
59. The method of any of the preceding claims, wherein the one or more cell types are detected using one or markers differentially expressed in the cell types.
60. The method of any of the preceding claims, wherein the one or more cell types or one or more genes are detected by immunohistochemistry (IHC), fluorescence activated cell sorting (FACS), fluorescently bar-coded oligonucleotide probes, RNA FISH (fluorescent in situ hybridization), RNA-seq, or any combination thereof.
61. The method of claim 60, wherein single cell expression is inferred from bulk RNA-seq.
62. The method of claim 61, wherein expression is determined by single cell RNA-seq.
63. A method of screening for agents capable of shifting epithelial cells from a SARS-CoV2 severe phenotype to a mild/moderate phenotype comprising: a. treating a sample comprising epithelial cells with a drug candidate; b. detecting modulation of any indicators of infection according to any of the preceding claims; and c. identifying the drug, wherein the one or more indicators shift towards a mild/moderate phenotype.
64. The method of claim 63, wherein the sample comprises epithelial cells infected with SARS-CoV2.
65. The method of claim 63, wherein the sample comprises epithelial cells expressing one or more SARS-CoV2 genes.
66. The method of any of claims 63 to 65, wherein the sample is an organoid or tissue model.
67. The method of any of claims 63 to 65, wherein the sample is an animal model.
68. The method of any of the preceding claims, wherein cell types are detected using one or markers selected from Table 1.
69. A method of detecting susceptibility to a barrier tissue infection in a subject in need thereof comprising: detecting one or more indicators of susceptibility from a sample obtained from the subject, wherein the sample comprises one or more of epithelial, immune, stromal, and neuronal cells; comparing the indicators to control/healthy samples or disease reference values to determine whether the subject belongs to a risk group selected from mild/moderate; or severe.
70. The method of claim 69, wherein the barrier tissue infection is a respiratory barrier tissue infection.
71. The method of claim 70, wherein mild subjects are asymptomatic or symptomatic and not hospitalized, wherein moderate subjects are hospitalized and do not require oxygen by non- invasive ventilation or high flow, and wherein severe subjects are hospitalized and require oxygen by non-invasive ventilation, high flow, or intubation and mechanical ventilation.
72. The method of any of claims 69 to 71, wherein the infection is a viral infection.
73. The method of claim 72, wherein the viral infection is a coronavirus.
74. The method of claim 73, wherein the coronavirus is SARS-CoV2 or variant thereof.
75. The method of claim 74, wherein mild/moderate subjects have a WHO score of 1-5 and severe subjects have a WHO score of 6-8.
76. The method of any of claims 69 to 75, wherein the one or more indicators of susceptibility are selected from the group consisting of: a) decreased interferon-stimulated gene (ISG) induction; b) upregulation of one or more anti-viral factors or IFN-responsive genes; c) reduction of mature ciliated cell population or increased immature ciliated cell population; d) increased secretory cell population; e) increased deuterosomal cell population; f) increased ciliated cell population; g) increased goblet cell population; h) decreased expression in Type II interferon specific genes; i) increased expression in Type I interferon specific genes; j) increased MHC-I and MHC-II genes; k) increased developing ciliated cell populations; l) altered expression of one or more genes in a cell type selected from any of Tables
2-4; m) altered expression of one or more genes in a cell type selected from Table 5; n) increase expression of IFITM3 and IFI44L; o) increased expression of EIF2AK2; p) increased expression of TMPRSS4, TMPRSS2, CTSS, CTSD; q) upregulation of cholesterol and lipid biosynthesis; and r) increased abundance of low-density lipoprotein receptors LDLR and LRP8.
77. The method of claim 76, wherein one or more interferon-stimulated genes are detected, wherein if the one or more interferon-stimulated genes are downregulated the subject is at risk for severe disease and if the one or more interferon-stimulated genes are upregulated the subject is not at risk for severe disease.
78. The method of claim 77, wherein the one or more interferon-stimulated genes are selected from the group consisting of STAT1, STAT2, IRF1, and IRF9.
79. The method of any of claims 69 to 78, wherein the one or more indicators of infection are detected in infected host cells and compared to reference values in infected host cells from a risk group.
80. The method of claim 79, wherein one or more anti-viral factors or IFN-responsive genes are detected in virally-infected cells, wherein if the one or more anti-viral factors or IFN- responsive genes are downregulated or absent in virally-infected cells the subject is at risk for severe disease and if the one or more anti-viral factors or IFN-responsive genes are upregulated in virally-infected cells the subject is not at risk for severe disease.
81. The method of claim 80, wherein the one or more anti -viral factors or IFN-responsive genes are selected from the group consisting of EIF2AK2, STAT1 and STAT2.
82. The method of claim 70, wherein the secretory cells comprise one or both of: KRT13 KRT24 high Secretory Cells and Early Response Secretory Cells.
83. The method of claim 70, wherein the secretory cells express CXCL8.
84. The method of claim 70, wherein the goblet cells comprise one or both of: AZGP1 high Goblet Cells and SCGB1A1 high Goblet Cells.
85. The method of claim 70, where the ciliated cells comprise one or more upregulated genes selected from the group consisting of IFI27, IFIT1, IFI6, IFITM3, and GBP3.
86. The method of claim 70, wherein one or both of the ciliated cells and the goblet cells comprise increased gene expression of one or more IFN gene selected from any of Tables 2-4.
87. The method of claim 70, wherein ACE2 expression is upregulated compared to other epithelial cells among one or more of secretory cells, goblet cells, ciliated cells, developing ciliated cells, and deuterosomal cells.
88. The method of claim 70, wherein the mature ciliated cells are BEST4 high cilia high ciliated cells.
89. The method of claim 70, wherein the MHC-I and MHC-II genes comprise at least one or more of: HLA-A, HLA-C, HLA-F, HLA-E, HLA-DRB1, and HLA-DRA.
90. The method of claim 70, wherein the upregulated cholesterol and lipid biosynthesis genes comprise at least one or more of: FDFT1, MVK, FDPS, ACAT2, and HMGCS1.
91. The method of claim 69, wherein detecting one or more indicators is performed by using Simpson's index.
92. The method of claim 69, where a subject is determined to belong to the severe risk group if one or more of the following is detected in the sample: a) proinflammatory cytokines comprising at least one or more of: IL1B, TNF, CXCL8, CCL2, CCL3, CXCL9, CXCL10, and CXCL11; b) upregulation of alarmins comprising one or both of: S100A8 and S100A9; c) 14% - 26% of all epithelial cells are secretory cells; d) elevated BPIFA1 high Secretory cells; e) elevated KRT13 KRT24 high secretory cells; f) macrophage population increase as compared to other immune cells; g) upregulated genes in ciliated cells comprising one or both of: IL5RA and NLRP1; h) no increase of at least one or more of: type I, type II, and type III interferon abundance; i) elevated stress response factors comprising at least one or more of: HSPA8, HSPA1A, and DUSP1; j) increased expression of one or more genes differentially expressed in COVID-19 WHO 6-8 according to Table 3 or Table 4; k) reduced or absent antiviral/interferon response; and l) reduced or absent mature ciliated cells.
93. The method of claim 92, wherein the macrophage population comprises at least one or more of: ITGAX High Macrophages, FFAR High Macrophages, Inflammatory Macrophages, and Interferon Responsive Macrophages.
94. The method of claim 69, where a subject is determined to belong to the mild/moderate risk group if one or more of the following is detected in the sample: a) 4% - 12% of all epithelial cells are Secretory Cells; b) 10% - 20% of all epithelial cells comprise Interferon Responsive Ciliated Cells; c) upregulated ciliated cell genes comprising at least one or more of: IFI44L, STAT1, IFITM1, MX1, IFITM3, OAS1, OAS2, OAS3, STAT2, TAP1, HLA-C, ADAR, XAF1, IRF1, CTSS, and CTSB; d) increase in type I interferon abundance; e) high expression of interferon-responsive genes; f) decreased expression of one or more genes differentially expressed in COVID-19 WHO 6-8 according to Table 3 or Table 4; g) induction of type I interferon responses; and h) high abundance of IFI6 and IFI27.
95. The method of claim 94, where the interferon-responsive genes comprise at least one or more of: STAT1, MX1, HLA-B, and HLA-C.
96. The method of claim 94, where the interferon response occurs in at least one or more of: MUC5AC high Goblet Cells, SCGB1A1 high Goblet Cells, Early Response Secretory Cells, Deuterosomal Cells, Interferon Responsive Ciliated Cells, and BEST4 high Cilia high Ciliated Cells.
PCT/US2022/017082 2021-02-18 2022-02-18 Methods of stratifying and treating coronavirus infection WO2022178312A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US18/277,612 US20240229166A9 (en) 2021-02-18 2022-02-18 Methods of stratifying and treating coronavirus infection
EP22757036.3A EP4295151A4 (en) 2021-02-18 2022-02-18 METHODS OF STRATIFICATION AND TREATMENT OF CORONAVIRUS INFECTION

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202163151002P 2021-02-18 2021-02-18
US63/151,002 2021-02-18
US202163203514P 2021-07-26 2021-07-26
US63/203,514 2021-07-26

Publications (1)

Publication Number Publication Date
WO2022178312A1 true WO2022178312A1 (en) 2022-08-25

Family

ID=82931850

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/017082 WO2022178312A1 (en) 2021-02-18 2022-02-18 Methods of stratifying and treating coronavirus infection

Country Status (3)

Country Link
US (1) US20240229166A9 (en)
EP (1) EP4295151A4 (en)
WO (1) WO2022178312A1 (en)

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
BACHER PETRA; ROSATI ELISA; ESSER DANIELA; MARTINI GABRIELA RIOS; SAGGAU CARINA; SCHIMINSKY ESTHER; DARGVAINIENE JUSTINA; SCHR&#24: "Low-Avidity CD4+ T Cell Responses to SARS-CoV-2 in Unexposed Individuals and Humans with Severe COVID-19", IMMUNITY, CELL PRESS, AMSTERDAM, NL, vol. 53, no. 6, 26 November 2020 (2020-11-26), AMSTERDAM, NL , pages 1258, XP086410221, ISSN: 1074-7613, DOI: 10.1016/j.immuni.2020.11.016 *
BALLESTAR ESTEBAN, FARBER DONNA L., GLOVER SARAH, HORWITZ BRUCE, MEYER KERSTIN, NIKOLIĆ MARKO, ORDOVAS-MONTANES JOSE, SIMS PETER, : "Single cell profiling of COVID-19 patients: an international data resource from multiple tissues", MEDRXIV, 23 November 2020 (2020-11-23), XP055964600, [retrieved on 20220926], DOI: 10.1101/2020.11.20.20227355 *
LIAO ET AL.: "Single- cell landscape of bronchoalveolar immune cells in patients with COVID-19", NATURE MEDICINE, vol. 26, no. 6, June 2020 (2020-06-01), pages 842 - 844, XP037173433, DOI: 10.1038/s41591-020-0901-9 *
MENZEL MANDY, AKBARSHAHI HAMID, TUFVESSON ELLEN, PERSSON CARL, BJERMER LEIF, ULLER LENA: "Azithromycin augments rhinovirus-induced IFNβ via cytosolic MDA5 in experimental models of asthma exacerbation", ONCOTARGET, vol. 8, no. 19, 9 May 2017 (2017-05-09), pages 31601 - 31611, XP055964609, DOI: 10.18632/oncotarget.16364 *
UDDIN FATHEMA, RUDIN CHARLES M., SEN TRIPARNA: "CRISPR Gene Therapy: Applications, Limitations, and Implications for the Future", FRONTIERS IN ONCOLOGY, vol. 10, 7 August 2020 (2020-08-07), XP055964602, DOI: 10.3389/fonc.2020.01387 *
YE BINGJUE, ZHOU CHENG, GUO HUITING, ZHENG MIN: "Effects of BTK signalling in pathogenic microorganism infections", JOURNAL OF CELLULAR AND MOLECULAR MEDICINE, UNIVERSITY PRESS CAROL DAVILA, BUCHAREST, RO, vol. 23, no. 10, 1 October 2019 (2019-10-01), RO , pages 6522 - 6529, XP055964605, ISSN: 1582-1838, DOI: 10.1111/jcmm.14548 *

Also Published As

Publication number Publication date
EP4295151A1 (en) 2023-12-27
US20240132976A1 (en) 2024-04-25
EP4295151A4 (en) 2025-03-05
US20240229166A9 (en) 2024-07-11

Similar Documents

Publication Publication Date Title
Ziegler et al. Impaired local intrinsic immunity to SARS-CoV-2 infection in severe COVID-19
Nouailles et al. Temporal omics analysis in Syrian hamsters unravel cellular effector responses to moderate COVID-19
US12105089B2 (en) Cell atlas of the healthy and ulcerative colitis human colon
US12227578B2 (en) Modulation of intestinal epithelial cell differentiation, maintenance and/or function through T cell action
US20210040442A1 (en) Modulation of epithelial cell differentiation, maintenance and/or function through t cell action, and markers and methods of use thereof
US20200208114A1 (en) Taxonomy and use of bone marrow stromal cell
WO2019070755A1 (en) Methods and compositions for detecting and modulating an immunotherapy resistance gene signature in cancer
Wang et al. Distinct expression of SARS‐CoV‐2 receptor ACE2 correlates with endotypes of chronic rhinosinusitis with nasal polyps
US20210024997A1 (en) Cell atlas of healthy and diseased tissues
US20240068057A1 (en) Markers of active hiv reservoir
US10851415B2 (en) Molecular predictors of sepsis
EP2971116A1 (en) Dendritic cell response gene expression, compositions of matters and methods of use thereof
US11630103B2 (en) Product and methods useful for modulating and evaluating immune responses
US10072296B2 (en) Compositions and methods for sjögren&#39;s syndrome
US20220202845A1 (en) Methods and compositions for treating cancer
US20200397828A1 (en) Atlas of choroid plexus cell types and therapeutic and diagnostic uses thereof
US20240108689A1 (en) Modulation of a pathogenic phenotype in th1 cells
US20210208162A1 (en) Down syndrome biomarkers and uses thereof
US20220152148A1 (en) Modulation of type 2 immunity by targeting clec-2 signaling
Dong et al. Single‐cell transcriptomics reveals longevity immune remodeling features shared by centenarians and their offspring
EP3937969A1 (en) Compositions and methods for modulating cgrp signaling to regulate intestinal innate lymphoid cells
Liu et al. LncRNA expression in CD4+ T cells in neurosyphilis patients
AU2018335382A1 (en) Novel cell line and uses thereof
US20200146269A1 (en) Methods and compositions for modulating innate lymphoid cell pathogenic effectors
EP4295151A1 (en) Methods of stratifying and treating coronavirus infection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22757036

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022757036

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022757036

Country of ref document: EP

Effective date: 20230918