WO2022125702A1 - Analysis of host gene expression for diagnosis of severe acute respiratory syndrome coronavirus 2 infection - Google Patents

Analysis of host gene expression for diagnosis of severe acute respiratory syndrome coronavirus 2 infection Download PDF

Info

Publication number
WO2022125702A1
WO2022125702A1 PCT/US2021/062474 US2021062474W WO2022125702A1 WO 2022125702 A1 WO2022125702 A1 WO 2022125702A1 US 2021062474 W US2021062474 W US 2021062474W WO 2022125702 A1 WO2022125702 A1 WO 2022125702A1
Authority
WO
WIPO (PCT)
Prior art keywords
genes
sars
cov
test sample
rna expression
Prior art date
Application number
PCT/US2021/062474
Other languages
French (fr)
Inventor
Charles Y. CHIU
Dianna L. NG
Andrea C. GRANADOS
Yale A. SANTOS
Venice SERVELLITA
Original Assignee
The Regents Of The University Of California
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Regents Of The University Of California filed Critical The Regents Of The University Of California
Publication of WO2022125702A1 publication Critical patent/WO2022125702A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/70Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving virus or bacteriophage
    • C12Q1/701Specific hybridization probes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A50/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
    • Y02A50/30Against vector-borne diseases, e.g. mosquito-borne, fly-borne, tick-borne or waterborne diseases whose impact is exacerbated by climate change

Definitions

  • the present disclosure relates to methods of characterizing gene expression of a mammalian host suspected of having an acute respiratory illness.
  • the methods are suitable for determining whether the host is infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
  • SARS-CoV-2 severe acute respiratory syndrome coronavirus 2
  • Coronavirus disease 2019 (COVID-19) is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a virus that spreads from person to person through droplet and contact transmission. Since it was first detected in December 2019, SARS-CoV-2 has spread worldwide, resulting in a devastating global pandemic. As of mid-October 2020, there have been over 39 million confirmed cases of COVID-19, and over 1 million confirmed deaths worldwide, according to the World Health Organization (WHO Weekly Operational Update on COVID-19). In many parts of the world, the spread of SARS-CoV-2 is still uncontrolled. This means that millions more are at risk of contracting CO VID-19.
  • SARS-CoV-2 severe acute respiratory syndrome coronavirus 2
  • Controlling the spread of COVID-19 depends in large part on the ability to identify SARS-CoV-2 infected individuals.
  • Two types of tests are generally employed for detecting active SARS-CoV-2 infections (Vandenberg et al., Nat Rev Microbiol. 2020 Oct 14; ; 1 - 13. doi: 10.1038/s41579-020-00461-z).
  • the first detects the presence of SARS-CoV-2 nucleic acids in a sample, for example, by specifically amplifying a region of the viral genome using a polymerase chain reaction.
  • the second detects the presence of viral antigens, for example, using an antibody that specifically binds to a SARS-CoV-2 protein.
  • Both types of tests may produce erroneous results, namely false positives and false negatives. False negatives are particularly a problem when viral titers in a biological sample are low.
  • the present disclosure relates to methods of characterizing gene expression of a mammalian host suspected of having an acute respiratory illness.
  • the methods are suitable for determining whether the host is infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
  • SARS-CoV-2 severe acute respiratory syndrome coronavirus 2
  • FIGS. 1A-1H provide an overview of sample collection and metatranscriptomic analysis.
  • FIG. 1A shows a flow chart of nasopharyngeal (NP) swab and whole blood (WB) sample collection for metatranscriptomic next-generation sequencing (NGS).
  • FIG. 1C-1E show analyses of the viral and bacterial metatranscriptome as determined by NGS.
  • FIG. 1C shows box-and-whiskers plots of abundance
  • FIG. ID shows box-and-whiskers plots of Chao Richness Scores
  • Patents were stratified by the inclusion (“Including Respiratory Viral Reads”) or exclusion (“Exclusion Respiratory Viral Reads”) of respiratory viral reads, as indicated above the plots.
  • FIG. IF shows box-and-whiskers plots of abundance
  • FIG. 1G shows box-and- whiskers plots of Chao Richness Score
  • the median is represented by a dotted line
  • whiskers represent the minimum and maximum values
  • jitters represent the distribution of the population.
  • FIGS. 1C-H statistical analysis was conducted by Kruskal-Wallis test, followed by the Nemyeni test for post- hoc analysis.
  • FIGS. 2A-2D show Venn diagrams of differentially expressed genes (DEGs).
  • FIG. 2A shows a comparison of NP swab DEGs in SARS-CoV-2 (left) and influenza (right) patients.
  • FIG. 2B shows a comparison of NP swab (left) and WB (right) DEGs in SARS-CoV-2 patients.
  • FIG. 2C shows a comparison of NP swab DEGs in COVID-19 hospitalized patients (left) and outpatients (right).
  • FIG. 2D shows a comparison of NP swab (left) and WB (right) DEGs in influenza patients.
  • DEGs Venn diagrams of differentially expressed genes
  • the DEGs are calculated relative to the donor controls, and the shared DEGs are listed in boxes below each diagram.
  • the plurality of genes does not comprise one or more of the genes shown in the box of FIG. 2A, FIG. 2B, FIG. 2C, and/or FIG. 2D.
  • FIG. 3 shows an overview of the design and distribution of samples for training and test sets in layer 1 and layer 2 of a diagnostic classifier for COVID-19.
  • Layer 1 differentiates between SARS-CoV-2 positive and SARS-CoV-2 negative cases (excluding influenza and seasonal coronavirus infections).
  • Layer 2 differentiates SARS-CoV-2 from influenza and seasonal coronavirus infections.
  • FIGS. 4A-C show performance characteristics of the two-layered (combined layer 1 and layer 2) classifier for full, medium, and small gene panels. Training set ROC curve (left) and test set violin plot (middle) and confusion matrix (right) for the two-layer classifier, using either the full gene panel (FIG. 4A), the medium gene panel (FIG. 4B), or the small gene panel (FIG. 4C) [0012]
  • FIGS. 5A-B show Venn diagrams of differentially expressed genes (DEGs).
  • DEGs differentially expressed genes
  • FIG. 5A shows a comparison of DEGs in nasopharyngeal swabs of COVID-19 out patients to whole blood samples of hospitalized (hospitalized, non-ICU, and ICU) CO VID-19 patients.
  • FIG. 5B shows a comparison of DEGs in hospitalized (hospitalized, non-ICU, and ICU) COVID-19 patients nasopharyngeal swabs and whole blood.
  • the plurality of genes does not comprise one or more of the genes shown in the box of FIG. 5A and/or FIG. 5B.
  • FIGS. 6A-F provide assessments of COVID-19 classifier test performance.
  • FIGS. 6A-B show violin plots for layer 1 and layer 2 of the full gene panel, respectively.
  • FIGS. 6C-D show violin plots for layer 1 and layer 2 of the medium gene panel, respectively.
  • FIGS. 6E-F show violin plots for layer 1 and layer 2 of the small gene panel, respectively.
  • Coronavirus Disease- 19 (CO VID-19) has emerged as the cause of a global pandemic.
  • RNA sequencing was used to analyze 286 nasopharyngeal (NP) swab and 53 whole blood (WB) samples from 333 COVID-19 patients and controls, including patients with other viral and bacterial infections.
  • DEGs differentially expressed genes
  • DEGs differentially expressed genes
  • Comparative COVID-19 host responses between NP swabs and WB were distinct, with minimal overlap in DEGs. Both hospitalized patients and outpatients exhibited upregulation of interferon-associated pathways, although heightened and more robust inflammatory and immune responses were observed in hospitalized patients with more clinically severe disease.
  • a two-layer machine learning-based classifier run on an independent test set of 94 NP swab samples, was able to discriminate between COVID-19 and non-COVID-19 infectious or non-infectious acute respiratory illness using complete (>1,000 genes), medium ( ⁇ 100) and small ( ⁇ 20) gene biomarker panels with 85. l%-86.5% accuracy, respectively.
  • ciliated epithelial cells appear to be major contributors to the host transcriptome in NP swab samples, versus white blood cells in WB. Differing cell types and proportions may thus explain the lack of overlap in shared DEGs and pathways between NP swabs and WB. Strikingly, there are no IFN-associated DEGs or pathways shared between NP swabs and WB from COVID-19 patients. In contrast, activation of IFN-associated pathways in both the upper airway and blood of patients with influenza suggests a global, more systemic host response relative to COVID-19.
  • ACE2 has been shown to be the cellular receptor for entry for SARS-CoV-2 and has been described as an interferon stimulating gene (35, 36), ACE2 was not found to be upregulated in COVID-19 patients, whether from NP swab or WB samples.
  • Example 1 The findings described in Example 1 of a distinct host response biosignature in COVID-19 patients and an augmented response in the setting of more severe illness underscore the potential diagnostic utility of host response-based classifiers for SARS-CoV-2 infection.
  • the 19-gene diagnostic classifier described herein has >85% overall accuracy (-80% sensitivity and -90% specificity).
  • the size of the classifier is compatible with implementation on existing multiplex diagnostic platforms (37, 38).
  • a host response-based test may be particularly useful as a complementary diagnostic tool for SARS-CoV-2 infection, especially for PCR-negative hospitalized patients with residual clinical suspicion for COVID-19 disease.
  • NP swab from the one asymptomatic patient in the current study was classified as having a SARS- CoV-2-associated host response with confidence of 85.7-99.2%, suggesting that a host response- based test can be used to screen for asymptomatic or even pre- symptomatic SARS-CoV-2 infection.
  • a panel of DEGs associated with more severe COVID-19 was defined. No correlation was generally observed between viral load and severity of disease (39, 40), and a robust biomarker for disease severity was not heretofore clinically available.
  • the present disclosure relates to methods of characterizing gene expression of a mammalian host suspected of having an acute respiratory illness.
  • the methods are suitable for determining whether the host is infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
  • SARS-CoV-2 severe acute respiratory syndrome coronavirus 2
  • a polynucleotide includes one or more polynucleotides.
  • pluricity refers to two or more objects, preferably three or more objects.
  • a plurality of genes refers to two or more genes, preferably 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, or 50 more genes.
  • the term “reduced” as used herein refers to a measurably lower level of a value for a parameter as compared to a control or other reference value for the parameter.
  • the term “reduced” when used in connection with a level of RNA expression of a gene refers to a negative logFC level of expression of the gene.
  • the term “elevated” as used herein refers to a measurably higher level of a value for a parameter as compared to a control or other reference value for the parameter.
  • the term “elevated” when used in connection with a level of RNA expression of a gene refers to a positive logFC level of expression of the gene.
  • a subject suspected of having an acute respiratory illness is a subject that meets one or more of the following criteria: has COVID-19-like symptoms (e.g., fever, chills, cough, shortness of breath or difficulty breathing, fatigue, muscle or body aches, headache, new loss of taste or smell, sore throat, congestion or runny nose, nausea or vomiting, and/or diarrhea); may have been in contact with an individual with a SARS-CoV-2 infection; and/or has visited a region in which SARS-CoV-2 infections are prevalent.
  • COVID-19-like symptoms e.g., fever, chills, cough, shortness of breath or difficulty breathing, fatigue, muscle or body aches, headache, new loss of taste or smell, sore throat, congestion or runny nose, nausea or vomiting, and/or diarrhea
  • treating or “treatment” of a disease or an infection refer to executing a protocol, which may include administering one or more pharmaceutical compositions to an individual (human or other mammal), in an effort to alleviate signs or symptoms of the disease.
  • treating does not require complete alleviation of signs or symptoms, does not require a cure, and specifically includes protocols that have only a palliative effect on the individual.
  • treatment is an approach for obtaining beneficial or desired results, including clinical results.
  • Beneficial or desired clinical results include, but are not limited to, alleviation or amelioration of one or more symptoms, diminishment of extent of disease, stabilized (i.e., not worsening) state of disease, preventing spread of disease, delay or slowing of disease progression, amelioration or palliation of the disease state, and remission (whether partial or total).
  • Certain aspects of the present disclosure relate to methods for measuring gene expression, which may be used to assist in the diagnosis of SARS-CoV-2 infection or severe SARS-CoV-2 infection.
  • the methods include one or more techniques selected from of the group consisting of sequence analysis, hybridization, and amplification.
  • the methods may include, without limitation, next generation sequencing, RT-qPCR, Luminex, Nanostring, and/or microarray. Exemplary methods are set forth below, but the skilled artisan will appreciate that various methods for measurement of gene expression that are known in the art can be employed without departing from the scope of the present disclosure.
  • the present disclosure provides method for measuring gene expression, comprising the steps of: (a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises: (i) at least one gene selected from the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and (ii) at least one gene selected from the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and (b) identifying the test sample as having a gene expression provide of a SARS- CoV-2 infection when: (i) the level of RNA expression of RSAD2, IFI6, IFI44L, EPSTI1, and/or SERPING1 is
  • identification of the gene expression profile is indicative of the human subject having a SARS-CoV-2 infection.
  • the at least one gene of (a)(i) comprises 3, 4, 5, 6, 7 or all 8 genes of the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and/or wherein the at least one gene of (a)(ii) comprises 3, 4, 5, 6, 7, 8, 9, 10 or all 11 genes of the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF.
  • the human subject exhibits symptoms of COVID-19 disease.
  • the present disclosure provides methods for measuring gene expression, comprising the steps of (a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the a plurality of genes comprises: (i) at least one gene selected from the group consisting of RSAD2, SLC6A4, SHISA3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PROS1, SCGB1A1, and CST1; and (ii) at least one gene selected from the group consisting of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFK
  • identification of the gene expression profile is indicative of the human subject having a SARS-CoV-2 infection.
  • the at least one gene of (a)(i) comprises a plurality of 3, 4, 5, 6, 7, 9, 10, or all 29 genes of the group consisting of RSAD2, SLC6A4, SHISA3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PROS1, SCGB1A1, and CST1; and/or wherein the at least one gene of (a)(ii) comprises a plurality of 3, 4, 5, 6, 7, 8, 9, 10 or all 37 genes of the group consisting of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFK
  • the present disclosure provides methods for identifying whether a human subject has a SARS-CoV-2 infection or a non-viral acute respiratory illness, comprising the steps of (a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises three or more genes selected from the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and (b) identifying the human subject as having a SARS-CoV-2 infection when the level of RNA expression of RSAD2, IFI6, IFI44L, EPSTI1, and/or SERPING1 is elevated, and/or RNA expression of ATP5G1, COX20, and/or TCN1 is reduced in the test sample in comparison with respective reference values; and/or identifying the human subject as having a non-viral acute respiratory illness when
  • the present disclosure provides methods for identifying whether a subject has a SARS-CoV-2 infection or another viral acute respiratory illness, comprising the steps of (a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises three or more genes selected from the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and (b) identifying the human subject as having a SARS-CoV-2 infection when the level of RNA expression of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, and/or HSPA14 is elevated, and/or RNA expression of MTRNR2L6, SLC16A8, and/or BATF3
  • the present disclosure provides methods for measuring gene expression, comprising the steps of: (a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject having or suspected of having a SARS-CoV-2 infection, wherein the plurality of genes comprises three or more genes selected from the group consisting of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD 16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9,
  • the identification of the severe CO VID-19 gene expression profile is indicative of the human subject having a SARS-CoV-2 infection, and having or developing severe COVID-19, optionally wherein severe COVID-19 is associated with hospitalization, optionally wherein hospitalization comprises a stay within a hospital intensive care unit and/or mechanical ventilation.
  • the plurality of genes comprises 4, 5, 6, 7, 8, 9, 10, or all 58 genes of the group consisting of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD 16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8, H3F3B, IL6R, RAB20, HSPA1B, MIR22HG, PPAP2B, TBC1
  • the methods further include: extracting RNA from the cells of the test sample prior to step (a).
  • the method used to extract RNA may include, without limitation, Zymo Direct-zolTM, TRIzol® (reagents for isolating biological material marketed by Molecular Research Center, Inc.), phenol/chloroform, etc.
  • RNA extraction may also include the RNA with DNAse to remove DNA contamination, which may occur during the extraction process (e.g., in an RNA extraction kit including an on-column DNAse step) or after the extraction process (e.g., DNAse treatment of extracted RNA).
  • RNA concentration may be measured using a method such as Qubit fluorometric quantitation.
  • RNA expression is measured using a next-generation sequence method.
  • sequencing by synthesis single-stranded DNA is sequenced using DNA polymerase to create a complementary second strand one base at a time.
  • Most next generation (high-throughput) sequencing methods use a sequencing by synthesis approach, which is often combined with optical detection. High-throughput methods are advantageous in that many thousand (e.g., 10 6 -l 0 9 ) sequences may be determined in parallel.
  • high-throughput sequencing methods that may be used to measure gene expression in connection with the present disclosure are briefly described below.
  • Illumina (Solexa) sequencing is a high-throughput method that uses reversible terminator bases for sequencing by synthesis (see e.g., Bentley et al., Nature, 456:53-59, 2008; and Meyer and Kircher, "Illumina Sequencing Library Preparation for Highly Multiplexed Target Capture and Sequencing”. Cold Springs Harbor Protocols 2010: doi: 10.1101/pdb.prot5448).
  • DNA molecules are attached to a slide and amplified to generate local clusters of the same DNA sequence.
  • reversible terminator bases or RT-bases reversible terminator bases
  • Pyrosequencing is another type of sequencing by synthesis method that detects the release of pyrophosphate (PPi) during DNA synthesis (see, e.g., Ronaghi et al., Science, 281 :363-365, 1998).
  • PPi pyrophosphate
  • ATP sulfurylase firefly luciferase
  • luciferin a visible light signal from PPi.
  • Light is produced when a nucleotide has been incorporated into the complementary strand of DNA by DNA polymerase, and the intensity of the light emitted is used to determine how many nucleotides have been incorporated. Each of the four nucleotides is added in turn until the sequence is complete.
  • High- throughput pyrosequencing also known as 454 pyrosequencing (Roche Diagnostics) uses an initial step of emulsion PCR to generate oil droplets containing a cluster of single DNA sequences attached to a bead via primers. These droplets are then added to a plate with picoliter- volume wells such that each well contains a single bead as well as the enzymes needed for pyrosequencing.
  • Ion semiconductor sequencing is a further type of sequencing by synthesis method that uses the hydrogen ions released during DNA polymerization for sequencing (see, e.g., US Patent No. 7,948,015).
  • a single strand of template DNA is placed into a microwell.
  • the microwell is flooded with one type of nucleotide. If the nucleotide is complementary, it is incorporated into the secondary strand, and a hydrogen ion is released.
  • the release of the hydrogen ion triggers a hypersensitive ion sensor; if multiple nucleotides are incorporated, multiple hydrogen ions are released, and the resulting electronic signal is higher.
  • Sequencing by ligation uses the mismatch sensitivity of DNA ligase in combination with a pool of fluorescently labeled oligonucleotides (probes) for sequencing (see, e.g., WO 2006084132).
  • DNA molecules are amplified using emulsion PCR, which results in individual oil droplets containing one bead and a cluster of the same DNA sequence. Then, the beads are deposited on a glass slide. The probes are added to the slide along with a universal sequencing primer. If the probe is complementary, the DNA ligase joins it to the primer, fluorescence is measured, and then the fluorescent label is cleaved off. This leaves the 5’ end of the probe available for the next round of ligation.
  • Third-generation or long-read sequencing methods are high-throughput sequencing methods that sequence single molecules. These methods do not require initial PCR amplification steps.
  • Single-molecule real-time sequencing Pacific Biosciences is a sequencing by synthesis long-read sequencing method, which employs zero-mode waveguides (ZMWs), which are small wells with capturing tools located at the bottom (see, e.g., Levene, Science, 299:682-686, 2003; and Eid et al., Science, 323:133-138, 2009).
  • ZMWs zero-mode waveguides
  • one DNA polymerase enzyme is attached to the bottom of a ZMW, and a single molecule of single-stranded DNA is present as a template.
  • Nanopore sequencing (Oxford nanopore) is a sequencing method that sequences a single DNA or RNA molecule without any form of label.
  • the principle of nanopore sequencing is that DNA passing through a nanopore changes the ion current of the nanopore in a manner dependent on the type of nucleotide.
  • the nanopore itself contains a detection region able to recognize different nucleotides.
  • Current nanopore sequencing methods in development are either solid state methods employing metal or metal alloys (see, e.g., Soni el al., Rev Sci Instrum, 81(1): 014301, 2010) or biological employing proteins (see, e.g., Stoddart et al., Proc Natl Acad Sci USA, 106:7702-7707, 2009).
  • Further large-scale sequencing techniques for use in measuring gene expression in connection with methods of the present disclosure include but are not limited to microscopy- based techniques (e.g., using atomic force microscopy or transmission electron microscopy), tunneling currents DNA sequencing, sequencing by hybridization (e.g., using microarrays), sequencing with mass spectrometry (e.g., using matrix-assisted laser desorption ionization time- of-flight mass spectrometry, or MALDI-TOF MS), microfluidic Sanger sequencing, RNA polymerase (RNAP) sequencing (e.g., using polystyrene beads), and in vitro virus high- throughput sequencing.
  • microscopy- based techniques e.g., using atomic force microscopy or transmission electron microscopy
  • tunneling currents DNA sequencing e.g., using microarrays
  • sequencing with mass spectrometry e.g., using matrix-assisted laser desorption ionization time- of-flight mass spectrometry,
  • Serial analysis of gene expression is a method that allows quantitative measurement of gene expression profiles that can be compared between samples (Velculescu et al., Science, 270: 484-7, 1995).
  • cDNA is synthesized from an RNA sample.
  • tags are concatenated, amplified using bacteria, isolated, and finally sequenced using high-throughput sequencing techniques.
  • SAGE can be used to measure gene expression changes of multiple genes at once, for example in response to infection.
  • Methods that may be used to measure gene expression in connection with the present disclosure may include an amplification step.
  • measuring RNA expression of a plurality of genes includes a quantitative polymerase chain reaction (qPCR).
  • some methods include performing reverse transcriptase- quantitative polymerase chain reaction (RT-qPCR) on RNA extracted from the PBMCs.
  • RT-qPCR reverse transcriptase- quantitative polymerase chain reaction
  • Quantitative reverse transcription polymerase chain reaction is an amplification method that uses fluorescence to quantitatively measure gene expression (see, e.g., Heid et al., Genome Res 6:986-994, 1996).
  • the first step of qRT-PCR is to produce complementary DNA (cDNA) by reverse transcribing mRNA.
  • the cDNA is used as the template in the PCR reaction.
  • gene-specific primers e.g., a buffer (and other reagents for stability), a DNA polymerase, nucleotides, and a fluorophore are added to the PCR reaction.
  • the reaction is then placed in a thermocycler that is able to both cycle through the different temperatures required for the standard PCR steps (e.g., separating the two strands of DNA, primer binding, and DNA polymerization) and illuminate the reaction with light at a particular wavelength to excite the fluorophore. Over the course of the reaction, the level of fluorescence is detected, and this level is subsequently used to quantify the amount of gene expression.
  • the use of fluorescence in qRT-PCR can be done in two different ways.
  • the first way uses a dye in the reaction mixture that fluoresces when it binds to double stranded DNA.
  • the intensity of the fluorescence increases as the amount of double stranded DNA increases, but the dye is not specific for a particular sequence.
  • the second way uses sequence-specific probes labeled with a fluorescent reporter. The intensity of the fluorescence increases as the amount of the particular sequence increases.
  • Methods that may be used to measure gene expression in connection with the present disclosure may include a hybridization step.
  • the methods include use of a DNA microarray.
  • DNA microarrays employ a plurality of specific DNA sequences (e.g., probes, reporters, oligos) attached to a slide or chip.
  • cDNA from a sample is labeled with a fluorophore, silver, or a chemiluminescent molecule.
  • the labeled sample is hybridized to the DNA microarray under specific conditions, and hybridization is subsequently detected and quantified.
  • Other methods of measuring gene expression through hybridization include but are not limited to Northern blot analysis, and in situ hybridization.
  • Certain aspects of the present disclosure relate to methods for treating a SARS-CoV- 2-infected human subject, identified by use of any of the methods disclosed herein for measuring levels of RNA expression of a plurality of genes in cells from a nasopharyngeal test sample obtained from the human subject, by administering an effective amount of a COVID-19 therapeutic agent.
  • the COVID-19 therapeutic agent comprises an antiviral agent.
  • the antiviral agent comprises one or more of lopinavir, ritonavir, remdesivir, ribavirin, umifenovir, favipiravir, darunavir, and oseltamivir.
  • the antiviral agent comprises remdesivir.
  • the COVID-19 therapeutic agent comprises an immunotherapeutic agent.
  • the immunotherapeutic agent comprises one or more an interferon, convalescent plasma, hyperimmune plasma, and an anti- SARS-CoV2 monoclonal antibody or SARS-CoV2-binding fragment thereof.
  • kits for measuring gene expression and diagnosis of SARS-CoV-2 infection, and optionally prognosis of severe COVID-19 comprise kits.
  • the kits comprise a plurality of oligonucleotides and instructions for use thereof.
  • the plurality of oligonucleotides of the kit are attached to a slide or a chip.
  • the plurality of oligonucleotides of the kit each comprise a label for ease in detection.
  • the plurality of oligonucleotides comprise a pair of oligonucleotides for each of the plurality of genes.
  • kits include (a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7 or all 8 genes of the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and (b) instructions for: (i) use of the oligonucleotides for measuring RNA expression of the plurality of genes; and (ii) identifying whether a human subject has a SARS-CoV-2 infection or a non- viral acute respiratory illness based on the levels of RNA expression of a the plurality of genes in cells from a nasopharyngeal test sample obtained from the human subject, wherein the human subject is suspected of having an acute respiratory illness.
  • the plurality of genes comprises the genes of Table 1-8 or the genes of Table 1-6.
  • kits include (a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7, 8, 9, 10 or all 11 genes of the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and (b) instructions for: (i) use of the oligonucleotides for measuring RNA expression of the plurality of genes; and (ii) identifying whether a human subject has a SARS-CoV-2 infection or another viral acute respiratory illness based on the levels of RNA expression of a the plurality of genes in cells from a nasopharyngeal test sample obtained from the human subject, wherein the human subject is suspected of having an acute respiratory illness.
  • the plurality of genes comprises the genes of Table 1-9 or the genes of Table 1-7.
  • kits include (a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 or all 19 genes of the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, TCN1, HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and (b) instructions for: (i) use of the oligonucleotides for measuring RNA expression of the plurality of genes; and (ii) identifying whether the test sample has a gene expression profile of a SARS-CoV-2 infection based on the levels of RNA expression of a the plurality of genes in cells from a nasopharyngeal test sample obtained from a human subject suspected of having an
  • kits include (a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7, 8, 9, 10, 11 or all 58 genes of the group consisting of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD 16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8,
  • a method for measuring gene expression comprising the steps of:
  • test sample (b) identifying the test sample as having a gene expression profile of a SARS-CoV-2 infection when:
  • RNA expression of RSAD2, IFI6, IFI44L, EPSTI1, and/or SERPING1 is elevated, and/or RNA expression of ATP5G1, COX20, and/or TCN1 is reduced in the test sample in comparison with respective reference values;
  • RNA expression of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, and/or HSPA14 is elevated, and/or RNA expression of MTRNR2L6, SLC16A8, and/or BATF3 is reduced in the test sample in comparison with respective reference values, optionally wherein identification of the gene expression profile is indicative of the human subject having a SARS-CoV-2 infection.
  • the at least one gene of (a)(i) comprises 3, 4, 5, 6, 7 or all 8 genes of the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and/or wherein the at least one gene of (a)(ii) comprises 3, 4, 5, 6, 7, 8, 9, 10 or all 11 genes of the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3.
  • a method for measuring gene expression comprising the steps of:
  • test sample (b) identifying the test sample as having a gene expression profile of a SARS-CoV-2 infection when:
  • RNA expression of RSAD2, SLC6A4, SHIS A3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, and/or SARS2 is elevated, and/or RNA expression of ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PROS1, SCGB1A1, and/or CST1 is reduced in the test sample in comparison with respective reference values; and
  • RNA expression of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFKFB4, POSTN, BORA, NUP35, DHFR, AMBP, ADRA2A, ZNF92, CYP2F1, PIFO, SNTN, ZNF469, ADH1C, FAM3D, SERTAD2, HSPA14, ILVBL, PERP, UBE2I, and/or EHF is elevated, and/or RNA expression of MLKL, PTMS, MTRNR2L6, ADGRL1, PLEKHA4, CD300E, ECSCR, SLC16A8, BATF3, TRPV5, GUK1, and/or TUBG2 is reduced in the test sample in comparison with respective reference values, optionally wherein identification of the gene expression profile is indicative of the human subject having a SARS-CoV-2 infection.
  • the at least one gene of (a)(i) comprises a plurality of 3, 4, 5, 6, 7, 9, 10, or all 29 genes of the group consisting of RSAD2, SLC6A4, SHISA3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PR0S1, SCGB1A1, and CST1; and/or wherein the at least one gene of (a)(ii) comprises a plurality of 3, 4, 5, 6, 7, 8, 9, 10 or 37 genes of the group consisting of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFKFB4, POSTN, BORA, NUP35, DHFR, AMBP, ADRA2A,
  • a method for identifying whether a human subject has a SARS-CoV-2 infection or a non-viral acute respiratory illness comprising the steps of:
  • RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises three or more genes selected from the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and
  • identifying the human subject as having a SARS-CoV-2 infection when the level of RNA expression of RSAD2, IFI6, IFI44L, EPSTI1, and/or SERPING1 is elevated, and/or RNA expression of ATP5G1, COX20, and/or TCN1 is reduced in the test sample in comparison with respective reference values; and/or identifying the human subject as having a non-viral acute respiratory illness when the level of RNA expression of RSAD2, IFI6, IFI44L, EPSTI1, and/or SERPING1 is reduced, and/or RNA expression of ATP5G1, COX20, and/or TCN1 is elevated in the test sample in comparison with the respective reference values.
  • a method for identifying whether a human subject has a SARS-CoV-2 infection or another viral acute respiratory illness comprising the steps of:
  • RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises three or more genes selected from the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and
  • identifying the human subject as having a SARS-CoV-2 infection when the level of RNA expression of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, and/or HSPA14 is elevated, and/or RNA expression of MTRNR2L6, SLC16A8, and/or BATF3 is reduced in the test sample in comparison with respective reference values; and/or identifying the human subject as having another viral acute respiratory illness when the level of RNA expression of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, and/or HSPA14 is reduced, and/or RNA expression of MTRNR2L6, SLC16A8, and/or BATF3 is elevated in the test sample in comparison with the respective reference values.
  • the respective reference values are determined from a nasopharyngeal control sample from a healthy human subject without symptoms of a respiratory illness, optionally wherein the respective reference values are average values are determined from a plurality of nasopharyngeal control samples obtained from a plurality of healthy human subjects, optionally wherein the healthy human subject or subjects do not have an acute SARS-CoV-2 infection.
  • a method for measuring gene expression comprising the steps of:
  • RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject having or suspected of having a SARS-CoV-2 infection wherein the plurality of genes comprises three or more genes selected from the group consisting of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFY
  • the plurality of genes comprises 4, 5, 6, 7, 8, 9, 10, or all 58 genes of the group consisting of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8, H3F3B, IL6R, RAB20, HSPA1B, MIR22HG, PPAP2B
  • step (c) treating the SARS-CoV-2-infected subject identified in step (b) by administering an effective amount of a COVID-19 therapeutic agent.
  • COVID-19 therapeutic agent comprises one or both of an antiviral agent and an immunotherapeutic agent.
  • paxlovid PF-07321332 and ritonavir
  • the antiviral agent comprises remdesivir
  • the antiviral agent comprises molnupiravir
  • the antiviral agent comprises paxlovid (PF-07321332 and ritonavir).
  • the immunotherapeutic agent comprises one or more of the group consisting of an interferon, convalescent plasma, hyperimmune plasma, and an anti-SARS-CoV2 monoclonal antibody or SARS-CoV2-binding fragment thereof, optionally wherein the anti-SARS-CoV-2 monoclonal antibody comprises:
  • step (iii) casirivimab and imdevimab, 24.
  • step (a) comprises one or more of the group consisting of sequence analysis, hybridization, and amplification.
  • step (a) comprises: performing reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) on RNA extracted from the test sample.
  • RT-qPCR reverse transcriptase-quantitative polymerase chain reaction
  • step (a) comprises: hybridizing RNA extracted from the test sample to a microarray.
  • step (a) comprises: performing serial amplification of gene expression (SAGE) on RNA extracted from the test sample.
  • SAGE serial amplification of gene expression
  • step (a) comprises targeted RNA expression resequencing comprising:
  • step (iii) generating a read count for RNA expression of the plurality of genes by normalization to the sequence of the at least 50,000 members of step (ii).
  • step (a) comprises whole transcriptome shotgun sequencing (WTSS) comprising:
  • step (iii) generating a read count for RNA expression of the plurality of genes by normalization to the sequence of the at least 50,000 members of step (ii).
  • a kit compri sing :
  • kits of embodiment 30, wherein the plurality of genes further comprises one or more genes selected from the group consisting of SLC6A4, SHISA3, FAM155B, SEMA7A, KIAA1614, TECTA, CXCL9, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, AZGP1, WFDC6, SDHAF4, FCGBP, BPIFB1, PROS1, SCGB1A1, CST1, and combinations thereof
  • kits of embodiment 30, wherein the plurality of genes consists of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1.
  • kits of embodiment 31, wherein the plurality of gene consists of RSAD2, SLC6A4, SHISA3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PROS1, SCGB1A1, and CST1.
  • a kit comprising:
  • a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7, 8, 9, 10 or all 11 genes of the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and
  • kits of embodiment 34 wherein the plurality of genes further comprises one or more genes selected from the group consisting of ZC4H2, HMX1, POSTN, BORA, NUP35, DHFR, AMBP, ZNF92, CYP2F1, SNTN, ZNF469, ADH1C, FAM3D, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, ADGRL1, PLEKHA4, CD300E, ECSCR, TRPV5, GUK1, and TUBG2, and combinations thereof. 36.
  • kits of embodiment 34 wherein the plurality of genes consists of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3.
  • kits of embodiment 35, wherein the plurality of gene consists of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFKFB4, POSTN, BORA, NUP35, DHFR, AMBP, ADRA2A, ZNF92, CYP2F1, PIFO, SNTN, ZNF469, ADH1C, FAM3D, SERTAD2, HSPA14, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, MTRNR2L6, ADGRL1, PLEKHA4, CD300E, ECSCR, SLC16A8, BATF3, TRPV5, GUK1, and TUBG2.
  • a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 or all 19 genes of the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, TCN1, HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and
  • kits of embodiment 38 wherein the plurality of genes further comprises one or more genes selected from the group consisting of SLC6A4, SHISA3, FAM155B, SEMA7A, KIAA1614, TECTA, CXCL9, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, AZGP1, WFDC6, SDHAF4, FCGBP, BPIFB1, PROS1, SCGB1A1, CST1, ZC4H2, HMX1, POSTN, BORA, NUP35, DHFR, AMBP, ZNF92, CYP2F1, SNTN, ZNF469, ADH1C, FAM3D, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, ADGRL1, PLEKHA4, CD300E, ECSCR, TRPV5, GUK1, and TUBG2, and combinations thereof.
  • kits of embodiment 38, wherein the plurality of genes consists of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, TCN1, HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3.
  • kits of embodiment 39 wherein the plurality of genes consists of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, TCN1, HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, BATF3, SLC6A4, SHISA3, FAM155B, SEMA7A, KIAA1614, TECTA, CXCL9, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, AZGP1, WFDC6, SDHAF4, FCGBP, BPIFB1, PROS1, SCGB1A1, CST1, ZC4H2, HMX1, POSTN, BORA, NUP35, DHFR, AMBP, ZNF92, CYP2F1, SNTN, ZNF469, ADH1C, FAM3D, ILVBL, PERP, UBE2I, E
  • a kit comprising:
  • a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7, 8, 9, 10, 11 or all 58 genes of the group consisting of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD 16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8, H3F3B, IL6R, RAB
  • kits of embodiment 42 wherein the plurality of genes consists of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD 16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8, H3F3B, IL6R, RAB20, HSPA1B, MIR22HG, PPAP2B, TBC1D2, SRPK1, FGD4, RAB
  • step (a) further comprises: iii) measuring levels of RNA expression of at least one control gene, wherein the at least one control gene is selected from the group consisting of PMM1, RAC1, RPP30, ACTB, and HSPD1; and iv) normalizing levels of RNA expression of the plurality of genes of (i) and (ii) to levels of RNA expression of the at least one control gene.
  • kit of any one of embodiments 30-42, wherein the kit further comprises: at least one control oligonucleotide which hybridize to at least one control gene, wherein the at least one control gene is selected from the group consisting of PMM1, RAC1, RPP30, ACTB, and HSPD; and instruction for use of the at least one control oligonucleotide to normalize levels of RNA expression of the plurality of genes.
  • ARI acute respiratory illness
  • AUC area under the curve
  • CO VID-19 Coronavirus Disease-19
  • CPDH California Department of Public Health
  • DEG Differentially expressed gene
  • DGE differential gene expression
  • ICU intensive care unit
  • IFN interferon
  • ISG interferon stimulating gene
  • mNGS metal nanotranscriptomic next-generation sequencing
  • NP nosopharyngeal
  • ROC receiveriver operating characteristic
  • RT-PCR real-time reverse- transcription polymerase chain reaction
  • SARS-CoV-2 severe Acute Respiratory Syndrome Coronavirus 2
  • UCSF Universality of California, San Francisco
  • UTM universal transport media
  • WB whole blood.
  • RNA-Seq was used to characterize the host response to SARS-CoV-2 infection, and a diagnostic two-layer host response classifier was developed based on the host gene expression patterns to discriminate SARS-CoV-2 infection from other viral and non-viral acute respiratory illnesses.
  • RT-PCR real-time reverse-transcription polymerase chain reaction
  • RNA ribonucleic acid
  • NP swab samples obtained at UCSF were pre-treated with a 1 : 1 ratio of DNA/RNA Shield (Zymo Research) prior to extraction. An input volume of 200 pl of NP swab sample was used for all extraction methods performed at UCSF and eluted in 100 pl.
  • NP swab samples obtained from the CDPH were extracted using the easyMag instrument (bioMerieux) according to the manufacturer’s instructions with an input volume of 300 pl and elution volume of 110 pl.
  • NP swab samples collected at UCSF 217 were extracted using the Mag-Bind Viral DNA/RNA 96 kit (Omega Bio-Tek) on the KingFisher Flex (Thermofisher Scientific), and 34 samples using the EZ1 Advanced XL (Qiagen), according to the manufacturer’s instructions.
  • RNA from NP swab samples 25 pl were treated with a nuclease cocktail of TURBO DNase (ThermoFisher Scientific), and Baseline Zero DNase (Ambion) for 30 min at 37°C and purified using Ampure XP beads (Beckman- Coulter) on the EpMotion 5075 (Eppendorf).
  • Purified RNA (7 pl) was used for library preparation using the SMART-Seq Stranded kit (Takara Bio) and purified using Ampure XP beads (Beckman-Coulter) on the EpMotion 5073 (Eppendorf). Libraries were quantified using the Qubit dsDNA HS Assay (Thermofisher Scientific) on the Qubit Flex (Thermofisher Scientific).
  • WB sample libraries were prepared using 9 pl of total RNA and TruSeq Total RNA with Ribo-Zero Globin (Illumina), and spiked with 1 pl of ERCC RNA Spike-In Mix (Thermo Fisher Scientific). Libraries were purified using Ampure XP beads (Beckman-Coulter) and quantified using the Qubit dsDNA HS Assay (Thermofisher Scientific) on the Qubit Flex (Thermofisher Scientific).
  • NP swab and WB sample libraries were sequenced on the NovaSeq 6000 (Illumina) using 150bp paired-end sequencing at the UCSF Center for Advanced Technology (CAT). Included in each sequencing run were negative controls (nuclease-free water) to monitor for laboratory and reagent contamination and a Human Reference RNA Standard (Agilent) to monitor for sequencing efficiency.
  • CAT UCSF Center for Advanced Technology
  • Metatranscriptomic Analysis Metatranscriptomic next-generation sequencing (mNGS) data from all samples were analyzed for viral nucleic acids using SURPI+ (vl.0.7- build.4), a bioinformatics pipeline for pathogen detection and discovery from metatranscriptomic data, modified to incorporate enhanced filtering and classification algorithms (41, 42).
  • SURPI+ vl.0.7- build.4
  • the SNAP nucleotide aligner was run using an edit distance of 16 against the National Center for Biotechnology Information (NCBI) nucleotide (NT) database filtered to contain the viral, bacterial, fungal, and parasitic reads of GenBank (March 2019, with inclusion of the SARS- CoV2 Wuhan-Hu-1 genome accession number NC 045512), enabling the detection of reads with >90% identity to reference sequences in the database.
  • NCBI National Center for Biotechnology Information
  • NT National Center for Biotechnology Information
  • the pre-established criterion for viral detection by SNAP was the presence of reads mapping to at least three non-overlapping regions of the viral genome (41).
  • Diversity metrics including the Chao Richness Score and Shannon Diversity Index, were calculated in R (version 4.00) (43) using the vegan package (version 2.5.3), and figures were produced using the ggplot2 package (44).
  • Transcriptome Analysis Following sequencing of sample libraries, quality control was performed on the fastq files to ensure the sequencing reads met pre-established cutoffs for number (z.e., at least 5 million read counts per sample) and quality using FastQC (version 0.11.8) (45) and MultiQC (version 1.8) (46). Quality filtering and adapter trimming were performed using BBduk tools (version 38.76). Reads were aligned to the ENSEMBL GRCh38 human reference genome assembly (Release 33) using STAR (version 2.7. Of) (47).
  • Hierarchical clustering of DEGs was performed in R (version 4.0.0) using the ComplexHeatmap and pheatmap package (43), and figures were produced using the ggplot2 package (44).
  • Clustering was performed based on Euclidean distance with complete linkage, after exclusion of non-coding genes.
  • IP A Ingenuity Pathway Analysis
  • Qiagen Qiagen
  • the molecule activity predictor tool of IPA was used to predict gene upregulation or downregulation and pathway activation or inhibition.
  • the enrichment score p-value was used to evaluate the significance of the overlap between predicted and observed genes, while the z-score was used to assess the match between observed and predicted regulation or downregulation.
  • Classifiers were developed using scikit-leam (version 1.2.2) (51) in Python. Several different classifier models were evaluated in parallel and the one with optimal performance on the training data was selected. These candidate classifier models included a Linear Support Vector Machine, Linear Discriminant Analysis, and a Deep Neural Network, all within the scikit-learn package. Reduced, small gene panels were selected using Lasso (52) and a forward customized reverse search across the resulting feature set. This search iteratively removed the remaining gene with the lowest significance as measured by its Lasso coefficient, performed classifier training, and reported sensitivity, specificity, and accuracy across the training set. These results were then manually reviewed to balance each of them with a priority placed on specificity and number of genes.
  • Receiver operating characteristic (ROC) curves were generated using pROC package in R (53).
  • Statistical Analysis To identify potentially important clinical predictors for CO VID- 19 score among RT-PCR positive patients, linear regression models were used to check the association of each clinical variable with the transformed COVID-19 score while controlling for demographics (age, gender, and race/ethnicity). A stepwise procedure was then used to determine what clinical variables would be selected when all of the variables were included in the model while controlling for demographics. Variables with a p- value less than 0.15 from those models were further examined for their association with transformed COVID-19 score in one model together while controlling for demographics. In this exploratory analysis, /?-values were not adjusted for multiple comparisons, in order to avoid missing potentially important variables.
  • Ct values were categorized as low (Ct ⁇ l 8), moderate (Ct >18 and ⁇ 25, and high (Ct>25).
  • the association of demographics and clinical variables with RT-PCR (positive versus negative), diagnosis (COVID-19, influenza or bacterial sepsis), viral load (low, medium, high) were examined by Fisher’s exact test (values ⁇ 5) or chi-squared test (values>5) for categorical variables and two-sample / test or ANOVA for age, respectively.
  • the association of demographics and clinical variables with Ct values were assessed with Wilcoxon rank sum test for variables with two categories or Kruskal -Wallis test for variables with more than two categories.
  • the tetrachoric or polychoric correlation was estimated for the correlation between binary RT-PCR and binary or ordinal symptoms and outcome.
  • the point-biserial correlation was estimated for the correlation between binary symptoms and continuous Ct values.
  • COVID-19 patients there was a median of 5 ⁇ 11 days (range 0-65 days) between symptom onset and NP sample collection, and a median of 9 ⁇ 29 days (range 6-72 days) between symptom onset and whole blood sample collection.
  • Six COVID-19 patients had paired NP swabs and WB available for comparison.
  • COVID-19 patients were also stratified according to the highest level of care received (z.e., outpatient, hospitalized but not requiring intensive care, and ICU admission).
  • Ct cycle threshold
  • T2DM type 2 diabetes mellitus
  • CKD chronic kidney disease
  • CAD coronary artery disease
  • CHF congestive heart failure
  • COPD chronic obstructive pulmonary disease' HIV, human immunodeficiency virus
  • ACE inhibitors angiotensin-converting enzyme inhibitors
  • ICU intensive care unit.
  • T2DM type 2 diabetes mellitus
  • CKD chronic kidney disease
  • CAD coronary artery disease
  • CHF congestive heart failure
  • COPD chronic obstructive pulmonary disease' HIV, human immunodeficiency virus
  • ACE inhibitors angiotensin-converting enzyme inhibitors
  • ICU intensive care unit.
  • a total of 23.2 billion and 3.4 billion raw reads were sequenced from 380 NP swab and 53 WB samples, respectively.
  • the median transcriptome coverage achieved was 52.4% ⁇ 17.8% (range 0.69-84.7%), generated from a median 30.3 ⁇ 84.0 million reads (range 0.061 to 604 million reads) for each sample.
  • 286 were used to evaluate the host response and metatranscriptome, from 19 billion raw sequencing reads, with a median transcriptome coverage of 58.5% ⁇ 15.1% (range 4.4-84.7%), generated from a median 28.8 ⁇ 96.1 million reads (range 0.45 to 604 million reads).
  • the median coverage achieved was 37.5% ⁇ 1 6.2% (range 20.8-89.2%), generated from a median 30.8 ⁇ 41.7 million reads (range 16.5 to 182 million reads).
  • Viral Co-infections in SARS-CoV-2 Patients were SARS-CoV-2 positive, and 108 (37.8%) were negative for any respiratory virus (including 11 donor controls).
  • a respiratory virus was identified by metatranscriptomic analysis in 41 cases (14.3%) including 27 patients with previously confirmed influenza or seasonal coronavirus infection by RT-PCR testing. These respiratory viruses included seasonal coronavirus, influenza virus, human rhinovirus, human parainfluenza virus, and human metapneumovirus.
  • Co-infections were identified in 10 of 137 (7.3%) SARS-CoV-2 positive and 4 of 41 (9.76%) SARS-CoV-2 negative individuals; 2 of 137 SARS-CoV-2 positive (1.5%) and 2 of 41 SARS-CoV-2 negative (4.88%) individuals were infected by 3 viruses (Table 1-4). Triply- infected individuals had additional infections from human rhinovirus (multiple genotypes) and human metapneumovirus.
  • influenza and other viral respiratory infections shared IFN signaling activation pathways in common with COVID-19 (FIG. 2A).
  • other immune response pathways that were activated by influenza and other viral infections such as acute phase, B-cell receptor, and Toll-like receptor signaling (including genes IRAKI, MAPK12, MAP2K7), and chemokine signaling (including IL-6 and IL-S) were inhibited in COVID-19.
  • Patients infected with SARS-CoV2 or a seasonal coronavirus showed similar levels of activation of glycoprotein IV (GP6) pathway, and inhibition of dendritic cell maturation and acute phase response signaling pathways (IRAKI and MAPK12).
  • group L genes related to cell signaling, cellular metabolism, immune signaling, and innate immunity
  • group M cellular metabolism, immune signaling, and innate immunity
  • group N cellular metabolism and transport
  • Genes from all three groups had increased overall expression in hospitalized patients relative to outpatients.
  • Upregulated pathways in COVID-19 were primarily related to cell signaling (ERK/MAPK and GP6 signaling), tissue development, cellular function and proliferation, and organismal injury, and included only a few immune pathways, such as PI3K signaling in B -lymphocytes, CXCR4 signaling, and IL-15 production.
  • bacterial sepsis was characterized by generalized upregulation of immune-mediated pathways as well as multiple additional pathways associated with hematological development and other cellular functions.
  • Hierarchical clustering of DEGs among patients with COVID-19, influenza, or bacterial sepsis based on comparisons to donor controls revealed 6 distinct groups.
  • CO VID-19 Host Responses in NP Swabs and WB Comparison of CO VID-19 Host Responses in NP Swabs and WB.
  • CO VID- 19 host responses in NP swabs and WB shared common pathways related to antiviral response, innate immunity, ISG signaling (e.g. IL-6 and IL-8) and dendritic cell maturation.
  • NP swabs and WB were discordant between NP swabs and WB for multiple additional immune-related pathways, including acute phase response signaling (z-score of -1.30 for NP swabs versus 0.33 for WB) , IL-15 signaling (z-score of 0 versus 1.89), CXCR4 signaling (z- score 0 versus 1.63), natural killer cell signaling (z-score 0 versus -1.63), Thl pathway (z-score 0 versus -2.24), and B-cell receptor signaling (z-score 2.11 versus -0.5). Very few DEGs ( ⁇ 3%) were shared between NP swabs and WB from COVID-19 patients (FIG. 2B, FIGS.
  • Classifier As transcriptome analysis had revealed distinct patterns of gene expression in COVID-19 patients (FIG. 2A), it was hypothesized that a classifier could be constructed that accurately discriminates between SARS-CoV-2 infection and other viral or non- viral ARIs from NP swabs. After randomly partitioning 30% of samples into an independent test cohort, two-layer classifier was developed that first differentiates between SARS-CoV-2 positive cases and SARS-CoV-2 negative cases for which no pathogen was identified (layer 1), followed by a second layer that differentiates SARS-CoV-2 from microbiologically confirmed viral acute respiratory illnesses, including influenza and seasonal coronavirus infections, among others (layer 2) (FIG. 3).
  • the initial set of DEGs was selected using a Bonferroni corrected p value of ⁇ 0.001 for both layers. Only samples assigned to SARS-CoV-2 by both binary classifiers were designated positive for SARS-CoV-2 infection.
  • the cutoff for the prediction score of each classifier was determined by generating receiver operating characteristic (ROC) curves for the training data, and comparing Youden’s index, an arbitrary 0.5 cut off, and a manually selected threshold that prioritized specificity (“high-specificity threshold”). After review of the training set results, the selected high-specificity threshold was manually selected.
  • ROC receiver operating characteristic
  • the layer 1 classifier generated using a training set of 110 SARS-CoV-2 positive and 93 non-viral ARI samples, contained 748 DEGs, consisting of genes associated with both cell processes and immune signaling. This classifier had a sensitivity of 97.3% specificity of 97.3%, and area under the receiver operating characteristic curve (AUC) of 0.993 at a threshold of 0.4515.
  • the layer 2 classifier generated using a training set of the same 110 SARS-CoV-2 positive and 93 viral ARI samples, contained 266 DEGs with a smaller proportion of immune signaling genes than in the layer 1 classifier.
  • This classifier had a sensitivity of 95.5%, specificity of 98.9% and AUC of 0.999 at a threshold of 0.6066.
  • the full 1,014- gene two-layer classifier (containing a full complement of 1,014 genes) had an overall sensitivity of 95.5%, specificity of 98.2%, and AUC of 0.999 (FIG. 4A).
  • the performance of the two-layer classifier was then evaluated using an independent test set that included NP swab samples from 28 SARS-CoV-2 positive, 19 non-viral ARI and 27 viral ARI patients (FIG. 3).
  • the layer 1 classifier had 82.1% sensitivity, 89.5% specificity (FIG. 6A), and AUC of 0.944, while the layer 2 classifier yielded 92.9% sensitivity, 96.3% specificity (FIG. 6D), and AUC of 0.991.
  • the full 1,014-gene two-layer classifier had an overall sensitivity of 75.0% (95% CI: 55.0-89.0%), specificity of 93.5% (95% CI: 82.1- 98.6%), and AUC of 0.933 (range 0.879-.987), yielding an overall accuracy of 86.5% (FIG. 4A).
  • a lasso regression analysis was used to find an optimal set of genes for a medium two-layer classifier with an a priori specification of no more than 100 genes.
  • the medium classifier consisted of 29 genes for layer 1 and 38 genes for layer 2 (Tables 1-6 and 1- 7). Based on the training set, the medium 67-gene 2-layer classifier had a sensitivity of 88.2%, specificity of 97.6%, and AUC of 0.997.
  • the medium 2-layer classifier When applied to the test set, the medium 2-layer classifier had a sensitivity of 71.4% (95% CI: 51.3-86.8%), specificity of 93.5% (95% CI: 82.1- 98.6%), AUC of 0.922 (range 0.863 - 0.982), and 85.1% overall accuracy (FIG. 4B).
  • the number of genes was then narrowed to ⁇ 20 total by iteratively removing one gene at a time from the 29 genes for layer 1 and 37 genes for layer 2.
  • Maximum performance was identified for a small two-layer classifier consisting of 19 genes, 8 genes for layer 1 and 11 genes for layer 2 (Tables 1-8 and 1-9). Based on the training set, the small 19-gene 2-layer classifier had a sensitivity of 94.6% specificity of 94.6% and AUC of 0.984 for layer 1.
  • the small 2-layer classifier When applied to the test set, the small 2-layer classifier had a sensitivity of 78.6% (95% CI: 76.5- 99.1%), specificity of 89.1% (95% CI: 59.1-91.7%), AUC of 0.906 (range 0.837 - 0.974), and 85.1% accuracy (FIG. 4C).
  • a classifier was constructed to discriminate between severe COVID-19 and mild COVID-19.
  • severity associated genes were identified by comparing expression of genes in NP swabs obtained from outpatients with mild COVID-19 and hospitalized patients with severe COVID-19, including intensive care unit patients requiring mechanical ventilation.
  • the severity classifier consisted of the genes provided in Table 1-10.
  • RNA-Seq was used to characterize the differential host responses to SARS- CoV-2 infection in 286 NP swab and 53 whole blood samples from 333 individuals. Both NP swabs and WB from COVID-19 patients showed distinct patterns of activation or inhibition relative to other infections (influenza, seasonal coronaviruses, and bacterial sepsis) and to each other. SARS-CoV-2 infection was found to activate interferon-mediated antiviral pathways and paradoxically inhibit multiple additional immune and inflammatory pathways, resulting in an overall dysregulated immune response. Host responses were similar between outpatients and hospitalized patients with CO VID-19, but the magnitude of host response was found to increase with clinical severity of disease.
  • diagnostic two-layer host response classifiers were developed based on RNA-Seq data that can discriminate SARS-CoV-2 infection from other viral and non-viral ARIs from NP swab samples with an accuracy of 85.7-86.5%. Finally, a classifier to discriminate the severity of SARS-CoV-2 infection was developed.
  • SARS-CoV-2 Receptor ACE2 Is an Interferon-Stimulated Gene in Human Airway Epithelial Cells and Is Detected in Specific Cell Subsets across Tissues. Cell 181, 1016-1035 el019 (2020).

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Virology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present disclosure relates to methods of characterizing gene expression of a mammalian host suspected of having an acute respiratory illness. The methods are suitable for determining whether the host is infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

Description

ANALYSIS OF HOST GENE EXPRESSION FOR DIAGNOSIS OF
SEVERE ACUTE RESPIRATORY SYNDROME CORONAVIRUS 2 INFECTION
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims benefit of U.S. Provisional Application No. 63/123,389, filed December 9, 2020, the disclosure of which is hereby incorporated by reference in its entirety.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
[0002] This invention was made with government support under Grant Nos. R01 HL 105704 and R33 AH29077 awarded by the National Institutes of Health. The government has certain rights in the invention.
TECHNICAL FIELD
[0003] The present disclosure relates to methods of characterizing gene expression of a mammalian host suspected of having an acute respiratory illness. The methods are suitable for determining whether the host is infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
BACKGROUND
[0004] Coronavirus disease 2019 (COVID-19) is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a virus that spreads from person to person through droplet and contact transmission. Since it was first detected in December 2019, SARS-CoV-2 has spread worldwide, resulting in a devastating global pandemic. As of mid-October 2020, there have been over 39 million confirmed cases of COVID-19, and over 1 million confirmed deaths worldwide, according to the World Health Organization (WHO Weekly Operational Update on COVID-19). In many parts of the world, the spread of SARS-CoV-2 is still uncontrolled. This means that millions more are at risk of contracting CO VID-19.
[0005] Controlling the spread of COVID-19 depends in large part on the ability to identify SARS-CoV-2 infected individuals. Two types of tests are generally employed for detecting active SARS-CoV-2 infections (Vandenberg et al., Nat Rev Microbiol. 2020 Oct 14; ; 1 - 13. doi: 10.1038/s41579-020-00461-z). The first detects the presence of SARS-CoV-2 nucleic acids in a sample, for example, by specifically amplifying a region of the viral genome using a polymerase chain reaction. The second detects the presence of viral antigens, for example, using an antibody that specifically binds to a SARS-CoV-2 protein. Both types of tests may produce erroneous results, namely false positives and false negatives. False negatives are particularly a problem when viral titers in a biological sample are low.
[0006] Thus, what is needed in the art are further types of tests for detecting active SARS- CoV-2 infections. Tests that do not rely on the measurement of viral nucleic acids and proteins in a sample are desirable. Such tests would improve our ability to gather accurate information on the prevalence of SARS-CoV-2 infections, and thereby improve our ability to contain the pandemic.
BRIEF SUMMARY
[0007] The present disclosure relates to methods of characterizing gene expression of a mammalian host suspected of having an acute respiratory illness. The methods are suitable for determining whether the host is infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] FIGS. 1A-1H provide an overview of sample collection and metatranscriptomic analysis. FIG. 1A shows a flow chart of nasopharyngeal (NP) swab and whole blood (WB) sample collection for metatranscriptomic next-generation sequencing (NGS). FIG. IB shows box-and-whiskers plots of RT-PCR cycle threshold (Ct) values of SARS-CoV-2 positive individuals who were outpatients (n=55, left plot), compared to those who were hospitalized, non-ICU (n=17, center plot), or in the ICU (n=7, right plot). There was no difference in viral load, inversely related to the Ct value, regardless of disease severity (p=0.89 by ANOVA). FIGS. 1C-1E show analyses of the viral and bacterial metatranscriptome as determined by NGS. FIG. 1C shows box-and-whiskers plots of abundance, FIG. ID shows box-and-whiskers plots of Chao Richness Scores, and FIG. IE shows box-and-whiskers plots of Shannon Diversity Indices of the virome in patients with SARS-CoV-2 (COVID-19) (n=137), patients with other respiratory viruses (“Other Virus”) (n=41), and patients without respiratory viruses (“No Virus”) (n=108). Patents were stratified by the inclusion (“Including Respiratory Viral Reads”) or exclusion (“Exclusion Respiratory Viral Reads”) of respiratory viral reads, as indicated above the plots. FIGS. 1F-1H show analyses of the nasopharyngeal metatranscriptome as determined by NGS. FIG. IF shows box-and-whiskers plots of abundance, FIG. 1G shows box-and- whiskers plots of Chao Richness Score, and FIG. 1H shows box-and-whiskers plots of and Shannon Diversity Indices of the microbiome in patients with SARS-CoV-2 (CO VID-19) (n=137), patients with other respiratory viruses (“Other Virus”) (n=41), and patients without respiratory viruses (“No Virus”) (n=108). For box and whisker plots, the median is represented by a dotted line, boxes present the first to third quartiles, whiskers represent the minimum and maximum values, and jitters represent the distribution of the population. For FIGS. 1C-H, statistical analysis was conducted by Kruskal-Wallis test, followed by the Nemyeni test for post- hoc analysis.
[0009] FIGS. 2A-2D show Venn diagrams of differentially expressed genes (DEGs). FIG. 2A shows a comparison of NP swab DEGs in SARS-CoV-2 (left) and influenza (right) patients. FIG. 2B shows a comparison of NP swab (left) and WB (right) DEGs in SARS-CoV-2 patients. FIG. 2C shows a comparison of NP swab DEGs in COVID-19 hospitalized patients (left) and outpatients (right). FIG. 2D shows a comparison of NP swab (left) and WB (right) DEGs in influenza patients. In FIGS. 2A-2D, the DEGs are calculated relative to the donor controls, and the shared DEGs are listed in boxes below each diagram. In some embodiments of the present disclosure, the plurality of genes does not comprise one or more of the genes shown in the box of FIG. 2A, FIG. 2B, FIG. 2C, and/or FIG. 2D.
[0010] FIG. 3 shows an overview of the design and distribution of samples for training and test sets in layer 1 and layer 2 of a diagnostic classifier for COVID-19. Layer 1 differentiates between SARS-CoV-2 positive and SARS-CoV-2 negative cases (excluding influenza and seasonal coronavirus infections). Layer 2 differentiates SARS-CoV-2 from influenza and seasonal coronavirus infections.
[0011] FIGS. 4A-C show performance characteristics of the two-layered (combined layer 1 and layer 2) classifier for full, medium, and small gene panels. Training set ROC curve (left) and test set violin plot (middle) and confusion matrix (right) for the two-layer classifier, using either the full gene panel (FIG. 4A), the medium gene panel (FIG. 4B), or the small gene panel (FIG. 4C) [0012] FIGS. 5A-B show Venn diagrams of differentially expressed genes (DEGs). FIG. 5A shows a comparison of DEGs in nasopharyngeal swabs of COVID-19 out patients to whole blood samples of hospitalized (hospitalized, non-ICU, and ICU) CO VID-19 patients. FIG. 5B shows a comparison of DEGs in hospitalized (hospitalized, non-ICU, and ICU) COVID-19 patients nasopharyngeal swabs and whole blood. In some embodiments of the present disclosure, the plurality of genes does not comprise one or more of the genes shown in the box of FIG. 5A and/or FIG. 5B.
[0013] FIGS. 6A-F provide assessments of COVID-19 classifier test performance. FIGS. 6A-B show violin plots for layer 1 and layer 2 of the full gene panel, respectively. FIGS. 6C-D show violin plots for layer 1 and layer 2 of the medium gene panel, respectively. FIGS. 6E-F show violin plots for layer 1 and layer 2 of the small gene panel, respectively.
DETAILED DESCRIPTION
[0014] Coronavirus Disease- 19 (CO VID-19) has emerged as the cause of a global pandemic.
There is an urgent need to better understand the pathophysiology of and develop new diagnostic tests for SARS-CoV-2 infection. To elucidate key pathways in the host transcriptome, RNA sequencing (RNA Seq) was used to analyze 286 nasopharyngeal (NP) swab and 53 whole blood (WB) samples from 333 COVID-19 patients and controls, including patients with other viral and bacterial infections. Analyses of differentially expressed genes (DEGs) and pathways revealed a more muted innate immune response in COVID-19 relative to other infections (e.g. influenza, other seasonal coronaviruses, bacterial sepsis) in both NP swabs and WB, with paradoxical downregulation of several key immune signaling pathways. Comparative COVID-19 host responses between NP swabs and WB were distinct, with minimal overlap in DEGs. Both hospitalized patients and outpatients exhibited upregulation of interferon-associated pathways, although heightened and more robust inflammatory and immune responses were observed in hospitalized patients with more clinically severe disease. A two-layer machine learning-based classifier, run on an independent test set of 94 NP swab samples, was able to discriminate between COVID-19 and non-COVID-19 infectious or non-infectious acute respiratory illness using complete (>1,000 genes), medium (<100) and small (<20) gene biomarker panels with 85. l%-86.5% accuracy, respectively. These findings demonstrate that SARS-CoV-2 infection has a distinct biosignature that differs between NP swabs and WB and can be leveraged for differential diagnosis of COVID-19 disease.
[0015] Of note, ciliated epithelial cells appear to be major contributors to the host transcriptome in NP swab samples, versus white blood cells in WB. Differing cell types and proportions may thus explain the lack of overlap in shared DEGs and pathways between NP swabs and WB. Strikingly, there are no IFN-associated DEGs or pathways shared between NP swabs and WB from COVID-19 patients. In contrast, activation of IFN-associated pathways in both the upper airway and blood of patients with influenza suggests a global, more systemic host response relative to COVID-19. Although ACE2 has been shown to be the cellular receptor for entry for SARS-CoV-2 and has been described as an interferon stimulating gene (35, 36), ACE2 was not found to be upregulated in COVID-19 patients, whether from NP swab or WB samples.
[0016] The findings described in Example 1 of a distinct host response biosignature in COVID-19 patients and an augmented response in the setting of more severe illness underscore the potential diagnostic utility of host response-based classifiers for SARS-CoV-2 infection. The 19-gene diagnostic classifier described herein has >85% overall accuracy (-80% sensitivity and -90% specificity). The size of the classifier is compatible with implementation on existing multiplex diagnostic platforms (37, 38). A host response-based test may be particularly useful as a complementary diagnostic tool for SARS-CoV-2 infection, especially for PCR-negative hospitalized patients with residual clinical suspicion for COVID-19 disease. In addition, NP swab from the one asymptomatic patient in the current study was classified as having a SARS- CoV-2-associated host response with confidence of 85.7-99.2%, suggesting that a host response- based test can be used to screen for asymptomatic or even pre- symptomatic SARS-CoV-2 infection. Additionally, a panel of DEGs associated with more severe COVID-19was defined. No correlation was generally observed between viral load and severity of disease (39, 40), and a robust biomarker for disease severity was not heretofore clinically available.
[0017] The present disclosure relates to methods of characterizing gene expression of a mammalian host suspected of having an acute respiratory illness. The methods are suitable for determining whether the host is infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). I. Definitions
[0018] As used herein and in the appended claims, the singular forms “a,” “an” and “the” include plural referents unless otherwise indicated or clear from context. For example, “a polynucleotide” includes one or more polynucleotides.
[0019] It is understood that aspects and embodiments described herein as “comprising” include “consisting of’ and “consisting essentially of’ embodiments.
[0020] The term “plurality” as used herein in reference to an object refers to two or more objects, preferably three or more objects. For instance, “a plurality of genes” refers to two or more genes, preferably 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, or 50 more genes.
[0021] The term “reduced” as used herein refers to a measurably lower level of a value for a parameter as compared to a control or other reference value for the parameter. For instance, the term “reduced” when used in connection with a level of RNA expression of a gene, refers to a negative logFC level of expression of the gene. On the other hand, the term “elevated” as used herein refers to a measurably higher level of a value for a parameter as compared to a control or other reference value for the parameter. For instance, the term “elevated” when used in connection with a level of RNA expression of a gene, refers to a positive logFC level of expression of the gene.
[0022] As used herein, “a subject suspected of having an acute respiratory illness” is a subject that meets one or more of the following criteria: has COVID-19-like symptoms (e.g., fever, chills, cough, shortness of breath or difficulty breathing, fatigue, muscle or body aches, headache, new loss of taste or smell, sore throat, congestion or runny nose, nausea or vomiting, and/or diarrhea); may have been in contact with an individual with a SARS-CoV-2 infection; and/or has visited a region in which SARS-CoV-2 infections are prevalent.
[0023] The terms “treating” or “treatment” of a disease or an infection refer to executing a protocol, which may include administering one or more pharmaceutical compositions to an individual (human or other mammal), in an effort to alleviate signs or symptoms of the disease. Thus, “treating” or “treatment” does not require complete alleviation of signs or symptoms, does not require a cure, and specifically includes protocols that have only a palliative effect on the individual. As used herein, and as well-understood in the art, “treatment” is an approach for obtaining beneficial or desired results, including clinical results. Beneficial or desired clinical results include, but are not limited to, alleviation or amelioration of one or more symptoms, diminishment of extent of disease, stabilized (i.e., not worsening) state of disease, preventing spread of disease, delay or slowing of disease progression, amelioration or palliation of the disease state, and remission (whether partial or total).
II. Methods For Measuring Gene Expression and Diagnosis of SARS-CoV-2 Infection
[0024] Certain aspects of the present disclosure relate to methods for measuring gene expression, which may be used to assist in the diagnosis of SARS-CoV-2 infection or severe SARS-CoV-2 infection. In some embodiments, the methods include one or more techniques selected from of the group consisting of sequence analysis, hybridization, and amplification. For example, in some embodiments, the methods may include, without limitation, next generation sequencing, RT-qPCR, Luminex, Nanostring, and/or microarray. Exemplary methods are set forth below, but the skilled artisan will appreciate that various methods for measurement of gene expression that are known in the art can be employed without departing from the scope of the present disclosure.
[0025] In one aspect, the present disclosure provides method for measuring gene expression, comprising the steps of: (a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises: (i) at least one gene selected from the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and (ii) at least one gene selected from the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and (b) identifying the test sample as having a gene expression provide of a SARS- CoV-2 infection when: (i) the level of RNA expression of RSAD2, IFI6, IFI44L, EPSTI1, and/or SERPING1 is elevated, and/or RNA expression of ATP5G1, COX20, and/or TCN1 is reduced in the test sample in comparison with respective reference values; and (ii) the level of RNA expression of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, and/or HSPA14 is elevated, and/or RNA expression ofMTRNR2L6, SLC16A8, and/or BATF3 is reduced in the test sample in comparison with respective reference values. In some embodiments, identification of the gene expression profile is indicative of the human subject having a SARS-CoV-2 infection. In some embodiments, the at least one gene of (a)(i) comprises 3, 4, 5, 6, 7 or all 8 genes of the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and/or wherein the at least one gene of (a)(ii) comprises 3, 4, 5, 6, 7, 8, 9, 10 or all 11 genes of the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF. In some embodiments, the human subject exhibits symptoms of COVID-19 disease.
[0026] In another aspect, the present disclosure provides methods for measuring gene expression, comprising the steps of (a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the a plurality of genes comprises: (i) at least one gene selected from the group consisting of RSAD2, SLC6A4, SHISA3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PROS1, SCGB1A1, and CST1; and (ii) at least one gene selected from the group consisting of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFKFB4, POSTN, BORA, NUP35, DHFR, AMBP, ADRA2A, ZNF92, CYP2F1, PIFO, SNTN, ZNF469, ADH1C, FAM3D, SERTAD2, HSPA14, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, MTRNR2L6, ADGRL1, PLEKHA4, CD300E, ECSCR, SLC16A8, BATF3, TRPV5, GUK1, and TUBG2; and (b) identifying the test sample as having a gene expression profile of a SARS-CoV-2 infection when: (i) the level of RNA expression of RSAD2, SLC6A4, SHISA3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, and/or SARS2 is elevated, and/or RNA expression of ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PROS1, SCGB1A1, and/or CST1 is reduced in the test sample in comparison with respective reference values; and (ii) the level of RNA expression of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFKFB4, POSTN, BORA, NUP35, DHFR, AMBP, ADRA2A, ZNF92, CYP2F1, PIFO, SNTN, ZNF469, ADH1C, FAM3D, SERTAD2, HSPA14, ILVBL, PERP, UBE2I, and/or EHF is elevated, and/or RNA expression of MLKL, PTMS, MTRNR2L6, ADGRL1, PLEKHA4, CD300E, ECSCR, SLC16A8, BATF3, TRPV5, GUK1, and/or TUBG2 is reduced in the test sample in comparison with respective reference values. In some embodiments, identification of the gene expression profile is indicative of the human subject having a SARS-CoV-2 infection. In some embodiments, the at least one gene of (a)(i) comprises a plurality of 3, 4, 5, 6, 7, 9, 10, or all 29 genes of the group consisting of RSAD2, SLC6A4, SHISA3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PROS1, SCGB1A1, and CST1; and/or wherein the at least one gene of (a)(ii) comprises a plurality of 3, 4, 5, 6, 7, 8, 9, 10 or all 37 genes of the group consisting of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFKFB4, POSTN, BORA, NUP35, DHFR, AMBP, ADRA2A, ZNF92, CYP2F1, PIFO, SNTN, ZNF469, ADH1C, FAM3D, SERTAD2, HSPA14, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, MTRNR2L6, ADGRL1, PLEKHA4, CD300E, ECSCR, SLC16A8, BATF3, TRPV5, GUK1, and TUBG2.
[0027] In a further aspect, the present disclosure provides methods for identifying whether a human subject has a SARS-CoV-2 infection or a non-viral acute respiratory illness, comprising the steps of (a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises three or more genes selected from the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and (b) identifying the human subject as having a SARS-CoV-2 infection when the level of RNA expression of RSAD2, IFI6, IFI44L, EPSTI1, and/or SERPING1 is elevated, and/or RNA expression of ATP5G1, COX20, and/or TCN1 is reduced in the test sample in comparison with respective reference values; and/or identifying the human subject as having a non-viral acute respiratory illness when the level of RNA expression of RSAD2, IFI6, IFI44L, EPSTI1, and/or SERPING1 is reduced, and/or RNA expression of ATP5G1, COX20, and/or TCN1 is elevated in the test sample in comparison with respective reference values.
[0028] In a still further aspect, the present disclosure provides methods for identifying whether a subject has a SARS-CoV-2 infection or another viral acute respiratory illness, comprising the steps of (a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises three or more genes selected from the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and (b) identifying the human subject as having a SARS-CoV-2 infection when the level of RNA expression of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, and/or HSPA14 is elevated, and/or RNA expression of MTRNR2L6, SLC16A8, and/or BATF3 is reduced in the test sample in comparison with respective reference values; and/or identifying the human subject as having another viral acute respiratory illness when the level of RNA expression of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, and/or HSPA14 is reduced, and/or RNA expression of MTRNR2L6, SLC16A8, and/or BATF3 is elevated in the test sample in comparison with respective reference values. In some embodiments, the other viral acute respiratory illness is associated with an infection with a virus selected from the group consisting of an influenza virus, a seasonal coronavirus, a rhinovirus, a metapneumovirus, and a parainfluenza virus.
[0029] In an additional aspect, the present disclosure provides methods for measuring gene expression, comprising the steps of: (a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject having or suspected of having a SARS-CoV-2 infection, wherein the plurality of genes comprises three or more genes selected from the group consisting of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD 16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8, H3F3B, IL6R, RAB20, HSPA1B, MIR22HG, PPAP2B, TBC1D2, SRPK1, FGD4, RAB21, and CPD; and (b) identifying the test sample as having a severe COVID-19 gene expression profile when the level of RNA expression of the plurality of genes is elevated in the test sample in comparison with respective reference values; and/or identifying the test sample as having a mild CO VID-19 gene expression profile when the level of RNA expression of the plurality of genes is not elevated in the test sample in comparison with the respective reference values. In some embodiments, the identification of the severe CO VID-19 gene expression profile is indicative of the human subject having a SARS-CoV-2 infection, and having or developing severe COVID-19, optionally wherein severe COVID-19 is associated with hospitalization, optionally wherein hospitalization comprises a stay within a hospital intensive care unit and/or mechanical ventilation. In some embodiments, the plurality of genes comprises 4, 5, 6, 7, 8, 9, 10, or all 58 genes of the group consisting of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD 16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8, H3F3B, IL6R, RAB20, HSPA1B, MIR22HG, PPAP2B, TBC1D2, SRPK1, FGD4, RAB21, and CPD.
[0030] In some embodiments, the methods further include: extracting RNA from the cells of the test sample prior to step (a). For example, in some embodiments, the method used to extract RNA may include, without limitation, Zymo Direct-zol™, TRIzol® (reagents for isolating biological material marketed by Molecular Research Center, Inc.), phenol/chloroform, etc. RNA extraction may also include the RNA with DNAse to remove DNA contamination, which may occur during the extraction process (e.g., in an RNA extraction kit including an on-column DNAse step) or after the extraction process (e.g., DNAse treatment of extracted RNA).
Subsequent to extraction, RNA concentration may be measured using a method such as Qubit fluorometric quantitation.
A. Next Generation Sequencing Methods
[0031] Provided herein are methods involving measuring levels of RNA expression of a plurality of genes. In some embodiments, RNA expression is measured using a next-generation sequence method.
[0032] In sequencing by synthesis, single-stranded DNA is sequenced using DNA polymerase to create a complementary second strand one base at a time. Most next generation (high-throughput) sequencing methods use a sequencing by synthesis approach, which is often combined with optical detection. High-throughput methods are advantageous in that many thousand (e.g., 106-l 09) sequences may be determined in parallel. Various high-throughput sequencing methods that may be used to measure gene expression in connection with the present disclosure are briefly described below.
[0033] Illumina (Solexa) sequencing, is a high-throughput method that uses reversible terminator bases for sequencing by synthesis (see e.g., Bentley et al., Nature, 456:53-59, 2008; and Meyer and Kircher, "Illumina Sequencing Library Preparation for Highly Multiplexed Target Capture and Sequencing". Cold Springs Harbor Protocols 2010: doi: 10.1101/pdb.prot5448). First, DNA molecules are attached to a slide and amplified to generate local clusters of the same DNA sequence. Then, four types of fluorescently labeled nucleotides with reversible 3’ blockers (reversible terminator bases or RT-bases) are added to the chip, the excess is washed away, and the chip is imaged. After imaging, the dye and the 3’ blocker are removed from the nucleotide, and the next round of RT-bases is added to the chip and imaged.
[0034] Pyrosequencing is another type of sequencing by synthesis method that detects the release of pyrophosphate (PPi) during DNA synthesis (see, e.g., Ronaghi et al., Science, 281 :363-365, 1998). In order to detect PPi, ATP sulfurylase, firefly luciferase, and luciferin are used, which together act to generate a visible light signal from PPi. Light is produced when a nucleotide has been incorporated into the complementary strand of DNA by DNA polymerase, and the intensity of the light emitted is used to determine how many nucleotides have been incorporated. Each of the four nucleotides is added in turn until the sequence is complete. High- throughput pyrosequencing, also known as 454 pyrosequencing (Roche Diagnostics), uses an initial step of emulsion PCR to generate oil droplets containing a cluster of single DNA sequences attached to a bead via primers. These droplets are then added to a plate with picoliter- volume wells such that each well contains a single bead as well as the enzymes needed for pyrosequencing.
[0035] Ion semiconductor sequencing (Ion Torrent, now Life Technologies) is a further type of sequencing by synthesis method that uses the hydrogen ions released during DNA polymerization for sequencing (see, e.g., US Patent No. 7,948,015). First, a single strand of template DNA is placed into a microwell. Then, the microwell is flooded with one type of nucleotide. If the nucleotide is complementary, it is incorporated into the secondary strand, and a hydrogen ion is released. The release of the hydrogen ion triggers a hypersensitive ion sensor; if multiple nucleotides are incorporated, multiple hydrogen ions are released, and the resulting electronic signal is higher.
[0036] Sequencing by ligation (SOLiD sequencing marketed by Applied Biosystems) uses the mismatch sensitivity of DNA ligase in combination with a pool of fluorescently labeled oligonucleotides (probes) for sequencing (see, e.g., WO 2006084132). First, DNA molecules are amplified using emulsion PCR, which results in individual oil droplets containing one bead and a cluster of the same DNA sequence. Then, the beads are deposited on a glass slide. The probes are added to the slide along with a universal sequencing primer. If the probe is complementary, the DNA ligase joins it to the primer, fluorescence is measured, and then the fluorescent label is cleaved off. This leaves the 5’ end of the probe available for the next round of ligation.
[0037] Third-generation or long-read sequencing methods are high-throughput sequencing methods that sequence single molecules. These methods do not require initial PCR amplification steps. Single-molecule real-time sequencing (Pacific Biosciences) is a sequencing by synthesis long-read sequencing method, which employs zero-mode waveguides (ZMWs), which are small wells with capturing tools located at the bottom (see, e.g., Levene, Science, 299:682-686, 2003; and Eid et al., Science, 323:133-138, 2009). In brief, one DNA polymerase enzyme is attached to the bottom of a ZMW, and a single molecule of single-stranded DNA is present as a template. Four types of fluorescently-labelled nucleotides are present in a solution added to the ZMWs. When a nucleotide is incorporated into the second strand by the DNA polymerase in a ZMW, the fluorescence is detected by the capturing tools at the bottom of the ZMW. Then, the fluorescent label is cleaved off and diffuses away from the capturing tools at the bottom of the ZMW so it is no longer detectable and the remaining DNA strand in the ZMW is free of labels.
[0038] Nanopore sequencing (Oxford nanopore) is a sequencing method that sequences a single DNA or RNA molecule without any form of label. The principle of nanopore sequencing is that DNA passing through a nanopore changes the ion current of the nanopore in a manner dependent on the type of nucleotide. The nanopore itself contains a detection region able to recognize different nucleotides. Current nanopore sequencing methods in development are either solid state methods employing metal or metal alloys (see, e.g., Soni el al., Rev Sci Instrum, 81(1): 014301, 2010) or biological employing proteins (see, e.g., Stoddart et al., Proc Natl Acad Sci USA, 106:7702-7707, 2009).
[0039] Further large-scale sequencing techniques for use in measuring gene expression in connection with methods of the present disclosure include but are not limited to microscopy- based techniques (e.g., using atomic force microscopy or transmission electron microscopy), tunneling currents DNA sequencing, sequencing by hybridization (e.g., using microarrays), sequencing with mass spectrometry (e.g., using matrix-assisted laser desorption ionization time- of-flight mass spectrometry, or MALDI-TOF MS), microfluidic Sanger sequencing, RNA polymerase (RNAP) sequencing (e.g., using polystyrene beads), and in vitro virus high- throughput sequencing.
[0040] Serial analysis of gene expression (SAGE) is a method that allows quantitative measurement of gene expression profiles that can be compared between samples (Velculescu et al., Science, 270: 484-7, 1995). First, cDNA is synthesized from an RNA sample. Then, through multiple steps involving bead binding, cleavage, and adapters, short cDNA fragments (tags) are produced. These tags are concatenated, amplified using bacteria, isolated, and finally sequenced using high-throughput sequencing techniques. SAGE can be used to measure gene expression changes of multiple genes at once, for example in response to infection.
B. Amplification Methods for Measuring Gene Expression
[0041] Methods that may be used to measure gene expression in connection with the present disclosure may include an amplification step. In some embodiments of the present disclosure, measuring RNA expression of a plurality of genes includes a quantitative polymerase chain reaction (qPCR). For instance, some methods include performing reverse transcriptase- quantitative polymerase chain reaction (RT-qPCR) on RNA extracted from the PBMCs. Quantitative reverse transcription polymerase chain reaction (qRT-PCR) is an amplification method that uses fluorescence to quantitatively measure gene expression (see, e.g., Heid et al., Genome Res 6:986-994, 1996). The first step of qRT-PCR is to produce complementary DNA (cDNA) by reverse transcribing mRNA. The cDNA is used as the template in the PCR reaction. In addition to the template, gene-specific primers, a buffer (and other reagents for stability), a DNA polymerase, nucleotides, and a fluorophore are added to the PCR reaction. The reaction is then placed in a thermocycler that is able to both cycle through the different temperatures required for the standard PCR steps (e.g., separating the two strands of DNA, primer binding, and DNA polymerization) and illuminate the reaction with light at a particular wavelength to excite the fluorophore. Over the course of the reaction, the level of fluorescence is detected, and this level is subsequently used to quantify the amount of gene expression.
[0042] The use of fluorescence in qRT-PCR can be done in two different ways. The first way uses a dye in the reaction mixture that fluoresces when it binds to double stranded DNA. The intensity of the fluorescence increases as the amount of double stranded DNA increases, but the dye is not specific for a particular sequence. The second way uses sequence-specific probes labeled with a fluorescent reporter. The intensity of the fluorescence increases as the amount of the particular sequence increases.
C. Hybridization Methods for Measuring Gene Expression
[0043] Methods that may be used to measure gene expression in connection with the present disclosure may include a hybridization step. In some preferred embodiments, the methods include use of a DNA microarray. DNA microarrays employ a plurality of specific DNA sequences (e.g., probes, reporters, oligos) attached to a slide or chip. First, cDNA from a sample is labeled with a fluorophore, silver, or a chemiluminescent molecule. Then, the labeled sample is hybridized to the DNA microarray under specific conditions, and hybridization is subsequently detected and quantified. Other methods of measuring gene expression through hybridization include but are not limited to Northern blot analysis, and in situ hybridization.
III. Methods for Treating SARS-COV-2 Infection
[0044] Certain aspects of the present disclosure relate to methods for treating a SARS-CoV- 2-infected human subject, identified by use of any of the methods disclosed herein for measuring levels of RNA expression of a plurality of genes in cells from a nasopharyngeal test sample obtained from the human subject, by administering an effective amount of a COVID-19 therapeutic agent. In some embodiments, the COVID-19 therapeutic agent comprises an antiviral agent. In some embodiments, the antiviral agent comprises one or more of lopinavir, ritonavir, remdesivir, ribavirin, umifenovir, favipiravir, darunavir, and oseltamivir. In some embodiments, the antiviral agent comprises remdesivir. In some embodiments, the COVID-19 therapeutic agent comprises an immunotherapeutic agent. In some embodiments, the immunotherapeutic agent comprises one or more an interferon, convalescent plasma, hyperimmune plasma, and an anti- SARS-CoV2 monoclonal antibody or SARS-CoV2-binding fragment thereof.
IV. Kits for Measuring Gene Expression and Diagnosis of SARS-CoV-2 Infection
[0045] Certain aspects of the present disclosure relate to kits for measuring gene expression and diagnosis of SARS-CoV-2 infection, and optionally prognosis of severe COVID-19. In some embodiments, the kits comprise a plurality of oligonucleotides and instructions for use thereof. In some embodiments, the plurality of oligonucleotides of the kit are attached to a slide or a chip. In some embodiments, the plurality of oligonucleotides of the kit each comprise a label for ease in detection. In some embodiments, the plurality of oligonucleotides comprise a pair of oligonucleotides for each of the plurality of genes.
[0046] In some preferred embodiments, the kits include (a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7 or all 8 genes of the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and (b) instructions for: (i) use of the oligonucleotides for measuring RNA expression of the plurality of genes; and (ii) identifying whether a human subject has a SARS-CoV-2 infection or a non- viral acute respiratory illness based on the levels of RNA expression of a the plurality of genes in cells from a nasopharyngeal test sample obtained from the human subject, wherein the human subject is suspected of having an acute respiratory illness. In related embodiments, the plurality of genes comprises the genes of Table 1-8 or the genes of Table 1-6.
[0047] In some preferred embodiments, the kits include (a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7, 8, 9, 10 or all 11 genes of the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and (b) instructions for: (i) use of the oligonucleotides for measuring RNA expression of the plurality of genes; and (ii) identifying whether a human subject has a SARS-CoV-2 infection or another viral acute respiratory illness based on the levels of RNA expression of a the plurality of genes in cells from a nasopharyngeal test sample obtained from the human subject, wherein the human subject is suspected of having an acute respiratory illness. In related embodiments, the plurality of genes comprises the genes of Table 1-9 or the genes of Table 1-7.
[0048] In some preferred embodiments, the kits include (a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 or all 19 genes of the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, TCN1, HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and (b) instructions for: (i) use of the oligonucleotides for measuring RNA expression of the plurality of genes; and (ii) identifying whether the test sample has a gene expression profile of a SARS-CoV-2 infection based on the levels of RNA expression of a the plurality of genes in cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, optionally wherein identification of the gene expression profile is indicative of the human subject having a SARS-CoV-2 infection. In related embodiments, the plurality of genes comprises the genes of Table 1-8 and Table 1-9 or the genes of Table 1-6 and Table 1-7.
[0049] In some preferred embodiments, the kits include (a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7, 8, 9, 10, 11 or all 58 genes of the group consisting of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD 16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8, H3F3B, IL6R, RAB20, HSPA1B, MIR22HG, PPAP2B, TBC1D2, SRPK1, FGD4, RAB21, and CPD; and (b) instructions for: (i) use of the oligonucleotides for measuring RNA expression of the plurality of genes; and (ii) identifying whether the test sample has a severe CO VID-19 gene expression profile based on the levels of RNA expression of a the plurality of genes in cells from a nasopharyngeal test sample obtained from a human subject having or suspected of having a SARS-CoV-2 infection, optionally wherein identification of the severe COVID-19 gene expression profile is indicative of the human subject having a SARS-CoV-2 infection, and having or developing severe COVID-19, optionally wherein severe COVID-19 is associated with hospitalization, optionally wherein hospitalization comprises a stay within a hospital intensive care unit and/or mechanical ventilation. In related embodiments, the plurality of genes comprises the genes of Table 1-10.
ENUMERATED EMBODIMENTS
1. A method for measuring gene expression, comprising the steps of:
(a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises:
(i) at least one gene selected from the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and
(ii) at least one gene selected from the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and
(b) identifying the test sample as having a gene expression profile of a SARS-CoV-2 infection when:
(i) the level of RNA expression of RSAD2, IFI6, IFI44L, EPSTI1, and/or SERPING1 is elevated, and/or RNA expression of ATP5G1, COX20, and/or TCN1 is reduced in the test sample in comparison with respective reference values; and
(ii) the level of RNA expression of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, and/or HSPA14 is elevated, and/or RNA expression of MTRNR2L6, SLC16A8, and/or BATF3 is reduced in the test sample in comparison with respective reference values, optionally wherein identification of the gene expression profile is indicative of the human subject having a SARS-CoV-2 infection.
2. The method of embodiment 1, wherein the at least one gene of (a)(i) comprises 3, 4, 5, 6, 7 or all 8 genes of the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and/or wherein the at least one gene of (a)(ii) comprises 3, 4, 5, 6, 7, 8, 9, 10 or all 11 genes of the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3. 3. A method for measuring gene expression, comprising the steps of:
(a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises:
(i) at least one gene selected from the group consisting of RSAD2, SLC6A4, SHISA3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PROS1, SCGB1A1, and CST1; and
(ii) at least one gene selected from the group consisting of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFKFB4, POSTN, BORA, NUP35, DHFR, AMBP, ADRA2A, ZNF92, CYP2F1, PIFO, SNTN, ZNF469, ADH1C, FAM3D, SERTAD2, HSPA14, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, MTRNR2L6, ADGRL1, PLEKHA4, CD300E, ECSCR, SLC16A8, BATF3, TRPV5, GUK1, and TUBG2; and
(b) identifying the test sample as having a gene expression profile of a SARS-CoV-2 infection when:
(i) the level of RNA expression of RSAD2, SLC6A4, SHIS A3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, and/or SARS2 is elevated, and/or RNA expression of ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PROS1, SCGB1A1, and/or CST1 is reduced in the test sample in comparison with respective reference values; and
(ii) the level of RNA expression of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFKFB4, POSTN, BORA, NUP35, DHFR, AMBP, ADRA2A, ZNF92, CYP2F1, PIFO, SNTN, ZNF469, ADH1C, FAM3D, SERTAD2, HSPA14, ILVBL, PERP, UBE2I, and/or EHF is elevated, and/or RNA expression of MLKL, PTMS, MTRNR2L6, ADGRL1, PLEKHA4, CD300E, ECSCR, SLC16A8, BATF3, TRPV5, GUK1, and/or TUBG2 is reduced in the test sample in comparison with respective reference values, optionally wherein identification of the gene expression profile is indicative of the human subject having a SARS-CoV-2 infection.
4. The method of embodiment 3, wherein the at least one gene of (a)(i) comprises a plurality of 3, 4, 5, 6, 7, 9, 10, or all 29 genes of the group consisting of RSAD2, SLC6A4, SHISA3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PR0S1, SCGB1A1, and CST1; and/or wherein the at least one gene of (a)(ii) comprises a plurality of 3, 4, 5, 6, 7, 8, 9, 10 or 37 genes of the group consisting of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFKFB4, POSTN, BORA, NUP35, DHFR, AMBP, ADRA2A, ZNF92, CYP2F1, PIFO, SNTN, ZNF469, ADH1C, FAM3D, SERTAD2, HSPA14, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, MTRNR2L6, ADGRL1, PLEKHA4, CD300E, ECSCR, SLC16A8, BATF3, TRPV5, GUK1, and TUBG2.
5. A method for identifying whether a human subject has a SARS-CoV-2 infection or a non-viral acute respiratory illness, comprising the steps of:
(a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises three or more genes selected from the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and
(b) identifying the human subject as having a SARS-CoV-2 infection when the level of RNA expression of RSAD2, IFI6, IFI44L, EPSTI1, and/or SERPING1 is elevated, and/or RNA expression of ATP5G1, COX20, and/or TCN1 is reduced in the test sample in comparison with respective reference values; and/or identifying the human subject as having a non-viral acute respiratory illness when the level of RNA expression of RSAD2, IFI6, IFI44L, EPSTI1, and/or SERPING1 is reduced, and/or RNA expression of ATP5G1, COX20, and/or TCN1 is elevated in the test sample in comparison with the respective reference values.
6. A method for identifying whether a human subject has a SARS-CoV-2 infection or another viral acute respiratory illness, comprising the steps of:
(a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises three or more genes selected from the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and
(b) identifying the human subject as having a SARS-CoV-2 infection when the level of RNA expression of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, and/or HSPA14 is elevated, and/or RNA expression of MTRNR2L6, SLC16A8, and/or BATF3 is reduced in the test sample in comparison with respective reference values; and/or identifying the human subject as having another viral acute respiratory illness when the level of RNA expression of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, and/or HSPA14 is reduced, and/or RNA expression of MTRNR2L6, SLC16A8, and/or BATF3 is elevated in the test sample in comparison with the respective reference values.
7. The method of embodiment 6, wherein the other viral acute respiratory illness is associated with an infection with a virus selected from the group consisting of an influenza virus, a seasonal coronavirus, a rhinovirus, a metapneumovirus, and a parainfluenza virus.
8. The method of any one of embodiments 1-7, wherein the human subject has symptoms of an acute respiratory illness when the nasopharyngeal test sample was obtained.
9. The method of any one of embodiments 1-7, wherein the human subject does not have symptoms of an acute respiratory illness when the nasopharyngeal test sample was obtained.
10. The method of any one of embodiments 1-9, wherein the respective reference values are determined from a nasopharyngeal control sample from a healthy human subject without symptoms of a respiratory illness, optionally wherein the respective reference values are average values are determined from a plurality of nasopharyngeal control samples obtained from a plurality of healthy human subjects, optionally wherein the healthy human subject or subjects do not have an acute SARS-CoV-2 infection.
11. A method for measuring gene expression, comprising the steps of:
(a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject having or suspected of having a SARS-CoV-2 infection, wherein the plurality of genes comprises three or more genes selected from the group consisting of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8, H3F3B, IL6R, RAB20, HSPA1B, MIR22HG, PPAP2B, TBC1D2, SRPK1, FGD4, RAB21, and CPD; and
(b) identifying the test sample as having a severe COVID-19 gene expression profile when the level of RNA expression of the plurality of genes is elevated in the test sample in comparison with respective reference values; and/or identifying the test sample as having a mild COVID-19 gene expression profile when the level of RNA expression of the plurality of genes is not elevated in the test sample in comparison with the respective reference values, optionally wherein identification of the severe CO VID-19 infection gene expression profile is indicative of the human subject having a SARS-CoV-2 infection and having or developing severe COVID-19, optionally wherein severe COVID-19 is associated with hospitalization, optionally wherein hospitalization comprises a stay within a hospital intensive care unit and/or mechanical ventilation.
12. The method of embodiment 11, wherein the plurality of genes comprises 4, 5, 6, 7, 8, 9, 10, or all 58 genes of the group consisting of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8, H3F3B, IL6R, RAB20, HSPA1B, MIR22HG, PPAP2B, TBC1D2, SRPK1, FGD4, RAB21, and CPD.
13. The method of any one of embodiments 1-12, wherein the human subject has been exposed to a SARS-CoV-2-infected individual within 1-2 weeks of the nasopharyngeal test sample being obtained.
14. The method of any one of embodiments 1-13, further comprising: obtaining the test sample prior to step (a).
15. The method of any one of embodiments 1-14, further comprising: extracting RNA from the test sample prior to step (a).
16. The method of any one of embodiments 1-15, further comprising performing or having performed a SARS-CoV-2 reverse transcriptase-polymerase chain reaction (RT-PCR) test on the nasopharyngeal test sample. 17. The method of any one of embodiments 1-16, further comprising performing or having performed a SARS-CoV-2 antibody test on a blood sample obtained from the human subject.
18. The method of any one of embodiments 1-17, further comprising step (c) treating the SARS-CoV-2-infected subject identified in step (b) by administering an effective amount of a COVID-19 therapeutic agent.
19. The method of embodiment 18, wherein the COVID-19 therapeutic agent comprises one or both of an antiviral agent and an immunotherapeutic agent.
20. The method of embodiment 19, wherein the COVID-19 therapeutic agent comprises an antiviral agent.
21. The method of embodiment 20, wherein the antiviral agent comprises:
(i) one or more of the group consisting of lopinavir, ritonavir, remdesivir, ribavirin, umifenovir, favipiravir, darunavir, and oseltamivir;
(ii) remdesivir;
(iii) molnupiravir; and/or
(iv) paxlovid (PF-07321332 and ritonavir), optionally wherein the antiviral agent comprises remdesivir, optionally wherein the antiviral agent comprises molnupiravir, optionally wherein the antiviral agent comprises paxlovid (PF-07321332 and ritonavir)..
22. The method of embodiment 19, wherein the COVID-19 therapeutic agent comprises an immunotherapeutic agent.
23. The method of embodiment 22, wherein the immunotherapeutic agent comprises one or more of the group consisting of an interferon, convalescent plasma, hyperimmune plasma, and an anti-SARS-CoV2 monoclonal antibody or SARS-CoV2-binding fragment thereof, optionally wherein the anti-SARS-CoV-2 monoclonal antibody comprises:
(i) sotrovimab;
(ii) bamlanivimab and etesevimab; and/or
(iii) casirivimab and imdevimab, 24. The method of any one of embodiments 1-23, wherein step (a) comprises one or more of the group consisting of sequence analysis, hybridization, and amplification.
25. The method of any one of embodiments 1-23, wherein step (a) comprises: performing reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) on RNA extracted from the test sample.
26. The method of any one of embodiments 1-23, wherein step (a) comprises: hybridizing RNA extracted from the test sample to a microarray.
27. The method of any one of embodiments 1-23, wherein step (a) comprises: performing serial amplification of gene expression (SAGE) on RNA extracted from the test sample.
28. The method of any one of embodiments 1-23, wherein step (a) comprises targeted RNA expression resequencing comprising:
(i) preparing an RNA expression library for the plurality of genes from RNA extracted from the test sample;
(ii) sequencing a portion of at least 50,000 members of the library;
(iii) generating a read count for RNA expression of the plurality of genes by normalization to the sequence of the at least 50,000 members of step (ii).
29. The method of any one of embodiments 1-23, wherein step (a) comprises whole transcriptome shotgun sequencing (WTSS) comprising:
(i) preparing an RNA expression library for the plurality of genes from RNA extracted from the test sample;
(ii) sequencing a portion of at least 50,000 members of the library;
(iii) generating a read count for RNA expression of the plurality of genes by normalization to the sequence of the at least 50,000 members of step (ii).
30. A kit compri sing :
(a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7 or all 8 genes of the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and
(b) instructions for: (i) use of the oligonucleotides for measuring RNA expression of the plurality of genes; and (ii) identifying whether a human subject has a SARS-CoV-2 infection or a non-viral acute respiratory illness based on the levels of RNA expression of a the plurality of genes in cells from a nasopharyngeal test sample obtained from the human subject, wherein the human subject is suspected of having an acute respiratory illness.
31. The kit of embodiment 30, wherein the plurality of genes further comprises one or more genes selected from the group consisting of SLC6A4, SHISA3, FAM155B, SEMA7A, KIAA1614, TECTA, CXCL9, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, AZGP1, WFDC6, SDHAF4, FCGBP, BPIFB1, PROS1, SCGB1A1, CST1, and combinations thereof
32. The kit of embodiment 30, wherein the plurality of genes consists of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1.
33. The kit of embodiment 31, wherein the plurality of gene consists of RSAD2, SLC6A4, SHISA3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PROS1, SCGB1A1, and CST1.
34. A kit comprising:
(a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7, 8, 9, 10 or all 11 genes of the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and
(b) instructions for: (i) use of the oligonucleotides for measuring RNA expression of the plurality of genes; and (ii) identifying whether a human subject has a SARS-CoV-2 infection or another viral acute respiratory illness based on the levels of RNA expression of a the plurality of genes in cells from a nasopharyngeal test sample obtained from the human subject, wherein the human subject is suspected of having an acute respiratory illness.
35. The kit of embodiment 34, wherein the plurality of genes further comprises one or more genes selected from the group consisting of ZC4H2, HMX1, POSTN, BORA, NUP35, DHFR, AMBP, ZNF92, CYP2F1, SNTN, ZNF469, ADH1C, FAM3D, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, ADGRL1, PLEKHA4, CD300E, ECSCR, TRPV5, GUK1, and TUBG2, and combinations thereof. 36. The kit of embodiment 34, wherein the plurality of genes consists of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3.
37. The kit of embodiment 35, wherein the plurality of gene consists of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFKFB4, POSTN, BORA, NUP35, DHFR, AMBP, ADRA2A, ZNF92, CYP2F1, PIFO, SNTN, ZNF469, ADH1C, FAM3D, SERTAD2, HSPA14, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, MTRNR2L6, ADGRL1, PLEKHA4, CD300E, ECSCR, SLC16A8, BATF3, TRPV5, GUK1, and TUBG2.
38. A kit compri sing :
(a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 or all 19 genes of the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, TCN1, HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and
(b) instructions for: (i) use of the oligonucleotides for measuring RNA expression of the plurality of genes; and (ii) identifying whether the test sample has a gene expression profile of a SARS-CoV-2 infection based on the levels of RNA expression of a the plurality of genes in cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, optionally wherein identification of the gene expression profile is indicative of the human subject having a SARS-CoV-2 infection.
39. The kit of embodiment 38, wherein the plurality of genes further comprises one or more genes selected from the group consisting of SLC6A4, SHISA3, FAM155B, SEMA7A, KIAA1614, TECTA, CXCL9, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, AZGP1, WFDC6, SDHAF4, FCGBP, BPIFB1, PROS1, SCGB1A1, CST1, ZC4H2, HMX1, POSTN, BORA, NUP35, DHFR, AMBP, ZNF92, CYP2F1, SNTN, ZNF469, ADH1C, FAM3D, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, ADGRL1, PLEKHA4, CD300E, ECSCR, TRPV5, GUK1, and TUBG2, and combinations thereof. 40. The kit of embodiment 38, wherein the plurality of genes consists of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, TCN1, HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3.
41. The kit of embodiment 39, wherein the plurality of genes consists of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, TCN1, HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, BATF3, SLC6A4, SHISA3, FAM155B, SEMA7A, KIAA1614, TECTA, CXCL9, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, AZGP1, WFDC6, SDHAF4, FCGBP, BPIFB1, PROS1, SCGB1A1, CST1, ZC4H2, HMX1, POSTN, BORA, NUP35, DHFR, AMBP, ZNF92, CYP2F1, SNTN, ZNF469, ADH1C, FAM3D, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, ADGRL1, PLEKHA4, CD300E, ECSCR, TRPV5, GUK1, and TUBG2.
42. A kit comprising:
(a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7, 8, 9, 10, 11 or all 58 genes of the group consisting of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD 16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8, H3F3B, IL6R, RAB20, HSPA1B, MIR22HG, PPAP2B, TBC1D2, SRPK1, FGD4, RAB21, and CPD; and
(b) instructions for: (i) use of the oligonucleotides for measuring RNA expression of the plurality of genes; and (ii) identifying whether the test sample has a severe CO VID-19 gene expression profile based on the levels of RNA expression of a the plurality of genes in cells from a nasopharyngeal test sample obtained from a human subject having or suspected of having a SARS-CoV-2 infection, optionally wherein identification of the severeCOVID-19 gene expression profile is indicative of the human subject having a severe SARS-CoV-2 infection and having or developing severe COVID-19, optionally wherein severe COVID-19 is associated with hospitalization, optionally wherein hospitalization comprises a stay within a hospital intensive care unit and/or mechanical ventilation.
43. The kit of embodiment 42, wherein the plurality of genes consists of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD 16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8, H3F3B, IL6R, RAB20, HSPA1B, MIR22HG, PPAP2B, TBC1D2, SRPK1, FGD4, RAB21, and CPD.
44. The method of any one of embodiments 1-29, wherein step (a) further comprises: iii) measuring levels of RNA expression of at least one control gene, wherein the at least one control gene is selected from the group consisting of PMM1, RAC1, RPP30, ACTB, and HSPD1; and iv) normalizing levels of RNA expression of the plurality of genes of (i) and (ii) to levels of RNA expression of the at least one control gene.
45. The kit of any one of embodiments 30-42, wherein the kit further comprises: at least one control oligonucleotide which hybridize to at least one control gene, wherein the at least one control gene is selected from the group consisting of PMM1, RAC1, RPP30, ACTB, and HSPD; and instruction for use of the at least one control oligonucleotide to normalize levels of RNA expression of the plurality of genes.
EXAMPLES
[0050] In the experimental disclosure which follows, the following abbreviations apply: ARI (acute respiratory illness); AUC (area under the curve); CO VID-19 (Coronavirus Disease-19); CPDH (California Department of Public Health); DEG (differentially expressed gene); DGE (differential gene expression); ICU (intensive care unit); IFN (interferon); ISG (interferon stimulating gene); mNGS (metatranscriptomic next-generation sequencing); NP (nasopharyngeal); ROC (receiver operating characteristic); RT-PCR (real-time reverse- transcription polymerase chain reaction); SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2); UCSF (University of California, San Francisco); UTM (universal transport media); and WB (whole blood).
[0051] Although, the present disclosure has been described in some detail by way of illustration and example for purposes of clarity and understanding, it will be apparent to those skilled in the art that certain changes and modifications may be practiced. Therefore, the following examples should not be construed as limiting the scope of the present disclosure, which is delineated by the appended claims.
EXAMPLE 1
Host Response RNA Profiles of Nasal Swabs and Blood from COVID-19 Patients Are Distinct and Allow For Identification of a Diagnostic Biosignature
[0052] The following example describes analyses of the host transcriptome of COVID-19 patients. RNA-Seq was used to characterize the host response to SARS-CoV-2 infection, and a diagnostic two-layer host response classifier was developed based on the host gene expression patterns to discriminate SARS-CoV-2 infection from other viral and non-viral acute respiratory illnesses.
Materials and Methods
[0053] Nasopharyngeal Swab Sample Collection. The study population consisted of patients with available remnant nasopharyngeal samples collected in universal transport media (UTM) or DNA/RNA Shield (Zymo Research) from the clinical laboratories at University of California, San Francisco (UCSF) (n=352). Samples from patients who were positive or negative by SARS- CoV-2 real-time reverse-transcription polymerase chain reaction (RT-PCR) testing or were positive by respiratory virus panel PCR on nasopharyngeal swabs collected from
September 20, 2014 to April 30, 2020 (FIG. 1A). Patients who tested negative by SARS-CoV-2 RT-PCR were selected randomly (n=100). In addition, ribonucleic acid (RNA) extracts from patients who had tested positive by SARS-CoV-2 RT-PCR (n=4), and UTM from patients with seasonal coronavirus, or influenza were provided by the California Department of Public Health (CDPH; Richmond, CA) (n=20). Nasopharyngeal swabs from donor controls were obtained from asymptomatic volunteers at UCSF (n=l 1).
[0054] Whole Blood Sample Collection. Remnant whole blood from patients with CO VID- 19 was collected from the clinical laboratories at UCSF from March 8, 2020 to April 13, 2020 (n=7) (FIG. 1A). Remnant whole blood from patients with influenza (n=20) and sepsis (n=6) were collected from March 7, 2018 to November 15, 2018. Additional donor controls were obtained from volunteers at UCSF (n=20).
[0055] Nucleic Acid Extraction. All NP swab samples obtained at UCSF were pre-treated with a 1 : 1 ratio of DNA/RNA Shield (Zymo Research) prior to extraction. An input volume of 200 pl of NP swab sample was used for all extraction methods performed at UCSF and eluted in 100 pl. NP swab samples obtained from the CDPH were extracted using the easyMag instrument (bioMerieux) according to the manufacturer’s instructions with an input volume of 300 pl and elution volume of 110 pl. For NP swab samples collected at UCSF, 217 were extracted using the Mag-Bind Viral DNA/RNA 96 kit (Omega Bio-Tek) on the KingFisher Flex (Thermofisher Scientific), and 34 samples using the EZ1 Advanced XL (Qiagen), according to the manufacturer’s instructions.
[0056] All WB samples (300 pl) were pre-treated with a 2: 1 ratio of DNA/RNA Shield (Zymo Research) and extracted using Direct-zol RNA Mini-Prep kit (Zymo Research) according to the manufacturer’s instructions. Samples were on-column DNase-treated with DNase-I (Zymo Research) and eluted in 30 pl. Extracted material was stored at -80°C.
[0057] Library Preparation and Sequencing. Extracted RNA from NP swab samples (25 pl) were treated with a nuclease cocktail of TURBO DNase (ThermoFisher Scientific), and Baseline Zero DNase (Ambion) for 30 min at 37°C and purified using Ampure XP beads (Beckman- Coulter) on the EpMotion 5075 (Eppendorf). Purified RNA (7 pl) was used for library preparation using the SMART-Seq Stranded kit (Takara Bio) and purified using Ampure XP beads (Beckman-Coulter) on the EpMotion 5073 (Eppendorf). Libraries were quantified using the Qubit dsDNA HS Assay (Thermofisher Scientific) on the Qubit Flex (Thermofisher Scientific).
[0058] WB sample libraries were prepared using 9 pl of total RNA and TruSeq Total RNA with Ribo-Zero Globin (Illumina), and spiked with 1 pl of ERCC RNA Spike-In Mix (Thermo Fisher Scientific). Libraries were purified using Ampure XP beads (Beckman-Coulter) and quantified using the Qubit dsDNA HS Assay (Thermofisher Scientific) on the Qubit Flex (Thermofisher Scientific).
[0059] NP swab and WB sample libraries were sequenced on the NovaSeq 6000 (Illumina) using 150bp paired-end sequencing at the UCSF Center for Advanced Technology (CAT). Included in each sequencing run were negative controls (nuclease-free water) to monitor for laboratory and reagent contamination and a Human Reference RNA Standard (Agilent) to monitor for sequencing efficiency.
[0060] Metatranscriptomic Analysis. Metatranscriptomic next-generation sequencing (mNGS) data from all samples were analyzed for viral nucleic acids using SURPI+ (vl.0.7- build.4), a bioinformatics pipeline for pathogen detection and discovery from metatranscriptomic data, modified to incorporate enhanced filtering and classification algorithms (41, 42). The SNAP nucleotide aligner was run using an edit distance of 16 against the National Center for Biotechnology Information (NCBI) nucleotide (NT) database filtered to contain the viral, bacterial, fungal, and parasitic reads of GenBank (March 2019, with inclusion of the SARS- CoV2 Wuhan-Hu-1 genome accession number NC 045512), enabling the detection of reads with >90% identity to reference sequences in the database. The pre-established criterion for viral detection by SNAP was the presence of reads mapping to at least three non-overlapping regions of the viral genome (41). Diversity metrics, including the Chao Richness Score and Shannon Diversity Index, were calculated in R (version 4.00) (43) using the vegan package (version 2.5.3), and figures were produced using the ggplot2 package (44).
[0061] Transcriptome Analysis . Following sequencing of sample libraries, quality control was performed on the fastq files to ensure the sequencing reads met pre-established cutoffs for number (z.e., at least 5 million read counts per sample) and quality using FastQC (version 0.11.8) (45) and MultiQC (version 1.8) (46). Quality filtering and adapter trimming were performed using BBduk tools (version 38.76). Reads were aligned to the ENSEMBL GRCh38 human reference genome assembly (Release 33) using STAR (version 2.7. Of) (47). Remaining reads were then aligned to the ENSEMBL GRCh38 human reference genome assembly (Release 33) using STAR (version 2.6.1a) (47), and gene frequencies were counted using featureCounts (version 2.0.0) within the Subread package (48). Comparative analysis of DGEs was performed using a generalized linear model (GLM) implemented in the edgeR Bioconductor package (version 3.30.3) (49), using a Benjamini-Hochberg corrected p-value of <0.01.
[0062] Hierarchical clustering of DEGs was performed in R (version 4.0.0) using the ComplexHeatmap and pheatmap package (43), and figures were produced using the ggplot2 package (44). The top 100 DEGs with a Bonferroni corrected p value of <0.1 and absolute logFC > log(0.58), which corresponds to a doubling in expression, were included. Clustering was performed based on Euclidean distance with complete linkage, after exclusion of non-coding genes.
[0063] Signaling pathway analyses and heatmaps were generated using Ingenuity Pathway Analysis (IP A) software (Qiagen) (50). The molecule activity predictor tool of IPA was used to predict gene upregulation or downregulation and pathway activation or inhibition. The enrichment score p-value was used to evaluate the significance of the overlap between predicted and observed genes, while the z-score was used to assess the match between observed and predicted regulation or downregulation.
[0064] Classifiers were developed using scikit-leam (version 1.2.2) (51) in Python. Several different classifier models were evaluated in parallel and the one with optimal performance on the training data was selected. These candidate classifier models included a Linear Support Vector Machine, Linear Discriminant Analysis, and a Deep Neural Network, all within the scikit-learn package. Reduced, small gene panels were selected using Lasso (52) and a forward customized reverse search across the resulting feature set. This search iteratively removed the remaining gene with the lowest significance as measured by its Lasso coefficient, performed classifier training, and reported sensitivity, specificity, and accuracy across the training set. These results were then manually reviewed to balance each of them with a priority placed on specificity and number of genes. Receiver operating characteristic (ROC) curves were generated using pROC package in R (53). [0065] Statistical Analysis. To identify potentially important clinical predictors for CO VID- 19 score among RT-PCR positive patients, linear regression models were used to check the association of each clinical variable with the transformed COVID-19 score while controlling for demographics (age, gender, and race/ethnicity). A stepwise procedure was then used to determine what clinical variables would be selected when all of the variables were included in the model while controlling for demographics. Variables with a p- value less than 0.15 from those models were further examined for their association with transformed COVID-19 score in one model together while controlling for demographics. In this exploratory analysis, /?-values were not adjusted for multiple comparisons, in order to avoid missing potentially important variables.
[0066] Ct values were categorized as low (Ct<l 8), moderate (Ct >18 and <25, and high (Ct>25). The association of demographics and clinical variables with RT-PCR (positive versus negative), diagnosis (COVID-19, influenza or bacterial sepsis), viral load (low, medium, high) were examined by Fisher’s exact test (values <5) or chi-squared test (values>5) for categorical variables and two-sample / test or ANOVA for age, respectively. The association of demographics and clinical variables with Ct values were assessed with Wilcoxon rank sum test for variables with two categories or Kruskal -Wallis test for variables with more than two categories. The tetrachoric or polychoric correlation was estimated for the correlation between binary RT-PCR and binary or ordinal symptoms and outcome. The point-biserial correlation was estimated for the correlation between binary symptoms and continuous Ct values.
[0067] For mNGS analysis, comparisons of virome or bacterial metatranscriptome abundance, richness, and alpha diversity between groups were analyzed using the Kruskal-Wallis test, followed by the Nemyeni test for post hoc analysis.
[0068] Comparisons of diagnosis and disease severity for cell types was conducted using the Kruskal -Wallis test, followed by Dunn’s test for post hoc analysis. All statistical tests were calculated as two-sided at the 0.05 significance level.
Results
[0069] Population Characteristics and Sequence Metrics. A total of 380 remnant NP swab samples from 380 individuals (163 SARS-CoV-2 positive patients, 217 SARS-CoV-2 negative patients, including 88 with documented influenza or seasonal coronavirus infection, and 11 control donors) and 53 WB samples from 53 individuals (7 SARS-CoV-2 positive patients, 26 SARS-CoV-2 negative patients with influenza or bacterial sepsis, and 20 control donors) were collected for RNA-Seq analysis (FIG. 1A). Of the 380 NP samples, 286 remnant NP swab samples from 286 individuals (137 SARS-CoV-2 positive patients, 149 SARS-CoV-2 negative patients) and all of the WB samples were used to evaluate the host response. The remaining 94 NP swab samples from 94 patients were reserved for independent assessment of a two-layer classifier. Clinical history was available from 177 of 340 (52.1%) patients with NP swabs and all 33 of 33 (100%) patients with whole blood analyzed by RNA-Seq (Tables 1-1 and 1-2), as well as for control donors (Table 1-3). Among COVID-19 patients, there was a median of 5 ± 11 days (range 0-65 days) between symptom onset and NP sample collection, and a median of 9 ± 29 days (range 6-72 days) between symptom onset and whole blood sample collection. Six COVID-19 patients had paired NP swabs and WB available for comparison. As a surrogate indicator for disease severity, COVID-19 patients were also stratified according to the highest level of care received (z.e., outpatient, hospitalized but not requiring intensive care, and ICU admission). The median age for COVID-19 patients was 49 versus 44 years old for non-COVID- 19 patients, with proportionally fewer women in the COVID-19 group (p=0.0021) (Table 1-1). COVID-19 patients were more likely to have fever (p<0.0001), chills (p=0.003), malaise (p=0.0009), and anosmia (p=0.0002) than non-CO VID-19 patients with ARI (Table 1-1). Hypertension and hyperlipidemia were significantly associated with COVID-19 patients (p=0.0406 andp=0.0128). The presence of fever (p=0.004) and cough (p=0.0008) appeared to correlate with high viral loads as indicated by low cycle threshold (Ct) values by PCR (<18). In contrast, viral loads in more severely ill hospitalized patients, including patients in the intensive care unit (ICU), were not significantly different from those in outpatients (p=0.72) (FIG. IB).
Table 1-1. Patient Demographics and Characteristics of Nasopharyngeal Swab Samples
Figure imgf000036_0001
Figure imgf000037_0001
Figure imgf000038_0001
*Other immune compromised conditions include autoimmune diseases and solid organ transplants.
** Ambulatory care includes outpatient as well as patients seen in the emergency department and not admitted.
T2DM, type 2 diabetes mellitus; CKD, chronic kidney disease; CAD, coronary artery disease; CHF, congestive heart failure; COPD, chronic obstructive pulmonary disease' HIV, human immunodeficiency virus; ACE inhibitors, angiotensin-converting enzyme inhibitors; ICU, intensive care unit.
Table 1-2. Patient Demographics and Clinical Characteristics of Whole Blood Samples
Figure imgf000038_0002
Figure imgf000039_0001
Figure imgf000040_0001
SARS-CoV-2 positive
AABacterial sepsis
*Other immune compromised conditions include autoimmune diseases and solid organ transplants.
** Ambulatory care includes outpatient as well as patients seen in ED and not admitted.
T2DM, type 2 diabetes mellitus; CKD, chronic kidney disease; CAD, coronary artery disease; CHF, congestive heart failure; COPD, chronic obstructive pulmonary disease' HIV, human immunodeficiency virus; ACE inhibitors, angiotensin-converting enzyme inhibitors; ICU, intensive care unit.
Table 1-3. Demographics of Nasopharyngeal Swab and Whole Blood Donor Controls
Figure imgf000041_0001
[0070] A total of 23.2 billion and 3.4 billion raw reads were sequenced from 380 NP swab and 53 WB samples, respectively. For the NP swab samples, the median transcriptome coverage achieved was 52.4% ± 17.8% (range 0.69-84.7%), generated from a median 30.3 ± 84.0 million reads (range 0.061 to 604 million reads) for each sample. Of these 380 NP swab samples, 286 were used to evaluate the host response and metatranscriptome, from 19 billion raw sequencing reads, with a median transcriptome coverage of 58.5% ± 15.1% (range 4.4-84.7%), generated from a median 28.8 ± 96.1 million reads (range 0.45 to 604 million reads). For the WB samples, the median coverage achieved was 37.5% ±1 6.2% (range 20.8-89.2%), generated from a median 30.8 ± 41.7 million reads (range 16.5 to 182 million reads).
[0071] Viral Co-infections in SARS-CoV-2 Patients. Of 286 NP swab samples tested, 137 (47.9%) were SARS-CoV-2 positive, and 108 (37.8%) were negative for any respiratory virus (including 11 donor controls). A respiratory virus was identified by metatranscriptomic analysis in 41 cases (14.3%) including 27 patients with previously confirmed influenza or seasonal coronavirus infection by RT-PCR testing. These respiratory viruses included seasonal coronavirus, influenza virus, human rhinovirus, human parainfluenza virus, and human metapneumovirus. Co-infections were identified in 10 of 137 (7.3%) SARS-CoV-2 positive and 4 of 41 (9.76%) SARS-CoV-2 negative individuals; 2 of 137 SARS-CoV-2 positive (1.5%) and 2 of 41 SARS-CoV-2 negative (4.88%) individuals were infected by 3 viruses (Table 1-4). Triply- infected individuals had additional infections from human rhinovirus (multiple genotypes) and human metapneumovirus.
Table 1-4. Distribution of Co-Infections in SARS-CoV-2 Positive and Negative Individuals
Figure imgf000042_0001
[0072] Metatranscriptomic analysis of WB samples identified anelloviruses and human herpesvirus 6B in SARS-CoV-2 positive individuals (but no SARS-CoV-2 reads), and hepatitis B virus, human immunodeficiency virus, and anelloviruses in patients with influenza (Table 1- 5). The absence of SARS-CoV-2 viremia is consistent with results from other published studies showing that viremia is rare in acutely infected individuals (12).
Table 1-5. Viruses Detected in Whole Blood
Figure imgf000043_0001
[0073] Impact Of SARS-Cov-2 Infection On The Nasopharyngeal Metatranscriptome. The effect of SARS-CoV-2 infection on the nasopharyngeal viral and bacterial metatranscriptome was investigated. The viral metatranscriptome of SARS-CoV-2 positive individuals (CO VID, n=137) was compared to SARS-CoV-2 negative individuals either with another respiratory virus (seasonal coronavirus, influenza, human rhinovirus, human metapneumovirus) detected by sequencing (“Other Virus”, n=41), or with no virus detected (“No Virus”, n=108) (FIGS. 1C- 1E). Additional detected respiratory viruses included all four seasonal coronaviruses (229E, HKU1, NL63, and OC43), influenza virus, human rhinovirus, human parainfluenzavirus 2, and human metapneumovirus. Relative abundance (p<0.001) and richness (Chao Richness Score)p <0.001) were higher in SARS-CoV-2 patients and in patients infected with other respiratory viruses than in patients without respiratory viral infection. In comparison to patients with another respiratory virus, SARS-CoV-2 patients had no difference in abundance (p=0.26) and a decrease in richness (p=0.02) (FIGS. 1C-1D, “Including Respiratory Viral Reads”) There was with no difference in diversity in any population (p=0.06) (FIG. IE, “Including Respiratory Viral Reads”). If respiratory viral reads are excluded (FIGS. 1C-1E, “Excluding Respiratory Viral Reads”), patients with SARS-CoV-2 infection showed no difference in abundance (p=0.06) or diversity (p=0.08), but revealed an increase in richness (/?<0.001) relative to individuals without a respiratory virus. In comparison to patients infected with another respiratory virus, patients with SARS-CoV-2 had decreased abundance (p=0.04) and diversity (p=0.008), but increased richness (/?<0.001). [0074] There was no difference in abundance, richness, or alpha diversity of the bacterial metatranscriptome in SARS-CoV-2 positive individuals compared to those without a virus or with another respiratory virus (FIGS. 1F-1H). Furthermore, infections from SARS-CoV-2 or other respiratory viruses did not appear to affect the overall distribution of families in the bacterial metatranscriptome. Based on the relative distribution of viral families found in the nasopharynx, patients with SARS-CoV-2 had an increase in the proportion of Siphoviridae (95%) compared to those infected with another respiratory virus (90%) or without a respiratory virus identified (86%). These findings are consistent with a study evaluating the microbiome using NP swabs in patients with SARS-CoV-2 infections (21).
[0075] Comparison of Cell Types and Proportions Between SARS-CoV-2 and Other Infections. Cell type and proportion analyses of NP swabs and WB using the MUSIC deconvolution algorithm (22) were performed. SARS-CoV-2 positive patients had increased ciliated epithelial cells relative to influenza (p=0.03) and seasonal coronavirus (p=0.02), increased neutrophils relative to non-viral ARIs (/?<0.0001), and increased eosinophils relative to donor samples (p=0.008) and non-viral ARIs (p<0.0001). SARS-CoV-2 positive patients had decreased fibroblasts relative to influenza (p=0.008) and seasonal coronaviruses (p=0.01), and decreased macrophages relative to influenza (p=0.02) and other respiratory viruses (p=0.04). Endothelial cells and other cells (mast, myeloid, basal, plasma, and glandular epithelial cells) were also lower in SARS-CoV-2 relative to influenza (p=0.02), and other viruses (p=0.04). Influenza had increased fibroblasts (/?<0.0001), macrophages (/?<0.03), neutrophils (p<0.0001), but decreased ciliated epithelial cells (p=0.02), endothelial cells (p=0.03) and other cells (p=0.03) relative to non-viral ARIs. Seasonal coronaviruses had increased neutrophils (p <0.0001), fibroblasts (/?<0.0001), and other cells (p=0.04), but decreased ciliated epithelial cells (p<0.0001 ) compared to non-viral ARIs. There was no difference in the proportion of cell types among different levels of severity of SARS-CoV-2 infection.
[0076] When looking at cell proportions in WB, there was an increase in basophils and smooth muscle cells in SARS-CoV-2 relative to influenza (p=0.007 and p=0.003, respectively), sepsis (p=0.008 and p=0.008, respectively), and donor controls (p=0.0002 and p=0.001, respectively). There were also increased bone progenitor cells (p=0.002) and platelets (p=0.004) in SARS-CoV-2 relative to influenza and decreased CD8+ T cells (p=0.004) and erythrocytes (p=0.004) relative to donor controls. Compared to sepsis, SARS-CoV-2 had decreased neutrophils (p=0.03) and increased of platelets (p=0.002).
[0077] Nasopharyngeal Swab Transcriptome Analysis. Pathway analysis of DEGs in NP swabs from COVID-19 patients relative to uninfected donor controls showed prominent activation of genes related to interferon (IFN) signaling and interferon stimulating genes (ISGs) (including IFI6, I FIT! -3, and ISG15), but inhibition of IL-6 and IL-8 signaling genes (including IRAKI and MAP2K7). Patterns of activation and inhibition associated with COVID-19 were markedly different from those associated with influenza or other viral infections. In particular, COVID-19 patients showed activation of pathways involved primarily in cell death and survival, and both activation and inhibition of pathways associated with organismal injury and survival and inflammatory response. Relative to donor controls, influenza and other viral respiratory infections shared IFN signaling activation pathways in common with COVID-19 (FIG. 2A). However, other immune response pathways that were activated by influenza and other viral infections, such as acute phase, B-cell receptor, and Toll-like receptor signaling (including genes IRAKI, MAPK12, MAP2K7), and chemokine signaling (including IL-6 and IL-S) were inhibited in COVID-19. Patients infected with SARS-CoV2 or a seasonal coronavirus showed similar levels of activation of glycoprotein IV (GP6) pathway, and inhibition of dendritic cell maturation and acute phase response signaling pathways (IRAKI and MAPK12).
[0078] Hierarchical clustering of NP swab transcriptome DEGs in patients with SARS-CoV- 2 infection relative to individuals without SARS-CoV-2 infection, including donor controls, revealed 3 distinct gene groups. Group A (n=35, including IFIT2, IFI6, and OAS2) was enriched in immune signaling genes and was upregulated in SARS-CoV-2 infections but not other viral and non-viral ARIs. Group B consisted mostly of genes related to cell metabolism, signaling, and transport, as well as many uncharacterized genes, (n=41, including SOX3, CLCN1, and CCL2) and was increased in viral infections other than SARS-CoV-2, particularly influenza and seasonal coronavirus, compared to non-viral ARIs. Group C (n=24, including COX15, FLI-1, and POLDI) was enriched in immune signaling, cell signaling, and cellular metabolism genes and was increased in viral infection, including from SARS-CoV-2.
[0079] Differential nasopharyngeal host responses in COVID- 19 hospitalized patients versus outpatients. Hospitalized patients with COVID-19, including those requiring intensive care, had overlapping but heightened inflammatory responses compared to outpatients, with upregulation of DEGs implicated in innate antiviral immunity, such as TREM1 signaling and proinflammatory cytokines related to IL-6 and IL-8 signaling, including CXCL2, CXCL8, and IL6R relative to uninfected donor controls. There was also increased activation of pathways involved in hematological development and function, cellular movement, immune cell trafficking, inflammatory responses, and cell-to-cell signaling.
[0080] Hierarchical clustering of DEGs based on pairwise comparison between outpatients versus hospitalized patients with COVID-19 revealed 3 distinct groups. The groups consisted of genes related to cell signaling, cellular metabolism, immune signaling, and innate immunity (group L) (n=52, including IL1R1, IL6R, and CXCZ2); cellular metabolism, immune signaling, and innate immunity (group M) (n=13, including CXCL1, CXCL8, and VEGFA); and cellular metabolism and transport (group N) (n=2, including SAT1 and I'THl). Genes from all three groups had increased overall expression in hospitalized patients relative to outpatients. Relative to donor controls, 26% (44/171) of DEGs were shared between outpatients and hospitalized patients (FIG. 2C), of which 21 of 44 (48%) were related to IFN signaling and innate immunity, including IFIT1, IFIT3, ISG15, EIF2AK2, and MAPK2K7.
[0081] Whole Blood Transcriptome Analysis. Pathway analysis of WB from COVID-19 patients, all of whom were hospitalized, compared to patients with influenza or bacterial sepsis showed striking inhibition of genes in multiple pathways associated with immune cell signaling and antiviral IFN responses, particularly genes in the NF-kB and TREM1 signaling pathways (IL- IB, TLR1, TLR4, and TLR6), as well as natural killer cell signaling pathways (FCGR2A,FCGR3A, and FCGR3B). Upregulated pathways in COVID-19 were primarily related to cell signaling (ERK/MAPK and GP6 signaling), tissue development, cellular function and proliferation, and organismal injury, and included only a few immune pathways, such as PI3K signaling in B -lymphocytes, CXCR4 signaling, and IL-15 production. In contrast, bacterial sepsis was characterized by generalized upregulation of immune-mediated pathways as well as multiple additional pathways associated with hematological development and other cellular functions. Hierarchical clustering of DEGs among patients with COVID-19, influenza, or bacterial sepsis based on comparisons to donor controls revealed 6 distinct groups. Groups D (n=20) and E (n=36) were upregulated in CO VID-19 and were primarily composed of genes related to cell death, cell metabolism, cell signaling, and multiple additional pathways, including DUSP8, CCR3, STX1A, and HBEGF. Groups H (n=13) and I (n=12) were upregulated in bacterial sepsis and were enriched in genes related to innate immunity, immune signaling, cell signaling, and cell metabolism, including TLR8, DDIT4, IFIT1, and MMP9. Influenza showed mild upregulation of all pathways.
[0082] Comparison of CO VID-19 Host Responses in NP Swabs and WB. CO VID- 19 host responses in NP swabs and WB shared common pathways related to antiviral response, innate immunity, ISG signaling (e.g. IL-6 and IL-8) and dendritic cell maturation. However, the directionality of signaling was discordant between NP swabs and WB for multiple additional immune-related pathways, including acute phase response signaling (z-score of -1.30 for NP swabs versus 0.33 for WB) , IL-15 signaling (z-score of 0 versus 1.89), CXCR4 signaling (z- score 0 versus 1.63), natural killer cell signaling (z-score 0 versus -1.63), Thl pathway (z-score 0 versus -2.24), and B-cell receptor signaling (z-score 2.11 versus -0.5). Very few DEGs (<3%) were shared between NP swabs and WB from COVID-19 patients (FIG. 2B, FIGS. 5A-5B), suggesting that the host response was localized and body-site specific. In contrast, heightened IFN responses in both NP swabs and WB were observed for influenza, consistent with a systemic immune and inflammatory response. Notably, among the 16 DEGs shared between NP swabs and WB from influenza patients, the majority of those genes (11 of 16, 69%) were related to innate immunity and IFN signaling (FIG. 2D).
[0083] Classifier. As transcriptome analysis had revealed distinct patterns of gene expression in COVID-19 patients (FIG. 2A), it was hypothesized that a classifier could be constructed that accurately discriminates between SARS-CoV-2 infection and other viral or non- viral ARIs from NP swabs. After randomly partitioning 30% of samples into an independent test cohort, two-layer classifier was developed that first differentiates between SARS-CoV-2 positive cases and SARS-CoV-2 negative cases for which no pathogen was identified (layer 1), followed by a second layer that differentiates SARS-CoV-2 from microbiologically confirmed viral acute respiratory illnesses, including influenza and seasonal coronavirus infections, among others (layer 2) (FIG. 3). The initial set of DEGs was selected using a Bonferroni corrected p value of <0.001 for both layers. Only samples assigned to SARS-CoV-2 by both binary classifiers were designated positive for SARS-CoV-2 infection. The cutoff for the prediction score of each classifier was determined by generating receiver operating characteristic (ROC) curves for the training data, and comparing Youden’s index, an arbitrary 0.5 cut off, and a manually selected threshold that prioritized specificity (“high-specificity threshold”). After review of the training set results, the selected high-specificity threshold was manually selected.
[0084] The layer 1 classifier, generated using a training set of 110 SARS-CoV-2 positive and 93 non-viral ARI samples, contained 748 DEGs, consisting of genes associated with both cell processes and immune signaling. This classifier had a sensitivity of 97.3% specificity of 97.3%, and area under the receiver operating characteristic curve (AUC) of 0.993 at a threshold of 0.4515. The layer 2 classifier, generated using a training set of the same 110 SARS-CoV-2 positive and 93 viral ARI samples, contained 266 DEGs with a smaller proportion of immune signaling genes than in the layer 1 classifier. This classifier had a sensitivity of 95.5%, specificity of 98.9% and AUC of 0.999 at a threshold of 0.6066. Based on training set data, the full 1,014- gene two-layer classifier (containing a full complement of 1,014 genes) had an overall sensitivity of 95.5%, specificity of 98.2%, and AUC of 0.999 (FIG. 4A).
[0085] The performance of the two-layer classifier was then evaluated using an independent test set that included NP swab samples from 28 SARS-CoV-2 positive, 19 non-viral ARI and 27 viral ARI patients (FIG. 3). The layer 1 classifier had 82.1% sensitivity, 89.5% specificity (FIG. 6A), and AUC of 0.944, while the layer 2 classifier yielded 92.9% sensitivity, 96.3% specificity (FIG. 6D), and AUC of 0.991. Based on test set data, the full 1,014-gene two-layer classifier had an overall sensitivity of 75.0% (95% CI: 55.0-89.0%), specificity of 93.5% (95% CI: 82.1- 98.6%), and AUC of 0.933 (range 0.879-.987), yielding an overall accuracy of 86.5% (FIG. 4A).
[0086] Because panels containing a smaller number of genes would be more practical to translate into a clinical assay, a lasso regression analysis was used to find an optimal set of genes for a medium two-layer classifier with an a priori specification of no more than 100 genes. The medium classifier consisted of 29 genes for layer 1 and 38 genes for layer 2 (Tables 1-6 and 1- 7). Based on the training set, the medium 67-gene 2-layer classifier had a sensitivity of 88.2%, specificity of 97.6%, and AUC of 0.997. When applied to the test set, the medium 2-layer classifier had a sensitivity of 71.4% (95% CI: 51.3-86.8%), specificity of 93.5% (95% CI: 82.1- 98.6%), AUC of 0.922 (range 0.863 - 0.982), and 85.1% overall accuracy (FIG. 4B). Table 1-6. Medium Gene Set Layer 1
Figure imgf000049_0001
Table 1-7. Medium Gene Set Layer 2
Figure imgf000049_0002
Figure imgf000050_0001
[0087] The number of genes was then narrowed to <20 total by iteratively removing one gene at a time from the 29 genes for layer 1 and 37 genes for layer 2. Maximum performance was identified for a small two-layer classifier consisting of 19 genes, 8 genes for layer 1 and 11 genes for layer 2 (Tables 1-8 and 1-9). Based on the training set, the small 19-gene 2-layer classifier had a sensitivity of 94.6% specificity of 94.6% and AUC of 0.984 for layer 1. When applied to the test set, the small 2-layer classifier had a sensitivity of 78.6% (95% CI: 76.5- 99.1%), specificity of 89.1% (95% CI: 59.1-91.7%), AUC of 0.906 (range 0.837 - 0.974), and 85.1% accuracy (FIG. 4C).
Table 1-8. Small Gene Set Layer 1
Figure imgf000051_0001
Table 1-9. Small Gene Set Layer 2
Figure imgf000051_0002
[0088] Notably, for the NP swab sample from the one asymptomatic COVID-19 patient in the study, the 3 classifiers (full, medium, and small) predicted a SARS-CoV-2 host response with 85.7-99.2% confidence. There was >50% overlap in the misclassified patients across all 3 classifiers, suggesting internal consistency between them. No obvious clinical factors, including days between symptom onset and sample collection, appeared to be associated with classifier performance.
[0089] Further, a classifier was constructed to discriminate between severe COVID-19 and mild COVID-19. In brief, severity associated genes were identified by comparing expression of genes in NP swabs obtained from outpatients with mild COVID-19 and hospitalized patients with severe COVID-19, including intensive care unit patients requiring mechanical ventilation.. The severity classifier consisted of the genes provided in Table 1-10.
Table 1-10. Severity Associated Gene Set
Figure imgf000052_0001
Figure imgf000053_0001
[0090] Here RNA-Seq was used to characterize the differential host responses to SARS- CoV-2 infection in 286 NP swab and 53 whole blood samples from 333 individuals. Both NP swabs and WB from COVID-19 patients showed distinct patterns of activation or inhibition relative to other infections (influenza, seasonal coronaviruses, and bacterial sepsis) and to each other. SARS-CoV-2 infection was found to activate interferon-mediated antiviral pathways and paradoxically inhibit multiple additional immune and inflammatory pathways, resulting in an overall dysregulated immune response. Host responses were similar between outpatients and hospitalized patients with CO VID-19, but the magnitude of host response was found to increase with clinical severity of disease. Further, diagnostic two-layer host response classifiers were developed based on RNA-Seq data that can discriminate SARS-CoV-2 infection from other viral and non-viral ARIs from NP swab samples with an accuracy of 85.7-86.5%. Finally, a classifier to discriminate the severity of SARS-CoV-2 infection was developed.
References
1. E. Dong, H. Du, L. Gardner, An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis 20, 533-534 (2020).
2. E. Abdollahi, D. Champredon, J. M. Langley, A. P. Galvani, S. M. Moghadas, Temporal estimates of case-fatality rate for COVID-19 outbreaks in Canada and the United States. CMAJ 192, E666-E670 (2020).
3. W. J. Guan et al., Clinical Characteristics of Coronavirus Disease 2019 in China. N Engl J Med 382, 1708-1720 (2020).
4. T. W. Russell et al., Estimating the infection and case fatality ratio for coronavirus disease (COVID-19) using age-adjusted data from the outbreak on the Diamond Princess cruise ship, February 2020. Euro Surveill 25, (2020).
5. R. Verity et al., Estimates of the severity of coronavirus disease 2019: a model-based analysis. Lancet Infect Dis 20, 669-677 (2020).
6. Z. Wu, J. M. McGoogan, Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72314 Cases From the Chinese Center for Disease Control and Prevention. JAMA, (2020).
7. X. Yang et al., Clinical course and outcomes of critically ill patients with SARS- CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. Lancet Respir Med 8, 475-481 (2020).
8. C. Wu et al., Risk Factors Associated With Acute Respiratory Distress Syndrome and Death in Patients With Coronavirus Disease 2019 Pneumonia in Wuhan, China. JAMA Intern Med, (2020).
9. Z. Zhou et al., Heightened Innate Immune Responses in the Respiratory Tract of COVID-19 Patients. Cell Host Microbe 27, 883-890 e882 (2020).
10. Y. Pan, D. Zhang, P. Yang, L. L. M. Poon, Q. Wang, Viral load of SARS-CoV-2 in clinical samples. Lancet Infect Dis, (2020).
11. L. Zou et al., SARS-CoV-2 Viral Load in Upper Respiratory Specimens of Infected Patients. N Engl J Med 382, 1177-1179 (2020).
12. W. Wang et al., Detection of SARS-CoV-2 in Different Types of Clinical Specimens. JAMA, (2020). 13. R. Wolfel et al., Virological assessment of hospitalized patients with COVID-2019. Nature 581, 465-469 (2020).
14. M. Andres-Terre et al., Integrated, Multi-cohort Analysis Identifies Conserved Transcriptional Signatures across Multiple Respiratory Viruses. Immunity 43, 1199-1211 (2015).
15. J. Bouquet et al., Longitudinal Transcriptome Analysis Reveals a Sustained Differential Gene Expression Signature in Patients Treated for Acute Lyme Disease. mBio 7, eOO 100-00116 (2016).
16. M. L. Landry, E. F. Foxman, Antiviral Response in the Nasopharynx Identifies Patients With Respiratory Virus Infection. J Infect Dis 217, 897-905 (2018).
17. N. M. Suarez et al., Superiority of transcriptional profiling over procalcitonin for distinguishing bacterial from viral lower respiratory tract infections in hospitalized adults. J Infect Dis 212, 213-222 (2015).
18. T. E. Sweeney, H. R. Wong, P. Khatri, Robust classification of bacterial and viral infections via integrated host gene expression diagnostics. Sci Transl Med 8, 346ra391 (2016).
19. E. L. Tsalik et al., Host gene expression classifiers diagnose acute respiratory illness etiology. Sci Transl Med 8, 322ra311 (2016).
20. C. W. Woods et al., A host transcriptional signature for presymptomatic detection of infection in humans exposed to influenza H1N1 or H3N2. PLoS One 8, e52198 (2013).
21. F. De Maio et al., Nasopharyngeal Microbiota Profiling of SARS-CoV-2 Infected Patients. Biol Proced Online 22, 18 (2020).
22. X. Wang, J. Park, K. Susztak, N. R. Zhang, M. Li, Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat Commun 10, 380 (2019).
23. F. McNab, K. Mayer-Barber, A. Sher, A. Wack, A. O'Garra, Type I interferons in infectious disease. Nat Rev Immunol 15, 87-103 (2015).
24. T. P. Salazar-Mather, K. L. Hokeness, Cytokine and chemokine networks: pathways to antiviral defense. Curr Top Microbiol Immunol 303, 29-46 (2006).
25. P. Bost et al., Host- Viral Infection Maps Reveal Signatures of Severe COVID-19 Patients. Cell 181, 1475-1488 el412 (2020).
26. R. L. Chua et al., COVID-19 severity correlates with airway epithelium-immune cell interactions identified by single-cell analysis. Nat Biotechnol 38, 970-979 (2020).
27. M. Liao et al., Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19. Nat Med 26, 842-844 (2020).
28. A. J. Wilk et al., A single-cell atlas of the peripheral immune response in patients with severe COVID-19. Nat Med 26, 1070-1076 (2020).
29. Y. Xiong et al., Transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in COVID-19 patients. Emerg Microbes Infect 9, 761-770 (2020).
30. D. Blanco-Melo et al., Imbalanced Host Response to SARS-CoV-2 Drives Development of COVID- 19. Cell 181, 1036-1045 el039 (2020). 31. X. Chen et al., Detectable serum SARS-CoV-2 viral load (RNAaemia) is closely correlated with drastically elevated interleukin 6 (IL-6) level in critically ill COVID-19 patients. Clin Infect Dis, (2020).
32. Y. Gao et al., Diagnostic utility of clinical laboratory data determinations for patients with the severe COVID-19. J Med Virol 92, 791-796 (2020).
33. B. J. Barnes et al., Targeting potential drivers of COVID-19: Neutrophil extracellular traps. J Exp Med 217, (2020).
34. D. Wang et al., Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China. JAMA, (2020).
35. M. Hoffmann et al., SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor. Cell 181, 271-280 e278 (2020).
36. C. G. K. Ziegler et al., SARS-CoV-2 Receptor ACE2 Is an Interferon-Stimulated Gene in Human Airway Epithelial Cells and Is Detected in Specific Cell Subsets across Tissues. Cell 181, 1016-1035 el019 (2020).
37. M. E. Dueck et al., Precision cancer monitoring using a novel, fully integrated, microfluidic array partitioning digital PCR platform. Sci Rep 9, 19606 (2019).
38. E. B. Popowitch, S. S. O'Neill, M. B. Miller, Comparison of the Biofire FilmArray RP, Genmark eSensor RVP, Luminex xTAG RVPvl, and Luminex xTAG RVP fast multiplex assays for detection of respiratory viruses. J Clin Microbiol 51, 1528-1533 (2013).
39. X. He et al., Temporal dynamics in viral shedding and transmissibility of COVID-19. Nat Med 26, 672-675 (2020).
40. Y. Liu et al., Viral dynamics in mild and severe cases of COVID-19. Lancet Infect Dis 20, 656-657 (2020).
41. S. Miller et al., Laboratory validation of a clinical metagenomic sequencing assay for pathogen detection in cerebrospinal fluid. Genome Res 29, 831-842 (2019).
42. S. N. Naccache et al., A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res 24, 1180-1192 (2014).
43. R. C. Team, R. F. f. S. Computing, Ed. (Vienna, Austria, 2018).
44. H. Wickham, Ggplot2 : elegant graphics for data analysis. Use R! (Springer, New York, 2009), pp. viii, 212 p.
45. S. W. Wingett, S. Andrews, FastQ Screen: A tool for multi-genome mapping and quality control. FlOOORes 7, 1338 (2018).
46. P. Ewels, M. Magnusson, S. Lundin, M. Kaller, MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047-3048 (2016).
47. A. Dobin, T. R. Gingeras, Mapping RNA-seq Reads with STAR. Curr Protoc Bioinformatics 51, 11 14 11-11 14 19 (2015).
48. Y. Liao, G. K. Smyth, W. Shi, featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923-930 (2014). 49. M. D. Robinson, D. J. McCarthy, G. K. Smyth, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140 (2010).
50. A. Kramer, J. Green, J. Pollard, Jr., S. Tugendreich, Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics 30, 523-530 (2014).
51. F. Pedregosa et al., Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 2825-2830 (2011).
52. T. Hastie, R. Tibshirani, M. Wainwright, Statistical learning with sparsity : the lasso and generalizations. Monographs on statistics and applied probability (CRC Press, Taylor & Francis Group, Boca Raton, 2015), pp. xv, 351 pages.
53. X. Robin et al., pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12, 77 (2011).

Claims

CLAIMS We claim:
1. A method for measuring gene expression, comprising the steps of:
(a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises:
(i) at least one gene selected from the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and
(ii) at least one gene selected from the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and
(b) identifying the test sample as having a gene expression profile of a SARS-CoV-2 infection when:
(i) the level of RNA expression of RSAD2, IFI6, IFI44L, EPSTI1, and/or SERPING1 is elevated, and/or RNA expression of ATP5G1, COX20, and/or TCN1 is reduced in the test sample in comparison with respective reference values; and
(ii) the level of RNA expression of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, and/or HSPA14 is elevated, and/or RNA expression of MTRNR2L6, SLC16A8, and/or BATF3 is reduced in the test sample in comparison with respective reference values, wherein identification of the gene expression profile is indicative of the human subject having a SARS-CoV-2 infection.
2. The method of claim 1, wherein the at least one gene of (a)(i) comprises 3, 4, 5, 6, 7 or all 8 genes of the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and/or wherein the at least one gene of (a)(ii) comprises 3, 4, 5, 6, 7, 8, 9, 10 or all 11 genes of the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3.
56
3. A method for measuring gene expression, comprising the steps of:
(a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises:
(i) at least one gene selected from the group consisting of RSAD2, SLC6A4, SHISA3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PROS1, SCGB1A1, and CST1; and
(ii) at least one gene selected from the group consisting of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFKFB4, POSTN, BORA, NUP35, DHFR, AMBP, ADRA2A, ZNF92, CYP2F1, PIFO, SNTN, ZNF469, ADH1C, FAM3D, SERTAD2, HSPA14, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, MTRNR2L6, ADGRL1, PLEKHA4, CD300E, ECSCR, SLC16A8, BATF3, TRPV5, GUK1, and TUBG2; and
(b) identifying the test sample as having a gene expression profile of a SARS-CoV-2 infection when:
(i) the level of RNA expression of RSAD2, SLC6A4, SHIS A3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, and/or SARS2 is elevated, and/or RNA expression of ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PROS1, SCGB1A1, and/or CST1 is reduced in the test sample in comparison with respective reference values; and
(ii) the level of RNA expression of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFKFB4, POSTN, BORA, NUP35, DHFR, AMBP, ADRA2A, ZNF92, CYP2F1, PIFO, SNTN, ZNF469, ADH1C, FAM3D, SERTAD2, HSPA14, ILVBL, PERP, UBE2I, and/or EHF is elevated, and/or RNA expression of MLKL, PTMS, MTRNR2L6, ADGRL1, PLEKHA4, CD300E, ECSCR, SLC16A8, BATF3, TRPV5, GUK1, and/or TUBG2 is reduced in the test sample in comparison with respective reference values, wherein identification of the gene expression profile is indicative of the human subject having a SARS-CoV-2 infection.
4. The method of claim 3, wherein the at least one gene of (a)(i) comprises a plurality of 3, 4, 5, 6, 7, 9, 10, or all 29 genes of the group consisting of RSAD2, SLC6A4,
57 SHISA3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PR0S1, SCGB1A1, and CST1; and/or wherein the at least one gene of (a)(ii) comprises a plurality of 3, 4, 5, 6, 7, 8, 9, 10 or 37 genes of the group consisting of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFKFB4, POSTN, BORA, NUP35, DHFR, AMBP, ADRA2A, ZNF92, CYP2F1, PIFO, SNTN, ZNF469, ADH1C, FAM3D, SERTAD2, HSPA14, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, MTRNR2L6, ADGRL1, PLEKHA4, CD300E, ECSCR, SLC16A8, BATF3, TRPV5, GUK1, and TUBG2.
5. A method for identifying whether a human subject has a SARS-CoV-2 infection or a non-viral acute respiratory illness, comprising the steps of:
(a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises three or more genes selected from the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and
(b) identifying the human subject as having a SARS-CoV-2 infection when the level of RNA expression of RSAD2, IFI6, IFI44L, EPSTI1, and/or SERPING1 is elevated, and/or RNA expression of ATP5G1, COX20, and/or TCN1 is reduced in the test sample in comparison with respective reference values; and/or identifying the human subject as having a non-viral acute respiratory illness when the level of RNA expression of RSAD2, IFI6, IFI44L, EPSTI1, and/or SERPING1 is reduced, and/or RNA expression of ATP5G1, COX20, and/or TCN1 is elevated in the test sample in comparison with the respective reference values.
6. A method for identifying whether a human subject has a SARS-CoV-2 infection or another viral acute respiratory illness, comprising the steps of:
(a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein the plurality of genes comprises three or more genes selected from the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and
58 (b) identifying the human subject as having a SARS-CoV-2 infection when the level of RNA expression of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, and/or HSPA14 is elevated, and/or RNA expression of MTRNR2L6, SLC16A8, and/or BATF3 is reduced in the test sample in comparison with respective reference values; and/or identifying the human subject as having another viral acute respiratory illness when the level of RNA expression of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, and/or HSPA14 is reduced, and/or RNA expression of MTRNR2L6, SLC16A8, and/or BATF3 is elevated in the test sample in comparison with the respective reference values.
7. The method of claim 6, wherein the other viral acute respiratory illness is associated with an infection with a virus selected from the group consisting of an influenza virus, a seasonal coronavirus, a rhinovirus, a metapneumovirus, and a parainfluenza virus.
8. The method of claim 7, wherein the human subject has symptoms of an acute respiratory illness when the nasopharyngeal test sample was obtained.
9. The method of claim 7, wherein the human subject does not have symptoms of an acute respiratory illness when the nasopharyngeal test sample was obtained.
10. The method of claim 1, wherein the respective reference values are determined from a nasopharyngeal control sample from a healthy human subject without symptoms of a respiratory illness, or wherein the respective reference values are average values determined from a plurality of nasopharyngeal control samples obtained from a plurality of healthy human subjects, and wherein the healthy human subject or subjects do not have an acute SARS-CoV-2 infection.
11. A method for measuring gene expression, comprising the steps of:
(a) measuring levels of RNA expression of a plurality of genes of cells from a nasopharyngeal test sample obtained from a human subject having or suspected of having a SARS-CoV-2 infection, wherein the plurality of genes comprises three or more genes selected from the group consisting of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP,
59 CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, P0LDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8, H3F3B, IL6R, RAB20, HSPA1B, MIR22HG, PPAP2B, TBC1D2, SRPK1, FGD4, RAB21, and CPD; and
(b) identifying the test sample as having a severe COVID-19 gene expression profile when the level of RNA expression of the plurality of genes is elevated in the test sample in comparison with respective reference values; and/or identifying the test sample as having a mild COVID-19 gene expression profile when the level of RNA expression of the plurality of genes is not elevated in the test sample in comparison with the respective reference values, wherein identification of the severe CO VID-19 infection gene expression profile is indicative of the human subject having a SARS-CoV-2 infection and having or developing severe COVID-19, wherein severe COVID-19 is associated with hospitalization, and wherein hospitalization comprises a stay within a hospital intensive care unit and/or mechanical ventilation.
12. The method of claim 11, wherein the plurality of genes comprises 4, 5, 6, 7, 8, 9, 10, or all 58 genes of the group consisting of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD 16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8, H3F3B, IL6R, RAB20, HSPA1B, MIR22HG, PPAP2B, TBC1D2, SRPK1, FGD4, RAB21, and CPD.
13. The method of claim 1, wherein the human subject has been exposed to a SARS- CoV-2-infected individual within 1-2 weeks of the nasopharyngeal test sample being obtained.
14. The method of claim 1, further comprising: obtaining the test sample prior to step (a).
15. The method of claim 1, further comprising: extracting RNA from the test sample prior to step (a).
60
16. The method of claim 1, further comprising performing or having performed a SARS-CoV-2 reverse transcriptase-polymerase chain reaction (RT-PCR) test on the nasopharyngeal test sample.
17. The method of claim 1, further comprising performing or having performed a SARS-CoV-2 antibody test on a blood sample obtained from the human subject.
18. The method of claim 1, further comprising step (c) treating the SARS-CoV-2- infected subject identified in step (b) by administering an effective amount of a COVID-19 therapeutic agent.
19. The method of claim 18, wherein the COVID-19 therapeutic agent comprises one or both of an antiviral agent and an immunotherapeutic agent.
20. The method of claim 19, wherein the COVID-19 therapeutic agent comprises an antiviral agent
21. The method of claim 20, wherein the antiviral agent comprises remdesivir, molnupiravir, and/or paxlovid (PF-07321332 and ritonavir). .
22. The method of claim 19, wherein the COVID-19 therapeutic agent comprises an immunotherapeutic agent.
23. The method of claim 22, wherein the immunotherapeutic agent comprises one or more of the group consisting of an interferon, convalescent plasma, hyperimmune plasma, and an anti-SARS-CoV-2 monoclonal antibody or SARS-CoV-2-binding fragment thereof.
24. The method of any one of claims 1-23, wherein step (a) comprises one or more of the group consisting of sequence analysis, hybridization, and amplification.
25. The method of any one of claims 1-23, wherein step (a) comprises: performing reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) on RNA extracted from the test sample.
26. The method of any one of claims 1-23, wherein step (a) comprises: hybridizing RNA extracted from the test sample to a microarray.
27. The method of any one of claims 1-23, wherein step (a) comprises: performing serial amplification of gene expression (SAGE) on RNA extracted from the test sample.
28. The method of any one of claims 1-23, wherein step (a) comprises targeted RNA expression resequencing comprising:
(i) preparing an RNA expression library for the plurality of genes from RNA extracted from the test sample;
(ii) sequencing a portion of at least 50,000 members of the library;
(iii) generating a read count for RNA expression of the plurality of genes by normalization to the sequence of the at least 50,000 members of step (ii).
29. The method of any one of claims 1-23, wherein step (a) comprises whole transcriptome shotgun sequencing (WTSS) comprising:
(i) preparing an RNA expression library for the plurality of genes from RNA extracted from the test sample;
(ii) sequencing a portion of at least 50,000 members of the library;
(iii) generating a read count for RNA expression of the plurality of genes by normalization to the sequence of the at least 50,000 members of step (ii).
30. A kit compri sing :
(a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7 or all 8 genes of the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1; and
(b) instructions for: (i) use of the oligonucleotides for measuring RNA expression of the plurality of genes; and (ii) identifying whether a human subject has a SARS-CoV-2 infection or a non-viral acute respiratory illness based on the levels of RNA expression of a the plurality of genes in cells from a nasopharyngeal test sample obtained from the human subject, wherein the human subject is suspected of having an acute respiratory illness.
31. The kit of claim 30, wherein the plurality of genes further comprises one or more genes selected from the group consisting of SLC6A4, SHISA3, FAM155B, SEMA7A, KIAA1614, TECTA, CXCL9, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, AZGP1, WFDC6, SDHAF4, FCGBP, BPIFB1, PROS1, SCGB1A1, CST1, and combinations thereof
32. The kit of claim 30, wherein the plurality of genes consists of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, and TCN1.
33. The kit of claim 31, wherein the plurality of gene consists of RSAD2, SLC6A4, SHISA3, IFI6, IFI44L, FAM155B, SEMA7A, KIAA1614, EPSTI1, TECTA, CXCL9, SERPING1, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, ATP5G1, AZGP1, WFDC6, SDHAF4, FCGBP, COX20, BPIFB1, TCN1, PROS1, SCGB1A1, and CST1.
34. A kit comprising:
(a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7, 8, 9, 10 or all 11 genes of the group consisting of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and
(b) instructions for: (i) use of the oligonucleotides for measuring RNA expression of the plurality of genes; and (ii) identifying whether a human subject has a SARS-CoV-2 infection or another viral acute respiratory illness based on the levels of RNA expression of a the plurality of genes in cells from a nasopharyngeal test sample obtained from the human subject, wherein the human subject is suspected of having an acute respiratory illness.
35. The kit of claim 34, wherein the plurality of genes further comprises one or more genes selected from the group consisting of ZC4H2, HMX1, POSTN, BORA, NUP35, DHFR, AMBP, ZNF92, CYP2F1, SNTN, ZNF469, ADH1C, FAM3D, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, ADGRL1, PLEKHA4, CD300E, ECSCR, TRPV5, GUK1, and TUBG2, and combinations thereof.
36. The kit of claim 34, wherein the plurality of genes consists of HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and B ATF3.
63
37. The kit of claim 35, wherein the plurality of gene consists of ZC4H2, HEPACAM2, HMX1, TMEM229A, PLD4, PFKFB4, POSTN, BORA, NUP35, DHFR, AMBP, ADRA2A, ZNF92, CYP2F1, PIFO, SNTN, ZNF469, ADH1C, FAM3D, SERTAD2, HSPA14, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, MTRNR2L6, ADGRL1, PLEKHA4, CD300E, ECSCR, SLC16A8, BATF3, TRPV5, GUK1, and TUBG2.
38. A kit compri sing :
(a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 or all 19 genes of the group consisting of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, TCN1, HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3; and
(b) instructions for: (i) use of the oligonucleotides for measuring RNA expression of the plurality of genes; and (ii) identifying whether the test sample has a gene expression profile of a SARS-CoV-2 infection based on the levels of RNA expression of a the plurality of genes in cells from a nasopharyngeal test sample obtained from a human subject suspected of having an acute respiratory illness, wherein identification of the gene expression profile is indicative of the human subject having a SARS-CoV-2 infection.
39. The kit of claim 38, wherein the plurality of genes further comprises one or more genes selected from the group consisting of SLC6A4, SHISA3, FAM155B, SEMA7A, KIAA1614, TECTA, CXCL9, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, AZGP1, WFDC6, SDHAF4, FCGBP, BPIFB1, PROS1, SCGB1A1, CST1, ZC4H2, HMX1, POSTN, BORA, NUP35, DHFR, AMBP, ZNF92, CYP2F1, SNTN, ZNF469, ADH1C, FAM3D, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, ADGRL1, PLEKHA4, CD300E, ECSCR, TRPV5, GUK1, and TUBG2, and combinations thereof.
40. The kit of claim 38, wherein the plurality of genes consists of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, TCN1, HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, and BATF3.
64
41. The kit of claim 39, wherein the plurality of genes consists of RSAD2, IFI6, IFI44L, EPSTI1, SERPING1, ATP5G1, COX20, TCN1, HEPACAM2, TMEM229A, PLD4, PFKFB4, ADRA2A, PIFO, SERTAD2, HSPA14, MTRNR2L6, SLC16A8, BATF3, SLC6A4, SHISA3, FAM155B, SEMA7A, KIAA1614, TECTA, CXCL9, HRASLS2, RGS1, IRF8, FAM71F2, C1QC, SARS2, AZGP1, WFDC6, SDHAF4, FCGBP, BPIFB1, PROS1, SCGB1A1, CST1, ZC4H2, HMX1, POSTN, BORA, NUP35, DHFR, AMBP, ZNF92, CYP2F1, SNTN, ZNF469, ADH1C, FAM3D, ILVBL, PERP, UBE2I, EHF, MLKL, PTMS, ADGRL1, PLEKHA4, CD300E, ECSCR, TRPV5, GUK1, and TUBG2.
42. A kit comprising:
(a) a plurality of oligonucleotides which hybridize to a plurality of genes comprising at least 3, 4, 5, 6, 7, 8, 9, 10, 11 or all 58 genes of the group consisting of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD 16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3, CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, POLDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8, H3F3B, IL6R, RAB20, HSPA1B, MIR22HG, PPAP2B, TBC1D2, SRPK1, FGD4, RAB21, and CPD; and
(b) instructions for: (i) use of the oligonucleotides for measuring RNA expression of the plurality of genes; and (ii) identifying whether the test sample has a severe CO VID-19 gene expression profile based on the levels of RNA expression of a the plurality of genes in cells from a nasopharyngeal test sample obtained from a human subject having or suspected of having a SARS-CoV-2 infection, wherein identification of the severe CO VID-19 gene expression profile is indicative of the human subject having a severe SARS-CoV-2 infection and having or developing severe COVID- 19, wherein severe COVID-19 is associated with hospitalization, and wherein hospitalization comprises a stay within a hospital intensive care unit and/or mechanical ventilation.
43. The kit of claim 42, wherein the plurality of genes consists of CXCL8, PHACTR1, GRIN2C, CXCL2, G0S2, PLAUR, CXCR4, KRT6A, FBXL7, CARD 16, ZNF267, GPR65, PPIF, CSF1, LCP2, LPCAT1, SOD2, FCER1G, CD93, ZNF438, C5AR1, FTH1, IER3,
65 CREM, NINJ1, CSGALNACT2, AGPS, IVNS1ABP, CDC42EP3, GK, ZEB2, HSPA1A, CXCL1, GPX3, IFNGR1, P0LDIP2, DNAJB1, PI3, NEDD9, VEGFA, IL1R1, ATG2A, DOCK4, THBS1, ZFYVE16, SAT1, PNPLA8, H3F3B, IL6R, RAB20, HSPA1B, MIR22HG, PPAP2B, TBC1D2, SRPK1, FGD4, RAB21, and CPD.
44. The method of any one of claims 1-23, wherein step (a) further comprises: iii) measuring levels of RNA expression of at least one control gene, wherein the at least one control gene is selected from the group consisting of PMM1, RAC1, RPP30, ACTB, and HSPD1; and iv) normalizing levels of RNA expression of the plurality of genes of (i) and (ii) to levels of RNA expression of the at least one control gene.
45. The kit of any one of claims 30-42, wherein the kit further comprises: at least one control oligonucleotide which hybridize to at least one control gene, wherein the at least one control gene is selected from the group consisting of PMM1, RAC1, RPP30, ACTB, and HSPD; and instruction for use of the at least one control oligonucleotide to normalize levels of RNA expression of the plurality of genes.
66
PCT/US2021/062474 2020-12-09 2021-12-08 Analysis of host gene expression for diagnosis of severe acute respiratory syndrome coronavirus 2 infection WO2022125702A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063123389P 2020-12-09 2020-12-09
US63/123,389 2020-12-09

Publications (1)

Publication Number Publication Date
WO2022125702A1 true WO2022125702A1 (en) 2022-06-16

Family

ID=81972773

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/062474 WO2022125702A1 (en) 2020-12-09 2021-12-08 Analysis of host gene expression for diagnosis of severe acute respiratory syndrome coronavirus 2 infection

Country Status (1)

Country Link
WO (1) WO2022125702A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180305760A1 (en) * 2015-09-30 2018-10-25 Immunexpress Pty Ltd Pathogen biomarkers and uses therefor
WO2019108549A1 (en) * 2017-11-28 2019-06-06 The Regents Of The University Of California Assays for detection of acute lyme disease
WO2019236768A1 (en) * 2018-06-05 2019-12-12 Washington University Nasal genes used to identify, characterize, and diagnose viral respiratory infections

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180305760A1 (en) * 2015-09-30 2018-10-25 Immunexpress Pty Ltd Pathogen biomarkers and uses therefor
WO2019108549A1 (en) * 2017-11-28 2019-06-06 The Regents Of The University Of California Assays for detection of acute lyme disease
WO2019236768A1 (en) * 2018-06-05 2019-12-12 Washington University Nasal genes used to identify, characterize, and diagnose viral respiratory infections

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ADRIANUS CM BOON;ROBERT W WILLIAMS;DAVID S SINASAC;RICHARD J WEBBY: "A novel genetic locus linked to pro-inflammatory cytokines after virulent H5N1 virus infection in mice", BMC GENOMICS, BIOMED CENTRAL LTD, LONDON, UK, vol. 15, no. 1, 24 November 2014 (2014-11-24), London, UK , pages 1017, XP021204554, ISSN: 1471-2164, DOI: 10.1186/1471-2164-15-1017 *

Similar Documents

Publication Publication Date Title
US11466331B2 (en) RNA determinants for distinguishing between bacterial and viral infections
EP3356558B1 (en) Sirs pathogen biomarkers and uses therefor
JP2023138990A (en) Methods for diagnosis of sepsis
US11041206B2 (en) Biomarkers for inflammatory bowel disease
US20200255898A1 (en) Diagnostic assay for source of inflammation
US20180245154A1 (en) Methods to diagnose and treat acute respiratory infections
EP2931923A1 (en) Blood transcriptional signatures of active pulmonary tuberculosis and sarcoidosis
JP2008538007A (en) Diagnosis of sepsis
US20110312521A1 (en) Genomic Transcriptional Analysis as a Tool for Identification of Pathogenic Diseases
WO2015048098A1 (en) Diagnostic methods for infectious disease using endogenous gene expression
US20190194728A1 (en) Systemic inflammatory and pathogen biomarkers and uses therefor
US20230399698A1 (en) Assays for detection of acute lyme disease
Zerbib et al. Pathway mapping of leukocyte transcriptome in influenza patients reveals distinct pathogenic mechanisms associated with progression to severe infection
WO2022125702A1 (en) Analysis of host gene expression for diagnosis of severe acute respiratory syndrome coronavirus 2 infection
Rodriguez et al. Genomic, Metagenomic and Transcriptomic Characterization of the Clinical Forms of COVID-19: A Comparative Cross-Sectional Study
Yang et al. Evaluation of IFIT3 and ORM1 as biomarkers for discriminating active tuberculosis from latent infection
O’Neill et al. Basal Expression of Interferon-Stimulated Genes Drives Population Differences in Monocyte Susceptibility to Influenza Infection
Goh An integrated metagenomic approach to investigating disease heterogeneity in sepsis due to community-acquired pneumonia
US10793909B2 (en) Methods for predicting the survival time of patients with decompensated alcoholic cirrhosis
Taghizadeh et al. COVID-19; History, Taxonomy, and Diagnostic Molecular and Immunological Techniques
WO2024015879A1 (en) Gene expression-based identification of early lyme disease
WO2024054572A1 (en) Methods of detecting sjögren&#39;s syndrome using salivary exosomes
IL285031A (en) Diagnosing inflammatory bowel diseases
NZ750396B2 (en) Biomarkers for inflammatory bowel disease
Cathomas et al. Two distinct immunopathological profiles in autopsy lungs of COVID-19

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21904348

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21904348

Country of ref document: EP

Kind code of ref document: A1