WO2021243149A2 - Protein covariance networks reveal interactions important to the emergence of sars coronaviruses as human pathogens - Google Patents

Protein covariance networks reveal interactions important to the emergence of sars coronaviruses as human pathogens Download PDF

Info

Publication number
WO2021243149A2
WO2021243149A2 PCT/US2021/034753 US2021034753W WO2021243149A2 WO 2021243149 A2 WO2021243149 A2 WO 2021243149A2 US 2021034753 W US2021034753 W US 2021034753W WO 2021243149 A2 WO2021243149 A2 WO 2021243149A2
Authority
WO
WIPO (PCT)
Prior art keywords
amino acid
seq
acid sequence
identity
domain comprises
Prior art date
Application number
PCT/US2021/034753
Other languages
French (fr)
Other versions
WO2021243149A3 (en
Inventor
John J. Mekalanos
William P. ROBINS
Original Assignee
President And Fellows Of Harvard College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by President And Fellows Of Harvard College filed Critical President And Fellows Of Harvard College
Publication of WO2021243149A2 publication Critical patent/WO2021243149A2/en
Publication of WO2021243149A3 publication Critical patent/WO2021243149A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/005Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from viruses
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K39/00Medicinal preparations containing antigens or antibodies
    • A61K39/12Viral antigens
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/50Fusion polypeptide containing protease site
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2770/00MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssRNA viruses positive-sense
    • C12N2770/00011Details
    • C12N2770/20011Coronaviridae
    • C12N2770/20022New viral proteins or individual genes, new structural or functional aspects of known viral proteins or genes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2770/00MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssRNA viruses positive-sense
    • C12N2770/00011Details
    • C12N2770/20011Coronaviridae
    • C12N2770/20034Use of virus or viral component as vaccine, e.g. live-attenuated or inactivated virus, VLP, viral protein

Definitions

  • fusion polypeptides, polynucleotides, vaccine compositions, and methods provided herein are based, in part, on the discovery of covariant residues within these six core viral proteins that are responsible for the evolution and infection of human hosts by the pathogenic SARS-CoV2 virus.
  • the method of selecting antigens based on the covariance analysis provided herein can identify and generate fusion polypeptides as preventive therapeutics for pathogenic infections.
  • a fusion polypeptide comprising: a first domain comprising a first viral polypeptide or a fragment thereof expressed by a first virus, wherein the first viral polypeptide or fragment thereof comprises at least two or more covarying amino acid positions.
  • a fusion polypeptide comprising: (a) a first domain comprising a first viral polypeptide or a fragment thereof expressed by a first virus; and (b) a second domain comprising a second viral polypeptide or a fragment thereof expressed by the first virus, and wherein the first viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the second viral polypeptide or a fragment thereof.
  • a composition comprising a fusion polypeptide or a polynucleotide provided herein.
  • the composition is a vaccine composition.
  • the fusion polypeptide described herein are immunogenic can induce an antigen specific immune response. Accordingly, in another aspect, provided herein is a method of inducing an immune response in a subject using the fusion polypeptide described herein or a polynucleotide encoding the same.
  • the method comprises administering to the subject a fusion polypeptide provided herein or a polynucleotide encoding the same in an amount effective to produce an antigen specific immune response.
  • cell comprising a fusion polypeptide or a polynucleotide provided herein.
  • FIGS. 2A-2C show predicted covariant residues in different CoVs by protein.
  • FIG. 2A Mapped covariant residues in 1a/1b polyproteins, S, 3a, E,M , and N proteins from the bat CoVs found in SARS-CoV-2. conserveed residues found in other groups are mapped as a reference.
  • FIG. 2B Graph showing a progression of Clusters and residue content more abundant in Bat CoVs to those more restricted in SARS-CoV2 and closely related viruses.
  • FIG. 2C Residues found to covary that are restricted to categories. Those with overlap and are conserved are shown as “Conserved A”.
  • FIG.3 shows phylogeny of the 850 CoVs. Maximum Likelihood Tree using the aligned amino acids from 1a/1b, S, 3a, M, E, and N proteins from bat, pangolin, civet, and human CoVs.
  • FIGS.4A-4D show covariants in S in respect to the position of recognized subdomains and motifs.
  • FIG. 4A Recognized subdomains and motifs in the SARS-CoV-2 Spike (S) protein.
  • FIG. 4B Covarying residues predicted to be interacting in the S protein. Dominant B and T epitopes and residues predicted to interact within the Spike structure are indicated (FIG.
  • FIG.4C discloses SEQ ID NOS 69, 87, 88, 70, 70, 71, 89, and 72, respectively, in order of appearance.
  • FIG. 4D Shows the alignment of covariants with the S receptor-binding domain. Aligned SAS- CoV and SARS-CoV2 sequences in the Spike RBD showing sequence conservation (blue), covariant residues (red arrows), contacts with ACE-2, and contacts between Ab R80 and SARS- CoV.
  • FIG.4D discloses SEQ ID NOS 73 and 74, respectively, in order of appearance. [0017] FIGS.
  • FIG.5A-5B show plotting of predicted covariant and interacting residues in clinical strains and highlighted residues in S.
  • FIG.5A 796 Covarying residue pairs detected in the clinical isolates. Abundance is indicated by hue. High abundance is that both residues vary in at least 4% in the population, mid-abundance is that least on residue is varied at least 4%, and low-abundance is that neither less vary than 4%. Overlap pairs are those covariant residues detected in both the 850 CoVs and the clinical strains.
  • FIG.5B Covarying and interacting residues detected in clinical isolates and within 3a and S protein. Clinical residues are colored blue and those predicted to interact in the S protein from the PDB (6VXX) are orange.
  • FIGS. 6A-6D show covariant residues in 3a domains and observed variance in the 3a ectodomain.
  • FIG. 6A Diagram showing the positions of extracellular, cytoplasmic, and transmembrane domains (TMD).
  • FIG. 6B Comparison of SARS-CoV-2 and SARS-CoV residues in the 3a N-terminal ectodomain and arrows from (A) showing covariance. Distinct residues are colored blue.
  • FIG. 6B discloses SEQ ID NOS 75 and 76, respectively, in order of appearance.
  • FIG.6C Consensus sequence of 3a (SEQ ID NO: 77) derived from the alignment of CoVs in this work and all different substitutions observed each position. Variable residues are colored red.
  • FIG. 6D BepiPred-2.0 epitope prediction using residues in the 3a ectodomain.
  • FIG.6D discloses SEQ ID NOS 78 and 79, respectively, in order of appearance.
  • FIGS. 7A-7C show covariants that emerged during SARS-CoV and SARS-CoV2.
  • FIG. 7A Gephi Force map inset showing Cluster 126 and highlighted genomes that possess the Cluster 126 residues. The ML phylogenic tree showing the relative relationship of circled strains is included as a reference.
  • FIG. 7B Alignment of strains to both SARS-CoV and SARS-CoV2 showing the order and diversity of these 8 residues in S protein.
  • FIG. 7A Gephi Force map inset showing Cluster 126 and highlighted genomes that possess the Cluster 126 residues.
  • the ML phylogenic tree showing the relative relationship of circled strains is included as a reference.
  • FIG. 7B Alignment of strains to both SARS-CoV and SARS-Co
  • FIG. 7B discloses SEQ ID NOS 80, 81, 90, 82, 83, 84, 85, 86, and 91, respectively, in order of appearance.
  • FIG.7C Structures of both SARS-CoV2 (PDB 6VXX) and SARS-CoV2 (6ACC) showing highlighted residues belonging to both. Q23 is missing from the SARS-CoV2 and the closest residue A27 was labeled to provide an approximate location.
  • FIG.8 shows a summary of the method of identifying covariance of SARS-CoV2.
  • FIG. 9 shows a model of 3a N-terminal domain and S N-terminal domain and furin cleavage domain fusion.
  • FIG. 7B shows SEQ ID NOS 80, 81, 90, 82, 83, 84, 85, 86, and 91, respectively, in order of appearance.
  • FIG.7C Structures of both SARS-CoV2 (PDB 6VXX) and SARS-CoV2 (6ACC) showing highlighted residue
  • FIG. 10 shows a model of two distinct 3a-S fusions incorporating 3a N-terminal domain and S N-terminal domain and furin cleavage domain. Both constructs possess either a G or D amino acid residue at the SARS-CoV-2 614 position in the S protein cleavage domain (not shown).
  • FIG.11 shows structure model of two distinct 3a-S fusions incorporating 3a N-terminal domain (NTD) and S NTD and furin cleavage domain.
  • FIG. 12 shows a schematic representation of the strategy to display 3a-S fusion epitopes to generate different classes of antibodies independent antibodies. Two distinct 3a NTD fusion sites are shown.
  • FIG. 13A-13C show predicted covariant residues in the 3a and Spike (S) proteins.
  • FIG.13A Diagram showing the subdomains in regards to predicted cellular location and location of covariant residues in 3a.
  • FIG.13B Shows a predicted map of covariant relationships between residues in 3a and Spike (S). Dominant epitopes in Spike and residues that are predicted to interact within the Spike trimer structure are indicated for reference.
  • FIG. 13C Shows an alignment of the identities and patterns of covariant residues and amino acid charge and polarity information in 3a and Spike using the positions in SARS-CoV-2.
  • FIGS. 14A and 14B show comparison of the initial pan-sarbecovirus covariant pair analysis using ⁇ 60 distinct ⁇ -Covs with covariant analysis of ⁇ 75,000 clinical SARS-CoV-2 genomes is completed for Spike and 3a.
  • FIG.14A All 11,394 covariant pair residues are mapped and colored by independent occurrence of distinct amino acid changes. Linkages showing the most highly enriched and varied independent covariant pairs are indicated by lines darker in blue color.
  • FIG.14 Covariant residues with at least two distinct changes are shown.
  • FIGS.15A-15D show covariant pairs in Spike and 3a.
  • FIG.15A All covariant pairs in Spike and 3a identified using the ⁇ 75,000 clinical SARS-CoV-2 genomes.
  • FIG. 15B The top 50 most abundant covariant pairs are shown and linkages are colored red for those within the Spike NTD and blue for those within the CTD.
  • FIG. 15C All covariant pairs are shown and those interacting with residues within the same domains of Spike (NTD, RBD, and CTD) and those within 3a are colored.
  • FIGS. 16A-16C show the overlapping covariant pair linkages between residues in Spike and 3a.
  • FIG.16A The overlapping covariant pair linkages between residues in Spike and 3a are shown as darker lines when the pan-sarbecovirus and clinical strains are compared. Mutations identified in dominant SARS-CoV-2 variants are indicated and labeled by arrows.
  • FIGS. 16A-17D show the most abundant covariant residues in Spike and 3a.
  • FIG. 17C discloses SEQ ID NOS 75 and 75, respectively, in order of appearance (FIG. 17D) Variable amino acid residues in related SARS-CoVs identified in the 3a NTD found in the pan-sarbecovirus analysis are shown in red and all amino acid variations are shown below the sequence. As a reference, comparison of the amino acid residue sequence of 3a NTD between SARS-CoV-2 and SARS-CoV is also shown. All covariant residues identified in the pan- sarbecovirus are colored blue.
  • FIG. 17D discloses SEQ ID NOS 75 and 75, respectively, in order of appearance
  • FIGS. 18A is a schematic representation showing covariant residue-enriched regions of 3a and Spike labeled as Part A, Part B, and Part C.
  • FIG.18B is a schematic representation showing various covariant interactions between and within Part A, Part B, and Part C (FIG. 18A) based on both pan-sarbecovirus and clinical covariance.
  • FIG. 18A is a schematic representation showing various covariant interactions between and within Part A, Part B, and Part C (FIG. 18A) based on both pan-sarbecovirus and clinical covariance.
  • FIG. 18C is a schematic representation of some exemplary antigens for producing antibodies that disrupt interactions between Parts A, B, and C in FIG.18B.
  • FIG. 18D is a schematic representation of exemplary monomeric and trimeric antigen (fusion proteins) that are either purified or instead expressed or encoded in DNA or mRNA.
  • fusion polypeptides, polynucleotides, vaccine compositions, and methods provided herein are based, in part, on the discovery of covariant residues within these six core viral proteins that are responsible for the evolution and infection of human hosts by the pathogenic SARS-CoV2 virus.
  • Viruses are small infectious agents which generally contain a nucleic acid core and a protein coat, but are not independently living organisms. Viruses can also take the form of infectious nucleic acids lacking a protein. A virus cannot replicate in the absence of a living host cell. Viruses enter specific living cells either by endocytosis or direct injection of DNA and multiply, causing disease. The multiplied virus can then be released and infect additional cells. Some viruses are DNA-containing viruses and others are RNA-containing viruses.
  • viruses that have been found to infect in humans include but are not limited to: Coronaviridae (e.g., coronaviruses); Retroviridae (e.g., human immunodeficiency viruses, such as HIV-1 (also referred to as HTLV-III, LAV or HTLV-III/LAV, or HIV-III; and other isolates, such as HIV-LP); Picornaviridae (e.g., polio viruses, hepatitis A virus; enteroviruses, human Coxsackie viruses, rhinoviruses, echoviruses); Calciviridae (e.g., strains that cause gastroenteritis); Togaviridae (e.g., equine encephalitis viruses, rubella viruses); Flaviviridae (e.g., dengue viruses, encephalitis viruses, yellow fever viruses); Hepacivruses (hepatitis C viruses); Rhabdoviridae (e.g., ve
  • compositions and methods provided herein are not limited to provoking an immune response to a virus but can also be used to provoke an immune response to other microorganisms (e.g., bacteria, fungi, parasites, etc.) as well.
  • microorganisms e.g., bacteria, fungi, parasites, etc.
  • Other medically relevant microorganisms have been described extensively in the literature, e.g., see Murray et al. Medical Microbiology, 9 th ed., published March 10, 2020, (eBook ISBN: 9780323674515); Tortora et al. Microbiology: An Introduction, Pearson; 13th edition (January 8, 2018); Topley & Wilson’s Microbiology and Microbial Infections, 10 th edition, John Wiley & Sons, Ltd.
  • infection or “infection of a host” or “infectious disease” or “microbial infection” refers to the growth, proliferation, spread, and/or presence of a microorganism in a subject. In some cases, the infection can elicit an immune response by the host that leads to symptoms associated with a disease. The infection can be transmitted from one subject to another by contact, contact with aerosolized liquid droplets (coughing, sneezing, etc.), contaminated needles, contaminated bodily fluids, or via sexual transmission.
  • aerosolized liquid droplets coughing, sneezing, etc.
  • the infection can be characterized by at least one symptom of a disease, such as pain, increased mucosal secretions, coughing, headaches, abnormalities of the skin, fever, sore throat, swollen lymph nodes, hair loss, muscle aches, sores, or any other symptom associated with an infection.
  • a disease such as pain, increased mucosal secretions, coughing, headaches, abnormalities of the skin, fever, sore throat, swollen lymph nodes, hair loss, muscle aches, sores, or any other symptom associated with an infection.
  • Exemplary infections or infectious diseases include but are not limited to: SARS, COVID19, coronavirus infections, acquired immune deficiency syndrome (AIDS), hepatitis, candidiasis, human papillomavirus (HPV) infection, herpes, influenza, pneumonia, ear infections, the common cold, chicken pox, cat scratch disease, rabies, adenovirus, bronchiolitis, croup, encephalitis, fifth disease, hand foot and mouth disease, impetigo, botulism, listeria infection, MRSA infection, measles, meningitis, mumps, polio, Rocky Mountain Spotted Fever, shingles, sinusitis, staph infections, tetanus, toxic shock syndrome, urinary tract infections, warts, whooping cough, Zika virus infections, or any other infection caused by a microorganism known in the art.
  • AIDS acquired immune deficiency syndrome
  • HPV human papillomavirus
  • coronavirus RNA viruses that are distinguished from other RNA viruses by an intracellular budding site. Coronaviruses are also characterized by their petal-shaped spikes. The spikes are oligomers of the 180–200 kDa S glycoprotein that binds to receptor glycoproteins and induce fusion of the viral envelope with cell membranes and, sometimes, cell–cell fusion with a host cell (e.g., a human cell).
  • coronaviruses The basic structure and genome organization of coronaviruses are known in the art and described, e.g., by Payne S. Family Coronaviridae. Viruses. 2017;149-158. doi:10.1016/B978-0-12-803109-4.00017-9, which is incorporated herein by reference in its entirety. [0041] Several medications have been used for the treatment of an infection (e.g., a viral infection) or to prevent a severe infection have been developed.
  • an infection e.g., a viral infection
  • Treatments for infections can include (1) vaccines comprising inactivated virus or bacterial cells, (ii) a live attenuated vaccine containing genetically manipulated viruses, (iii) fusion polypeptides, (iv) vaccine compositions comprising nucleic acids that promote an immune response to a pathogen, and (v) antibiotics and antiviral medications administered following infection.
  • the compositions and methods provided herein can be used to prevent a viral infection in a subject.
  • the subject can be administered the fusion polypeptide or vaccine compositions provided herein to provoke an immune response in the subject.
  • Fusion polypeptide compositions [0043] Provided herein are fusion polypeptide compositions for use in treating and preventing an infection.
  • a polypeptide e.g., a fusion polypeptide comprising: a first domain comprising a first viral polypeptide or a fragment thereof expressed by a first virus.
  • the first viral polypeptide or fragment thereof comprises at least two or more covarying amino acid positions.
  • the first domain comprises at least two amino acid residue positions that covary with each other.
  • a fusion polypeptide comprising: (a) a first domain comprising a first viral polypeptide or a fragment thereof expressed by a first virus; and (b) a second domain comprising a second viral polypeptide or a fragment thereof expressed by the first virus.
  • the first viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the second viral polypeptide or a fragment thereof.
  • the first and second domain each comprises at least one amino acid position that covaries with at least one amino acid position in the other domain.
  • a fusion polypeptide comprising: at least one viral polypeptide or a fragment thereof that is derived from a first pathogenic virus that infects a human subject, wherein the viral polypeptide or fragment thereof comprises at least two or more covarying amino acid sites when compared to a viral polypeptide expressed by one or more of a virus that infects a non-human subject, and/or a viral polypeptide expressed by a different pathogenic virus that infects a human subject.
  • the fusion polypeptide provided herein further comprises a second domain comprising a second viral polypeptide or a fragment thereof expressed by the first virus, optionally, the second viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the first viral polypeptide or a fragment thereof.
  • the second viral polypeptide or fragment thereof comprises at least two or more amino acid positions that covary with each other.
  • the second viral polypeptide or a fragment thereof comprises at least two or more amino acid positions that covary with each other.
  • the fusion polypeptide further comprises a third domain comprising a third viral polypeptide or a fragment thereof expressed by the first virus.
  • the third viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the first and/or second viral polypeptide or a fragment thereof.
  • the third viral polypeptide or fragment thereof comprises at least two or more amino acid positions that covary with each other.
  • the fusion polypeptide further comprises a fourth domain comprising an amino acid sequence of the first, second or third domain.
  • the covarying amino acid positions are determined using a correlating tandem model, optionally, the tandem model purity threshold is a level greater than or equal to 0.80. In some embodiments, the tandem model purity threshold is 0.80 or more, 0.85 or more, 0.90 or more, or 0.95 or more, or 0.99 or more. [0054] In some embodiments of any of the aspects, the covarying amino acid positions are relative to a viral polypeptide expressed by a second virus. In some embodiments of any of the aspects, the first virus is capable of infecting a human host and the second virus is capable of infecting a non-human host or the second virus is a different virus capable of infecting a human host.
  • the first and/or second virus are from the virus same family. Accordingly, in some embodiments of any one of the aspects, the first and second virus independently are from a virus family selected from the group consisting of abyssoviridae, ackermannviridae, adenoviridae, alloherpesviridae, alphaflexiviridae, alphasatellitidae, alphatetraviridae, alvernaviridae, amalgaviridae, amnoonviridae, ampullaviridae, anelloviridae, arenaviridae, arteriviridae, artoviridae, ascoviridae, asfarviridae, aspiviridae, astroviridae, autographiviridae, avsunviroidae, bacilladnaviridae, baculoviridae, barnaviridae
  • the first and/or second virus are from the coronaviridae family.
  • the first and/or second virus are selected from the group consisting of hepadnaviruses, coronaviruses, avian influenza viruses, adenoviruses, herpesviruses, human papillomaviruses, parvoviruses, reoviruses, picornaviruses, flaviviruses, togaviruses, orthomyxoviruses, bunyaviruses, rhabdoviruses, and paramyxoviruses.
  • the first and second virus are from the same virus genus.
  • the first and/or second viruses are independently from the genus selected from the group consisting of alphacoronavirus, betacoronavirus, gammacoronavirus and deltacoronavirus.
  • the first and/or second virus are from the genus betacoronavirus.
  • the first virus is from a first genus and the second virus is from a different genus of the same virus family.
  • the first virus is from the genus betacoronavirus and the second virus is from the genus alphacoronavirus.
  • the first and second virus are from the same virus species.
  • the first and/or second virus are independently selected from the group consisting of Severe acute respiratory syndrome-related coronavirus (SARS-CoV, SARS-CoV-2), Middle East respiratory syndrome-related coronavirus (MERS), Human coronavirus HKU1, Human coronavirus OC43, Bovine Coronavirus, Hedgehog coronavirus 1, Murine coronavirus, Pipistrellus bat coronavirus HKU5, Rousettus bat coronavirus HKU9, and Tylonycteris bat coronavirus HKU4.
  • SARS-CoV Severe acute respiratory syndrome-related coronavirus
  • SARS-CoV-2 Middle East respiratory syndrome-related coronavirus
  • MERS Middle East respiratory syndrome-related coronavirus
  • Human coronavirus HKU1 Human coronavirus OC43
  • Bovine Coronavirus Hedgehog coronavirus 1
  • Murine coronavirus Pipistrellus bat coronavirus HKU5
  • the first and/or second virus independently are SARS- CoV, SARS-CoV-2, or MERS.
  • the first and/or second virus are SARS-CoV or SARS- CoV2.
  • the first and/or second viruses are independently selected from the group consisting of adeno-associated virus; Aichi virus; astrovirus; Australian bat lyssavirus; BK polyomavirus; Banna virus; Barmah forest virus; Bunyamwera virus; Bunyavirus La Crosse; Bunyavirus snowshoe hare; Cercopithecine herpesvirus; Chandipura virus; Chikungunya virus; Cosavirus A; Cowpox virus; Coxsackie A virus; Coxsackie B virus; Crimean- Congo hemorrhagic fever virus; Dengue virus; Dhori virus; Dugbe virus; Duvenhage virus; Eastern equine encephalitis virus; Ebola
  • louis encephalitis virus Tick-borne powassan virus; Torque teno virus; Toscana virus; Uukuniemi virus; Vaccinia virus; Varicella-zoster virus; Variola virus; Venezuelan equine encephalitis virus; Vesicular stomatitis virus; Western equine encephalitis virus; WU polyomavirus; West Nile virus; Yaba monkey tumor virus; Yaba-like disease virus; Yellow fever.
  • the first and second virus are capable of infecting the same host species.
  • the first and second virus are capable of infecting humans.
  • the first and second virus infect different host species.
  • the first virus infects a human host and the second virus infects a non- human host.
  • the non-human host can be selected from the group consisting of: a bat, a pangolin, a civet, an insect, a non-human primate, a rodent, a bovine, a bird, and an alpaca.
  • the first and second viruses are different isolates of the same virus species.
  • the viral polypeptide or fragment thereof is selected from Table 1.
  • SARS-COV2 antigens and fragments thereof comprising at least two covarying amino acid sites with a coronavirus that infects a nonhuman organism.
  • the virus that infects a non-human subject or the different pathogenic virus that infects a human subject is selected from the group consisting of: Table 2.
  • the first viral polypeptide or fragment thereof is a corona virus polypeptide or a fragment thereof.
  • the first viral polypeptide or fragment thereof is selected from the group consisting of: the viroporin 3a protein, a non- structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), and spike (S) protein.
  • the first domain comprises an amino acid sequence of a viroporin 3a protein, a non- structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), spike (S) protein, or a fragment thereof from a corona virus.
  • the first domain comprises an amino acid sequence of a viroporin 3a protein or fragment thereof.
  • the first domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2.
  • the first domain comprises an amino acid sequence having 100% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein protein, e.g., amino acids 1-44 of SEQ ID NO: 2.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence SEQ ID NO: 2.
  • the first domain comprises an amino acid sequence comprising a substitution, mutation or deletion at position 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 32, 37, 38, 41, 42, 43 or 44 of SEQ ID NO: 2.
  • the first domain comprises an amino acid sequence comprising an amino acid sequence comprising a substitution, mutation or deletion at position 15, 16, 18, 20, 24, 25, 26, 28, or 38 of SEQ ID NO: 2.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence selected from SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence comprising a substitution, mutation or deletion at position 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 32, 37, 38, 41, 42, 43 or 44 of SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence comprising an amino acid sequence comprising a substitution, mutation or deletion at position 15, 16, 18, 20, 24, 25, 26, 28, or 38 of SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence selected from SEQ ID NOs: 20-40: 3a_Seq1 WT 3a sequence MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 20), 3a_Seq2 T32F Covariant selected and Removes Glycosylation-site MDLFMRIFTIGTVTLKQGEIKDATPSDFVRAFATIPIQASLPFG (SEQ ID NO: 21), 3a_Seq3 V13I Covariant selected MDLFMRIFTIGTITLKQGEIKDATP
  • the first domain comprises an amino acid sequence having 100% identity to an amino acid sequence selected from SEQ ID NOs: 20-40.
  • the first domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence of amino acids 12-339 of S protein, e.g., amino acids 12-339 of SEQ ID NO: 1.
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence of amino acids 12-320 of S protein, e.g., amino acids 12-339 of SEQ ID NO: 1.
  • the second domain comprises an amino acid sequence having 100% identity to the amino acid sequence of amino acids 12-339 of S protein, e.g., amino acids 12-339 of SEQ ID NO: 1.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence of amino acids 1-309 or 1-328 of SEQ ID NO: 41:
  • the first domain comprises an amino acid sequence having a mutation or deletion at position 4, 6, 7, 9, 10, 13, 15, 16, 19, 20, 23, 49, 54, 56, 58, 59, 60, 61, 62, 126, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 150, 163, 165, 172, 191, 229, 230, 231, 232, 233, 243, 244, 246 or 2
  • the first domain comprises an amino acid sequence of amino acids 1-309 of SEQ ID NO: 41 having a mutation or deletion at position 4, 6, 7, 9, 10, 13, 15, 16, 19, 20, 23, 49, 54, 56, 58, 59, 60, 61, 62, 126, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 150, 163, 165, 172, 191, 229, 230, 231, 232, 233, 243, 244, 246 or 248.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence of amino acids 1-310 or 1-329 of SEQ ID NO: 45: [0078] In some embodiments, the first domain comprises an amino acid sequence having a mutation or deletion at position 5, 7, 8, 10, 11, 14, 16, 17, 20, 21, 24, 50, 55, 57, 59, 60, 61, 62, 63, 127, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 151, 164, 166, 173, 192, 230, 231, 232, 233, 234, 244, 245, 247 and 249 of SEQ ID NO: 45.
  • the first domain comprises an amino acid sequence of amino acids 1-310 of SEQ ID NO: 45 having a mutation or deletion at position 5, 7, 8, 10, 11, 14, 16, 17, 20, 21, 24, 50, 55, 57, 59, 60, 61, 62, 63, 127, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 151, 164, 166, 173, 192, 230, 231, 232, 233, 234, 244, 245, 247 and 249.
  • Second domain [0080]
  • the second viral polypeptide or fragment thereof is a corona virus polypeptide or a fragment thereof.
  • the second viral polypeptide or fragment thereof is selected from the group consisting of: a viroporin 3a protein, a non-structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), and spike (S) protein.
  • the second domain comprises an amino acid sequence of a viroporin 3a protein, a non-structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), spike (S) protein, or a fragment thereof from a corona virus.
  • the second domain comprises an amino acid sequence of S protein or fragment thereof.
  • the second domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence of amino acids 12- 339 of S protein, e.g., amino acids 12-339 of SEQ ID NO: 1.
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence of amino acids 12-320 of S protein, e.g., amino acids 12-339 of SEQ ID NO: 1.
  • the second domain comprises an amino acid sequence having 100% identity to the amino acid sequence of amino acids 12-339 of S protein, e.g., amino acids 12-339 of SEQ ID NO: 1.
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence of amino acids 1-309 or 1-328 of SEQ ID NO: 41: [0082] In some embodiments, the second domain comprises an amino acid sequence having a mutation or deletion at position 4, 6, 7, 9, 10, 13, 15, 16, 19, 20, 23, 49, 54, 56, 58, 59, 60, 61, 62, 126, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 150, 163, 165, 172, 191, 229, 230, 231, 232, 233, 243, 244, 246 or 248 of SEQ ID NO: 41.
  • the second domain comprises an amino acid sequence of amino acids 1-309 of SEQ ID NO: 41 having a mutation or deletion at position 4, 6, 7, 9, 10, 13, 15, 16, 19, 20, 23, 49, 54, 56, 58, 59, 60, 61, 62, 126, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 150, 163, 165, 172, 191, 229, 230, 231, 232, 233, 243, 244, 246 or 248.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence of amino acids 1-310 or 1-329 of SEQ ID NO: 45: [0085] In some embodiments, the first domain comprises an amino acid sequence having a mutation or deletion at position 5, 7, 8, 10, 11, 14, 16, 17, 20, 21, 24, 50, 55, 57, 59, 60, 61, 62, 63, 127, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 151, 164, 166, 173, 192, 230, 231, 232, 233, 234, 244, 245, 247 and 249 of SEQ ID NO: 45.
  • the first domain comprises an amino acid sequence of amino acids 1-310 of SEQ ID NO: 45 having a mutation or deletion at position 5, 7, 8, 10, 11, 14, 16, 17, 20, 21, 24, 50, 55, 57, 59, 60, 61, 62, 63, 127, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 151, 164, 166, 173, 192, 230, 231, 232, 233, 234, 244, 245, 247 and 249.
  • the second domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence of amino acids 591-700, 514- 714, 514-794 or 514- 890 of S protein, e.g., SEQ ID NO: 1.
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence of amino 591-700, 514- 714, 514-794 or 514- 890 of S protein, e.g., SEQ ID NO: 1.
  • the second domain comprises an amino acid sequence having 100% identity to the amino acid sequence of amino acids 591-700, 514- 714, 514-794 or 514- 890 of S protein, e.g., SEQ ID NO: 1.
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence of amino acids 1-377, 1-281, 1-201 or 78-187 of SEQ ID NO: 42, 43 and 60-65:
  • the second domain comprises an amino acid sequence having at least one mutation or deletion at position 19, 41, 127, 164 or 175 of SEQ ID NO: 42, 43 or 60-65.
  • the second domain comprises the amino acid sequence of amino acids 1-281 of SEQ ID NO: 42, 43 or 60-65 having at least one mutation or deletion at position 19, 41, 127, 164 or 175.
  • the second domain comprises the amino acid sequence of amino acids 1-201 of SEQ ID NO: 42, 43 or 60-65 having at least one mutation or deletion at position 19, 41, 127, 164 or 175.
  • the second domain comprises the amino acid sequence of amino acids 78-187 of SEQ ID NO: 42, 43 or 60-65 having at least one mutation or deletion at position 19, 41, 127, 164 or 175.
  • the second domain comprises an amino acid sequence having 100% identity to an amino acid sequence of amino acids 1-377, 1-281, 1-201 or 78-187 of SEQ ID NO: 42, 43 or 60-65.
  • Third domain [0094]
  • the third viral polypeptide or fragment thereof is a corona virus polypeptide or a fragment thereof.
  • the third viral polypeptide or fragment thereof is selected from the group consisting of: a viroporin 3a protein, a non-structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), and spike (S) protein.
  • the third domain comprises an amino acid sequence of a viroporin 3a protein, a non- structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), spike (S) protein, or a fragment thereof from a corona virus.
  • the third domain comprises an amino acid sequence of S protein or fragment thereof.
  • the third domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence of amino acids 591-700, 514- 714, 514-794 or 514- 890 of S protein, e.g., amino acids 591-700, 514- 714, 514-794 or 514- 890 of SEQ ID NO: 1.
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence of amino 591-700, 514-714, 514-794 or 514- 890 of S protein, e.g., amino acids 591-700, 514- 714, 514-794 or 514-890 of SEQ ID NO: 1.
  • the third domain comprises an amino acid sequence having 100% identity to the amino acid sequence of amino acids 591-700, 514-714, 514-794 or 514-890 of S protein, e.g., amino acids 591-700, 514- 714, 514-794 or 514- 890 of SEQ ID NO: 1.
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence of amino acids 1-377, 1-281, 1-201 or 78-187 of SEQ ID NO: 42, 43 or 60-65.
  • the third domain comprises an amino acid sequence having at least one mutation or deletion at position 19, 41, 127, 164 or 175 of SEQ ID NO: 42, 43 or 60-65.
  • the third domain comprises the amino acid sequence of amino acids 1-281 of SEQ ID NO: 42, 43 or 60-65having at least one mutation or deletion at position 19, 41, 127, 164 or 175.
  • the third domain comprises the amino acid sequence of amino acids 1-201 of SEQ ID NO: 42, 43 or 60-65 having at least one mutation or deletion at position 19, 41, 127, 164 or 175.
  • the third domain comprises the amino acid sequence of amino acids 78-187 of SEQ ID NO: 42, 43 or 60-65 having at least one mutation or deletion at position 19, 41, 127, 164 or 175.
  • the third domain comprises an amino acid sequence having 100% identity to an amino acid sequence of amino acids 1-377, 1-281, 1-201 or 78-187 of SEQ ID NO: 42, 43 or 60-65.
  • the third domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2.
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2.
  • the third domain comprises an amino acid sequence having 100% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein protein, e.g., amino acids 1-44 of SEQ ID NO: 2.
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence SEQ ID NO: 2.
  • the third domain comprises an amino acid sequence comprising a substitution, mutation or deletion at position 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 32, 37, 38, 41, 42, 43 or 44 of SEQ ID NO: 2.
  • the third domain comprises an amino acid sequence comprising an amino acid sequence comprising a substitution, mutation or deletion at position 15, 16, 18, 20, 24, 25, 26, 28, or 38 of SEQ ID NO: 2.
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence selected from SEQ ID NO: 20.
  • the third domain comprises an amino acid sequence having 100% identity to SEQ ID NO: 20.
  • the third domain comprises an amino acid sequence comprising a substitution, mutation or deletion at position 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 32, 37, 38, 41, 42, 43 or 44 of SEQ ID NO: 20.
  • the third domain comprises an amino acid sequence comprising an amino acid sequence comprising a substitution, mutation or deletion at position 15, 16, 18, 20, 24, 25, 26, 28, or 38 of SEQ ID NO: 20.
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence selected from SEQ ID NOs: 20-40.
  • the third domain comprises an amino acid sequence having 100% identity to an amino acid sequence selected from SEQ ID NOs: 20-40.
  • the fusion polypeptide comprises a fourth domain.
  • the fourth domain comprises an amino acid sequence of the first, second or third domain.
  • the domains at the N-terminal and C-terminals comprise amino acids sequences having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to each other.
  • the domains at the N-terminal and C-terminals comprise amino acids sequences having 100% identity to each other.
  • the fourth domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2.
  • the fourth domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2.
  • the fourth domain comprises an amino acid sequence having 100% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2.
  • the fourth domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence selected from SEQ ID NOs: 20-43 and 60-65.
  • the fourth domain comprises an amino acid sequence having 100% identity to an amino acid sequence selected from SEQ ID NOs: 20-43 and 60-65.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 42.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 42
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 43.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 43
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 60.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 60
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 61.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 61
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 622.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 62
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 63.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 63
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 64.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 64
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 65.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 65
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 42.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 42
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 43.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 43
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 60.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 60
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 61.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 61
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 622.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 62
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 63.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 63
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 64.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 64
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 65.
  • the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45
  • the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 65
  • the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20.
  • an amino acid sequence described herein comprises at least one amino acid substitution.
  • the term “conservative substitution,” or “substitution” or “substituted” when describing a polypeptide refers to a change in the amino acid composition of the polypeptide that does not substantially alter the polypeptide's activity, fore examples, a conservative substitution refers to substituting an amino acid residue for a different amino acid residue that has similar chemical properties.
  • Conservative amino acid substitutions include replacement of a leucine with an isoleucine or valine, an aspartate with a glutamate, or a threonine with a serine.
  • “Conservative amino acid substitutions” result from replacing one amino acid with another having similar structural and/or chemical properties, such as the replacement of a leucine with an isoleucine or valine, an aspartate with a glutamate, or a threonine with a serine.
  • a “conservative substitution” of a particular amino acid sequence refers to substitution of those amino acids that are not critical for polypeptide activity or substitution of amino acids with other amino acids having similar properties (e.g., acidic, basic, positively or negatively charged, polar or non-polar, etc.) such that the substitution of even critical amino acids does not substantially alter activity.
  • Conservative substitution tables providing functionally similar amino acids are well known in the art.
  • the following six groups each contain amino acids that are conservative substitutions for one another: 1) Alanine (A), Serine (S), Threonine (T); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W). (See also Creighton, Proteins, W. H.
  • Naturally occurring residues can be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met, Ala, Val, Leu, Ile; (2) neutral hydrophilic: Cys, Ser, Thr, Asn, Gln; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; (6) aromatic: Trp, Tyr, Phe.
  • Non-conservative substitutions will entail exchanging a member of one of these classes for another class.
  • Particular conservative substitutions include, for example; Ala into Gly or into Ser; Arg into Lys; Asn into Gln or into His; Asp into Glu; Cys into Ser; Gln into Asn; Glu into Asp; Gly into Ala or into Pro; His into Asn or into Gln; Ile into Leu or into Val; Leu into Ile or into Val; Lys into Arg, into Gln or into Glu; Met into Leu, into Tyr or into Ile; Phe into Met, into Leu or into Tyr; Ser into Thr; Thr into Ser; Trp into Tyr; Tyr into Trp; and/or Phe into Val, into Ile or into Leu.
  • a fusion polypeptide or fragment thereof as described herein can be a variant of a polypeptide provided herein.
  • the variant is a conservatively modified variant.
  • Conservative substitution variants can be obtained by mutations of native nucleotide sequences, for example.
  • a “variant,” as referred to herein, is a polypeptide substantially homologous to a native or reference polypeptide, but which has an amino acid sequence different from that of the native or reference polypeptide because of one or a plurality of deletions, insertions or substitutions.
  • Variant polypeptide-encoding DNA sequences encompass sequences that comprise one or more additions, deletions, or substitutions of nucleotides when compared to a native or reference DNA sequence, but that encode a variant protein or fragment thereof that retains activity of the non-variant polypeptide.
  • a wide variety of PCR-based site- specific mutagenesis approaches are known in the art and can be applied by the ordinarily skilled artisan.
  • a variant amino acid or nucleic acid sequence can be at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, identical to a native or reference sequence (e.g.
  • the degree of homology (percent identity) between a native and a mutant sequence can be determined, for example, by comparing the two sequences using freely available computer programs commonly employed for this purpose on the world wide web (e.g. BLASTp or BLASTn with default settings).
  • Alterations of the native amino acid sequence can be accomplished by any of a number of techniques known in the art. Mutations can be introduced, for example, at particular loci by synthesizing oligonucleotides containing a mutant sequence, flanked by restriction sites permitting ligation to fragments of the native sequence. Following ligation, the resulting reconstructed sequence encodes an analog having the desired amino acid insertion, substitution, or deletion.
  • oligonucleotide-directed site-specific mutagenesis procedures can be employed to provide an altered nucleotide sequence having particular codons altered according to the substitution, deletion, or insertion required.
  • Techniques for making such alterations are well established and include, for example, those disclosed by Walder et al. (Gene 42:133, 1986); Bauer et al. (Gene 37:73, 1985); Craik (BioTechniques, January 1985, 12-19); Smith et al. (Genetic Engineering: Principles and Methods, Plenum Press, 1981); and U.S. Pat. Nos. 4,518,584 and 4,737,462, which are incorporated herein by reference in their entireties.
  • any cysteine residue not involved in maintaining the proper conformation of a polypeptide also can be substituted, generally with serine, to improve the oxidative stability of the molecule and prevent aberrant crosslinking.
  • cysteine bond(s) can be added to a polypeptide to improve its stability or facilitate oligomerization.
  • Linkers [00147]
  • the domains of the fusion polypeptide provided herein are linked by a linker.
  • the first and second domains are linked by a linker
  • the second and third domain are linked by a linker
  • the third and fourth domain are linked by a linker.
  • linker means a molecular moiety that connects two parts of a composition.
  • the linker can be a chemical linker, a single peptide bond (e.g., linked directly to each other) or a peptide linker containing one or more amino acid residues (e.g. with an intervening amino acid or amino acid sequence between the two domains that are linked to each other.
  • the linker is a flexible linker.
  • a “flexible linker” is a linker which does not have a fixed structure (secondary or tertiary structure) in solution and is therefore free to adopt a variety of conformations.
  • a flexible linker has a plurality of freely rotating bonds along its backbone.
  • a rigid linker is a linker which adopts a relatively well-defined conformation when in solution.
  • Rigid linkers are therefore those which have a particular secondary and/or tertiary structure in solution.
  • two domains that are linked together can be separated from each other by any desired distance.
  • the linker can be of any desired length.
  • certain linker lengths are relatively better for using the fusion protein in methods for detecting target nucleic acids. Accordingly, in some embodiments of the various aspects described herein, the linker is from about 10 ⁇ to about 140 ⁇ in length.
  • the linker can be from about 10 ⁇ to about 130 ⁇ in length, from about 15 ⁇ to about 125 ⁇ in length, from about 20 ⁇ to about 120 ⁇ in length, from about 25 ⁇ to about 115 ⁇ in length, from about 30 ⁇ to about 110 ⁇ in length, from about 35 ⁇ to about 105 ⁇ in length, from about 40 ⁇ to about 100 ⁇ in length, from about 45 ⁇ to about 95 ⁇ in length, from about 50 ⁇ to about 90 ⁇ in length, or from about 55 ⁇ to about 85 ⁇ in length.
  • the linker can be from about 60 ⁇ to about 80 ⁇ in length or from about 65 ⁇ to about 75 ⁇ in length.
  • the linker is about 70 ⁇ in length.
  • at least two of the domains are linked via a peptide linker.
  • peptide linker denotes a peptide with amino acid sequences, which is in some embodiments of synthetic origin. It is noted that peptide linkers may affect folding of a given fusion protein, and may also react/bind with other proteins, and these properties can be screened for by known techniques.
  • a peptide linker can comprise 1 amino acid or more, 5 amino acids or more, 10 amino acids or more, 15 amino acids or more, 20 amino acids or more, 25 amino acids or more, 30 amino acids or more, 35 amino acids or more, 40 amino acids or more, 45 amino acids or more, 50 amino acids or more and beyond.
  • a peptide linker can comprise less than 50 amino acids, less than 45 amino acids, less than 40 amino acids, less than 35 amino acids, less than 30 amino acids, less than 30 amino acids, less than 25 amino acids, less than 20 amino acids, less than 15 amino acids or less than 10 amino acids.
  • the peptide linker comprises from about 5 amino acids to about 50 amino acids.
  • the peptide linker can comprise from about 5 amino acids to about 45 amino acids, from about 5 amino acids to about 40 amino acids, from about 5 amino acids to about 35 amino acids, from about 10 amino acids to 30 amino acids, or from about 15 amino acids to about 25 amino acids.
  • the linker comprises 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 or 25 amino acids.
  • the linker comprises 17, 18, 19, 20, 21, 22 or 23 amino acids.
  • the linker comprises 18, 19, 20, 21 or 22 amino acids. More preferably, the linker comprises 19, 20 or 21 amino acids.
  • the linker comprises 20 amino acids.
  • Exemplary peptide linkers include those that consist of glycine and serine residues, the so-called Gly-Ser polypeptide linkers.
  • Gly-Ser polypeptide linker refers to a peptide that consists of glycine and serine residues.
  • the peptide linker comprises the amino acid sequence (GlyxSer)n (SEQ ID NO: 66), where x is 2, 3, 4 or 5, and n is 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10. In some embodiments of the various aspects described herein, x is 3 and n is 3, 4, 5 or 6.
  • x is 3 and n is 4 or 5. In some embodiments of the various aspects described herein, x is 4 and n is 3, 4, 5 or6. In some embodiments of the various aspects described herein, x is 4 and n is 4 or 5. In some embodiments of the various aspects described herein, x is 3 and n is 2. In some embodiments of the various aspects described herein, x is 3 or 4 and n is 1. [00155] Peptide linkers may affect folding of a given fusion protein, and may also react/bind with other proteins, and these properties can be screened for by known techniques.
  • Exemplary linkers include a string of histidine residues, e.g., His6 (SEQ ID NO: 67); sequences made up of Ala and Pro, varying the number of Ala-Pro pairs to modulate the flexibility of the linker; and sequences made up of charged amino acid residues e.g., mixing Glu and Lys. Flexibility can be controlled by the types and numbers of residues in the linker. See, e.g., Perham et al., Biochem. 8501 (1991) 30: 8501 and Wriggers et al., Biopolymers 736 (2005) 80:736.
  • the linker is (GGGS)n (SEQ ID NO: 68), where n is greater than 2.
  • n is 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20.
  • the linker can be a chemical linker.
  • Chemical linkers can comprise a direct bond or an atom such as oxygen or sulfur, a unit such as NH, C(O), C(O)NH, SO, SO 2 , SO 2 NH, or a chain of atoms, such as substituted or unsubstituted C1-C6 alkyl, substituted or unsubstituted C2-C6 alkenyl, substituted or unsubstituted C2-C6 alkynyl, substituted or unsubstituted C6-C12 aryl, substituted or unsubstituted C5-C12 heteroaryl, substituted or unsubstituted C5-C12 heterocyclyl, substituted or unsubstituted C3-C12 cycloalkyl, where one or more methylenes can be interrupted or terminated by O, S, S(O), SO2, NH, or C(O).
  • the domains can be linked to each other in any desired orientation.
  • the first domain can be linked to N-terminus of the second domain.
  • the first domain can be linked to C-terminus of the second domain.
  • the second and the third domains can be linked to each other in any desired orientation.
  • the third domain can be linked to N-terminus of the second domain.
  • the third domain can be linked to C-terminus of the second domain.
  • the third and the fourth domains can be linked to each other in any desired orientation.
  • the fourth domain can be linked to N-terminus of the third domain.
  • the fourth domain can be linked to C-terminus of the third domain.
  • the first domain is linked to the N-terminus of the second domain and the third domain is linked to the C-terminus of the second domain.
  • the fusion polypeptide comprises from N-terminus to C-terminus: first domain, second domain and third domain.
  • a linker can be present between the first and second domain, and/or between the second and third domain.
  • second domain is linked to the N-terminus of the third domain and the first domain is linked to the C-terminus of the third domain.
  • the fusion polypeptide comprises from N-terminus to C-terminus: second domain, third domain and first domain.
  • a linker can be present between the second and third domain, and/or between the third and first domain.
  • the C-terminal region of the N-terminal domain of the viroporin 3a protein is linked to the N-terminal region of the N-terminal domain of the S protein.
  • the N-terminal domain of the viroporin is separated from the N-terminal domain of the S protein by a peptide linker.
  • the fusion polypeptide comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence selected from SEQ ID NOs: 42-55 and 60- 63: A.
  • 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is D) (SEQ ID NOS 44 and 42, respectively); B. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is D) (890)([linkage]3a) (SEQ ID NOS 45 and 42, respectively); C. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is G) (SEQ ID NOS 46 and 43, respectively); and D. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is G) (890)([linkage]3a) (SEQ ID NOS 47 and 43, respectively); E.
  • 3aNTD-S fusion to SARS-CoV-2 S NTD F. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is D) I (794)([linkage]3a) (SEQ ID NOS 49 and 60, respectively); G. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is G) (794) (SEQ ID NOS 50 and 61, respectively); H. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is G) I (794)([linkage]3a) (SEQ ID NOS 51 and 61, respectively); I.
  • 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is D) (714) (SEQ ID NOS 52 and 62, respectively); J. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is D) (714)([linkage]3a) (SEQ ID NOS 53 and 62, respectively); K. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is G) (714) (SEQ ID NOS 54 and 63, respectively); and L. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is G) (714)([linkage]3a) (SEQ ID NOS 55 and 63, respectively); M. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is D) (714) (SEQ ID NOS 52 and 62, respectively); J. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue
  • 3a refers to amino acid 1-44 of a 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2; [linkage] means a linker, e.g., a linker described herein; and the number in parentheses, e.g., (514), (591), (714), (790) and (890) refers to the start position (e.g., (514) or (591)) or the end position (e.g., (714), (790) or (890)) of the sequence flanked by the numbers in the parentheses in the full length wild-type S protein, e.g., SEQ ID NO: 1.
  • the fusion polypeptide comprises an amino acid sequence having 100% identity to an amino acid sequence selected from SEQ ID NOs: 45-65.
  • the disclosure also provides a polynucleotide encoding a fusion polypeptide described herein.
  • a nucleic acid encoding a fusion polypeptide described herein is comprised in a vector.
  • a nucleic acid sequence encoding a fusion polypeptide is operably linked to a vector.
  • vector refers to a nucleic acid construct designed for delivery to a host cell or for transfer between different host cells.
  • a vector can be viral or non-viral.
  • vector encompasses any genetic element that is capable of replication when associated with the proper control elements and that can transfer gene sequences to cells.
  • a vector can include, but is not limited to, a cloning vector, an expression vector, a plasmid, phage, transposon, cosmid, chromosome, virus, virion, etc.
  • the vector is recombinant, e.g., it comprises sequences originating from at least two different sources.
  • the vector comprises sequences originating from at least two different species. In some embodiments of any of the aspects, the vector comprises sequences originating from at least two different genes, e.g., it comprises a fusion protein or a nucleic acid encoding an expression product which is operably linked to at least one non-native (e.g., heterologous) genetic control element (e.g., a promoter, suppressor, activator, enhancer, response element, or the like).
  • non-native e.g., heterologous
  • the vector or nucleic acid described herein is codon-optimized, e.g., the native or wild-type sequence of the nucleic acid sequence has been altered or engineered to include alternative codons such that altered or engineered nucleic acid encodes the same polypeptide expression product as the native/wild-type sequence, but will be transcribed and/or translated at an improved efficiency in a desired expression system.
  • the expression system is an organism other than the source of the native/wild-type sequence (or a cell obtained from such organism).
  • the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a mammal or mammalian cell, e.g., a mouse, a murine cell, or a human cell. In some embodiments, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a human cell. In some embodiments, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a yeast or yeast cell. In some embodiments, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a bacterial cell.
  • the vector and/or nucleic acid sequence described herein is codon-optimized for expression in an E. coli cell.
  • expression vector refers to a vector that directs expression of an RNA or polypeptide from sequences linked to transcriptional regulatory sequences on the vector. The sequences expressed will often, but not necessarily, be heterologous to the cell.
  • An expression vector may comprise additional elements, for example, the expression vector may have two replication systems, thus allowing it to be maintained in two organisms, for example in human cells for expression and in a prokaryotic host for cloning and amplification.
  • the term “viral vector” refers to a nucleic acid vector construct that includes at least one element of viral origin and has the capacity to be packaged into a viral vector particle.
  • the viral vector can contain the nucleic acid encoding a polypeptide as described herein in place of non-essential viral genes.
  • the vector and/or particle may be utilized for the purpose of transferring any nucleic acids into cells either in vitro or in vivo. Numerous forms of viral vectors are known in the art.
  • the vectors described herein can, in some embodiments, be combined with other suitable compositions and therapies. In some embodiments, the vector is episomal.
  • the constructs can be comprised by a superstructure, e.g., nanoparticles, liposomes, vectors, cells, scaffolds, or the like.
  • the disclosure also provides a cell comprising a fusion polypeptide described herein or a polynucleotide encoding the same.
  • the term “cell” refers to a single cell as well as to a population of (i.e., more than one) cells.
  • a cell prokaryotic cell or a eukaryotic cell comprising a polypeptide or polynucleotide described herein.
  • Exemplary cells include, but are not limited to, bacterial cells, yeast cells, plant cell, animal (including insect) or human cells.
  • Methods of selecting a viral polypeptide for a fusion polypeptide composition [00176] Provided herein is a method of selecting a viral polypeptide or fragment thereof for a therapeutic fusion polypeptide composition that can be administered to a subject prior to exposure to a pathogenic microorganism, during or following an infection. The fusion polypeptide can be administered to promote an immune response in a subject.
  • the method provided herein comprises a covariance analysis that compares the amino acid residues of polypeptides expressed by a first infectious pathogenic microorganism that infects a human subject (e.g., SARS-CoV2) to microorganisms that infect other species (e.g., bats) and/or different pathogenic microorganism that infects a human subject (e.g., SAR-CoV).
  • a first infectious pathogenic microorganism that infects a human subject
  • microorganisms that infect other species e.g., bats
  • different pathogenic microorganism that infects a human subject e.g., SAR-CoV
  • MSA Multiple sequence alignment
  • the variables are specific positions within a polypeptide amino acid sequence, for example, two amino acid residues.
  • Analysis of covariance can determine how a genome, translation products, or microorganisms evolve independently or whether they coevolve together. Sequence covariation can be used to detect protein–protein interactions, ligand-receptor bindings, and the folding structure of single proteins. Coevolving amino acid sites are also functionally and structurally important for identifying functional dependency between proteins. See, e.g., Fares MA, Travers SA.
  • a novel method for detecting intramolecular coevolution adding a further dimension to selective constraints analyses.
  • Covariance analysis is generally based on the following formula: The length (l) of the aligned sequences and number of pairs (N) is denoted. A threshold or purity value for the calculated correlation value (Px(m)y(n)) was set to be exceedingly stringent based on the apparent evolutionary relatedness of the selected set of lineage B betacoronaviruses.
  • covarying amino acid positions refers to amino acid residue positions that normally occur together. an amino acid position of a polypeptide that has evolved, changed, or mutated when aligned with to one or more reference sequences.
  • Covariance between two or more amino acid positions is observed when the type of amino acid found at a first amino acid position is dependent on the type of amino acid found at another amino acid position. That is, when one particular amino acid is found at a first position, a second particular amino acid is usually found at the second position.
  • One of skill in the art can determine the optimum assemblage of reference sequences and shared genes among a group of related genomes that can be used for the method of covariance analysis. As the degree of taxonomic relatedness within a group of related genomes can be arbitrary and distinct regions of an individual genome may have been horizontally acquired or lost independent of the measured evolutionary divergence measured in other conserved genes, it is necessary to select an appropriate threshold of relatedness of analyzed genomes and also the selection of conserved genes.
  • Non-limiting examples of reference sequences that can be used include those listed in Table 1 or sequences derived from the microorganisms in Table 2.
  • covariant amino acid residues Once covariant amino acid residues are identified, they can be further binned into pairs and groups (referred to herein as ‘Clusters’) of covarying amino acid residues. These clusters can be organized using a correlating tandem model set at different purity thresholds between 0.8 and 1.0. For example, a stringent purity threshold is 0.96 and will reduce noise based on the sampling size (See e.g., Table 2).
  • the correlating tandem model refers to a mathematical algorithm that evaluates the degree of associate of each element in a covariance analysis (e.g., pairs of amino acid residues) to confirm a reliable pattern of covariance. See e.g., Shen, W., Li, Y. A novel algorithm for detecting multiple covariance and clustering of biological sequences. Sci Rep 6, 30425 (2016), which is incorporated herein by reference in its entirety.
  • the correlating tandem model provided herein is based on the following formula: Based on the set correlation threshold values, groups of more than three covarying amino acid pairs were binned and transformed into a matrix.
  • mt is the degree of association correlation threshold and anything below a value threshold is removed from the correlation matrix when generating a pattern of covarying residues. This was repeated for all covarying residues found in each row and column in the matrix as to keep binned groups of residues that meet the set criteria. In this entire analysis, we chose a stringent 0.96 purity value (shown as P).
  • An applied force-directed mapping algorithm can be used to visualize the relationships between these clusters of covarying residues isolates from which they were derived (See, e.g., Jacomy et al., “ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software.” PloS ONE.
  • the inventors discovered that this analysis readily organized coronavirus isolates into groups that were consistent with phylogenic analysis (FIGS.1 and 3 in the working example).
  • the clusters can then be binned to compare covarying residues that are restricted to isolates from a given host species (e.g., humans or bats) or a given group of viruses.
  • a given host species e.g., humans or bats
  • clusters that are restricted to various combinations of bat, civet, pangolin, and human viral isolates can be identified. Those classified as ‘restricted’ are found in clusters linked to specific groups and absent in other groups.
  • this annotated comprehensive dataset identifies all distinct residue identities that strongly covary in each viral protein and the distribution of these in microorganisms that include but are not limited to the human pathogens- SARS-CoV and SARS-CoV-2.
  • Enriched covariant residues within the network include those that facilitate transmissibility into humans and would be selected as optimal candidates for a fusion polypeptide and/or vaccine composition provided herein.
  • the method described herein further comprises cloning the selected polypeptide or a fragment thereof into an expression vector.
  • the selected polypeptide comprises one or more covarying amino acid residues identified by the methods described above.
  • the nucleic acid encoding the candidate antigen or fragment thereof described herein can be cloned into a plasmid (e.g., pET17b).
  • the plasmid can have an antibiotic-resistance cassette (e.g., ampicillin or kanamycin)
  • the plasmid is then transformed into a bacterium (e.g., E. coli BL21 (DE3)) for recombinant expression.
  • Transformed bacterial cultures can then be inoculated in a medium with the appropriate antibiotics followed by induction (e.g., by isopropyl ⁇ -D-1-thiogalactopyranoside (IPTG)) to evaluate the polypeptide or polypeptide fragment expression.
  • IPTG isopropyl ⁇ -D-1-thiogalactopyranoside
  • Other methods of making, expressing, delivering, or preparing a fusion polypeptide or fragment thereof can also be used.
  • mRNA vaccine compositions can be used. See US Pat Nos. 9,192,651 B2 and 10,022,435B2 which have been incorporated by reference herein in their entirety. See also, polypeptide-antigen conjugates in US 2018/0333484 A1 and 2019/0192645 A1, which have been incorporated by reference herein in their entirety.
  • the method further comprises expressing and isolating the candidate polypeptide or fragment thereof by methods known in the art.
  • bacteria can be collected by centrifugation and proteins can then be purified by methods known in the art (e.g., column purification). Dot blots can be used to verify the antigen-positive fractions using an antibody and detection reagents. Protein concentration can then be quantified by methods known in the art.
  • the resulting fusion polypeptide or fragments thereof can be tested and verified by immunizing an animal model or a subject with purified recombinant proteins and measuring an immune response to the antigen.
  • the fusion polypeptides, vaccine compositions, and methods provided herein can provoke an immune response, e.g., an immune response which is protective against infections by one or more microorganisms.
  • an immune response e.g., an immune response which is protective against infections by one or more microorganisms.
  • a method of provoking an immune response to a microorganism e.g., a pathogenic virus or a coronavirus
  • the method comprises administering to a subject a fusion polypeptide or vaccine composition as described herein.
  • the vaccine composition provokes an immune response that is protective against a pathogenic virus.
  • the vaccine composition provokes an immune response that is protective against a coronavirus (e.g., SARS-CoV and SARS-CoV2).
  • An immune response can be characterized as any stimulation of any immune cell, such as release of antibodies, cytokines, proliferation of an immune cell, phagocytosis, or any known function of an immune cell known in the art.
  • Provoking an immune response as described herein can include the presence of an antibody or an increase in antibody production by B cells wherein the antibody can bind to an antigen expressing or an infecting pathogen following administration of the polypeptides, fragments thereof, or vaccine compositions described herein and thereby target the microorganism for killing or inactivation.
  • An immune response can be determined experimentally by evaluating the levels of immune molecules or cells in a biological sample from a subject exposed to the polypeptide, antigen, or fragment thereof.
  • Methods of detecting an immune response include but are not limited to antibody ELISA (e.g., IgG antibody), cytokine ELISA (e.g., measuring the presence of cytokines such as IL-4, IL-12, IL-6, IFN- ⁇ , or TNF- ⁇ ), flow cytometry, viral titer, or a bactericidal assay (SBA).
  • antibody broadly refers to any immunoglobulin (Ig) molecule and immunologically active portions of immunoglobulin molecules (i.e., molecules that contain an antigen binding site that immunospecifically bind an antigen) comprised of four polypeptide chains, two heavy (H) chains and two light (L) chains, or any functional fragment, mutant, variant, or derivation thereof, which retains the essential epitope binding features of an Ig molecule.
  • immunoglobulin molecules i.e., molecules that contain an antigen binding site that immunospecifically bind an antigen
  • immunoglobulin molecules comprised of four polypeptide chains, two heavy (H) chains and two light (L) chains, or any functional fragment, mutant, variant, or derivation thereof, which retains the essential epitope binding features of an Ig molecule.
  • the antibody or immunoglobulin molecules can be of any type (e.g., IgG, IgE, IgM, IgD, IgA and IgY), class (e.g., IgG1, IgG2, IgG3, IgG4, IgA1 and IgA2) or subclass of immunoglobulin molecule, as is understood by one of skill in the art.
  • the presence or an increase in antibody production in a subject compared with a reference level can be measured by any method known in the art including enzyme-linked immunosorbent assay (ELISA).
  • ELISA enzyme-linked immunosorbent assay
  • the presence or an increase in cytokine production by immune cells compared with a reference level can be measured by any method known in the art including an ELISPOT assay.
  • the fusion polypeptide provokes and immune response in a human subject.
  • the fusion polypeptides provided herein can be formulated with a pharmaceutically acceptable carrier for administration that results in effective treatment of a subject or as a pharmaceutical composition or a vaccine composition.
  • the vaccine composition comprises one or more fusion polypeptides or fragments thereof provided herein.
  • the vaccine composition comprises multiple fusion polypetides provided herein or fragments thereof.
  • the vaccine composition comprises two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, or twenty or more fusion polypeptides or fragments thereof provided herein (e.g., SEQ ID NOs: 14-37).
  • Acceptable carriers, excipients, or stabilizers are nontoxic to recipients at the dosages and concentrations employed, and include buffers such as phosphate, citrate, and other organic acids; antioxidants including ascorbic acid and methionine; preservatives (such as octadecyldimethylbenzyl ammonium chloride; hexamethonium chloride; benzalkonium chloride, benzethonium chloride; phenol, butyl or benzyl alcohol; alkyl parabens such as methyl or propyl paraben; catechol; resorcinol; cyclohexanol; 3-pentanol; and m-cresol); low molecular weight (less than about 10 residues) polypeptides; proteins, such as serum albumin, gelatin, or immunoglobulins; hydrophilic polymers such as polyvinylpyrrolidone; amino acids such as glycine, glutamine, asparagine, hist
  • the formulations comprising the compositions provided herein contain a pharmaceutically acceptable salt, typically, e.g., sodium chloride, and preferably at about physiological concentrations.
  • the formulations of the vaccine compositions described herein can contain a pharmaceutically acceptable preservative.
  • Suitable preservatives include those known in the pharmaceutical arts, e.g., benzyl alcohol, phenol, m-cresol, methylparaben, and propylparaben are examples of preservatives.
  • the formulations of the vaccine compositions described herein can include a pharmaceutically acceptable surfactant, e.g., at a concentration of about 0.005 to 0.02%.
  • the therapeutic formulations of the pharmaceutical compositions comprising the fusion proteins provided herein can also contain more than one active compound as necessary for the particular indication being treated (e.g., COVID19), preferably those with complementary activities that do not adversely affect each other. Such molecules are suitably present in combination in amounts that are effective for the purpose intended.
  • the active ingredients of the pharmaceutical compositions comprising a fusion protein provided herein can also be entrapped in microcapsules prepared, for example, by coacervation techniques or by interfacial polymerization, for example, hydroxymethylcellulose or gelatin- microcapsules and poly-(methylmethacylate) microcapsules, respectively, in colloidal drug delivery systems (for example, liposomes, albumin microspheres, microemulsions, nano-particles and nanocapsules) or in macroemulsions.
  • colloidal drug delivery systems for example, liposomes, albumin microspheres, microemulsions, nano-particles and nanocapsules
  • the pharmaceutical or vaccine composition provided herein can be formulated, dosed, and administered in a fashion consistent with good medical practice.
  • Factors for consideration in this context include the particular disorder being treated, the particular subject being treated, the clinical condition of the individual subject, the cause of the disorder, the site of delivery of the vaccine composition, the method of administration, the scheduling of administration, and other factors known to medical practitioners.
  • the “therapeutically effective amount” or “amount effective” of the fusion protein or fragment thereof or vaccine composition to be administered are governed by such considerations, and refers to the minimum amount necessary to ameliorate, treat, or stabilize an infection; to increase the time until progression (duration of progression free survival) or to treat or prevent the occurrence or recurrence of an infection.
  • the vaccine composition can be optionally formulated, in some embodiments, with one or more additional therapeutic agents currently used to prevent or treat the infection, for example.
  • the effective amount of such other agents depends on the amount of fusion protein or fragment thereof present in the formulation, the type of disorder or treatment, and other factors discussed above. These are generally used in the same dosages and with administration routes as used herein before or about from 1 to 99% of the heretofore employed dosages.
  • Effective amounts, toxicity, and therapeutic efficacy can be determined by standard pharmaceutical procedures in cell cultures or experimental animals, e.g., for determining the LD50 (the dose lethal to 50% of the population) and the ED50 (the dose therapeutically effective in 50% of the population). The dosage can vary depending upon the dosage form employed and the route of administration utilized.
  • the dose ratio between toxic and therapeutic effects is the therapeutic index and can be expressed as the ratio LD50/ED50.
  • Compositions and methods that exhibit large therapeutic indices are preferred.
  • a therapeutically effective dose can be estimated initially from cell culture assays.
  • a dose can be formulated in animal models to achieve a circulating plasma concentration range that includes the IC50 (i.e., the concentration of the polypeptide or fragment thereof), which achieves a half-maximal inhibition of symptoms) as determined in cell culture, or in an appropriate animal model.
  • Levels in plasma can be measured, for example, by high performance liquid chromatography.
  • the effects of any particular dosage can be monitored by a suitable bioassay. The dosage can be determined by a physician and adjusted, as necessary, to suit observed effects of the treatment.
  • the dosage ranges for the vaccine composition depend upon the potency, and encompass amounts large enough to produce the desired effect.
  • the dosage should not be so large as to cause unacceptable adverse side effects.
  • the dosage will vary with the age, condition, and sex of the patient and can be determined by one of skill in the art.
  • the dosage can also be adjusted by the individual physician in the event of any complication.
  • the dosage ranges from 0.001 mg/kg body weight to 100 mg/kg body weight.
  • the dose range is from 5 ⁇ g/kg body weight to 100 ⁇ g/kg body weight.
  • the dose range can be titrated to maintain serum levels between 1 ⁇ g/mL and 1000 ⁇ g/mL.
  • subjects can be administered a therapeutic amount, such as, e.g., 0.1 mg/kg, 0.5 mg/kg, 1.0 mg/kg, 2.0 mg/kg, 2.5 mg/kg, 5 mg/kg, 7.5 mg/kg, 10 mg/kg, 15 mg/kg, 20 mg/kg, 25 mg/kg, 30 mg/kg, 40 mg/kg, 50 mg/kg, or more.
  • a therapeutic amount such as, e.g., 0.1 mg/kg, 0.5 mg/kg, 1.0 mg/kg, 2.0 mg/kg, 2.5 mg/kg, 5 mg/kg, 7.5 mg/kg, 10 mg/kg, 15 mg/kg, 20 mg/kg, 25 mg/kg, 30 mg/kg, 40 mg/kg, 50 mg/kg, or more.
  • doses can be administered by one or more separate administrations, or by continuous infusion.
  • the treatment is sustained until, for example, the infection is treated, as measured by the methods described above or known in the art.
  • other dosage regimens can be useful.
  • the pharmaceutical or vaccine composition provided herein is suitably administered to the subject at one time or over a series of treatments.
  • the composition provided herein and the one or more additional therapeutic agents described herein are administered in a therapeutically effective or synergistic amount.
  • a therapeutically effective amount is such that co-administration of a fusion protein and one or more other therapeutic agents, or administration of a composition described herein, results in reduction or inhibition or prevention of a disease or disorder as described herein.
  • a therapeutically synergistic amount is that amount of a fusion protein and one or more other therapeutic agents necessary to synergistically or significantly reduce, prevent, or eliminate conditions or symptoms associated with a particular disease.
  • the fusion protein provided herein can be co-administered with one or more additional therapeutically effective agents to give an additive effect resulting in a significantly reduction, prevention, or elimination of conditions or symptoms associated with a particular disease, but with a much reduced toxicity profile due to lower dosages of one or more of the additional therapeutically effective agents.
  • the pharmaceutical or vaccine compositions described herein can be administered to a subject in need of vaccination, immunization, and/or stimulation of an immune response.
  • the methods described herein comprise administering an effective amount of vaccine compositions described herein, e.g., to a subject in order to stimulate an immune response or provide protection against the relevant pathogen or microorganism (e.g., SARS-COV2) the polypeptide or antigen was derived from.
  • Providing protection against the relevant pathogen is stimulating the immune system such that later exposure to the antigen, polypeptide, or fragment thereof (e.g., on or in a live pathogen) triggers a more effective immune response than if the subject was na ⁇ ve to the antigen. Protection can include faster clearance of the pathogen, reduced severity and/or time of symptoms, and/or lack of development of disease or symptoms.
  • compositions described herein can include, but are not limited to oral, parenteral, intravenous, intramuscular, subcutaneous, transdermal, airway (aerosol), pulmonary, cutaneous, injection, or topical, administration. Administration can be local or systemic.
  • parenteral intravenous, intramuscular, subcutaneous, transdermal, airway (aerosol), pulmonary, cutaneous, injection, or topical, administration. Administration can be local or systemic.
  • parenteral intravenous, intramuscular, subcutaneous, transdermal, airway (aerosol), pulmonary, cutaneous, injection, or topical, administration. Administration can be local or systemic.
  • the fusion polypeptide or fragment thereof or vaccine compositions as provided herein can be administered to a subject in need thereof by any appropriate route which results in an effective treatment in the subject.
  • administering and “introducing” are used interchangeably and refer to the placement of a vaccine composition, fusion polypeptide or fragment thereof into a subject by a method or route which results in at least partial localization of such compositions at a desired site, such as a site of infection, such that a desired effect(s) is produced.
  • a fusion polypeptide or fragment thereof or vaccine composition can be administered to a subject by any mode of administration that delivers the vaccine composition systemically or to a desired surface or target, and can include, but is not limited to, injection, infusion, instillation, and inhalation administration.
  • oral administration forms are also contemplated.
  • “Injection” includes, without limitation, intravenous, intramuscular, intra-arterial, intrathecal, intraventricular, intracapsular, intraorbital, intracardiac, intradermal, intraperitoneal, transtracheal, subcutaneous, subcuticular, intraarticular, sub capsular, subarachnoid, intraspinal, intracerebro spinal, and intrasternal injection and infusion. [00207] The duration of a therapy using the compositions described herein will continue for as long as medically indicated or until a desired therapeutic effect (e.g., those described herein) is achieved.
  • the administration of the vaccine composition described herein is continued for 1 month, 2 months, 4 months, 6 months, 8 months, 10 months, 1 year, 2 years, 3 years, 4 years, 5 years, 10 years, 20 years, or for a period of years up to the lifetime of the subject.
  • appropriate dosing regimens for a given vaccine composition can comprise a single administration/immunization or multiple ones. Subsequent doses may be given repeatedly at time periods, for example, about two weeks or greater up through the entirety of a subject's life, e.g., to provide a sustained preventative effect.
  • Subsequent doses can be spaced, for example, about two weeks, about three weeks, about four weeks, about one month, about two months, about three months, about four months, about five months, about six months, about seven months, about eight months, about nine months, about ten months, about eleven months, or about one year after a primary immunization.
  • the precise dose to be employed in the formulation will also depend on the route of administration and should be decided according to the judgment of the practitioner and each patient's circumstances. Ultimately, the practitioner or physician will decide the amount of fusion protein provided herein or vaccine composition to administer to particular subjects.
  • a vaccine composition as described herein can be used, for example, to protect or treat a subject against disease.
  • Embodiment 1 A fusion polypeptide comprising: a first domain comprising a first viral polypeptide or a fragment thereof expressed by a first virus, wherein the first viral polypeptide or fragment thereof comprises at least two or more covarying amino acid positions.
  • Embodiment 2 The fusion polypeptide of Embodiment 1, further comprising a second domain comprising a second viral polypeptide or a fragment thereof expressed by the first virus, optionally, the second viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the first viral polypeptide or a fragment thereof.
  • Embodiment 3 The fusion polypeptide of Embodiment 1 or 2, wherein the second viral polypeptide or fragment thereof comprises at least two or more amino acid positions that covary with each other.
  • Embodiment 4 A fusion polypeptide comprising: (a) a first domain comprising a first viral polypeptide or a fragment thereof expressed by a first virus; and (b) a second domain comprising a second viral polypeptide or a fragment thereof expressed by the first virus, and wherein the first viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the second viral polypeptide or a fragment thereof.
  • Embodiment 5 The fusion polypeptide of Embodiment 4, wherein the first viral polypeptide or a fragment thereof comprises at least two or more amino acid positions that covary with each other.
  • Embodiment 6 The fusion polypeptide of any one of Embodiments 2-5, wherein the second viral polypeptide or a fragment thereof comprises at least two or more amino acid positions that covary with each other.
  • Embodiment 7 The fusion polypeptide of any one of Embodiments 2-6, wherein the first and second domain are linked by a linker.
  • Embodiment 8 The fusion polypeptide of any one of Embodiments 2-7, wherein the first and second domain are linked by a flexible linker.
  • Embodiment 9 The fusion polypeptide of any one of Embodiments 2-8, further comprising a third domain comprising a third viral polypeptide or a fragment thereof expressed by the first virus, optionally the third viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the first or second viral polypeptide or a fragment thereof.
  • Embodiment 10 The fusion polypeptide of Embodiment 9, wherein the third viral polypeptide or fragment thereof comprises at least two or more amino acid positions that covary with each other.
  • Embodiment 11 The fusion polypeptide of Embodiment 9 or 10, wherein the first and second domain are linked by a linker.
  • Embodiment 12 The fusion polypeptide of any one of Embodiments 9-11, wherein the second and third domain are linked by a linker.
  • Embodiment 13 The fusion polypeptide of any one of Embodiments 9-12, further comprising a fourth domain comprising an amino acid sequence of the first, second or third domain.
  • Embodiment 14 The fusion polypeptide of 13, wherein the third and the fourth domains are linked by a linker.
  • Embodiment 15 The fusion polypeptide of any one of Embodiments 1-14, wherein the covarying amino acid positions are determined using a correlating tandem model, optionally, the tandem model purity threshold is a level greater than or equal to 0.80.
  • Embodiment 16 The fusion polypeptide of any one of Embodiments 1-15, wherein the covarying amino acid positions are relative to a viral polypeptide expressed by a second virus.
  • Embodiment 17 The fusion polypeptide of any one of Embodiments 2-16, wherein the first virus is capable of infecting a human host and the second virus is capable of infecting a non- human host or the second virus is a different virus capable of infecting a human host.
  • Embodiment 18 The fusion polypeptide of Embodiment 17, wherein the first and second virus are from the same family.
  • Embodiment 19 The fusion polypeptide of Embodiment 17 or 18, wherein the first and second virus are from the same genus.
  • Embodiment 20 The fusion polypeptide of any one of Embodiments 17-19, wherein the first and second virus are from the same species.
  • Embodiment 21 The fusion polypeptide of any one of Embodiments 17-20, wherein the first and second virus are capable of infecting the same host species.
  • Embodiment 22 The fusion polypeptide of any one of Embodiments 1-21, wherein the first virus is a corona virus.
  • Embodiment 23 The fusion polypeptide of Embodiment 22, wherein the first virus is SARS-CoV or SARS-CoV2.
  • Embodiment 24 The fusion polypeptide of any one of Embodiments 9-23, wherein the third viral polypeptide or fragment thereof is a corona virus polypeptide or fragment thereof selected from the group consisting of: the viroporin 3a protein, a non-structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), and spike (S) protein.
  • Embodiment 25 The fusion protein of any one of Embodiments 9-24, wherein the third domain comprises an amino acid sequence having at least 85% identity SEQ ID NO: 42, 43, 60, 61, 62, 63, 64 or 65.
  • Embodiment 26 The fusion polypeptide of Embodiment 25, wherein the third domain comprises an amino acid sequence comprising a substitution or deletion at position 19, 41, 127, 164 or 175 of SEQ ID NO: 42, 43, 60, 61, 62, 63, 64 or 65.
  • Embodiment 27 The fusion polypeptide of any one of Embodiments 2-26, wherein the second viral polypeptide or fragment thereof is a corona virus polypeptide or fragment thereof selected from the group consisting of: the viroporin 3a protein, a non-structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), and spike (S) protein.
  • Embodiment 28 The fusion polypeptide of any one of Embodiments 2-27, wherein the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 or 45.
  • Embodiment 29 The fusion polypeptide of Embodiment 28, wherein the second domain comprises an amino acid sequence comprising a substitution or deletion at position 4, 6, 7, 9, 10, 13, 15, 16, 19, 20, 23, 49, 54, 56, 58, 59, 60, 61, 62, 126, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 150, 163, 165, 172, 191, 229, 230, 231, 232, 233, 243, 244, 246 or 248 of SEQ ID NO: 41 or at position 5, 7, 8, 10, 11, 14, 16, 17, 20, 21, 24, 50, 55, 57, 59, 60, 61, 62, 63, 127, 130, 131, 132, 133, 134, 135,
  • Embodiment 30 The fusion polypeptide of any one of Embodiments 1-29, wherein the first viral polypeptide or fragment thereof is a corona virus polypeptide or a fragment thereof selected from the group consisting of: the viroporin 3a protein, a non-structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), and spike (S) protein.
  • Embodiment 31 The fusion protein of any one of Embodiments 1-30, wherein the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20.
  • Embodiment 32 The fusion protein of Embodiment 31, wherein the first domain comprises an amino acid sequence comprising a substitution or deletion at position 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 32, 37, 38, 41, 42, 43 or 44 of SEQ ID NO: 20.
  • Embodiment 33 The fusion protein of Embodiment 32, wherein the first domain comprises an amino acid sequence comprising a substitution or deletion at position 15, 16, 18, 20, 24, 25, 26, 28, or 38 of SEQ ID NO: 20.
  • Embodiment 34 The fusion protein of any one of Embodiments 1-31, wherein the first domain comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 21-40.
  • Embodiment 35 The fusion polypeptide of any one of Embodiments 1-31, wherein: (i) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 42; (ii) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 42 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (iii) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 43; (iv) the first domain comprises an amino acid sequence having at least 85% identity to SEQ
  • Embodiment 36 The fusion polypeptide of any one of Embodiments 1-35, wherein the fusion polypeptide induces an antigen specific immune response when administered to a subject.
  • Embodiment 37 A polynucleotide encoding an amino acid sequence of a fusion polypeptide of any one of Embodiments 1-36.
  • Embodiment 38 A vaccine composition comprising a fusion polypeptide of any one of Embodiments 1-36 or a polynucleotide of Embodiment 37.
  • Embodiment 39 The vaccine composition of Embodiment 38 further comprising an adjuvant.
  • Embodiment 40 The vaccine composition of Embodiment 37 or 38, further comprising a pharmaceutical carrier.
  • Embodiment 41 A cell comprising a fusion polypeptide of any one of Embodiments 1-36 or a polynucleotide of Embodiment 37.
  • Embodiment 42 A kit comprising a fusion polypeptide of any one of Embodiments 1- 36 or a polynucleotide of Embodiment 37.
  • Embodiment 43 A method of inducing an immune response in a subject, the method comprising: administering to the subject a fusion polypeptide of any one of Embodiments 1-36 or a polynucleotide of Embodiment 37 in an amount effective to produce an antigen specific immune response.
  • a “subject” means a human or animal. Usually the animal is a vertebrate such as a primate, rodent, domestic animal or game animal. Primates include chimpanzees, cynomolgus monkeys, spider monkeys, and macaques, e.g., Rhesus. Rodents include mice, rats, woodchucks, ferrets, rabbits and hamsters.
  • domestic and game animals include cows, horses, pigs, deer, bison, buffalo, feline species, e.g., domestic cat, canine species, e.g., dog, fox, wolf, avian species, e.g., chicken, emu, ostrich, and fish, e.g., trout, catfish and salmon.
  • the subject is a mammal, e.g., a primate, e.g., a human.
  • the terms, “individual,” “patient” and “subject” are used interchangeably herein. [00257]
  • the subject is a mammal.
  • the mammal can be a human, non-human primate, mouse, rat, dog, cat, horse, or cow, but is not limited to these examples. Mammals other than humans can be advantageously used as subjects that represent animal models of immunization and immune response.
  • a subject can be male or female.
  • a subject can be one who has been previously diagnosed with or identified as suffering from or having a condition (e.g., has been diagnosed with an infection) or one or more complications related to such a condition, and optionally, have already undergone treatment for the condition or the one or more complications related to the condition. Alternatively, a subject can also be one who has not been previously diagnosed as having the condition or one or more complications related to the condition.
  • a subject can be one who exhibits one or more risk factors for the condition or one or more complications related to the condition or a subject who does not exhibit risk factors.
  • a “subject in need” of treatment for a particular condition can be a subject having that condition, diagnosed as having that condition, or at risk of developing that condition (e.g., an infection).
  • a “host subject” is a subject that has been infected with a pathogen, microorganism, or bacteria.
  • the host subject can be symptomatic or asymptomatic.
  • the host subject can also be a carrier for the microorganism. In some embodiments of any of the aspects, the host subject is a mammal.
  • an “immune response” refers to a response by a cell of the immune system, such as a B cell, T cell (CD4 or CD8), regulatory T cell, antigen-presenting cell, dendritic cell, monocyte, macrophage, NKT cell, NK cell, basophil, eosinophil, or neutrophil, to a stimulus (e.g., to a vaccine composition, fusion protein, antigen or fragment thereof).
  • a cell of the immune system such as a B cell, T cell (CD4 or CD8), regulatory T cell, antigen-presenting cell, dendritic cell, monocyte, macrophage, NKT cell, NK cell, basophil, eosinophil, or neutrophil
  • the response is specific for at least one particular antigen (e.g., an antigen of SARS-CoV2 virus), and refers to a response by a CD4 T cell, CD8 T cell, or B cell via their antigen-specific receptor.
  • antigen e.g., an antigen of SARS-CoV2 virus
  • Such responses by these cells can include, for example, cytotoxicity, proliferation, antibody, cytokine, or chemokine production, trafficking, or phagocytosis, and can be dependent on the nature of the immune cell undergoing the response.
  • the term “provoking an immune response” refers to stimulation of an immune response, an induction, or increase in the immune response to a pathogenic microorganism.
  • provoking an immune response can mean any one or more of the following: (i) the prevention of infection or re-infection, as in a traditional vaccine, (ii) the reduction in the severity of, or, in the elimination of symptoms, and (iii) the substantial or complete elimination of the pathogenic microorganism or disorder in question.
  • provoking an immune response may be effected prophylactically (prior to infection) or therapeutically (following infection).
  • prophylactic treatment is the preferred mode.
  • the vaccine compositions and methods described herein treat, including prophylactically and/or therapeutically immunize, a host animal against a microbial infection (e.g., a viral infection).
  • the methods of the present technology are useful for conferring prophylactic and/or therapeutic immunity to a subject.
  • the methods described herein can also be practiced on subjects for biomedical research applications.
  • the term “vaccine composition” used herein is defined as a composition used to provoke or stimulate an immune response against an antigen or fragment thereof or a nucleic acid encoding such antigen or fragment thereof within the composition in order to protect or treat an organism against disease.
  • the vaccine composition is a suspension of attenuated or killed microorganisms (e.g., bacteria, viruses, or fungi) or of antigenic proteins or nucleic acids derived from them, administered for prevention, amelioration, or treatment of infectious diseases.
  • the term “antigen” refers to a molecule that is derived from a pathogenic microorganism (e.g., a virus). Typically, antigens are bound by the host subject’s antibody ligands and are capable of raising or causing an antibody immune response in vivo by the host subject.
  • An antigen can be a polypeptide, fusion polypeptide, protein, nucleic acid, or other molecule.
  • antigenic determinant refers to an epitope on the antigen recognized by an antigen-binding molecule (e.g., an antibody, antibody reagent, or a polypeptide fragment thereof), and more particularly, by the antigen-binding site of said molecule.
  • antigen-binding molecule e.g., an antibody, antibody reagent, or a polypeptide fragment thereof
  • antigen-binding site of said molecule e.g., an antibody, antibody reagent, or a polypeptide fragment thereof.
  • antigen-binding molecule e.g., an antibody, antibody reagent, or a polypeptide fragment thereof
  • fragment refers to one or more portions of a polypeptide that retains the ability to provoke an immune response. The fragment can be a nucleic acid encoding a portion of the antigen or a polypeptide.
  • a fragment can be a polypeptide or a nucleic acid encoding a polypeptide comprising 5 amino acids or more, 10 amino acids or more, 15 amino acids or more, 20 amino acids or more, 25 amino acids or more, 30 amino acids or more, or 35 amino acids or more.
  • protein and “polypeptide” and “encoded polypeptide” are used interchangeably herein to designate a series of amino acid residues, connected to each other by peptide bonds between the alpha-amino and carboxy groups of adjacent residues.
  • protein refers to a polymer of amino acids, including modified amino acids (e.g., phosphorylated, glycated, glycosylated, etc.) and amino acid analogs, regardless of its size or function.
  • modified amino acids e.g., phosphorylated, glycated, glycosylated, etc.
  • amino acid analogs regardless of its size or function.
  • Protein and polypeptide are often used in reference to relatively large polypeptides, whereas the term “peptide” is often used in reference to small polypeptides, but usage of these terms in the art overlaps.
  • protein and “polypeptide” are used interchangeably herein when referring to a gene product and fragments thereof.
  • exemplary polypeptides or proteins include gene products, naturally occurring proteins, homologs, orthologs, paralogs, fragments and other equivalents, variants, fragments, and analogs of the foregoing.
  • fusion polypeptide refers to an engineered polypeptide that comprises or consists of the domains provided herein.
  • a domain comprises a viral polypeptide or fragment thereof.
  • viral polypeptide refers to a polypeptide expressed by a virus.
  • a viral polypeptide is a surface protein expressed by a virus.
  • nucleic acid or “nucleic acid sequence” refers to any molecule, preferably a polymeric molecule, incorporating units of ribonucleic acid, deoxyribonucleic acid or an analog thereof.
  • the nucleic acid can be either single-stranded or double-stranded.
  • a single-stranded nucleic acid can be one nucleic acid strand of a denatured double- stranded DNA. Alternatively, it can be a single-stranded nucleic acid not derived from any double-stranded DNA.
  • the nucleic acid can be DNA.
  • nucleic acid can be RNA.
  • Suitable DNA can include, e.g., genomic DNA or cDNA.
  • Suitable RNA can include, e.g., mRNA.
  • the term “vector”, as used herein, refers to a nucleic acid construct designed for delivery to a host cell or for transfer between different host cells.
  • a vector can be viral or non-viral.
  • the term “vector” encompasses any genetic element that is capable of replication when associated with the proper control elements and that can transfer gene sequences to cells.
  • a vector can include, but is not limited to, a cloning vector, an expression vector, a plasmid, phage, transposon, cosmid, chromosome, virus, virion, etc.
  • a polypeptide or nucleic acid as described herein can be engineered.
  • engineered refers to the aspect of having been manipulated by the hand of man.
  • a polypeptide is considered to be “engineered” when at least one aspect of the polypeptide, e.g., its sequence, has been manipulated by the hand of man to differ from the aspect as it exists in nature.
  • progeny of an engineered cell are also typically still referred to as “engineered” even though the actual manipulation was performed on a prior entity.
  • the term “derived from” refers to the aspect of a molecule, substance.
  • polypeptide, nucleic acid, sugar, lipid, etc. as being from a parent substance (e.g., a cell or membrane) or organism (e.g., a microorganism).
  • a parent substance e.g., a cell or membrane
  • organism e.g., a microorganism
  • the term “derived from” encompasses an antigen or fragment thereof that is expressed by, purified, or isolated from a microorganism as described herein.
  • the antigen, viroporin 3a is derived from SARS-CoV2, the virus responsible for the worldwide COVID19 pandemic.
  • the term “pharmaceutical composition” refers to the fusion polypeptide, composition, antigen or fragment thereof as described herein in combination with a pharmaceutically acceptable carrier e.g. a carrier commonly used in the pharmaceutical industry.
  • a pharmaceutically acceptable carrier e.g. a carrier commonly used in the pharmaceutical industry.
  • pharmaceutically acceptable is employed herein to refer to those polypeptides, antigens, nucleic acids encoding said polypeptides or antigens, compounds, materials, compositions, and/or dosage forms which are, within the scope of sound medical judgment, suitable for use in contact with the tissues of human beings and animals without excessive toxicity, irritation, allergic response, or other problem or complication, commensurate with a reasonable benefit/risk ratio.
  • a pharmaceutically acceptable carrier can be a carrier other than water.
  • a pharmaceutically acceptable carrier can be a cream, emulsion, gel, liposome, nanoparticle, and/or ointment.
  • a pharmaceutically acceptable carrier can be an artificial or engineered carrier, e.g., a carrier that the active ingredient would not be found to occur in in nature.
  • the term “administering,” refers to the placement of a vaccine composition, fusion polypeptide, antigen, or antigen fragment thereof as described herein into a subject by a method or route which results in at least partial delivery of the vaccine composition at a desired site.
  • compositions comprising the antigens or fragments of antigens described herein can be administered by any appropriate route which results in an effective treatment in the subject.
  • parenteral administration and “administered parenterally” as used herein, refer to modes of administration other than enteral and topical administration, usually by injection.
  • systemic administration refers to the administration of a therapeutic composition other than directly into a target site, tissue, or organ, such a site of infection, such that it enters the subject’s circulatory system and, thus, is subject to metabolism and other like processes.
  • the fusion polypeptide or fragment thereof is administered locally, e.g., by direct injections, when the disorder or location of the infection permits, and the injections can be repeated periodically.
  • the term “multiple” refers to a number of at least two, more than two, or greater than two. In the context of strains of viruses, multiple refers to two or more strains known in the art for that given virus (e.g., coronavirus).
  • the terms “treat,” “treatment,” “treating,” or “amelioration” refer to therapeutic treatments, wherein the object is to reverse, alleviate, ameliorate, inhibit, slow down or stop the progression or severity of a condition associated with, a disease or disorder.
  • the term “treating” includes reducing or alleviating at least one adverse effect or symptom of a condition, disease or disorder associated with an infection.
  • Treatment is generally “effective” if one or more symptoms or clinical markers are reduced.
  • treatment is “effective” if the progression of a disease is reduced or halted. That is, “treatment” includes not just the improvement of symptoms or markers, but also a cessation or at least slowing of progress or worsening of symptoms that would be expected in absence of treatment.
  • Beneficial or desired clinical results include, but are not limited to, alleviation of one or more symptom(s), diminishment of extent of disease, stabilized (i.e., not worsening) state of disease, delay or slowing of disease progression, amelioration or palliation of the disease state, and remission (whether partial or total), whether detectable or undetectable.
  • treatment also includes providing relief from the symptoms or side-effects of the disease (including palliative treatment).
  • preventing or “prevention” refers to any methodology where the disease state does not occur due to the actions of the methodology (such as, for example, administration of an antigen or fragment thereof or vaccine composition as described herein).
  • prevention can also mean that the disease is not established to the extent that occurs in untreated controls. Accordingly, prevention of a disease encompasses a reduction in the likelihood that a subject can develop the disease, relative to an untreated subject (e.g. a subject who is not treated with the methods or compositions described herein).
  • the terms “decreased”, “reduced”, “reduction”, or “inhibit” are all used herein to mean a decrease by a statistically significant amount. In some embodiments of any of the aspects, “reduce,” “reduction” or “decreased” or “inhibit” typically means a decrease by at least 10% as compared to a reference level (e.g.
  • “reduction” or “inhibition” does not encompass a complete inhibition or reduction as compared to a reference level.
  • “Complete inhibition” is a 100% inhibition as compared to a reference level.
  • a decrease can be preferably down to a level accepted as within the range of normal for an individual without a given disorder.
  • the terms “increased”, “increase”, “enhance”, or “activate” are all used herein to mean an increase by a statistically significant amount.
  • the terms “increased”, “increase”, “enhance”, or “activate” can mean an increase of at least 10% as compared to a reference level, for example an increase of at least about 20%, or at least about 30%, or at least about 40%, or at least about 50%, or at least about 60%, or at least about 70%, or at least about 80%, or at least about 90% or up to and including a 100% increase or any increase between 10-100% as compared to a reference level, or at least about a 2-fold, or at least about a 3-fold, or at least about a 4-fold, or at least about a 5-fold or at least about a 10-fold increase, or any increase between 2-fold and 10-fold or greater as compared to a reference level.
  • a “increase” is a statistically significant increase in such level.
  • the term “modulates” or “modulation” refers to an effect including increasing or decreasing a given parameter as those terms are defined herein.
  • a “reference level” refers to a level of a given parameter measured or detected in a normal, otherwise unaffected, or untreated population of microorganisms (e.g., viruses or portions thereof cultured in vitro, virus or portions thereof obtained from a healthy subject, or virus or portions thereof obtained from a subject at a prior time point) or in a subject (e.g., a subject that does not have an infection).
  • a reference level refers to the level of a polypeptide, an antigen or fragment thereof, or a nucleic acid encoding an antigen or fragment thereof expressed by a microorganism (e.g., virus) which is not present in a subject (i.e., a microorganism which is not in vivo), not genetically modified, and is grown in culture in vitro.
  • a microorganism e.g., virus
  • the microorganism can be commercially available that was not cultured directly from a host subject or a strain originally obtained from a host subject but which is cultured in vitro at the time the reference level was determined [00285]
  • the term “statistically significant” or “significantly” refers to statistical significance and generally means a two standard deviation (2SD) or greater difference.
  • all numbers expressing quantities of ingredients or reaction conditions used herein should be understood as modified in all instances by the term “about.”
  • the term “about” when used in connection with percentages can mean ⁇ 1%.
  • the term “comprising” means that other elements can also be present in addition to the defined elements presented.
  • SARS-CoV-2 is one of three recognized coronaviruses (CoVs) that have caused epidemics or pandemics in the 21 st century and that have likely emerged from animal reservoirs based on genomic similarities to bat and civet viruses.
  • the methods and compositions provided herein, relate in part, to the discovery of conserved interactions between amino acid residues in all proteins encoded by SARS-CoV-related viruses.
  • pairs and networks of residue variants that exhibited statistically high frequencies of covariance with each other can be used as a new computational approach (Covariance-based Phylogeny Analysis) for understanding viral evolution and adaptation.
  • Covariance-based Phylogeny Analysis Provided herein is evidence that the evolutionary processes that converted a bat virus into a human pathogen occurred through recombination with other viruses in combination with new adaptive mutations important for entry into human cells. Results [00292]
  • the methods and compositions provided herein are related, in part, to the identification of variations in protein sequence of SARS-CoV2 to understand CoV evolution and the key functional interactions that drive adaption to new hosts or that influence transmission and pathogenicity.
  • the inventors selected the conserved CoV proteins called 1a/1b, Spike(S), 3a, E, M, and N from a set of 847 viral genomes provided herein (TABLE 2, above). The alignment resulted in a 9639 amino acid consensus sequence with only 2% of sites being gaps with low coverage and 2% with low sequence conservation. Because there are regions of diverged nucleotide identity in these viral genomes, the goal was to use amino acid identity to initially estimate phylogeny and then integrate that analysis with the identification of covariant residues within these six core viral proteins.
  • SARS-CoV and SARS-CoV-2 are represented by a large collection of independent isolates from previous and current epidemics as well as from variants selected during passage in various laboratories (TABLE 2, above) (Elbe and Buckland-Merrett, 2017). Similar to nucleic acid sequence-based analyses, our constructed phylogeny (FIG. 3) shows that SARS-CoV is closely related to CoVs found in civets and groups of bat CoVs that were previously suggested as likely ancestors (Li, 2005; Song et al., 2005).
  • SARS-CoV-2 is a close relative to bat CoV RATG13 and also more similar to two bat CoVs, SZXC21 and SZC45, than other bat CoVs (Paraskevis et al., 2020).
  • Five recently isolated pangolin CoVs identified as closely related to SARS-CoV-2 are also related to these three bat CoVs (Zhang et al., 2020) and our protein based analysis supports this conclusion.
  • amino acid sequence-based analysis supports the conclusion that both bat and pangolin CoVs share or represent probable common ancestors of SARS-CoV-2 (Xia, 2020).
  • Clusters To survey the frequency of covariance among CoVs, we extracted pairs and groups (here designated ‘Clusters’) of covarying amino acid residues and organized these using a correlating tandem model that was set at different purity thresholds between 0.8 and 1.0. A stringent purity threshold (0.96) was selected to reduce noise based on our sampling size. Then force- directed mapping algorithms were applied to visualize the relationships between these clusters of covarying residues and the CoV isolates from which they were derived (Jacomy et al., 2014).
  • This graphing technique simulates repulsion between nodes (residues, clusters, and genomes) as well as attraction between edges (links between nodes) and then plots them to minimize both the complexity (fewer crossed edges) and to reduce edge lengths in two dimensions.
  • Nodes are force- directed according to hierarchy and this orients these based on sequential linkage (edges) to clusters and then residues.
  • force-directed mapping of the linkage of covariance clusters readily organized all CoV isolates into groups that were consistent with phylogenic analysis (FIG.1 and FIG.3).
  • related genomes are forced into groupings in the middle of the graph surrounded by related clusters that are then flanked by networks of covarying residues.
  • CoVariance based Phylogeny Analysis CoVPA
  • Some covariant clusters are ubiquitous to the entire grouping while others exclusive to only one strain. We binned these clusters to compare covarying residues that are restricted to isolates from a given host species or a given group of CoVs.
  • this annotated comprehensive dataset identifies all distinct residue identities that strongly covary in each CoV protein and the distribution of these in CoVs that include the human pathogens SARS-CoV and SARS-CoV-2.
  • this provides a novel dataset that can be accessed through an interactive website available on the world-wide web at https ⁇ sarscov2-9d60e.web.app/>, which allows one to explore the relatedness of CoV isolates based on conserved amino acid interactions that contribute to structure, protein-protein interactions, or biological function that multiple proteins drive interactively.
  • Covariant residues are highly represented in gene 1a encoding non-structural proteins (nsp1-nsp4), as well as the Spike protein and viroporin 3a (TABLE 3 and FIG 2A, 2C, 4A, 13B). It was contemplated herein that these enriched covariant residues include those that facilitated transmissibility into humans. Covariant residues are least represented in the RNA replication proteins, viroporin E, and membrane protein M. There were not clusters found to be exclusive to all SARS-CoV-2 strains apart from those emerging in clinical isolates (Clusters 5 & 23) and instead find common overlapping clusters of covariant residues.
  • SARS- CoV-2 covariant residues are found to overlap with RATG13 (Cluster 183), or with RATG13 and pangolin CoVs (Cluster 194), or with the three bat viruses RATG13/COVCZ45/COVXC21 (Cluster 190), or with these three bat CoVs with pangolin CoVs (Cluster 197).
  • RATG13 Cluster 183
  • RATG13 and pangolin CoVs Cluster 194
  • the three bat viruses RATG13/COVCZ45/COVXC21 Cluster 190
  • these three bat CoVs with pangolin CoVs Cluster 197.
  • these CoVs share significant nucleotide similarity with another, this is not surprising; however, our covariance analysis also reveals key diverging features for the SARS-CoV-2 group that likely have biological significance.
  • S protein SARS-CoV-2 Spike
  • RBD receptor-binding domain
  • NTD amino-terminal domain
  • CTD C-terminal domain
  • bat and pangolin-restricted covariants are represented in distinct regions of the genome; for example, bat restricted residues are primarily located in non-structural proteins encoded in 1b (2’O-Mtase, NendoU, ExoN) and enriched in 3a and N but not S or M.
  • SARS-CoV-2 covariant residues belong to clusters linked to bat CoVs, but it was also found residues that are linked to human CoVs which are absent in other restricted categories. These covariant residues are most highly correlated with the pandemic emergence of SARS-CoV-2 (see below).
  • a large section in gene 1a ( ⁇ 3000 AA, nsP1-nsP4) is unique to SARS-CoV-2 and distinct from the rest of the genome based on our covariance analysis. The context of this difference would not be readily apparent using standard nucleic acid-based single nucleotide polymorphism analysis. The identity and network of covariant residues in this region are unique and do not match those found in bat viruses including those most similar such as RATG13 suggesting that SARS- CoV-2 experienced different selective pressures and evolutionary history (FIG. 2A and 2B).
  • the Enriched conserveed A group contains residues that align in at least more than one independent category (FIG. 2A and 2C); these enriched covariant groups include residues that are part of large networks of covarying residues (for example, in the nsp1-nsp4 region).
  • This particular group was found to be also densely enriched in the NTD and RBD of Spike, in much of 3a, and in both the CTD and NTD domains in N.
  • the networks of covariant residues in the S protein may provide plasticity in protein domains important for receptor recognition and possibly escape from antibody responses given that the same regions are rich in dominant neutralizing epitopes (He et al., 2005; Qiu et al., 2005).
  • Covariant residue pairs most highly represented in clinical isolates were identified and were discovered to form small networks (FIG. 5A). These networks include some of the most abundant polymorphisms previously identified such D614G in the Spike protein and L336P in RdRp protein (Pachetti et al., 2020). SARS-CoV-2 Spike residues 604, 606, 607,619, and 622 within the region of this D614 residue are predicted to be covariant in our 847 strains with a number of residues in viroporin 3a ectodomain as well as other residues in the S protein located between its RBD and furin cleavage site (FIG. 5B).
  • the 3a and E viroporins of SARS-CoV are thought to be a K + and Ca ++ efflux channels and it is interesting that furin is activated by these cations (Izidoro et al., 2010; Molloy et al., 1992). Furin cleavage of the S protein is strongly implicated in its activation for membrane fusion events (Coutard et al., 2020). In this regard, it was determined that Spike D614 maps near the furin cleavage site in the Spike trimer structure suggesting that 3a may be interacting with S during the process that leads to S cleavage and activation of membrane fusion.
  • the 3a protein of SARS-CoV is predicted to assemble as both a homodimer and a tetramer that form a membrane channel (viroporin) in the cell and possibly the viral particle membrane.
  • the subdomains vary in conservation and it was noted that covariance is in distinct separate domains of 3a.
  • There are covariant residues within a more variable NTD region (the first 30 residues of 3a) which is predicted to be surface-exposed as well as in other predicted extracellular and cytoplasmic loops (FIG. 6A).
  • Antibodies to 3a and its amino-terminal ectodomain from convalescent-phase SARS-CoV patients are commonly found (Qiu et al., 2005; Zhong, 2006) and the covariant residues that map here overlaps with a hypervariable region predicted to be more antigenic (FIG.6A-6D).
  • a cluster of covariable residues in the CTD between 168 and 189 were also identified; there is no role for this subdomain though it partially overlaps between residues 160 to 173 that contain putative intracellular protein sorting and trafficking motifs (Huang et al., 2006; Tan et al., 2004).
  • the 3a viroporin of SARS-CoV protein is known to cause activation of the NLRP3 inflammasome, NF-kB activation, and chemokine induction as well as efficient viral release from infected cells (Casta ⁇ o-Rodriguez et al., 2018; Chen et al., 2019; Freundt et al., 2010; Lu et al., 2006; Padhan et al., 2007).
  • identifying 3a as a hotspot for covariance and mutational selection may have relevance to the severe inflammation seen in COVID-19 disease and suggests that the 3a protein is a potential new target to consider for SARS-CoV-2 immunoprophylaxis.
  • covariance networks with numerous residues in SARS-CoV-2.
  • clusters that can reveal host-adaptation such as those that differentiate Civet CoVs and SARS-CoV.
  • Cluster 62 identifies 5 residues that emerged in SARS-CoV during its further passage and adaptation in the mouse model.
  • the relative position of two other covariant residues L54 and M153 in the NTD is difficult to determine in that they reside within disordered and exposed regions in the cryo- EM S trimer reconstructions of SARS-CoV and SARS-CoV-2. There are no known interactions or roles for these protruding NTD regions of the S trimers. Because the NTD of the S protein is especially enriched in covariant residues, host- or immune-driven selective pressure has likely occurred within this subdomain. In combination, covariant residues in S protein appear to have emerged independently and are enriched in viruses that caused two independent pandemics suggesting they are particularly important to human infection.
  • RNA and DNA viruses Previously covariant network analysis applied to other RNA and DNA viruses demonstrates observed covariance in viruses is often not an artifact of chance and likely influenced by imposed pressure, structure, and selection, including those provided by therapeutics (Aurora et al., 2009; Donlin et al., 2012; Sruthi and Prakash, 2019). Coevolving residues for large orthologous groups of proteins have also been useful in predicting structure (de Juan et al., 2013) and protein-protein interactions (Kamisetty et al., 2013) that are targeted for drug interference. In the present study, networks of co-varying amino acid residues were identified in proteins encoded by nearly 13,600 CoV isolates (FIG.8).
  • Genome and protein sequences were sourced from available NCBI and GISAID public databases. Protein sequences for genes in 1a,1b, Spike, 3a, E, M, and N were concatenated and aligned using CLC Bio Workbench (v8) for the 847 genomes using a gap open cost of 2.0 and a gap extension cost of 1.0 in very accurate mode and with MAFFT for the clinical SARS-CoV2 strain using the default FFT-NS-2 setting (Nakamura et al., 2018). Only the 13,611 clinical strains with significant contiguous coverage over the reference genes (>95%) were kept in the alignment.
  • Severe Acute Respiratory Syndrome Coronavirus Viroporins E, 3a, and 8a in Replication and Pathogenesis. mBio 9, e02325-17. https://doi.org/10.1128/mBio.02325-17 4) Chen, I.-Y., Moriyama, M., Chang, M.-F., Ichinohe, T., 2019. Severe Acute Respiratory Syndrome Coronavirus Viroporin 3a Activates the NLRP3 Inflammasome. Front. Microbiol.10.
  • the spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same clade.
  • Genome-Wide Networks of Amino Acid Covariances are Common among Viruses. J. Virol. 86, 3050– 3063. https://doi.org/10.1128/JVI.06857-11 9) Elbe, S., Buckland-Merrett, G., 2017. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob. Chall. Hoboken NJ 1, 33–46.
  • Lu Pubs://doi.org/10.1126/science.1118391 24) Lu, G., Wang, Q., Gao, G.F., 2015. Bat-to-human: spike features determining ‘host jump’ of coronaviruses SARS-CoV, MERS-CoV, and beyond. Trends Microbiol. 23, 468–478. https://doi.org/10.1016/j.tim.2015.06.003 25) Lu, W., Zheng, B.-J., Xu, K., Schwarz, W., Du, L., Wong, C.K.L., Chen, J., Duan, S., Deubel, V., Sun, B., 2006.
  • Severe acute respiratory syndrome-associated coronavirus 3a protein forms an ion channel and modulates virus release.
  • Human furin is a calcium-dependent serine endoprotease that recognizes the sequence Arg-X-X- Arg and efficiently cleaves anthrax toxin protective antigen. J. Biol. Chem. 267, 16396– 16402.
  • Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant. J. Transl. Med.18, 179. https://doi.org/10.1186/s12967-020-02344-6 29) Padhan, K., Tanwar, C., Hussain, A., Hui, P.Y., Lee, M.Y., Cheung, C.Y., Peiris, J.S.M., Jameel, S., 2007. Severe acute respiratory syndrome coronavirus Orf3a protein interacts with caveolin. J. Gen. Virol.88, 3067–3077.
  • Severe Acute Respiratory Syndrome (SARS)-coronavirus 3a protein may function as a modulator of the trafficking properties of the spike protein. Virol. J. 2, 5–5.
  • Tan, Y.-J. Teng, E., Shen, S., Tan, T.H.P., Goh, P.-Y., Fielding, B.C., Ooi, E.-E., Tan, H.- C., Lim, S.G., Hong, W., 2004.
  • EXAMPLE 2 Vaccine design using the 3a N-terminal domain and S N-terminal domain of SARS-CoV-2 [00311]
  • the covariant residues identified above provide strong evidence of an evolutionary relationship between 3a and S proteins in coronaviruses. Furthermore, an enrichment of covariant residues in the first 44 amino acids in 3a N-terminal domain was discovered including those that correlated with changes in S. Therefore, it is contemplated that immune pressure and host adaptation drive these key changes in both S and 3a proteins in coronaviruses that in turn drive selection of compensatory mutations linked in larger covariant networks. Mutations at such residues are driven by an immune response that in many cases are accompanied by others.
  • Both these domains can be modified with covariant residues, for example the 3a NTD antigen will include mutations listed in table S1 based on covariant analysis.
  • the S- NTD-FCD includes amino acids found to be highly covariant in the first 300 AA (NTD) and a variable residue at S position 614 (G/D) which is found to be a common polymorphism that emerged during the SARS-CoV-2 pandemic and that covaries with residues in S protein NTD and also with residues in 3a.
  • N-terminal domain constructs with residue substitutions are outlined below based on covariance.
  • 3a_Seq1 WT 3a sequence MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 20), 3a_Seq2 T32F Covariant selected and Removes Glycosylation-site MDLFMRIFTIGTVTLKQGEIKDATPSDFVRAFATIPIQASLPFG (SEQ ID NO: 21), 3a_Seq3 V13I Covariant selected MDLFMRIFTIGTITLKQGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 22), 3a_Seq4 T14I Covariant selected MDLFMRIFTIGTVILKQGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 23), 3a_Seq5 L15F Covariant selected MDLFMRIFTIGTVTFKQGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 24), 3a_Seq6 Q17H Covariant selected MDLFMRI
  • the 3a NTD was fused to the S-NTD-FCD at both the amino and carboxyl terminus (FIG.10 and FIG.11) to provide an antigen that may generate antibodies that targets two domains in two different proteins found to co-vary. This provides epitopes in three domains (3a NTD, S NTD, and S furin cleavage domain) that possess covariant residues found to link between these three domains. It is contemplated herein that antibodies that target these regions would be difficult for the virus to escape with single mutations based on the relationships.
  • the S protein is an additional target in coronavirus vaccine strategies, but the 3a protein’s role in the virion is unclear.
  • the 3a NTD construct removes the channel forming domains and regions in the CTD that may be needed to stimulate the inflammasome and the innate immune response.
  • the vaccine composition is designed to target this protein to reduce inflammation while also targeting predicted partner residues in S as an attempt to neutralize the virus (FIG.12).
  • Exemplary sequences of fusion proteins are provided below. [00317] The sequence of S protein fusion. SARS-CoV-2 S residue positions indicated (parenthesis). Linkage and 3a fusions indicated.
  • Amino Acid 614 residue in S protein varies in clinical strains and has covariance relationships with S residues and 3a based on the above Example.
  • A.3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is D)
  • B.3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is D)
  • C.3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is G)
  • the inventors analyzed approximately 75,000 genome sequences from the GISAID database that were identified between February 2020 and 2021 using clinical SARS-CoV-2 samples. They mapped covariant pairs between 3a and Spike. This is referred to as the “clinical SARS-CoV-2” data herein and summarized in FIGS.15A-15C.
  • the inventors compared the pan- sarbecovirus and clinical SARS-CoV-2 data to identify covariant common residues that overlap. Using a +1 and -1 AA sliding window in the amino acid alignment to accommodate slight deviation between the pan-sarbecovirus Spike-3a alignment and clinical SARS-CoV-2, common covariant residues linkages were identified (FIG.15 D). [00321] Inventors found some overlapping covariant regions between the pan-sarbecovirus and clinical SARS-CoV-2 aligned perfectly with deletions and mutations identified in dominant well- studied SARS-CoV-2 variants including the UK, Brazilian, and South African variants that emerged as dominant in late 2020 and early 2021 during the pandemic.
  • the covariance and variation in 3a NTD indicate that this is under pressure to mutate by either covariance with Spike residues and/or by immune selection in the host (FIGS 17A-17D).
  • the enriched covariance internal to and in between Spike NTD and CTD and the 3a NTD apparent in the evolutionary record (pan-sarbecovirus) and the additional independent confirmed covariance in SARS-CoV-2 clinical strains during approximately one year of the pandemic were considered for selecting key antigens in both proteins for vaccine design.
  • Spike NTD, CTD, and 3a NTD domains either together, separate, or in full protein form (i.e., Full-length Spike and full-length 3a) can be used to generate antigens that are wildtype and also possess mutations identified in the covariant residue analyses (via both pan- sarbecovirus and clinical) described herein.
  • the general concept leverages covariance identified or described herein to predict interactions within and between Spike and 3a.
  • the pan- sarbecovirus covariance can be predictive for some of key mutations identified in emerging variants isolated during the current SARS-CoV-2 pandemic.
  • the antigen can comprise one or more mutations that are either identical or closely resemble the deletions at amino acid residues 69-70, 142-147, and 242-244. These mutations overlap with covariance and also those found in SARS- CoV2 (FIGS. 18A-18C).
  • the antigen can be generated into protein, mRNA, or other methods that result in the presentation of a protein antigen in the host and produce a targeted immune response against these key residues and subdomains.
  • Exemplary antigens can be expressed as monomers, fusions, or anchored to a trimerization domain (FIG.18D).

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Organic Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Public Health (AREA)
  • Biophysics (AREA)
  • Epidemiology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Mycology (AREA)
  • Microbiology (AREA)
  • Veterinary Medicine (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Biochemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Immunology (AREA)
  • Peptides Or Proteins (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Medicines Containing Antibodies Or Antigens For Use As Internal Diagnostic Agents (AREA)

Abstract

The methods and compositions provided herein, relate in part, to the discovery of conserved interactions between amino acid residues in all proteins encoded by SARS-CoV-related viruses. Accordingly, pairs and networks of residue variants that exhibited statistically high frequencies of covariance with each other can be used as a new computational approach (Covariance-based Phylogeny Analysis) for understanding viral evolution and adaptation. Provided herein is evidence that the evolutionary processes that converted a bat virus into a human pathogen occurred through recombination with other viruses in combination with new adaptive mutations important for entry into human cells.

Description

PROTEIN COVARIANCE NETWORKS REVEAL INTERACTIONS IMPORTANT TO THE EMERGENCE OF SARS CORONAVIRUSES AS HUMAN PATHOGENS CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application claims benefit under 35 U.S.C. § 119(e) of U.S. Provisional Application No. 63/031,306 filed May 28, 2020, No. 63/032,925 filed June 1, 2020 and No. 63/129,029 filed December 22, 2020, the content of each of which is incorporated herein by reference in its entirety. SEQUENCE LISTING [0002] The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on May 26, 2021, is named 002806-097770WOPT_SL.txt and is 171,241 bytes in size. GOVERNMENT SUPPORT [0003] This invention was made with government support under AI018045 awarded by the National Institutes of Health. The government has certain rights in the invention. TECHNICAL FIELD [0004] The technology described herein relates to methods and compositions for identifying antigens, preventing and treating a microbial infection and uses thereof. BACKGROUND [0005] The emergence of SARS-CoV and MERS-CoV as human pathogens is attributed to zoonotic infections in bats that transferred to civets and camels, respectively, while SARS-CoV-2 is similar to viruses isolated from both bats and pangolins implicating these species in the emergence of this historic virus. The ~30kb genome size of all SARS-related CoVs render sequence alignment and pairwise distance methods effective for phylogenic studies and predicting the likely source of such coronaviruses (CoVs). While nucleic acid sequence-based phylogenies are informative, they still have limitations. Thus, methods of identifying variation in protein sequences of microbe evolution and the key functional interactions that drive adaptation of microbes into a new host are needed for identifying novel antigens for vaccine compositions and treatments for infections. SUMMARY [0006] The fusion polypeptides, polynucleotides, vaccine compositions, and methods provided herein are based, in part, on the discovery of covariant residues within these six core viral proteins that are responsible for the evolution and infection of human hosts by the pathogenic SARS-CoV2 virus. The method of selecting antigens based on the covariance analysis provided herein can identify and generate fusion polypeptides as preventive therapeutics for pathogenic infections. [0007] In one aspect, provided herein is a fusion polypeptide comprising: a first domain comprising a first viral polypeptide or a fragment thereof expressed by a first virus, wherein the first viral polypeptide or fragment thereof comprises at least two or more covarying amino acid positions. [0008] In another aspect, provided herein is a fusion polypeptide comprising: (a) a first domain comprising a first viral polypeptide or a fragment thereof expressed by a first virus; and (b) a second domain comprising a second viral polypeptide or a fragment thereof expressed by the first virus, and wherein the first viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the second viral polypeptide or a fragment thereof. [0009] In another aspect, provided herein is a polynucleotide encoding an amino acid sequence of a fusion polypeptide provided herein. [0010] In another aspect, provided herein is a composition comprising a fusion polypeptide or a polynucleotide provided herein. In some embodiments, the composition is a vaccine composition. [0011] The fusion polypeptide described herein are immunogenic can induce an antigen specific immune response. Accordingly, in another aspect, provided herein is a method of inducing an immune response in a subject using the fusion polypeptide described herein or a polynucleotide encoding the same. Generally, the method comprises administering to the subject a fusion polypeptide provided herein or a polynucleotide encoding the same in an amount effective to produce an antigen specific immune response. [0012] Also provided herein is cell comprising a fusion polypeptide or a polynucleotide provided herein. BRIEF DESCRIPTION OF THE DRAWINGS [0013] FIG.1 shows force mapping graph showing the relationship between 847 genomes and 197 clusters. Pairwise and multiple residue covariance were identified, scored for significance, and binning into clusters and respective strains for force mapping as described in Methods. A library of these results can also be viewed an interactive website available on the world-wide web at sarscov2-9d60e.web.app. [0014] FIGS. 2A-2C show predicted covariant residues in different CoVs by protein. (FIG. 2A) Mapped covariant residues in 1a/1b polyproteins, S, 3a, E,M , and N proteins from the bat CoVs found in SARS-CoV-2. Conserved residues found in other groups are mapped as a reference. (FIG. 2B) Graph showing a progression of Clusters and residue content more abundant in Bat CoVs to those more restricted in SARS-CoV2 and closely related viruses. (FIG. 2C) Residues found to covary that are restricted to categories. Those with overlap and are conserved are shown as “Conserved A”. [0015] FIG.3 shows phylogeny of the 850 CoVs. Maximum Likelihood Tree using the aligned amino acids from 1a/1b, S, 3a, M, E, and N proteins from bat, pangolin, civet, and human CoVs. [0016] FIGS.4A-4D show covariants in S in respect to the position of recognized subdomains and motifs. (FIG. 4A) Recognized subdomains and motifs in the SARS-CoV-2 Spike (S) protein. (FIG. 4B) Covarying residues predicted to be interacting in the S protein. Dominant B and T epitopes and residues predicted to interact within the Spike structure are indicated (FIG. 4C) Sequence motif of the SARS-related covariant conserved (SRCC) identified in this work. FIG.4C discloses SEQ ID NOS 69, 87, 88, 70, 70, 71, 89, and 72, respectively, in order of appearance. (FIG. 4D) Shows the alignment of covariants with the S receptor-binding domain. Aligned SAS- CoV and SARS-CoV2 sequences in the Spike RBD showing sequence conservation (blue), covariant residues (red arrows), contacts with ACE-2, and contacts between Ab R80 and SARS- CoV. FIG.4D discloses SEQ ID NOS 73 and 74, respectively, in order of appearance. [0017] FIGS. 5A-5B show plotting of predicted covariant and interacting residues in clinical strains and highlighted residues in S. (FIG.5A) 796 Covarying residue pairs detected in the clinical isolates. Abundance is indicated by hue. High abundance is that both residues vary in at least 4% in the population, mid-abundance is that least on residue is varied at least 4%, and low-abundance is that neither less vary than 4%. Overlap pairs are those covariant residues detected in both the 850 CoVs and the clinical strains. (FIG.5B) Covarying and interacting residues detected in clinical isolates and within 3a and S protein. Clinical residues are colored blue and those predicted to interact in the S protein from the PDB (6VXX) are orange. Linkages between 614 and 3a residues in 850 CoVs are black. Residue linkages with S 614 and S and 3a proteins in the nonclinical SAR- CoV2 clusters are indicated in purple. Key domains in 3a protein with enriched covariance in our data are circled in blue. Residues identified as emerging in both SARS-CoV and SARS-CoV2 are circled in black. [0018] FIGS. 6A-6D show covariant residues in 3a domains and observed variance in the 3a ectodomain. (FIG. 6A) Diagram showing the positions of extracellular, cytoplasmic, and transmembrane domains (TMD). Red arrows indicate covariant residues from 850 CoVs and blue and purple arrows are those from clinical strains as shows in Supplemental S3. (FIG. 6B) Comparison of SARS-CoV-2 and SARS-CoV residues in the 3a N-terminal ectodomain and arrows from (A) showing covariance. Distinct residues are colored blue. FIG. 6B discloses SEQ ID NOS 75 and 76, respectively, in order of appearance. (FIG.6C) Consensus sequence of 3a (SEQ ID NO: 77) derived from the alignment of CoVs in this work and all different substitutions observed each position. Variable residues are colored red. (FIG. 6D) BepiPred-2.0 epitope prediction using residues in the 3a ectodomain. FIG.6D discloses SEQ ID NOS 78 and 79, respectively, in order of appearance. [0019] FIGS. 7A-7C show covariants that emerged during SARS-CoV and SARS-CoV2. (FIG. 7A) Gephi Force map inset showing Cluster 126 and highlighted genomes that possess the Cluster 126 residues. The ML phylogenic tree showing the relative relationship of circled strains is included as a reference. (FIG. 7B) Alignment of strains to both SARS-CoV and SARS-CoV2 showing the order and diversity of these 8 residues in S protein. FIG. 7B discloses SEQ ID NOS 80, 81, 90, 82, 83, 84, 85, 86, and 91, respectively, in order of appearance. (FIG.7C) Structures of both SARS-CoV2 (PDB 6VXX) and SARS-CoV2 (6ACC) showing highlighted residues belonging to both. Q23 is missing from the SARS-CoV2 and the closest residue A27 was labeled to provide an approximate location. [0020] FIG.8 shows a summary of the method of identifying covariance of SARS-CoV2. [0021] FIG. 9 shows a model of 3a N-terminal domain and S N-terminal domain and furin cleavage domain fusion. [0022] FIG. 10 shows a model of two distinct 3a-S fusions incorporating 3a N-terminal domain and S N-terminal domain and furin cleavage domain. Both constructs possess either a G or D amino acid residue at the SARS-CoV-2 614 position in the S protein cleavage domain (not shown). [0023] FIG.11 shows structure model of two distinct 3a-S fusions incorporating 3a N-terminal domain (NTD) and S NTD and furin cleavage domain. [0024] FIG. 12 shows a schematic representation of the strategy to display 3a-S fusion epitopes to generate different classes of antibodies independent antibodies. Two distinct 3a NTD fusion sites are shown. [0025] FIGS. 13A-13C show predicted covariant residues in the 3a and Spike (S) proteins. (FIG.13A) Diagram showing the subdomains in regards to predicted cellular location and location of covariant residues in 3a. (FIG.13B) Shows a predicted map of covariant relationships between residues in 3a and Spike (S). Dominant epitopes in Spike and residues that are predicted to interact within the Spike trimer structure are indicated for reference. (FIG. 13C) Shows an alignment of the identities and patterns of covariant residues and amino acid charge and polarity information in 3a and Spike using the positions in SARS-CoV-2. The differences between human, pangolin, and bat betacoronaviruses is shown in the alignment and the reference genome name is indicated for each. [0026] FIGS. 14A and 14B show comparison of the initial pan-sarbecovirus covariant pair analysis using ~60 distinct β-Covs with covariant analysis of ~75,000 clinical SARS-CoV-2 genomes is completed for Spike and 3a. (FIG.14A) All 11,394 covariant pair residues are mapped and colored by independent occurrence of distinct amino acid changes. Linkages showing the most highly enriched and varied independent covariant pairs are indicated by lines darker in blue color. (FIG.14) Covariant residues with at least two distinct changes are shown. [0027] FIGS.15A-15D show covariant pairs in Spike and 3a. (FIG.15A) All covariant pairs in Spike and 3a identified using the ~75,000 clinical SARS-CoV-2 genomes. (FIG. 15B) The top 50 most abundant covariant pairs are shown and linkages are colored red for those within the Spike NTD and blue for those within the CTD. (FIG. 15C) All covariant pairs are shown and those interacting with residues within the same domains of Spike (NTD, RBD, and CTD) and those within 3a are colored. (FIG.15D) All clinical SARS-CoV-2 covariant residues are shown and those with overlap and also found in the pan-sarbecovirus (+/- 1 AA residue sliding window for each) are indicated as darker lines. [0028] FIGS. 16A-16C show the overlapping covariant pair linkages between residues in Spike and 3a. (FIG.16A) The overlapping covariant pair linkages between residues in Spike and 3a are shown as darker lines when the pan-sarbecovirus and clinical strains are compared. Mutations identified in dominant SARS-CoV-2 variants are indicated and labeled by arrows. Deletions identified in the Spike NTD of SARS-CoV when the amino acid residue sequence of wildtype Spike SARS-CoV-2 is aligned to SAR-CoV are circled in red (FIGS. 16A and 16B). (FIG. 16C) The alignment of the NTD and RBD of SARS-CoV-2 and SARS-CoV demonstrates these are key deletions apparent in these domains of SPike when residues are compared. [0029] FIGS. 17A-17D show the most abundant covariant residues in Spike and 3a. (FIG. 17A) Most abundant covariant residues (top 50) in Spike and 3a identified in the clinical SARS- CoV-2 analysis are shown and colored by interactions between domains (Spike NTD-NTD and NTD-RBD are red, CTD-CTD are blue, and CTD-NTD or RBD are purple). All covariant residues identified in the pan-sarbecovirus are colored blue (FIG. 17A). (FIG. 17B) All covariant interactions between 3a and Spike identified in the pan-sarbecovirus SARS-CoV-2 analysis are shown in grey. (FIG.17C) Identity of 3a NTD covariant linkages in clinical and pan-sarbecovirus analyses are compared. Red arrows link to Spike amino acid residues and black arrows link to 3a residues. FIG. 17C discloses SEQ ID NOS 75 and 75, respectively, in order of appearance (FIG. 17D) Variable amino acid residues in related SARS-CoVs identified in the 3a NTD found in the pan-sarbecovirus analysis are shown in red and all amino acid variations are shown below the sequence. As a reference, comparison of the amino acid residue sequence of 3a NTD between SARS-CoV-2 and SARS-CoV is also shown. All covariant residues identified in the pan- sarbecovirus are colored blue. FIG. 17D discloses consensus sequence of 3a (SEQ ID NO: 77) derived from the alignment of CoVs in this work and all different substitutions observed each position and SEQ ID NOS 75 and 76, respectively, in order of appearance. [0030] FIGS. 18A is a schematic representation showing covariant residue-enriched regions of 3a and Spike labeled as Part A, Part B, and Part C. [0031] FIG.18B is a schematic representation showing various covariant interactions between and within Part A, Part B, and Part C (FIG. 18A) based on both pan-sarbecovirus and clinical covariance. [0032] FIG. 18C is a schematic representation of some exemplary antigens for producing antibodies that disrupt interactions between Parts A, B, and C in FIG.18B. [0033] FIG. 18D is a schematic representation of exemplary monomeric and trimeric antigen (fusion proteins) that are either purified or instead expressed or encoded in DNA or mRNA. DETAILED DESCRIPTION [0034] The fusion polypeptides, polynucleotides, vaccine compositions, and methods provided herein are based, in part, on the discovery of covariant residues within these six core viral proteins that are responsible for the evolution and infection of human hosts by the pathogenic SARS-CoV2 virus. The method of selecting antigens based on the covariance analysis provided herein can identify and generate fusion polypeptides as preventive therapeutics for pathogenic infections. Pathogenic infections [0035] Viruses are small infectious agents which generally contain a nucleic acid core and a protein coat, but are not independently living organisms. Viruses can also take the form of infectious nucleic acids lacking a protein. A virus cannot replicate in the absence of a living host cell. Viruses enter specific living cells either by endocytosis or direct injection of DNA and multiply, causing disease. The multiplied virus can then be released and infect additional cells. Some viruses are DNA-containing viruses and others are RNA-containing viruses. [0036] Specific examples of viruses that have been found to infect in humans include but are not limited to: Coronaviridae (e.g., coronaviruses); Retroviridae (e.g., human immunodeficiency viruses, such as HIV-1 (also referred to as HTLV-III, LAV or HTLV-III/LAV, or HIV-III; and other isolates, such as HIV-LP); Picornaviridae (e.g., polio viruses, hepatitis A virus; enteroviruses, human Coxsackie viruses, rhinoviruses, echoviruses); Calciviridae (e.g., strains that cause gastroenteritis); Togaviridae (e.g., equine encephalitis viruses, rubella viruses); Flaviviridae (e.g., dengue viruses, encephalitis viruses, yellow fever viruses); Hepacivruses (hepatitis C viruses); Rhabdoviridae (e.g., vesicular stomatitis viruses, rabies viruses); Filoviridae (e.g., ebola viruses); Paramyxoviridae (e.g., parainfluenza viruses, mumps virus, measles virus, respiratory syncytial virus); Orthomyxoviridae (e.g., influenza viruses); Bunyaviridae (e.g., Hantaan viruses, bunya viruses, phleboviruses and Nairo viruses); Arenaviridae (hemorrhagic fever viruses); Reoviridae (e.g., reoviruses, orbiviurses and rotaviruses); Birnaviridae; Hepadnaviridae (Hepatitis B virus); Parvoviridae (parvoviruses); Papovaviridae (papillomaviruses, polyoma viruses); Adenoviridae (most adenoviruses); Herpesviridae (herpes simplex virus (HSV) 1 and 2, varicella zoster virus, cytomegalovirus (CMV)); Poxyiridae (variola viruses, vaccinia viruses, pox viruses); Iridoviridae (e.g., African swine fever virus); and unclassified viruses (e.g., the etiological agents of spongiform encephalopathies, the agent of delta hepatitis (thought to be a defective satellite of hepatitis B virus), the agents of non-A, non-B hepatitis (class 1=internally transmitted; class 2=parenterally transmitted); Norwalk and related viruses, and astroviruses. [0037] It is contemplated herein that the compositions and methods provided herein are not limited to provoking an immune response to a virus but can also be used to provoke an immune response to other microorganisms (e.g., bacteria, fungi, parasites, etc.) as well. Other medically relevant microorganisms have been described extensively in the literature, e.g., see Murray et al. Medical Microbiology, 9th ed., published March 10, 2020, (eBook ISBN: 9780323674515); Tortora et al. Microbiology: An Introduction, Pearson; 13th edition (January 8, 2018); Topley & Wilson’s Microbiology and Microbial Infections, 10th edition, John Wiley & Sons, Ltd. published 15 March 2010, (Online ISBN: 9780470688618); the entire contents of each of which is hereby incorporated by reference. [0038] As used herein, the terms “infection” or “infection of a host” or “infectious disease” or “microbial infection” refers to the growth, proliferation, spread, and/or presence of a microorganism in a subject. In some cases, the infection can elicit an immune response by the host that leads to symptoms associated with a disease. The infection can be transmitted from one subject to another by contact, contact with aerosolized liquid droplets (coughing, sneezing, etc.), contaminated needles, contaminated bodily fluids, or via sexual transmission. The infection can be characterized by at least one symptom of a disease, such as pain, increased mucosal secretions, coughing, headaches, abnormalities of the skin, fever, sore throat, swollen lymph nodes, hair loss, muscle aches, sores, or any other symptom associated with an infection. Exemplary infections or infectious diseases include but are not limited to: SARS, COVID19, coronavirus infections, acquired immune deficiency syndrome (AIDS), hepatitis, candidiasis, human papillomavirus (HPV) infection, herpes, influenza, pneumonia, ear infections, the common cold, chicken pox, cat scratch disease, rabies, adenovirus, bronchiolitis, croup, encephalitis, fifth disease, hand foot and mouth disease, impetigo, botulism, listeria infection, MRSA infection, measles, meningitis, mumps, polio, Rocky Mountain Spotted Fever, shingles, sinusitis, staph infections, tetanus, toxic shock syndrome, urinary tract infections, warts, whooping cough, Zika virus infections, or any other infection caused by a microorganism known in the art. [0039] Provided herein is a method of provoking an immune response to a coronavirus (Coronaviridae). In one aspect, the fusion polypeptides or vaccine compositions provided herein are administered to a subject at risk of becoming infected with a virus (e.g., a coronavirus). [0040] Coronaviruses are RNA viruses that are distinguished from other RNA viruses by an intracellular budding site. Coronaviruses are also characterized by their petal-shaped spikes. The spikes are oligomers of the 180–200 kDa S glycoprotein that binds to receptor glycoproteins and induce fusion of the viral envelope with cell membranes and, sometimes, cell–cell fusion with a host cell (e.g., a human cell). The basic structure and genome organization of coronaviruses are known in the art and described, e.g., by Payne S. Family Coronaviridae. Viruses. 2017;149-158. doi:10.1016/B978-0-12-803109-4.00017-9, which is incorporated herein by reference in its entirety. [0041] Several medications have been used for the treatment of an infection (e.g., a viral infection) or to prevent a severe infection have been developed. Treatments for infections can include (1) vaccines comprising inactivated virus or bacterial cells, (ii) a live attenuated vaccine containing genetically manipulated viruses, (iii) fusion polypeptides, (iv) vaccine compositions comprising nucleic acids that promote an immune response to a pathogen, and (v) antibiotics and antiviral medications administered following infection. [0042] The compositions and methods provided herein can be used to prevent a viral infection in a subject. The subject can be administered the fusion polypeptide or vaccine compositions provided herein to provoke an immune response in the subject. Fusion polypeptide compositions [0043] Provided herein are fusion polypeptide compositions for use in treating and preventing an infection. [0044] In one aspect, provided herein is a polypeptide, e.g., a fusion polypeptide comprising: a first domain comprising a first viral polypeptide or a fragment thereof expressed by a first virus. Generally, the first viral polypeptide or fragment thereof comprises at least two or more covarying amino acid positions. In other words, the first domain comprises at least two amino acid residue positions that covary with each other. [0045] In another aspect, provided herein is a fusion polypeptide comprising: (a) a first domain comprising a first viral polypeptide or a fragment thereof expressed by a first virus; and (b) a second domain comprising a second viral polypeptide or a fragment thereof expressed by the first virus. The first viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the second viral polypeptide or a fragment thereof. In other words, the first and second domain each comprises at least one amino acid position that covaries with at least one amino acid position in the other domain. [0046] In another aspect, provided herein is a fusion polypeptide comprising: at least one viral polypeptide or a fragment thereof that is derived from a first pathogenic virus that infects a human subject, wherein the viral polypeptide or fragment thereof comprises at least two or more covarying amino acid sites when compared to a viral polypeptide expressed by one or more of a virus that infects a non-human subject, and/or a viral polypeptide expressed by a different pathogenic virus that infects a human subject. [0047] In some embodiments of any of the aspects, the fusion polypeptide provided herein further comprises a second domain comprising a second viral polypeptide or a fragment thereof expressed by the first virus, optionally, the second viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the first viral polypeptide or a fragment thereof. [0048] In some embodiments of any of the aspects, the second viral polypeptide or fragment thereof comprises at least two or more amino acid positions that covary with each other. [0049] In some embodiments of any of the aspects, the second viral polypeptide or a fragment thereof comprises at least two or more amino acid positions that covary with each other. [0050] In some embodiments of any of the aspects, the fusion polypeptide further comprises a third domain comprising a third viral polypeptide or a fragment thereof expressed by the first virus. In some embodiments, the third viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the first and/or second viral polypeptide or a fragment thereof. [0051] In some embodiments of any of the aspects, the third viral polypeptide or fragment thereof comprises at least two or more amino acid positions that covary with each other. [0052] In some embodiments of any of the aspects, the fusion polypeptide further comprises a fourth domain comprising an amino acid sequence of the first, second or third domain. [0053] In some embodiments of any of the aspects, the covarying amino acid positions are determined using a correlating tandem model, optionally, the tandem model purity threshold is a level greater than or equal to 0.80. In some embodiments, the tandem model purity threshold is 0.80 or more, 0.85 or more, 0.90 or more, or 0.95 or more, or 0.99 or more. [0054] In some embodiments of any of the aspects, the covarying amino acid positions are relative to a viral polypeptide expressed by a second virus. In some embodiments of any of the aspects, the first virus is capable of infecting a human host and the second virus is capable of infecting a non-human host or the second virus is a different virus capable of infecting a human host. [0055] In some embodiments of any of the aspects, the first and/or second virus are from the virus same family. Accordingly, in some embodiments of any one of the aspects, the first and second virus independently are from a virus family selected from the group consisting of abyssoviridae, ackermannviridae, adenoviridae, alloherpesviridae, alphaflexiviridae, alphasatellitidae, alphatetraviridae, alvernaviridae, amalgaviridae, amnoonviridae, ampullaviridae, anelloviridae, arenaviridae, arteriviridae, artoviridae, ascoviridae, asfarviridae, aspiviridae, astroviridae, autographiviridae, avsunviroidae, bacilladnaviridae, baculoviridae, barnaviridae, belpaoviridae, benyviridae, betaflexiviridae, bicaudaviridae, bidnaviridae, birnaviridae, bornaviridae, botourmiaviridae, bromoviridae, caliciviridae, carmotetraviridae, caulimoviridae, chaseviridae, chrysoviridae, chuviridae, circoviridae, clavaviridae, closteroviridae, coronaviridae, corticoviridae, cremegaviridae, cruliviridae, cystoviridae, deltaflexiviridae, demerecviridae, dicistroviridae, drexlerviridae, endornaviridae, euroniviridae, filoviridae, fimoviridae, finnlakeviridae, flaviviridae, fuselloviridae, gammaflexiviridae, geminiviridae, genomoviridae, globuloviridae, gresnaviridae, guttaviridae, halspiviridae, hantaviridae, hepadnaviridae, hepeviridae, herelleviridae, herpesviridae, hypoviridae, hytrosaviridae, iflaviridae, inoviridae, iridoviridae, kitaviridae, lavidaviridae, leishbuviridae, leviviridae, lipothrixviridae, lispiviridae, luteoviridae, malacoherpesviridae, marnaviridae, marseilleviridae, matonaviridae, mayoviridae, medioniviridae, megabirnaviridae, mesoniviridae, metaviridae, microviridae, mimiviridae, mitoviridae, mononiviridae, mymonaviridae, myoviridae, mypoviridae, nairoviridae, nanghoshaviridae, nanhypoviridae, nanoviridae, narnaviridae, nimaviridae, nodaviridae, nudiviridae, nyamiviridae, olifoviridae, orthomyxoviridae, ovaliviridae, papillomaviridae, paramyxoviridae, partitiviridae, parvoviridae, peribunyaviridae, permutotetraviridae, phasmaviridae, phenuiviridae, phycodnaviridae, picobirnaviridae, picornaviridae, plasmaviridae, plectroviridae, pleolipoviridae, pneumoviridae, podoviridae, polycipiviridae, polydnaviridae, polymycoviridae, polyomaviridae, portogloboviridae, pospiviroidae, potyviridae, poxviridae, pseudoviridae, qinviridae, quadriviridae, redondoviridae, reoviridae, retroviridae, rhabdoviridae, roniviridae, rudiviridae, sarthroviridae, secoviridae, sinhaliviridae, siphoviridae, smacoviridae, solemoviridae, solinviviridae, sphaerolipoviridae, spiraviridae, sunviridae, tectiviridae, thaspiviridae, tobaniviridae, togaviridae, tolecusatellitidae, tombusviridae, tospoviridae, totiviridae, tristromaviridae, turriviridae, tymoviridae, virgaviridae, wupedeviridae, xinmoviridae, and yueviridae. In some preferred embodiments, the first and/or second virus are from the coronaviridae family. [0056] In some embodiments of any one of the aspects, the first and/or second virus are selected from the group consisting of hepadnaviruses, coronaviruses, avian influenza viruses, adenoviruses, herpesviruses, human papillomaviruses, parvoviruses, reoviruses, picornaviruses, flaviviruses, togaviruses, orthomyxoviruses, bunyaviruses, rhabdoviruses, and paramyxoviruses. [0057] In some embodiments of any of the aspects, the first and second virus are from the same virus genus. For example, the first and/or second viruses are independently from the genus selected from the group consisting of alphacoronavirus, betacoronavirus, gammacoronavirus and deltacoronavirus. Preferably, the first and/or second virus are from the genus betacoronavirus. [0058] In some embodiments of any one of the aspects, the first virus is from a first genus and the second virus is from a different genus of the same virus family. For example, the first virus is from the genus betacoronavirus and the second virus is from the genus alphacoronavirus. [0059] In some embodiments of any of the aspects, the first and second virus are from the same virus species. For example, the first and/or second virus are independently selected from the group consisting of Severe acute respiratory syndrome-related coronavirus (SARS-CoV, SARS-CoV-2), Middle East respiratory syndrome-related coronavirus (MERS), Human coronavirus HKU1, Human coronavirus OC43, Bovine Coronavirus, Hedgehog coronavirus 1, Murine coronavirus, Pipistrellus bat coronavirus HKU5, Rousettus bat coronavirus HKU9, and Tylonycteris bat coronavirus HKU4. In some embodiments, the first and/or second virus independently are SARS- CoV, SARS-CoV-2, or MERS. Preferably, the first and/or second virus are SARS-CoV or SARS- CoV2. [0060] In some embodiments of any one of the aspects, the first and/or second viruses are independently selected from the group consisting of adeno-associated virus; Aichi virus; astrovirus; Australian bat lyssavirus; BK polyomavirus; Banna virus; Barmah forest virus; Bunyamwera virus; Bunyavirus La Crosse; Bunyavirus snowshoe hare; Cercopithecine herpesvirus; Chandipura virus; Chikungunya virus; Cosavirus A; Cowpox virus; Coxsackie A virus; Coxsackie B virus; Crimean- Congo hemorrhagic fever virus; Dengue virus; Dhori virus; Dugbe virus; Duvenhage virus; Eastern equine encephalitis virus; Ebolavirus; Echovirus; Encephalomyocarditis virus; Epstein-Barr virus; European bat lyssavirus; GB virus C/Hepatitis G virus; Hantaan virus; Hendra virus; Hepatitis A virus; Hepatitis B virus; Hepatitis C virus; Hepatitis D virus; Hepatitis E virus; Hepatitis delta virus; Horsepox virus; Human adenovirus; Human astrovirus; Human coronavirus; Human cytomegalovirus; Human enterovirus 68, 70; Human herpesvirus 1; Human herpesvirus 2; Human herpesvirus 6; Human herpesvirus 7; Human herpesvirus 8; Human immunodeficiency virus; Human papillomavirus 1; Human papillomavirus 2; Human papillomavirus 16,18; Human parainfluenza; Human parvovirus B19; Human respiratory syncytial virus; Human rhinovirus; Human SARS coronavirus; Human spumaretrovirus; Human T-lymphotropic virus; Human torovirus; Influenza A virus; Influenza B virus; Influenza C virus; Isfahan virus; JC polyomavirus; Japanese encephalitis virus; Junin arenavirus; KI Polyomavirus; Kunjin virus; Lagos bat virus; Lake Victoria marburgvirus; Langat virus; Lassa virus; Lordsdale virus; Louping ill virus; Lymphocytic choriomeningitis virus; Machupo virus; Mayaro virus; MERS coronavirus; Measles virus; Mengo encephalomyocarditis virus; Merkel cell polyomavirus; Mokola virus; Molluscum contagiosum virus; Monkeypox virus; Mumps virus; Murray valley encephalitis virus; New York virus; Nipah virus; norovirus; Norwalk virus; O’nyong-nyong virus; Orf virus; Oropouche virus; Pichinde virus; Poliovirus; Punta toro phlebovirus; Puumala virus; Rabies virus; Rift valley fever virus; Rosavirus A; Ross river virus; Rotavirus A; Rotavirus B; Rotavirus C; Rubella virus; Sagiyama virus; Salivirus A; Sandfly fever sicilian virus; Sapporo virus; SARS coronavirus 2; Semliki forest virus; Seoul virus; Simian foamy virus; Simian virus 5; Sindbis virus; Southampton virus; St. louis encephalitis virus; Tick-borne powassan virus; Torque teno virus; Toscana virus; Uukuniemi virus; Vaccinia virus; Varicella-zoster virus; Variola virus; Venezuelan equine encephalitis virus; Vesicular stomatitis virus; Western equine encephalitis virus; WU polyomavirus; West Nile virus; Yaba monkey tumor virus; Yaba-like disease virus; Yellow fever. [0061] In some embodiments of any of the aspects, the first and second virus are capable of infecting the same host species. For example, the first and second virus are capable of infecting humans. [0062] In some embodiments of any of the aspects, the first and second virus infect different host species. For example, the first virus infects a human host and the second virus infects a non- human host. In some embodiments of any of the aspects, the non-human host can be selected from the group consisting of: a bat, a pangolin, a civet, an insect, a non-human primate, a rodent, a bovine, a bird, and an alpaca. [0063] In some embodiments of any one of the aspects, the first and second viruses are different isolates of the same virus species. [0064] In some embodiments of any of the aspects, the viral polypeptide or fragment thereof is selected from Table 1.
TABLE 1: SARS-COV2 antigens and fragments thereof comprising at least two covarying amino acid sites with a coronavirus that infects a nonhuman organism.
Figure imgf000015_0001
Figure imgf000016_0001
Figure imgf000017_0001
Figure imgf000018_0001
Figure imgf000019_0001
[0065] In some embodiments of any of the aspects, the virus that infects a non-human subject or the different pathogenic virus that infects a human subject is selected from the group consisting of: Table 2.
Table 2: Coronaviruses
Figure imgf000019_0002
Figure imgf000020_0001
Figure imgf000021_0001
Figure imgf000022_0001
Figure imgf000023_0001
Figure imgf000024_0001
Figure imgf000025_0001
Figure imgf000026_0001
Figure imgf000027_0001
Figure imgf000028_0001
Figure imgf000029_0001
Figure imgf000030_0001
Figure imgf000031_0001
Figure imgf000032_0001
Figure imgf000033_0001
Figure imgf000034_0001
Figure imgf000035_0001
Figure imgf000036_0001
Figure imgf000037_0001
Figure imgf000038_0001
Figure imgf000039_0001
Figure imgf000040_0001
Figure imgf000041_0001
Figure imgf000042_0001
Figure imgf000043_0001
Figure imgf000044_0001
Figure imgf000045_0001
Figure imgf000046_0001
Figure imgf000047_0001
Figure imgf000048_0001
Figure imgf000049_0001
Figure imgf000050_0001
Figure imgf000051_0001
Figure imgf000052_0001
Figure imgf000053_0001
Figure imgf000054_0001
Figure imgf000055_0001
Figure imgf000056_0001
Figure imgf000057_0001
Figure imgf000058_0001
First domain [0066] In some embodiments of any one of the aspect, the first viral polypeptide or fragment thereof is a corona virus polypeptide or a fragment thereof. For example, the first viral polypeptide or fragment thereof is selected from the group consisting of: the viroporin 3a protein, a non- structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), and spike (S) protein. In other words, the first domain comprises an amino acid sequence of a viroporin 3a protein, a non- structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), spike (S) protein, or a fragment thereof from a corona virus. Preferably the first domain comprises an amino acid sequence of a viroporin 3a protein or fragment thereof. In some embodiments, the first domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2. For example, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2. In some embodiments, the first domain comprises an amino acid sequence having 100% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein protein, e.g., amino acids 1-44 of SEQ ID NO: 2. [0067] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence SEQ ID NO: 2. [0068] In some embodiments, the first domain comprises an amino acid sequence comprising a substitution, mutation or deletion at position 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 32, 37, 38, 41, 42, 43 or 44 of SEQ ID NO: 2. In some embodiments, the first domain comprises an amino acid sequence comprising an amino acid sequence comprising a substitution, mutation or deletion at position 15, 16, 18, 20, 24, 25, 26, 28, or 38 of SEQ ID NO: 2. [0069] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence selected from SEQ ID NO: 20. [0070] In some embodiments, the first domain comprises an amino acid sequence comprising a substitution, mutation or deletion at position 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 32, 37, 38, 41, 42, 43 or 44 of SEQ ID NO: 20. In some embodiments, the first domain comprises an amino acid sequence comprising an amino acid sequence comprising a substitution, mutation or deletion at position 15, 16, 18, 20, 24, 25, 26, 28, or 38 of SEQ ID NO: 20. [0071] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence selected from SEQ ID NOs: 20-40: 3a_Seq1 WT 3a sequence MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 20), 3a_Seq2 T32F Covariant selected and Removes Glycosylation-site MDLFMRIFTIGTVTLKQGEIKDATPSDFVRAFATIPIQASLPFG (SEQ ID NO: 21), 3a_Seq3 V13I Covariant selected MDLFMRIFTIGTITLKQGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 22), 3a_Seq4 T14I Covariant selected MDLFMRIFTIGTVILKQGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 23), 3a_Seq5 L15F Covariant selected MDLFMRIFTIGTVTFKQGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 24), 3a_Seq6 Q17H Covariant selected MDLFMRIFTIGTVTLKHGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 25), 3a_Seq7 E19S Covariant selected MDLFMRIFTIGTVTLKQGSIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 26), 3a_Seq8 I20S Covariant selected MDLFMRIFTIGTVTLKQGESKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 27), 3a_Seq9 I20G Covariant selected MDLFMRIFTIGTVTLKQGEGKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 28), 3a_Seq10 I20V Covariant selected MDLFMRIFTIGTVTLKQGEVKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 29), 3a_Seq11 D22N Covariant selected MDLFMRIFTIGTVTLKQGEIKNATPSDFVRATATIPIQASLPFG (SEQ ID NO: 30), 3a_Seq12 A23S Covariant selected MDLFMRIFTIGTVTLKQGEIKDSTPSDFVRATATIPIQASLPFG (SEQ ID NO: 31), 3a_Seq13 P25L Covariant selected MDLFMRIFTIGTVTLKQGEIKDATLSDFVRATATIPIQASLPFG (SEQ ID NO: 32), 3a_Seq14 S26L Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPLDFVRATATIPIQASLPFG (SEQ ID NO: 33) 3a_Seq15 D27Y Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPSYFVRATATIPIQASLPFG (SEQ ID NO: 34), 3a_Seq16 I37L Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPLQASLPFG (SEQ ID NO: 35), 3a_Seq17 Q38P Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIPASLPFG (SEQ ID NO: 36), 3a_Seq18 L41F Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASFPFG (SEQ ID NO: 37), 3a_Seq19 P42S Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLSFG (SEQ ID NO: 38), 3a_Seq20 F43I Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPIG (SEQ ID NO: 39), and 3a_Seq21 G44V Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFV (SEQ ID NO: 40). [0072] In some embodiments, the first domain comprises an amino acid sequence having 100% identity to an amino acid sequence selected from SEQ ID NOs: 20-40. [0073] In some embodiments, the first domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence of amino acids 12-339 of S protein, e.g., amino acids 12-339 of SEQ ID NO: 1. In some embodiments, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence of amino acids 12-320 of S protein, e.g., amino acids 12-339 of SEQ ID NO: 1. For example, the second domain comprises an amino acid sequence having 100% identity to the amino acid sequence of amino acids 12-339 of S protein, e.g., amino acids 12-339 of SEQ ID NO: 1. [0074] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence of amino acids 1-309 or 1-328 of SEQ ID NO: 41:
Figure imgf000061_0001
[0075] In some embodiments, the first domain comprises an amino acid sequence having a mutation or deletion at position 4, 6, 7, 9, 10, 13, 15, 16, 19, 20, 23, 49, 54, 56, 58, 59, 60, 61, 62, 126, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 150, 163, 165, 172, 191, 229, 230, 231, 232, 233, 243, 244, 246 or 248 of SEQ ID NO: 41. [0076] In some embodiments, the first domain comprises an amino acid sequence of amino acids 1-309 of SEQ ID NO: 41 having a mutation or deletion at position 4, 6, 7, 9, 10, 13, 15, 16, 19, 20, 23, 49, 54, 56, 58, 59, 60, 61, 62, 126, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 150, 163, 165, 172, 191, 229, 230, 231, 232, 233, 243, 244, 246 or 248. [0077] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence of amino acids 1-310 or 1-329 of SEQ ID NO: 45:
Figure imgf000062_0001
[0078] In some embodiments, the first domain comprises an amino acid sequence having a mutation or deletion at position 5, 7, 8, 10, 11, 14, 16, 17, 20, 21, 24, 50, 55, 57, 59, 60, 61, 62, 63, 127, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 151, 164, 166, 173, 192, 230, 231, 232, 233, 234, 244, 245, 247 and 249 of SEQ ID NO: 45. [0079] In some embodiments, the first domain comprises an amino acid sequence of amino acids 1-310 of SEQ ID NO: 45 having a mutation or deletion at position 5, 7, 8, 10, 11, 14, 16, 17, 20, 21, 24, 50, 55, 57, 59, 60, 61, 62, 63, 127, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 151, 164, 166, 173, 192, 230, 231, 232, 233, 234, 244, 245, 247 and 249. Second domain [0080] In some embodiments of any one of the aspect, the second viral polypeptide or fragment thereof is a corona virus polypeptide or a fragment thereof. For example, the second viral polypeptide or fragment thereof is selected from the group consisting of: a viroporin 3a protein, a non-structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), and spike (S) protein. In other words, the second domain comprises an amino acid sequence of a viroporin 3a protein, a non-structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), spike (S) protein, or a fragment thereof from a corona virus. Preferably the second domain comprises an amino acid sequence of S protein or fragment thereof. For example, the second domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence of amino acids 12- 339 of S protein, e.g., amino acids 12-339 of SEQ ID NO: 1. In some embodiments, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence of amino acids 12-320 of S protein, e.g., amino acids 12-339 of SEQ ID NO: 1. For example, the second domain comprises an amino acid sequence having 100% identity to the amino acid sequence of amino acids 12-339 of S protein, e.g., amino acids 12-339 of SEQ ID NO: 1. [0081] In some embodiments, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence of amino acids 1-309 or 1-328 of SEQ ID NO: 41:
Figure imgf000063_0001
[0082] In some embodiments, the second domain comprises an amino acid sequence having a mutation or deletion at position 4, 6, 7, 9, 10, 13, 15, 16, 19, 20, 23, 49, 54, 56, 58, 59, 60, 61, 62, 126, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 150, 163, 165, 172, 191, 229, 230, 231, 232, 233, 243, 244, 246 or 248 of SEQ ID NO: 41. [0083] In some embodiments, the second domain comprises an amino acid sequence of amino acids 1-309 of SEQ ID NO: 41 having a mutation or deletion at position 4, 6, 7, 9, 10, 13, 15, 16, 19, 20, 23, 49, 54, 56, 58, 59, 60, 61, 62, 126, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 150, 163, 165, 172, 191, 229, 230, 231, 232, 233, 243, 244, 246 or 248. [0084] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence of amino acids 1-310 or 1-329 of SEQ ID NO: 45:
Figure imgf000063_0002
[0085] In some embodiments, the first domain comprises an amino acid sequence having a mutation or deletion at position 5, 7, 8, 10, 11, 14, 16, 17, 20, 21, 24, 50, 55, 57, 59, 60, 61, 62, 63, 127, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 151, 164, 166, 173, 192, 230, 231, 232, 233, 234, 244, 245, 247 and 249 of SEQ ID NO: 45. [0086] In some embodiments, the first domain comprises an amino acid sequence of amino acids 1-310 of SEQ ID NO: 45 having a mutation or deletion at position 5, 7, 8, 10, 11, 14, 16, 17, 20, 21, 24, 50, 55, 57, 59, 60, 61, 62, 63, 127, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 151, 164, 166, 173, 192, 230, 231, 232, 233, 234, 244, 245, 247 and 249. [0087] In some embodiments, the second domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence of amino acids 591-700, 514- 714, 514-794 or 514- 890 of S protein, e.g., SEQ ID NO: 1. For example, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence of amino 591-700, 514- 714, 514-794 or 514- 890 of S protein, e.g., SEQ ID NO: 1. In some embodiments, the second domain comprises an amino acid sequence having 100% identity to the amino acid sequence of amino acids 591-700, 514- 714, 514-794 or 514- 890 of S protein, e.g., SEQ ID NO: 1. [0088] In some embodiments, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence of amino acids 1-377, 1-281, 1-201 or 78-187 of SEQ ID NO: 42, 43 and 60-65:
Figure imgf000064_0001
Figure imgf000064_0002
Figure imgf000065_0001
[0089] In some embodiments, the second domain comprises an amino acid sequence having at least one mutation or deletion at position 19, 41, 127, 164 or 175 of SEQ ID NO: 42, 43 or 60-65. [0090] In some embodiments, the second domain comprises the amino acid sequence of amino acids 1-281 of SEQ ID NO: 42, 43 or 60-65 having at least one mutation or deletion at position 19, 41, 127, 164 or 175. [0091] In some embodiments, the second domain comprises the amino acid sequence of amino acids 1-201 of SEQ ID NO: 42, 43 or 60-65 having at least one mutation or deletion at position 19, 41, 127, 164 or 175. [0092] In some embodiments, the second domain comprises the amino acid sequence of amino acids 78-187 of SEQ ID NO: 42, 43 or 60-65 having at least one mutation or deletion at position 19, 41, 127, 164 or 175. [0093] In some embodiments, the second domain comprises an amino acid sequence having 100% identity to an amino acid sequence of amino acids 1-377, 1-281, 1-201 or 78-187 of SEQ ID NO: 42, 43 or 60-65. Third domain [0094] In some embodiments of any one of the aspect, the third viral polypeptide or fragment thereof is a corona virus polypeptide or a fragment thereof. For example, the third viral polypeptide or fragment thereof is selected from the group consisting of: a viroporin 3a protein, a non-structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), and spike (S) protein. In other words, the third domain comprises an amino acid sequence of a viroporin 3a protein, a non- structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), spike (S) protein, or a fragment thereof from a corona virus. Preferably the third domain comprises an amino acid sequence of S protein or fragment thereof. For example, the third domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence of amino acids 591-700, 514- 714, 514-794 or 514- 890 of S protein, e.g., amino acids 591-700, 514- 714, 514-794 or 514- 890 of SEQ ID NO: 1. For example, the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence of amino 591-700, 514-714, 514-794 or 514- 890 of S protein, e.g., amino acids 591-700, 514- 714, 514-794 or 514-890 of SEQ ID NO: 1. In some embodiments, the third domain comprises an amino acid sequence having 100% identity to the amino acid sequence of amino acids 591-700, 514-714, 514-794 or 514-890 of S protein, e.g., amino acids 591-700, 514- 714, 514-794 or 514- 890 of SEQ ID NO: 1. [0095] In some embodiments, the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence of amino acids 1-377, 1-281, 1-201 or 78-187 of SEQ ID NO: 42, 43 or 60-65. [0096] In some embodiments, the third domain comprises an amino acid sequence having at least one mutation or deletion at position 19, 41, 127, 164 or 175 of SEQ ID NO: 42, 43 or 60-65. [0097] In some embodiments, the third domain comprises the amino acid sequence of amino acids 1-281 of SEQ ID NO: 42, 43 or 60-65having at least one mutation or deletion at position 19, 41, 127, 164 or 175. [0098] In some embodiments, the third domain comprises the amino acid sequence of amino acids 1-201 of SEQ ID NO: 42, 43 or 60-65 having at least one mutation or deletion at position 19, 41, 127, 164 or 175. [0099] In some embodiments, the third domain comprises the amino acid sequence of amino acids 78-187 of SEQ ID NO: 42, 43 or 60-65 having at least one mutation or deletion at position 19, 41, 127, 164 or 175. [00100] In some embodiments, the third domain comprises an amino acid sequence having 100% identity to an amino acid sequence of amino acids 1-377, 1-281, 1-201 or 78-187 of SEQ ID NO: 42, 43 or 60-65. [00101] In some embodiments, the third domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2. For example, the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2. In some embodiments, the third domain comprises an amino acid sequence having 100% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein protein, e.g., amino acids 1-44 of SEQ ID NO: 2. [00102] In some embodiments, the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence SEQ ID NO: 2. [00103] In some embodiments, the third domain comprises an amino acid sequence comprising a substitution, mutation or deletion at position 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 32, 37, 38, 41, 42, 43 or 44 of SEQ ID NO: 2. In some embodiments, the third domain comprises an amino acid sequence comprising an amino acid sequence comprising a substitution, mutation or deletion at position 15, 16, 18, 20, 24, 25, 26, 28, or 38 of SEQ ID NO: 2. [00104] In some embodiments, the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence selected from SEQ ID NO: 20. For example, the third domain comprises an amino acid sequence having 100% identity to SEQ ID NO: 20. [00105] In some embodiments, the third domain comprises an amino acid sequence comprising a substitution, mutation or deletion at position 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 32, 37, 38, 41, 42, 43 or 44 of SEQ ID NO: 20. In some embodiments, the third domain comprises an amino acid sequence comprising an amino acid sequence comprising a substitution, mutation or deletion at position 15, 16, 18, 20, 24, 25, 26, 28, or 38 of SEQ ID NO: 20. [00106] In some embodiments, the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence selected from SEQ ID NOs: 20-40. For example, the third domain comprises an amino acid sequence having 100% identity to an amino acid sequence selected from SEQ ID NOs: 20-40. [00107] As described herein, in some embodiments, the fusion polypeptide comprises a fourth domain. Generally, the fourth domain comprises an amino acid sequence of the first, second or third domain. Generally, the when the fusion polypeptide comprises a fourth domain, the domains at the N-terminal and C-terminals comprise amino acids sequences having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to each other. For example, the domains at the N-terminal and C-terminals comprise amino acids sequences having 100% identity to each other. [00108] In some embodiments, the fourth domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2. For example, the fourth domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2. In some embodiments, the fourth domain comprises an amino acid sequence having 100% identity to the amino acid sequence of amino acids 1-44 of viroporin 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2. [00109] In some embodiments, the fourth domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence selected from SEQ ID NOs: 20-43 and 60-65. For example, the fourth domain comprises an amino acid sequence having 100% identity to an amino acid sequence selected from SEQ ID NOs: 20-43 and 60-65. [00110] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 42. [00111] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 42 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00112] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 43. [00113] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 43 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00114] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 60. [00115] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 60 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00116] In some embodiments, The first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 61. [00117] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 61 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00118] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 622. [00119] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 62 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00120] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 63. [00121] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 63 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00122] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 64. [00123] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 64 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00124] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 65. [00125] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 65 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00126] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 42. [00127] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 42 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00128] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 43. [00129] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 43 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00130] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 60. [00131] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 60 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00132] In some embodiments, The first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 61. [00133] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 61 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00134] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 622. [00135] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 62 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00136] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 63. [00137] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 63 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00138] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 64. [00139] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 64 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00140] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 65. [00141] In some embodiments, the first domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 65 and the third domain comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to SEQ ID NO: 20. [00142] In some embodiments of any of the aspect, an amino acid sequence described herein comprises at least one amino acid substitution. The term “conservative substitution,” or “substitution” or “substituted” when describing a polypeptide, refers to a change in the amino acid composition of the polypeptide that does not substantially alter the polypeptide's activity, fore examples, a conservative substitution refers to substituting an amino acid residue for a different amino acid residue that has similar chemical properties. Conservative amino acid substitutions include replacement of a leucine with an isoleucine or valine, an aspartate with a glutamate, or a threonine with a serine. “Conservative amino acid substitutions” result from replacing one amino acid with another having similar structural and/or chemical properties, such as the replacement of a leucine with an isoleucine or valine, an aspartate with a glutamate, or a threonine with a serine. Thus, a “conservative substitution” of a particular amino acid sequence refers to substitution of those amino acids that are not critical for polypeptide activity or substitution of amino acids with other amino acids having similar properties (e.g., acidic, basic, positively or negatively charged, polar or non-polar, etc.) such that the substitution of even critical amino acids does not substantially alter activity. Conservative substitution tables providing functionally similar amino acids are well known in the art. For example, the following six groups each contain amino acids that are conservative substitutions for one another: 1) Alanine (A), Serine (S), Threonine (T); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W). (See also Creighton, Proteins, W. H. Freeman and Company (1984).) In addition, individual substitutions, deletions or additions that alter, add or delete a single amino acid or a small percentage of amino acids in an encoded sequence are also “conservative substitutions.” Insertions or deletions are typically in the range of about 1 to 5 amino acids. [00143] Amino acids can be grouped according to similarities in the properties of their side chains (in A. L. Lehninger, in Biochemistry, second ed., pp. 73-75, Worth Publishers, New York (1975)): (1) non-polar: Ala (A), Val (V), Leu (L), Ile (I), Pro (P), Phe (F), Trp (W), Met (M); (2) uncharged polar: Gly (G), Ser (S), Thr (T), Cys (C), Tyr (Y), Asn (N), Gln (Q); (3) acidic: Asp (D), Glu (E); (4) basic: Lys (K), Arg (R), His (H). Alternatively, naturally occurring residues can be divided into groups based on common side-chain properties: (1) hydrophobic: Norleucine, Met, Ala, Val, Leu, Ile; (2) neutral hydrophilic: Cys, Ser, Thr, Asn, Gln; (3) acidic: Asp, Glu; (4) basic: His, Lys, Arg; (5) residues that influence chain orientation: Gly, Pro; (6) aromatic: Trp, Tyr, Phe. Non-conservative substitutions will entail exchanging a member of one of these classes for another class. Particular conservative substitutions include, for example; Ala into Gly or into Ser; Arg into Lys; Asn into Gln or into His; Asp into Glu; Cys into Ser; Gln into Asn; Glu into Asp; Gly into Ala or into Pro; His into Asn or into Gln; Ile into Leu or into Val; Leu into Ile or into Val; Lys into Arg, into Gln or into Glu; Met into Leu, into Tyr or into Ile; Phe into Met, into Leu or into Tyr; Ser into Thr; Thr into Ser; Trp into Tyr; Tyr into Trp; and/or Phe into Val, into Ile or into Leu. [00144] In some embodiments of any of the aspects, a fusion polypeptide or fragment thereof as described herein can be a variant of a polypeptide provided herein. In some embodiments, the variant is a conservatively modified variant. Conservative substitution variants can be obtained by mutations of native nucleotide sequences, for example. A “variant,” as referred to herein, is a polypeptide substantially homologous to a native or reference polypeptide, but which has an amino acid sequence different from that of the native or reference polypeptide because of one or a plurality of deletions, insertions or substitutions. Variant polypeptide-encoding DNA sequences encompass sequences that comprise one or more additions, deletions, or substitutions of nucleotides when compared to a native or reference DNA sequence, but that encode a variant protein or fragment thereof that retains activity of the non-variant polypeptide. A wide variety of PCR-based site- specific mutagenesis approaches are known in the art and can be applied by the ordinarily skilled artisan. [00145] A variant amino acid or nucleic acid sequence can be at least 80%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, identical to a native or reference sequence (e.g. SEQ ID NOs: 1-65). The degree of homology (percent identity) between a native and a mutant sequence can be determined, for example, by comparing the two sequences using freely available computer programs commonly employed for this purpose on the world wide web (e.g. BLASTp or BLASTn with default settings). [00146] Alterations of the native amino acid sequence can be accomplished by any of a number of techniques known in the art. Mutations can be introduced, for example, at particular loci by synthesizing oligonucleotides containing a mutant sequence, flanked by restriction sites permitting ligation to fragments of the native sequence. Following ligation, the resulting reconstructed sequence encodes an analog having the desired amino acid insertion, substitution, or deletion. Alternatively, oligonucleotide-directed site-specific mutagenesis procedures can be employed to provide an altered nucleotide sequence having particular codons altered according to the substitution, deletion, or insertion required. Techniques for making such alterations are well established and include, for example, those disclosed by Walder et al. (Gene 42:133, 1986); Bauer et al. (Gene 37:73, 1985); Craik (BioTechniques, January 1985, 12-19); Smith et al. (Genetic Engineering: Principles and Methods, Plenum Press, 1981); and U.S. Pat. Nos. 4,518,584 and 4,737,462, which are incorporated herein by reference in their entireties. Any cysteine residue not involved in maintaining the proper conformation of a polypeptide also can be substituted, generally with serine, to improve the oxidative stability of the molecule and prevent aberrant crosslinking. Conversely, cysteine bond(s) can be added to a polypeptide to improve its stability or facilitate oligomerization. Linkers [00147] In some embodiments, the domains of the fusion polypeptide provided herein are linked by a linker. For example, the first and second domains are linked by a linker, the second and third domain are linked by a linker and/or the third and fourth domain are linked by a linker. [00148] As used herein, the term “linker” means a molecular moiety that connects two parts of a composition. The linker can be a chemical linker, a single peptide bond (e.g., linked directly to each other) or a peptide linker containing one or more amino acid residues (e.g. with an intervening amino acid or amino acid sequence between the two domains that are linked to each other. [00149] Preferably, the linker is a flexible linker. As used herein, a “flexible linker” is a linker which does not have a fixed structure (secondary or tertiary structure) in solution and is therefore free to adopt a variety of conformations. Generally, a flexible linker has a plurality of freely rotating bonds along its backbone. In contrast, a rigid linker is a linker which adopts a relatively well-defined conformation when in solution. Rigid linkers are therefore those which have a particular secondary and/or tertiary structure in solution. [00150] It is noted that two domains that are linked together can be separated from each other by any desired distance. In other words, the linker can be of any desired length. Inventors have discovered inter alia that certain linker lengths are relatively better for using the fusion protein in methods for detecting target nucleic acids. Accordingly, in some embodiments of the various aspects described herein, the linker is from about 10 Å to about 140 Å in length. For example, the linker can be from about 10 Å to about 130 Å in length, from about 15 Å to about 125 Å in length, from about 20 Å to about 120 Å in length, from about 25 Å to about 115 Å in length, from about 30 Å to about 110 Å in length, from about 35 Å to about 105 Å in length, from about 40 Å to about 100 Å in length, from about 45 Å to about 95 Å in length, from about 50 Å to about 90 Å in length, or from about 55 Å to about 85 Å in length. In some embodiments of the various aspects described herein, the linker can be from about 60 Å to about 80 Å in length or from about 65 Å to about 75 Å in length. In some embodiments of the various aspects described herein, the linker is about 70 Å in length. [00151] In some embodiments of the various aspects described herein, at least two of the domains are linked via a peptide linker. The term “peptide linker” as used herein denotes a peptide with amino acid sequences, which is in some embodiments of synthetic origin. It is noted that peptide linkers may affect folding of a given fusion protein, and may also react/bind with other proteins, and these properties can be screened for by known techniques. A peptide linker can comprise 1 amino acid or more, 5 amino acids or more, 10 amino acids or more, 15 amino acids or more, 20 amino acids or more, 25 amino acids or more, 30 amino acids or more, 35 amino acids or more, 40 amino acids or more, 45 amino acids or more, 50 amino acids or more and beyond. Conversely, a peptide linker can comprise less than 50 amino acids, less than 45 amino acids, less than 40 amino acids, less than 35 amino acids, less than 30 amino acids, less than 30 amino acids, less than 25 amino acids, less than 20 amino acids, less than 15 amino acids or less than 10 amino acids. [00152] In some embodiments of the various aspects described herein, the peptide linker comprises from about 5 amino acids to about 50 amino acids. For example, the peptide linker can comprise from about 5 amino acids to about 45 amino acids, from about 5 amino acids to about 40 amino acids, from about 5 amino acids to about 35 amino acids, from about 10 amino acids to 30 amino acids, or from about 15 amino acids to about 25 amino acids. [00153] In some embodiments of the various aspects described herein, the linker comprises 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 or 25 amino acids. For example, the linker comprises 17, 18, 19, 20, 21, 22 or 23 amino acids. Preferably, the linker comprises 18, 19, 20, 21 or 22 amino acids. More preferably, the linker comprises 19, 20 or 21 amino acids. In some embodiments of the various aspects described herein, the linker comprises 20 amino acids. [00154] Exemplary peptide linkers include those that consist of glycine and serine residues, the so-called Gly-Ser polypeptide linkers. As used herein, the term “Gly-Ser polypeptide linker” refers to a peptide that consists of glycine and serine residues. In some embodiments of the various aspects described herein, the peptide linker comprises the amino acid sequence (GlyxSer)n (SEQ ID NO: 66), where x is 2, 3, 4 or 5, and n is 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10. In some embodiments of the various aspects described herein, x is 3 and n is 3, 4, 5 or 6. In some embodiments of the various aspects described herein, x is 3 and n is 4 or 5. In some embodiments of the various aspects described herein, x is 4 and n is 3, 4, 5 or6. In some embodiments of the various aspects described herein, x is 4 and n is 4 or 5. In some embodiments of the various aspects described herein, x is 3 and n is 2. In some embodiments of the various aspects described herein, x is 3 or 4 and n is 1. [00155] Peptide linkers may affect folding of a given fusion protein, and may also react/bind with other proteins, and these properties can be screened for by known techniques. Exemplary linkers, in addition to those described herein, include a string of histidine residues, e.g., His6 (SEQ ID NO: 67); sequences made up of Ala and Pro, varying the number of Ala-Pro pairs to modulate the flexibility of the linker; and sequences made up of charged amino acid residues e.g., mixing Glu and Lys. Flexibility can be controlled by the types and numbers of residues in the linker. See, e.g., Perham et al., Biochem. 8501 (1991) 30: 8501 and Wriggers et al., Biopolymers 736 (2005) 80:736. [00156] In some embodiments of any one of the aspects, the linker is (GGGS)n (SEQ ID NO: 68), where n is greater than 2. For example, n is 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20. [00157] In some embodiments of the various aspects described herein, the linker can be a chemical linker. Chemical linkers can comprise a direct bond or an atom such as oxygen or sulfur, a unit such as NH, C(O), C(O)NH, SO, SO2, SO2NH, or a chain of atoms, such as substituted or unsubstituted C1-C6 alkyl, substituted or unsubstituted C2-C6 alkenyl, substituted or unsubstituted C2-C6 alkynyl, substituted or unsubstituted C6-C12 aryl, substituted or unsubstituted C5-C12 heteroaryl, substituted or unsubstituted C5-C12 heterocyclyl, substituted or unsubstituted C3-C12 cycloalkyl, where one or more methylenes can be interrupted or terminated by O, S, S(O), SO2, NH, or C(O). [00158] It is noted that the domains can be linked to each other in any desired orientation. For example, the first domain can be linked to N-terminus of the second domain. In some other embodiments, the first domain can be linked to C-terminus of the second domain. [00159] Similarly, the second and the third domains can be linked to each other in any desired orientation. For example, the third domain can be linked to N-terminus of the second domain. Alternatively, the third domain can be linked to C-terminus of the second domain. [00160] The third and the fourth domains can be linked to each other in any desired orientation. For example, the fourth domain can be linked to N-terminus of the third domain. Alternatively, the fourth domain can be linked to C-terminus of the third domain. [00161] In some embodiments, the first domain is linked to the N-terminus of the second domain and the third domain is linked to the C-terminus of the second domain. For example, the fusion polypeptide comprises from N-terminus to C-terminus: first domain, second domain and third domain. A linker can be present between the first and second domain, and/or between the second and third domain. [00162] In some other embodiments, second domain is linked to the N-terminus of the third domain and the first domain is linked to the C-terminus of the third domain. For example, the fusion polypeptide comprises from N-terminus to C-terminus: second domain, third domain and first domain. A linker can be present between the second and third domain, and/or between the third and first domain. [00163] In some embodiments, the C-terminal region of the N-terminal domain of the viroporin 3a protein is linked to the N-terminal region of the N-terminal domain of the S protein. In some embodiments, the N-terminal domain of the viroporin is separated from the N-terminal domain of the S protein by a peptide linker. [00164] In some embodiments of any of the aspects, the fusion polypeptide comprises an amino acid sequence having at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to an amino acid sequence selected from SEQ ID NOs: 42-55 and 60- 63: A. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is D)
Figure imgf000080_0001
(SEQ ID NOS 44 and 42, respectively); B. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is D)
Figure imgf000080_0002
Figure imgf000081_0001
(890)([linkage]3a) (SEQ ID NOS 45 and 42, respectively); C. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is G)
Figure imgf000081_0002
(SEQ ID NOS 46 and 43, respectively); and D. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is G)
Figure imgf000081_0003
Figure imgf000082_0001
(890)([linkage]3a) (SEQ ID NOS 47 and 43, respectively); E. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is D)
Figure imgf000082_0002
F. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is D)
Figure imgf000082_0003
I (794)([linkage]3a) (SEQ ID NOS 49 and 60, respectively); G. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is G)
Figure imgf000082_0004
Figure imgf000083_0001
(794) (SEQ ID NOS 50 and 61, respectively);
Figure imgf000083_0002
H. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is G)
Figure imgf000083_0003
I (794)([linkage]3a) (SEQ ID NOS 51 and 61, respectively); I. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is D)
Figure imgf000083_0004
(714) (SEQ
Figure imgf000083_0005
ID NOS 52 and 62, respectively); J. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is D)
Figure imgf000083_0006
Figure imgf000084_0001
(714)([linkage]3a)
Figure imgf000084_0002
(SEQ ID NOS 53 and 62, respectively); K. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is G)
Figure imgf000084_0003
(714) (SEQ
Figure imgf000084_0004
ID NOS 54 and 63, respectively); and L. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is G)
Figure imgf000084_0005
(714)([linkage]3a)
Figure imgf000084_0006
(SEQ ID NOS 55 and 63, respectively); M. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is D)
Figure imgf000084_0007
Figure imgf000085_0001
N.3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is D)
Figure imgf000085_0002
O. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is G)
Figure imgf000085_0003
Figure imgf000086_0001
P. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is G)
Figure imgf000086_0002
[00165] In the above sequences, 3a refers to amino acid 1-44 of a 3a protein, e.g., amino acids 1-44 of SEQ ID NO: 2; [linkage] means a linker, e.g., a linker described herein; and the number in parentheses, e.g., (514), (591), (714), (790) and (890) refers to the start position (e.g., (514) or (591)) or the end position (e.g., (714), (790) or (890)) of the sequence flanked by the numbers in the parentheses in the full length wild-type S protein, e.g., SEQ ID NO: 1. [00166] For example, the fusion polypeptide comprises an amino acid sequence having 100% identity to an amino acid sequence selected from SEQ ID NOs: 45-65. [00167] The disclosure also provides a polynucleotide encoding a fusion polypeptide described herein. The skilled person will understand that, due to the degeneracy of the genetic code, a given fusion polypeptide can be encoded by different polynucleotides. These “variants” are encompassed herein. [00168] In some embodiments, a nucleic acid encoding a fusion polypeptide described herein is comprised in a vector. In some embodiments, a nucleic acid sequence encoding a fusion polypeptide is operably linked to a vector. The term "vector", as used herein, refers to a nucleic acid construct designed for delivery to a host cell or for transfer between different host cells. As used herein, a vector can be viral or non-viral. The term “vector” encompasses any genetic element that is capable of replication when associated with the proper control elements and that can transfer gene sequences to cells. A vector can include, but is not limited to, a cloning vector, an expression vector, a plasmid, phage, transposon, cosmid, chromosome, virus, virion, etc. [00169] In some embodiments, the vector is recombinant, e.g., it comprises sequences originating from at least two different sources. In some embodiments of any of the aspects, the vector comprises sequences originating from at least two different species. In some embodiments of any of the aspects, the vector comprises sequences originating from at least two different genes, e.g., it comprises a fusion protein or a nucleic acid encoding an expression product which is operably linked to at least one non-native (e.g., heterologous) genetic control element (e.g., a promoter, suppressor, activator, enhancer, response element, or the like). [00170] In some embodiments, the vector or nucleic acid described herein is codon-optimized, e.g., the native or wild-type sequence of the nucleic acid sequence has been altered or engineered to include alternative codons such that altered or engineered nucleic acid encodes the same polypeptide expression product as the native/wild-type sequence, but will be transcribed and/or translated at an improved efficiency in a desired expression system. In some embodiments, the expression system is an organism other than the source of the native/wild-type sequence (or a cell obtained from such organism). In some embodiments, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a mammal or mammalian cell, e.g., a mouse, a murine cell, or a human cell. In some embodiments, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a human cell. In some embodiments, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a yeast or yeast cell. In some embodiments, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in a bacterial cell. In some embodiments, the vector and/or nucleic acid sequence described herein is codon-optimized for expression in an E. coli cell. [00171] As used herein, the term "expression vector" refers to a vector that directs expression of an RNA or polypeptide from sequences linked to transcriptional regulatory sequences on the vector. The sequences expressed will often, but not necessarily, be heterologous to the cell. An expression vector may comprise additional elements, for example, the expression vector may have two replication systems, thus allowing it to be maintained in two organisms, for example in human cells for expression and in a prokaryotic host for cloning and amplification. [00172] As used herein, the term “viral vector" refers to a nucleic acid vector construct that includes at least one element of viral origin and has the capacity to be packaged into a viral vector particle. The viral vector can contain the nucleic acid encoding a polypeptide as described herein in place of non-essential viral genes. The vector and/or particle may be utilized for the purpose of transferring any nucleic acids into cells either in vitro or in vivo. Numerous forms of viral vectors are known in the art. [00173] It should be understood that the vectors described herein can, in some embodiments, be combined with other suitable compositions and therapies. In some embodiments, the vector is episomal. The use of a suitable episomal vector provides a means of maintaining the nucleotide of interest in the subject in high copy number extra chromosomal DNA thereby eliminating potential effects of chromosomal integration. [00174] In some embodiments of any of the aspects described herein, the constructs can be comprised by a superstructure, e.g., nanoparticles, liposomes, vectors, cells, scaffolds, or the like. [00175] The disclosure also provides a cell comprising a fusion polypeptide described herein or a polynucleotide encoding the same. As used herein, the term “cell” refers to a single cell as well as to a population of (i.e., more than one) cells. For example, a cell prokaryotic cell or a eukaryotic cell comprising a polypeptide or polynucleotide described herein. Exemplary cells include, but are not limited to, bacterial cells, yeast cells, plant cell, animal (including insect) or human cells. Methods of selecting a viral polypeptide for a fusion polypeptide composition [00176] Provided herein is a method of selecting a viral polypeptide or fragment thereof for a therapeutic fusion polypeptide composition that can be administered to a subject prior to exposure to a pathogenic microorganism, during or following an infection. The fusion polypeptide can be administered to promote an immune response in a subject. [00177] In order to select a fusion polypeptide, the method provided herein comprises a covariance analysis that compares the amino acid residues of polypeptides expressed by a first infectious pathogenic microorganism that infects a human subject (e.g., SARS-CoV2) to microorganisms that infect other species (e.g., bats) and/or different pathogenic microorganism that infects a human subject (e.g., SAR-CoV). One of skill in the art can determine which microorganisms to use as a reference. [00178] Multiple sequence alignment (MSA) is often used to determine the degree of conservation of a given amino acid sequence of a polypeptide between organisms, e.g., orthologues and paralogues. Identifying coordinated changes of amino acid residues over time or between different species of organisms has emerged as an important predictor of coevolution on a molecular scale. See, e.g., Yeang CH, Haussler D. Detecting coevolution in and among protein domains. PLoS Comput Biol.2007 Nov;3(11):e211 which is incorporated herein by reference in its entirety. [00179] Covariance is defined as a measure of how much two variables change together. In the context of examining the interaction between two polypeptides, the variables are specific positions within a polypeptide amino acid sequence, for example, two amino acid residues. Analysis of covariance can determine how a genome, translation products, or microorganisms evolve independently or whether they coevolve together. Sequence covariation can be used to detect protein–protein interactions, ligand-receptor bindings, and the folding structure of single proteins. Coevolving amino acid sites are also functionally and structurally important for identifying functional dependency between proteins. See, e.g., Fares MA, Travers SA. A novel method for detecting intramolecular coevolution: adding a further dimension to selective constraints analyses. Genetics.2006 May;173(1):9-23., which is incorporated by reference in its entirety. [00180] Covariance analysis is generally based on the following formula:
Figure imgf000089_0001
The length (l) of the aligned sequences and number of pairs (N) is denoted. A threshold or purity value for the calculated correlation value (Px(m)y(n)) was set to be exceedingly stringent based on the apparent evolutionary relatedness of the selected set of lineage B betacoronaviruses. [00181] As used herein, the term “covarying amino acid positions” refers to amino acid residue positions that normally occur together. an amino acid position of a polypeptide that has evolved, changed, or mutated when aligned with to one or more reference sequences. Covariance between two or more amino acid positions is observed when the type of amino acid found at a first amino acid position is dependent on the type of amino acid found at another amino acid position. That is, when one particular amino acid is found at a first position, a second particular amino acid is usually found at the second position. One of skill in the art can determine the optimum assemblage of reference sequences and shared genes among a group of related genomes that can be used for the method of covariance analysis. As the degree of taxonomic relatedness within a group of related genomes can be arbitrary and distinct regions of an individual genome may have been horizontally acquired or lost independent of the measured evolutionary divergence measured in other conserved genes, it is necessary to select an appropriate threshold of relatedness of analyzed genomes and also the selection of conserved genes. Non-limiting examples of reference sequences that can be used include those listed in Table 1 or sequences derived from the microorganisms in Table 2. [00182] Once covariant amino acid residues are identified, they can be further binned into pairs and groups (referred to herein as ‘Clusters’) of covarying amino acid residues. These clusters can be organized using a correlating tandem model set at different purity thresholds between 0.8 and 1.0. For example, a stringent purity threshold is 0.96 and will reduce noise based on the sampling size (See e.g., Table 2). [00183] The correlating tandem model refers to a mathematical algorithm that evaluates the degree of associate of each element in a covariance analysis (e.g., pairs of amino acid residues) to confirm a reliable pattern of covariance. See e.g., Shen, W., Li, Y. A novel algorithm for detecting multiple covariance and clustering of biological sequences. Sci Rep 6, 30425 (2016), which is incorporated herein by reference in its entirety. The correlating tandem model provided herein is based on the following formula:
Figure imgf000090_0001
Based on the set correlation threshold values, groups of more than three covarying amino acid pairs were binned and transformed into a matrix. mt is the degree of association correlation threshold and anything below a value threshold is removed from the correlation matrix when generating a pattern of covarying residues. This was repeated for all covarying residues found in each row and column in the matrix as to keep binned groups of residues that meet the set criteria. In this entire analysis, we chose a stringent 0.96 purity value (shown as P). [00184] An applied force-directed mapping algorithm can be used to visualize the relationships between these clusters of covarying residues isolates from which they were derived (See, e.g., Jacomy et al., “ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software.” PloS ONE. 2014; and Bastian M, et al. “Gephi: an open source software for exploring and manipulating networks. In: International AAAI Conference on Weblogs and Social Media.” Association for the Advancement of Artificial Intelligence, 2009 which are incorporated herein by reference in their entirety). This graphing technique simulates repulsion between nodes (residues, clusters, and genomes) as well as attraction between edges (links between nodes) and then plots them to minimize both the complexity (fewer crossed edges) and to reduce edge lengths in two dimensions. Nodes are force-directed according to hierarchy and this orients these based on sequential linkage (edges) to clusters and amino acid residues. [00185] The inventors discovered that this analysis readily organized coronavirus isolates into groups that were consistent with phylogenic analysis (FIGS.1 and 3 in the working example). The clusters can then be binned to compare covarying residues that are restricted to isolates from a given host species (e.g., humans or bats) or a given group of viruses. By comparing networks of clusters of covarying residues between distinct groupings, clusters that are restricted to various combinations of bat, civet, pangolin, and human viral isolates can be identified. Those classified as ‘restricted’ are found in clusters linked to specific groups and absent in other groups. Thus, this annotated comprehensive dataset identifies all distinct residue identities that strongly covary in each viral protein and the distribution of these in microorganisms that include but are not limited to the human pathogens- SARS-CoV and SARS-CoV-2. [00186] Enriched covariant residues within the network include those that facilitate transmissibility into humans and would be selected as optimal candidates for a fusion polypeptide and/or vaccine composition provided herein. [00187] In some embodiments of any of the aspects, the method described herein further comprises cloning the selected polypeptide or a fragment thereof into an expression vector. In some embodiments of any of the aspects, the selected polypeptide comprises one or more covarying amino acid residues identified by the methods described above. [00188] Methods of cloning are well known in the art. By way of example only, the nucleic acid encoding the candidate antigen or fragment thereof described herein can be cloned into a plasmid (e.g., pET17b). The plasmid can have an antibiotic-resistance cassette (e.g., ampicillin or kanamycin) The plasmid is then transformed into a bacterium (e.g., E. coli BL21 (DE3)) for recombinant expression. Transformed bacterial cultures can then be inoculated in a medium with the appropriate antibiotics followed by induction (e.g., by isopropyl β-D-1-thiogalactopyranoside (IPTG)) to evaluate the polypeptide or polypeptide fragment expression. [00189] Other methods of making, expressing, delivering, or preparing a fusion polypeptide or fragment thereof can also be used. For example, mRNA vaccine compositions can be used. See US Pat Nos. 9,192,651 B2 and 10,022,435B2 which have been incorporated by reference herein in their entirety. See also, polypeptide-antigen conjugates in US 2018/0333484 A1 and 2019/0192645 A1, which have been incorporated by reference herein in their entirety. [00190] In some embodiments of any of the aspects, the method further comprises expressing and isolating the candidate polypeptide or fragment thereof by methods known in the art. By way of example only, after induction, bacteria can be collected by centrifugation and proteins can then be purified by methods known in the art (e.g., column purification). Dot blots can be used to verify the antigen-positive fractions using an antibody and detection reagents. Protein concentration can then be quantified by methods known in the art. [00191] The resulting fusion polypeptide or fragments thereof can be tested and verified by immunizing an animal model or a subject with purified recombinant proteins and measuring an immune response to the antigen. Provoking an immune response in a subject [00192] The fusion polypeptides, vaccine compositions, and methods provided herein can provoke an immune response, e.g., an immune response which is protective against infections by one or more microorganisms. In one aspect, provided herein is a method of provoking an immune response to a microorganism (e.g., a pathogenic virus or a coronavirus) in a subject, wherein the method comprises administering to a subject a fusion polypeptide or vaccine composition as described herein. In some embodiments of any of the aspects, the vaccine composition provokes an immune response that is protective against a pathogenic virus. In some embodiments of any of the aspects, the vaccine composition provokes an immune response that is protective against a coronavirus (e.g., SARS-CoV and SARS-CoV2). [0089] An immune response can be characterized as any stimulation of any immune cell, such as release of antibodies, cytokines, proliferation of an immune cell, phagocytosis, or any known function of an immune cell known in the art. Provoking an immune response as described herein can include the presence of an antibody or an increase in antibody production by B cells wherein the antibody can bind to an antigen expressing or an infecting pathogen following administration of the polypeptides, fragments thereof, or vaccine compositions described herein and thereby target the microorganism for killing or inactivation. [00193] Methods of measuring an immune response are known in the art. An immune response can be determined experimentally by evaluating the levels of immune molecules or cells in a biological sample from a subject exposed to the polypeptide, antigen, or fragment thereof. Methods of detecting an immune response include but are not limited to antibody ELISA (e.g., IgG antibody), cytokine ELISA (e.g., measuring the presence of cytokines such as IL-4, IL-12, IL-6, IFN-γ, or TNF-α), flow cytometry, viral titer, or a bactericidal assay (SBA). [0090] As known to those of skill in the art, the term “antibody” broadly refers to any immunoglobulin (Ig) molecule and immunologically active portions of immunoglobulin molecules (i.e., molecules that contain an antigen binding site that immunospecifically bind an antigen) comprised of four polypeptide chains, two heavy (H) chains and two light (L) chains, or any functional fragment, mutant, variant, or derivation thereof, which retains the essential epitope binding features of an Ig molecule. The antibody or immunoglobulin molecules can be of any type (e.g., IgG, IgE, IgM, IgD, IgA and IgY), class (e.g., IgG1, IgG2, IgG3, IgG4, IgA1 and IgA2) or subclass of immunoglobulin molecule, as is understood by one of skill in the art. [0091] The presence or an increase in antibody production in a subject compared with a reference level can be measured by any method known in the art including enzyme-linked immunosorbent assay (ELISA). In addition, the presence or an increase in cytokine production by immune cells compared with a reference level can be measured by any method known in the art including an ELISPOT assay. [00194] Mammals are diagnosed as having an infection according to any standard method known in the art and described, for example, in U.S. Pat. Nos.6,368,832, 6,579,854, and 6,808,710 and U.S. Patent Application Publication Nos. 20040137577, 20030232323, 20030166531, 20030064380, 20030044768, 20030039653, 20020164600, 20020160000, 20020110836, 20020107363, and 20020106730, all of which are hereby incorporated by reference in their entireties. [00195] In some embodiments of any of the aspects, the fusion polypeptide provokes and immune response in a human subject. Pharmaceutical compositions and formulations [00196] The fusion polypeptides provided herein can be formulated with a pharmaceutically acceptable carrier for administration that results in effective treatment of a subject or as a pharmaceutical composition or a vaccine composition. In some embodiments of any of the aspects, the vaccine composition comprises one or more fusion polypeptides or fragments thereof provided herein. In some embodiments, the vaccine composition comprises multiple fusion polypetides provided herein or fragments thereof. In some embodiments the vaccine composition comprises two or more, three or more, four or more, five or more, six or more, seven or more, eight or more, nine or more, ten or more, or twenty or more fusion polypeptides or fragments thereof provided herein (e.g., SEQ ID NOs: 14-37). [00197] Acceptable carriers, excipients, or stabilizers are nontoxic to recipients at the dosages and concentrations employed, and include buffers such as phosphate, citrate, and other organic acids; antioxidants including ascorbic acid and methionine; preservatives (such as octadecyldimethylbenzyl ammonium chloride; hexamethonium chloride; benzalkonium chloride, benzethonium chloride; phenol, butyl or benzyl alcohol; alkyl parabens such as methyl or propyl paraben; catechol; resorcinol; cyclohexanol; 3-pentanol; and m-cresol); low molecular weight (less than about 10 residues) polypeptides; proteins, such as serum albumin, gelatin, or immunoglobulins; hydrophilic polymers such as polyvinylpyrrolidone; amino acids such as glycine, glutamine, asparagine, histidine, arginine, or lysine; monosaccharides, disaccharides, and other carbohydrates including glucose, mannose, or dextrins; chelating agents such as EDTA; sugars such as sucrose, mannitol, trehalose or sorbitol; salt-forming counter-ions such as sodium; metal complexes (e.g. Zn-protein complexes); and/or non-ionic surfactants such as TWEEN™, PLURONICS™ or polyethylene glycol (PEG). Exemplary lyophilized fusion protein formulations are described in WO 97/04801, expressly incorporated herein by reference. [00198] Optionally, but preferably, the formulations comprising the compositions provided herein contain a pharmaceutically acceptable salt, typically, e.g., sodium chloride, and preferably at about physiological concentrations. Optionally, the formulations of the vaccine compositions described herein can contain a pharmaceutically acceptable preservative. Suitable preservatives include those known in the pharmaceutical arts, e.g., benzyl alcohol, phenol, m-cresol, methylparaben, and propylparaben are examples of preservatives. Optionally, the formulations of the vaccine compositions described herein can include a pharmaceutically acceptable surfactant, e.g., at a concentration of about 0.005 to 0.02%. [00199] The therapeutic formulations of the pharmaceutical compositions comprising the fusion proteins provided herein can also contain more than one active compound as necessary for the particular indication being treated (e.g., COVID19), preferably those with complementary activities that do not adversely affect each other. Such molecules are suitably present in combination in amounts that are effective for the purpose intended. [00200] The active ingredients of the pharmaceutical compositions comprising a fusion protein provided herein can also be entrapped in microcapsules prepared, for example, by coacervation techniques or by interfacial polymerization, for example, hydroxymethylcellulose or gelatin- microcapsules and poly-(methylmethacylate) microcapsules, respectively, in colloidal drug delivery systems (for example, liposomes, albumin microspheres, microemulsions, nano-particles and nanocapsules) or in macroemulsions. Such techniques are disclosed in Remington's Pharmaceutical Sciences 16th edition, Osol, A. Ed. (1980), which is incorporated herein by reference in its entirety. [00201] The pharmaceutical or vaccine composition provided herein can be formulated, dosed, and administered in a fashion consistent with good medical practice. Factors for consideration in this context include the particular disorder being treated, the particular subject being treated, the clinical condition of the individual subject, the cause of the disorder, the site of delivery of the vaccine composition, the method of administration, the scheduling of administration, and other factors known to medical practitioners. The “therapeutically effective amount” or “amount effective” of the fusion protein or fragment thereof or vaccine composition to be administered are governed by such considerations, and refers to the minimum amount necessary to ameliorate, treat, or stabilize an infection; to increase the time until progression (duration of progression free survival) or to treat or prevent the occurrence or recurrence of an infection. The vaccine composition can be optionally formulated, in some embodiments, with one or more additional therapeutic agents currently used to prevent or treat the infection, for example. The effective amount of such other agents depends on the amount of fusion protein or fragment thereof present in the formulation, the type of disorder or treatment, and other factors discussed above. These are generally used in the same dosages and with administration routes as used herein before or about from 1 to 99% of the heretofore employed dosages. [00202] Effective amounts, toxicity, and therapeutic efficacy can be determined by standard pharmaceutical procedures in cell cultures or experimental animals, e.g., for determining the LD50 (the dose lethal to 50% of the population) and the ED50 (the dose therapeutically effective in 50% of the population). The dosage can vary depending upon the dosage form employed and the route of administration utilized. The dose ratio between toxic and therapeutic effects is the therapeutic index and can be expressed as the ratio LD50/ED50. Compositions and methods that exhibit large therapeutic indices are preferred. A therapeutically effective dose can be estimated initially from cell culture assays. Also, a dose can be formulated in animal models to achieve a circulating plasma concentration range that includes the IC50 (i.e., the concentration of the polypeptide or fragment thereof), which achieves a half-maximal inhibition of symptoms) as determined in cell culture, or in an appropriate animal model. Levels in plasma can be measured, for example, by high performance liquid chromatography. The effects of any particular dosage can be monitored by a suitable bioassay. The dosage can be determined by a physician and adjusted, as necessary, to suit observed effects of the treatment. [00203] The dosage ranges for the vaccine composition depend upon the potency, and encompass amounts large enough to produce the desired effect. The dosage should not be so large as to cause unacceptable adverse side effects. Generally, the dosage will vary with the age, condition, and sex of the patient and can be determined by one of skill in the art. The dosage can also be adjusted by the individual physician in the event of any complication. In some embodiments, the dosage ranges from 0.001 mg/kg body weight to 100 mg/kg body weight. In some embodiments, the dose range is from 5 μg/kg body weight to 100 μg/kg body weight. Alternatively, the dose range can be titrated to maintain serum levels between 1 μg/mL and 1000 μg/mL. For systemic administration, subjects can be administered a therapeutic amount, such as, e.g., 0.1 mg/kg, 0.5 mg/kg, 1.0 mg/kg, 2.0 mg/kg, 2.5 mg/kg, 5 mg/kg, 7.5 mg/kg, 10 mg/kg, 15 mg/kg, 20 mg/kg, 25 mg/kg, 30 mg/kg, 40 mg/kg, 50 mg/kg, or more. These doses can be administered by one or more separate administrations, or by continuous infusion. For repeated administrations over several days or longer, depending on the condition, the treatment is sustained until, for example, the infection is treated, as measured by the methods described above or known in the art. However, other dosage regimens can be useful. [00204] The pharmaceutical or vaccine composition provided herein is suitably administered to the subject at one time or over a series of treatments. In a combination therapy regimen, the composition provided herein and the one or more additional therapeutic agents described herein are administered in a therapeutically effective or synergistic amount. As used herein, a therapeutically effective amount is such that co-administration of a fusion protein and one or more other therapeutic agents, or administration of a composition described herein, results in reduction or inhibition or prevention of a disease or disorder as described herein. A therapeutically synergistic amount is that amount of a fusion protein and one or more other therapeutic agents necessary to synergistically or significantly reduce, prevent, or eliminate conditions or symptoms associated with a particular disease. In some cases, the fusion protein provided herein can be co-administered with one or more additional therapeutically effective agents to give an additive effect resulting in a significantly reduction, prevention, or elimination of conditions or symptoms associated with a particular disease, but with a much reduced toxicity profile due to lower dosages of one or more of the additional therapeutically effective agents. [00205] The pharmaceutical or vaccine compositions described herein can be administered to a subject in need of vaccination, immunization, and/or stimulation of an immune response. In some embodiments of any of the aspects, the methods described herein comprise administering an effective amount of vaccine compositions described herein, e.g., to a subject in order to stimulate an immune response or provide protection against the relevant pathogen or microorganism (e.g., SARS-COV2) the polypeptide or antigen was derived from. Providing protection against the relevant pathogen is stimulating the immune system such that later exposure to the antigen, polypeptide, or fragment thereof (e.g., on or in a live pathogen) triggers a more effective immune response than if the subject was naïve to the antigen. Protection can include faster clearance of the pathogen, reduced severity and/or time of symptoms, and/or lack of development of disease or symptoms. As compared with an equivalent untreated control, such reduction is by at least 5%, 10%, 20%, 40%, 50%, 60%, 80%, 90%, 95%, 99% or more as measured by any standard technique. A variety of means for administering the compositions described herein to subjects are known to those of skill in the art. Such methods can include, but are not limited to oral, parenteral, intravenous, intramuscular, subcutaneous, transdermal, airway (aerosol), pulmonary, cutaneous, injection, or topical, administration. Administration can be local or systemic. [00206] The fusion polypeptide or fragment thereof or vaccine compositions as provided herein can be administered to a subject in need thereof by any appropriate route which results in an effective treatment in the subject. As used herein, the terms “administering,” and “introducing” are used interchangeably and refer to the placement of a vaccine composition, fusion polypeptide or fragment thereof into a subject by a method or route which results in at least partial localization of such compositions at a desired site, such as a site of infection, such that a desired effect(s) is produced. A fusion polypeptide or fragment thereof or vaccine composition can be administered to a subject by any mode of administration that delivers the vaccine composition systemically or to a desired surface or target, and can include, but is not limited to, injection, infusion, instillation, and inhalation administration. To the extent that the fusion polypeptide or fragment thereof or vaccine composition can be protected from inactivation in the gut, oral administration forms are also contemplated. “Injection” includes, without limitation, intravenous, intramuscular, intra-arterial, intrathecal, intraventricular, intracapsular, intraorbital, intracardiac, intradermal, intraperitoneal, transtracheal, subcutaneous, subcuticular, intraarticular, sub capsular, subarachnoid, intraspinal, intracerebro spinal, and intrasternal injection and infusion. [00207] The duration of a therapy using the compositions described herein will continue for as long as medically indicated or until a desired therapeutic effect (e.g., those described herein) is achieved. In certain embodiments, the administration of the vaccine composition described herein is continued for 1 month, 2 months, 4 months, 6 months, 8 months, 10 months, 1 year, 2 years, 3 years, 4 years, 5 years, 10 years, 20 years, or for a period of years up to the lifetime of the subject. [00208] As will be appreciated by one of skill in the art, appropriate dosing regimens for a given vaccine composition can comprise a single administration/immunization or multiple ones. Subsequent doses may be given repeatedly at time periods, for example, about two weeks or greater up through the entirety of a subject's life, e.g., to provide a sustained preventative effect. Subsequent doses can be spaced, for example, about two weeks, about three weeks, about four weeks, about one month, about two months, about three months, about four months, about five months, about six months, about seven months, about eight months, about nine months, about ten months, about eleven months, or about one year after a primary immunization. [00209] The precise dose to be employed in the formulation will also depend on the route of administration and should be decided according to the judgment of the practitioner and each patient's circumstances. Ultimately, the practitioner or physician will decide the amount of fusion protein provided herein or vaccine composition to administer to particular subjects. [00210] A vaccine composition as described herein can be used, for example, to protect or treat a subject against disease. The terms “immunize” and “vaccinate” tend to be used interchangeably in the field. However, in reference to the administration of the vaccine compositions as described herein to provide protection against disease, e.g., infectious disease caused by a pathogen that expresses the polypeptide or antigen, it should be understood that the term “immunize” refers to the passive protection conferred by the administered vaccine composition. [00211] Exemplary embodiments of the various aspect described herein can be described by the following numbered embodiments: [00212] Embodiment 1: A fusion polypeptide comprising: a first domain comprising a first viral polypeptide or a fragment thereof expressed by a first virus, wherein the first viral polypeptide or fragment thereof comprises at least two or more covarying amino acid positions. [00213] Embodiment 2: The fusion polypeptide of Embodiment 1, further comprising a second domain comprising a second viral polypeptide or a fragment thereof expressed by the first virus, optionally, the second viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the first viral polypeptide or a fragment thereof. [00214] Embodiment 3: The fusion polypeptide of Embodiment 1 or 2, wherein the second viral polypeptide or fragment thereof comprises at least two or more amino acid positions that covary with each other. [00215] Embodiment 4: A fusion polypeptide comprising: (a) a first domain comprising a first viral polypeptide or a fragment thereof expressed by a first virus; and (b) a second domain comprising a second viral polypeptide or a fragment thereof expressed by the first virus, and wherein the first viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the second viral polypeptide or a fragment thereof. [00216] Embodiment 5: The fusion polypeptide of Embodiment 4, wherein the first viral polypeptide or a fragment thereof comprises at least two or more amino acid positions that covary with each other. [00217] Embodiment 6: The fusion polypeptide of any one of Embodiments 2-5, wherein the second viral polypeptide or a fragment thereof comprises at least two or more amino acid positions that covary with each other. [00218] Embodiment 7: The fusion polypeptide of any one of Embodiments 2-6, wherein the first and second domain are linked by a linker. [00219] Embodiment 8: The fusion polypeptide of any one of Embodiments 2-7, wherein the first and second domain are linked by a flexible linker. [00220] Embodiment 9: The fusion polypeptide of any one of Embodiments 2-8, further comprising a third domain comprising a third viral polypeptide or a fragment thereof expressed by the first virus, optionally the third viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the first or second viral polypeptide or a fragment thereof. [00221] Embodiment 10: The fusion polypeptide of Embodiment 9, wherein the third viral polypeptide or fragment thereof comprises at least two or more amino acid positions that covary with each other. [00222] Embodiment 11: The fusion polypeptide of Embodiment 9 or 10, wherein the first and second domain are linked by a linker. [00223] Embodiment 12: The fusion polypeptide of any one of Embodiments 9-11, wherein the second and third domain are linked by a linker. [00224] Embodiment 13: The fusion polypeptide of any one of Embodiments 9-12, further comprising a fourth domain comprising an amino acid sequence of the first, second or third domain. [00225] Embodiment 14: The fusion polypeptide of 13, wherein the third and the fourth domains are linked by a linker. [00226] Embodiment 15: The fusion polypeptide of any one of Embodiments 1-14, wherein the covarying amino acid positions are determined using a correlating tandem model, optionally, the tandem model purity threshold is a level greater than or equal to 0.80. [00227] Embodiment 16: The fusion polypeptide of any one of Embodiments 1-15, wherein the covarying amino acid positions are relative to a viral polypeptide expressed by a second virus. [00228] Embodiment 17: The fusion polypeptide of any one of Embodiments 2-16, wherein the first virus is capable of infecting a human host and the second virus is capable of infecting a non- human host or the second virus is a different virus capable of infecting a human host. [00229] Embodiment 18: The fusion polypeptide of Embodiment 17, wherein the first and second virus are from the same family. [00230] Embodiment 19: The fusion polypeptide of Embodiment 17 or 18, wherein the first and second virus are from the same genus. [00231] Embodiment 20: The fusion polypeptide of any one of Embodiments 17-19, wherein the first and second virus are from the same species. [00232] Embodiment 21: The fusion polypeptide of any one of Embodiments 17-20, wherein the first and second virus are capable of infecting the same host species. [00233] Embodiment 22: The fusion polypeptide of any one of Embodiments 1-21, wherein the first virus is a corona virus. [00234] Embodiment 23: The fusion polypeptide of Embodiment 22, wherein the first virus is SARS-CoV or SARS-CoV2. [00235] Embodiment 24: The fusion polypeptide of any one of Embodiments 9-23, wherein the third viral polypeptide or fragment thereof is a corona virus polypeptide or fragment thereof selected from the group consisting of: the viroporin 3a protein, a non-structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), and spike (S) protein. [00236] Embodiment 25: The fusion protein of any one of Embodiments 9-24, wherein the third domain comprises an amino acid sequence having at least 85% identity SEQ ID NO: 42, 43, 60, 61, 62, 63, 64 or 65. [00237] Embodiment 26: The fusion polypeptide of Embodiment 25, wherein the third domain comprises an amino acid sequence comprising a substitution or deletion at position 19, 41, 127, 164 or 175 of SEQ ID NO: 42, 43, 60, 61, 62, 63, 64 or 65. [00238] Embodiment 27: The fusion polypeptide of any one of Embodiments 2-26, wherein the second viral polypeptide or fragment thereof is a corona virus polypeptide or fragment thereof selected from the group consisting of: the viroporin 3a protein, a non-structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), and spike (S) protein. [00239] Embodiment 28: The fusion polypeptide of any one of Embodiments 2-27, wherein the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 or 45. [00240] Embodiment 29: The fusion polypeptide of Embodiment 28, wherein the second domain comprises an amino acid sequence comprising a substitution or deletion at position 4, 6, 7, 9, 10, 13, 15, 16, 19, 20, 23, 49, 54, 56, 58, 59, 60, 61, 62, 126, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 150, 163, 165, 172, 191, 229, 230, 231, 232, 233, 243, 244, 246 or 248 of SEQ ID NO: 41 or at position 5, 7, 8, 10, 11, 14, 16, 17, 20, 21, 24, 50, 55, 57, 59, 60, 61, 62, 63, 127, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 151, 164, 166, 173, 192, 230, 231, 232, 233, 234, 244, 245, 247 and 249 of SEQ ID NO: 45. [00241] Embodiment 30: The fusion polypeptide of any one of Embodiments 1-29, wherein the first viral polypeptide or fragment thereof is a corona virus polypeptide or a fragment thereof selected from the group consisting of: the viroporin 3a protein, a non-structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), and spike (S) protein. [00242] Embodiment 31: The fusion protein of any one of Embodiments 1-30, wherein the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20. [00243] Embodiment 32: The fusion protein of Embodiment 31, wherein the first domain comprises an amino acid sequence comprising a substitution or deletion at position 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 32, 37, 38, 41, 42, 43 or 44 of SEQ ID NO: 20. [00244] Embodiment 33: The fusion protein of Embodiment 32, wherein the first domain comprises an amino acid sequence comprising a substitution or deletion at position 15, 16, 18, 20, 24, 25, 26, 28, or 38 of SEQ ID NO: 20. [00245] Embodiment 34: The fusion protein of any one of Embodiments 1-31, wherein the first domain comprises an amino acid sequence selected from the group consisting of SEQ ID NOs: 21-40. [00246] Embodiment 35: The fusion polypeptide of any one of Embodiments 1-31, wherein: (i) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 42; (ii) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 42 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (iii) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 43; (iv) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 43 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (v) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 60; (vi) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 60 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (vii) The first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 61; (viii) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 61 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (ix) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 62; (x) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 62 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (xi) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 63; (xii) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 63 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (xiii) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 64; (xiv) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 64 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (xv) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 65; (xvi) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 65 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (xvii) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 42; (xviii) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 42 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (xix) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 43; (xx) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 43 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (xxi) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 60; (xxii) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 60 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (xxiii) The first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 61; (xxiv) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 61 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (xv) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 62; (xxvi) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 62 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (xxv) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 63; (xxvi) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 63 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (xxvii) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 64; (xxviii) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 64 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; (xxix) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 65; or (xxx) the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 65 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20. [00247] Embodiment 36: The fusion polypeptide of any one of Embodiments 1-35, wherein the fusion polypeptide induces an antigen specific immune response when administered to a subject. [00248] Embodiment 37: A polynucleotide encoding an amino acid sequence of a fusion polypeptide of any one of Embodiments 1-36. [00249] Embodiment 38: A vaccine composition comprising a fusion polypeptide of any one of Embodiments 1-36 or a polynucleotide of Embodiment 37. [00250] Embodiment 39: The vaccine composition of Embodiment 38 further comprising an adjuvant. [00251] Embodiment 40: The vaccine composition of Embodiment 37 or 38, further comprising a pharmaceutical carrier. [00252] Embodiment 41: A cell comprising a fusion polypeptide of any one of Embodiments 1-36 or a polynucleotide of Embodiment 37. [00253] Embodiment 42: A kit comprising a fusion polypeptide of any one of Embodiments 1- 36 or a polynucleotide of Embodiment 37. [00254] Embodiment 43: A method of inducing an immune response in a subject, the method comprising: administering to the subject a fusion polypeptide of any one of Embodiments 1-36 or a polynucleotide of Embodiment 37 in an amount effective to produce an antigen specific immune response. Some Selected Definitions [00255] For convenience, the meaning of some terms and phrases used in the specification, examples, and appended claims, are provided below. Unless stated otherwise, or implicit from context, the following terms and phrases include the meanings provided below. The definitions are provided to aid in describing particular embodiments, and are not intended to limit the claimed invention, because the scope of the invention is limited only by the claims. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. If there is an apparent discrepancy between the usage of a term in the art and its definition provided herein, the definition provided within the specification shall prevail. [00256] As used herein, a “subject” means a human or animal. Usually the animal is a vertebrate such as a primate, rodent, domestic animal or game animal. Primates include chimpanzees, cynomolgus monkeys, spider monkeys, and macaques, e.g., Rhesus. Rodents include mice, rats, woodchucks, ferrets, rabbits and hamsters. Domestic and game animals include cows, horses, pigs, deer, bison, buffalo, feline species, e.g., domestic cat, canine species, e.g., dog, fox, wolf, avian species, e.g., chicken, emu, ostrich, and fish, e.g., trout, catfish and salmon. In some embodiments of any of the aspects, the subject is a mammal, e.g., a primate, e.g., a human. The terms, “individual,” “patient” and “subject” are used interchangeably herein. [00257] Preferably, the subject is a mammal. The mammal can be a human, non-human primate, mouse, rat, dog, cat, horse, or cow, but is not limited to these examples. Mammals other than humans can be advantageously used as subjects that represent animal models of immunization and immune response. A subject can be male or female. [00258] A subject can be one who has been previously diagnosed with or identified as suffering from or having a condition (e.g., has been diagnosed with an infection) or one or more complications related to such a condition, and optionally, have already undergone treatment for the condition or the one or more complications related to the condition. Alternatively, a subject can also be one who has not been previously diagnosed as having the condition or one or more complications related to the condition. For example, a subject can be one who exhibits one or more risk factors for the condition or one or more complications related to the condition or a subject who does not exhibit risk factors. [00259] A “subject in need” of treatment for a particular condition can be a subject having that condition, diagnosed as having that condition, or at risk of developing that condition (e.g., an infection). [00260] As used herein, a “host subject” is a subject that has been infected with a pathogen, microorganism, or bacteria. The host subject can be symptomatic or asymptomatic. The host subject can also be a carrier for the microorganism. In some embodiments of any of the aspects, the host subject is a mammal. In some embodiments of any of the aspects, the host subject is a human. [00261] As used herein, an “immune response” refers to a response by a cell of the immune system, such as a B cell, T cell (CD4 or CD8), regulatory T cell, antigen-presenting cell, dendritic cell, monocyte, macrophage, NKT cell, NK cell, basophil, eosinophil, or neutrophil, to a stimulus (e.g., to a vaccine composition, fusion protein, antigen or fragment thereof). In some embodiments of the aspects described herein, the response is specific for at least one particular antigen (e.g., an antigen of SARS-CoV2 virus), and refers to a response by a CD4 T cell, CD8 T cell, or B cell via their antigen-specific receptor. Such responses by these cells can include, for example, cytotoxicity, proliferation, antibody, cytokine, or chemokine production, trafficking, or phagocytosis, and can be dependent on the nature of the immune cell undergoing the response. [00262] As used herein, the term “provoking an immune response” refers to stimulation of an immune response, an induction, or increase in the immune response to a pathogenic microorganism. The term “provoking an immune response” as used herein can mean any one or more of the following: (i) the prevention of infection or re-infection, as in a traditional vaccine, (ii) the reduction in the severity of, or, in the elimination of symptoms, and (iii) the substantial or complete elimination of the pathogenic microorganism or disorder in question. Hence, provoking an immune response may be effected prophylactically (prior to infection) or therapeutically (following infection). In the present methods described herein, prophylactic treatment is the preferred mode. According to a particular embodiment, the vaccine compositions and methods described herein treat, including prophylactically and/or therapeutically immunize, a host animal against a microbial infection (e.g., a viral infection). The methods of the present technology are useful for conferring prophylactic and/or therapeutic immunity to a subject. The methods described herein can also be practiced on subjects for biomedical research applications. [00263] The term “vaccine composition” used herein is defined as a composition used to provoke or stimulate an immune response against an antigen or fragment thereof or a nucleic acid encoding such antigen or fragment thereof within the composition in order to protect or treat an organism against disease. In some embodiments of any of the aspects, the vaccine composition is a suspension of attenuated or killed microorganisms (e.g., bacteria, viruses, or fungi) or of antigenic proteins or nucleic acids derived from them, administered for prevention, amelioration, or treatment of infectious diseases. The terms “vaccine composition” and “vaccine” are used interchangeably. [00264] As used herein, the term “antigen” refers to a molecule that is derived from a pathogenic microorganism (e.g., a virus). Typically, antigens are bound by the host subject’s antibody ligands and are capable of raising or causing an antibody immune response in vivo by the host subject. An antigen can be a polypeptide, fusion polypeptide, protein, nucleic acid, or other molecule. The term “antigenic determinant” refers to an epitope on the antigen recognized by an antigen-binding molecule (e.g., an antibody, antibody reagent, or a polypeptide fragment thereof), and more particularly, by the antigen-binding site of said molecule. [00265] As used herein, the term “antigen specific immune response” refers to stimulation of an immune response, an induction, or increase in the immune response to an antigen or fusion polypeptide provided herein. [00266] As used herein, the term “fragment” refers to one or more portions of a polypeptide that retains the ability to provoke an immune response. The fragment can be a nucleic acid encoding a portion of the antigen or a polypeptide. A fragment can be a polypeptide or a nucleic acid encoding a polypeptide comprising 5 amino acids or more, 10 amino acids or more, 15 amino acids or more, 20 amino acids or more, 25 amino acids or more, 30 amino acids or more, or 35 amino acids or more. [00267] As used herein, the terms “protein” and “polypeptide” and “encoded polypeptide” are used interchangeably herein to designate a series of amino acid residues, connected to each other by peptide bonds between the alpha-amino and carboxy groups of adjacent residues. The terms “protein”, and “polypeptide” refer to a polymer of amino acids, including modified amino acids (e.g., phosphorylated, glycated, glycosylated, etc.) and amino acid analogs, regardless of its size or function. “Protein” and “polypeptide” are often used in reference to relatively large polypeptides, whereas the term “peptide” is often used in reference to small polypeptides, but usage of these terms in the art overlaps. The terms “protein” and “polypeptide” are used interchangeably herein when referring to a gene product and fragments thereof. Thus, exemplary polypeptides or proteins include gene products, naturally occurring proteins, homologs, orthologs, paralogs, fragments and other equivalents, variants, fragments, and analogs of the foregoing. [00268] As used herein, the term “fusion polypeptide” refers to an engineered polypeptide that comprises or consists of the domains provided herein. In some embodiments, a domain comprises a viral polypeptide or fragment thereof. [00269] As used herein, the term “viral polypeptide” refers to a polypeptide expressed by a virus. In some embodiments, a viral polypeptide is a surface protein expressed by a virus. [00270] As used herein, the term “nucleic acid” or “nucleic acid sequence” refers to any molecule, preferably a polymeric molecule, incorporating units of ribonucleic acid, deoxyribonucleic acid or an analog thereof. The nucleic acid can be either single-stranded or double-stranded. A single-stranded nucleic acid can be one nucleic acid strand of a denatured double- stranded DNA. Alternatively, it can be a single-stranded nucleic acid not derived from any double-stranded DNA. In one aspect, the nucleic acid can be DNA. In another aspect, the nucleic acid can be RNA. Suitable DNA can include, e.g., genomic DNA or cDNA. Suitable RNA can include, e.g., mRNA. [00271] The term “vector”, as used herein, refers to a nucleic acid construct designed for delivery to a host cell or for transfer between different host cells. As used herein, a vector can be viral or non-viral. The term “vector” encompasses any genetic element that is capable of replication when associated with the proper control elements and that can transfer gene sequences to cells. A vector can include, but is not limited to, a cloning vector, an expression vector, a plasmid, phage, transposon, cosmid, chromosome, virus, virion, etc. [00272] In some embodiments of any of the aspects, a polypeptide or nucleic acid as described herein can be engineered. As used herein, “engineered” refers to the aspect of having been manipulated by the hand of man. For example, a polypeptide is considered to be “engineered” when at least one aspect of the polypeptide, e.g., its sequence, has been manipulated by the hand of man to differ from the aspect as it exists in nature. As is common practice and is understood by those in the art, progeny of an engineered cell are also typically still referred to as “engineered” even though the actual manipulation was performed on a prior entity. [00273] As used herein, the term “derived from” refers to the aspect of a molecule, substance. polypeptide, nucleic acid, sugar, lipid, etc. as being from a parent substance (e.g., a cell or membrane) or organism (e.g., a microorganism). In the context of antigens and fragments thereof, the term “derived from” encompasses an antigen or fragment thereof that is expressed by, purified, or isolated from a microorganism as described herein. By way of example only, the antigen, viroporin 3a is derived from SARS-CoV2, the virus responsible for the worldwide COVID19 pandemic. [00274] As used herein, the term “pharmaceutical composition” refers to the fusion polypeptide, composition, antigen or fragment thereof as described herein in combination with a pharmaceutically acceptable carrier e.g. a carrier commonly used in the pharmaceutical industry. The phrase “pharmaceutically acceptable” is employed herein to refer to those polypeptides, antigens, nucleic acids encoding said polypeptides or antigens, compounds, materials, compositions, and/or dosage forms which are, within the scope of sound medical judgment, suitable for use in contact with the tissues of human beings and animals without excessive toxicity, irritation, allergic response, or other problem or complication, commensurate with a reasonable benefit/risk ratio. In some embodiments of any of the aspects, a pharmaceutically acceptable carrier can be a carrier other than water. In some embodiments of any of the aspects, a pharmaceutically acceptable carrier can be a cream, emulsion, gel, liposome, nanoparticle, and/or ointment. In some embodiments of any of the aspects, a pharmaceutically acceptable carrier can be an artificial or engineered carrier, e.g., a carrier that the active ingredient would not be found to occur in in nature. [00275] As used herein, the term “administering,” refers to the placement of a vaccine composition, fusion polypeptide, antigen, or antigen fragment thereof as described herein into a subject by a method or route which results in at least partial delivery of the vaccine composition at a desired site. Pharmaceutical and vaccine compositions comprising the antigens or fragments of antigens described herein can be administered by any appropriate route which results in an effective treatment in the subject. [00276] The phrases “parenteral administration” and “administered parenterally” as used herein, refer to modes of administration other than enteral and topical administration, usually by injection. The phrases “systemic administration,” “administered systemically”, “peripheral administration” and “administered peripherally” as used herein refer to the administration of a therapeutic composition other than directly into a target site, tissue, or organ, such a site of infection, such that it enters the subject’s circulatory system and, thus, is subject to metabolism and other like processes. In other embodiments, the fusion polypeptide or fragment thereof is administered locally, e.g., by direct injections, when the disorder or location of the infection permits, and the injections can be repeated periodically. [00277] As used herein, the term “multiple” refers to a number of at least two, more than two, or greater than two. In the context of strains of viruses, multiple refers to two or more strains known in the art for that given virus (e.g., coronavirus). [00278] As used herein, the terms “treat,” “treatment,” “treating,” or “amelioration” refer to therapeutic treatments, wherein the object is to reverse, alleviate, ameliorate, inhibit, slow down or stop the progression or severity of a condition associated with, a disease or disorder. The term “treating” includes reducing or alleviating at least one adverse effect or symptom of a condition, disease or disorder associated with an infection. Treatment is generally “effective” if one or more symptoms or clinical markers are reduced. Alternatively, treatment is “effective” if the progression of a disease is reduced or halted. That is, “treatment” includes not just the improvement of symptoms or markers, but also a cessation or at least slowing of progress or worsening of symptoms that would be expected in absence of treatment. Beneficial or desired clinical results include, but are not limited to, alleviation of one or more symptom(s), diminishment of extent of disease, stabilized (i.e., not worsening) state of disease, delay or slowing of disease progression, amelioration or palliation of the disease state, and remission (whether partial or total), whether detectable or undetectable. The term “treatment” of a disease also includes providing relief from the symptoms or side-effects of the disease (including palliative treatment). [00279] As used herein “preventing” or “prevention” refers to any methodology where the disease state does not occur due to the actions of the methodology (such as, for example, administration of an antigen or fragment thereof or vaccine composition as described herein). In one aspect, it is understood that prevention can also mean that the disease is not established to the extent that occurs in untreated controls. Accordingly, prevention of a disease encompasses a reduction in the likelihood that a subject can develop the disease, relative to an untreated subject (e.g. a subject who is not treated with the methods or compositions described herein). [00280] The terms “decreased”, “reduced”, “reduction”, or “inhibit” are all used herein to mean a decrease by a statistically significant amount. In some embodiments of any of the aspects, “reduce,” “reduction” or “decreased” or “inhibit” typically means a decrease by at least 10% as compared to a reference level (e.g. the absence of a given treatment or vaccine composition) and can include, for example, a decrease by at least about 10%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, at least about 99% , or more. As used herein, “reduction” or “inhibition” does not encompass a complete inhibition or reduction as compared to a reference level. “Complete inhibition” is a 100% inhibition as compared to a reference level. A decrease can be preferably down to a level accepted as within the range of normal for an individual without a given disorder. [00281] The terms “increased”, “increase”, “enhance”, or “activate” are all used herein to mean an increase by a statistically significant amount. In some embodiments of any of the aspects, the terms “increased”, “increase”, “enhance”, or “activate” can mean an increase of at least 10% as compared to a reference level, for example an increase of at least about 20%, or at least about 30%, or at least about 40%, or at least about 50%, or at least about 60%, or at least about 70%, or at least about 80%, or at least about 90% or up to and including a 100% increase or any increase between 10-100% as compared to a reference level, or at least about a 2-fold, or at least about a 3-fold, or at least about a 4-fold, or at least about a 5-fold or at least about a 10-fold increase, or any increase between 2-fold and 10-fold or greater as compared to a reference level. In the context of a marker or symptom, a “increase” is a statistically significant increase in such level. [00282] As used herein, the term “modulates” or “modulation” refers to an effect including increasing or decreasing a given parameter as those terms are defined herein. [00283] As used herein a “reference level” refers to a level of a given parameter measured or detected in a normal, otherwise unaffected, or untreated population of microorganisms (e.g., viruses or portions thereof cultured in vitro, virus or portions thereof obtained from a healthy subject, or virus or portions thereof obtained from a subject at a prior time point) or in a subject (e.g., a subject that does not have an infection). One of skill in the art would be able to choose the appropriate reference level for the desired experiment or test. For example, the microorganisms (e.g., virus) cultured in vitro can be cultured and isolated from host cells in a chemically defined medium with or without supplementation. [00284] Generally, a reference level refers to the level of a polypeptide, an antigen or fragment thereof, or a nucleic acid encoding an antigen or fragment thereof expressed by a microorganism (e.g., virus) which is not present in a subject (i.e., a microorganism which is not in vivo), not genetically modified, and is grown in culture in vitro. For example, the microorganism can be commercially available that was not cultured directly from a host subject or a strain originally obtained from a host subject but which is cultured in vitro at the time the reference level was determined [00285] The term “statistically significant” or “significantly” refers to statistical significance and generally means a two standard deviation (2SD) or greater difference. [00286] Other than in the operating examples, or where otherwise indicated, all numbers expressing quantities of ingredients or reaction conditions used herein should be understood as modified in all instances by the term “about.” The term “about” when used in connection with percentages can mean ±1%. [00287] As used herein, the term “comprising” means that other elements can also be present in addition to the defined elements presented. The use of “comprising” indicates inclusion rather than limitation. [00288] The term “consisting of” refers to compositions, methods, and respective components thereof as described herein, which are exclusive of any element not recited in that description of the embodiment. [00289] As used herein the term “consisting essentially of” refers to those elements required for a given embodiment. The term permits the presence of additional elements that do not materially affect the basic and novel or functional characteristic(s) of that embodiment of the invention. [00290] It is understood that the foregoing description and the following examples are illustrative only and are not to be taken as limitations upon the scope of the invention. Various changes and modifications to the disclosed embodiments, which will be apparent to those of skill in the art, may be made without departing from the spirit and scope of the present invention. Further, all patents, patent applications, and publications identified are expressly incorporated herein by reference for the purpose of describing and disclosing, for example, the methodologies described in such publications that might be used in connection with the present invention. These publications are provided solely for their disclosure prior to the filing date of the present application. Nothing in this regard should be construed as an admission that the inventors are not entitled to antedate such disclosure by virtue of prior invention or for any other reason. All statements as to the date or representation as to the contents of these documents are based on the information available to the applicants and do not constitute any admission as to the correctness of the dates or contents of these documents. EXAMPLE EXAMPLE 1: Protein Covariance Networks Reveal Interactions Important to the Emergence of SARS Coronaviruses as Human Pathogens [00291] SARS-CoV-2 is one of three recognized coronaviruses (CoVs) that have caused epidemics or pandemics in the 21st century and that have likely emerged from animal reservoirs based on genomic similarities to bat and civet viruses. The methods and compositions provided herein, relate in part, to the discovery of conserved interactions between amino acid residues in all proteins encoded by SARS-CoV-related viruses. Accordingly, pairs and networks of residue variants that exhibited statistically high frequencies of covariance with each other can be used as a new computational approach (Covariance-based Phylogeny Analysis) for understanding viral evolution and adaptation. Provided herein is evidence that the evolutionary processes that converted a bat virus into a human pathogen occurred through recombination with other viruses in combination with new adaptive mutations important for entry into human cells. Results [00292] The methods and compositions provided herein are related, in part, to the identification of variations in protein sequence of SARS-CoV2 to understand CoV evolution and the key functional interactions that drive adaption to new hosts or that influence transmission and pathogenicity. [00293] The inventors selected the conserved CoV proteins called 1a/1b, Spike(S), 3a, E, M, and N from a set of 847 viral genomes provided herein (TABLE 2, above). The alignment resulted in a 9639 amino acid consensus sequence with only 2% of sites being gaps with low coverage and 2% with low sequence conservation. Because there are regions of diverged nucleotide identity in these viral genomes, the goal was to use amino acid identity to initially estimate phylogeny and then integrate that analysis with the identification of covariant residues within these six core viral proteins. Both SARS-CoV and SARS-CoV-2 are represented by a large collection of independent isolates from previous and current epidemics as well as from variants selected during passage in various laboratories (TABLE 2, above) (Elbe and Buckland-Merrett, 2017). Similar to nucleic acid sequence-based analyses, our constructed phylogeny (FIG. 3) shows that SARS-CoV is closely related to CoVs found in civets and groups of bat CoVs that were previously suggested as likely ancestors (Li, 2005; Song et al., 2005). In contrast, SARS-CoV-2 is a close relative to bat CoV RATG13 and also more similar to two bat CoVs, SZXC21 and SZC45, than other bat CoVs (Paraskevis et al., 2020). Five recently isolated pangolin CoVs identified as closely related to SARS-CoV-2 are also related to these three bat CoVs (Zhang et al., 2020) and our protein based analysis supports this conclusion. Thus, amino acid sequence-based analysis supports the conclusion that both bat and pangolin CoVs share or represent probable common ancestors of SARS-CoV-2 (Xia, 2020). [00294] To survey the frequency of covariance among CoVs, we extracted pairs and groups (here designated ‘Clusters’) of covarying amino acid residues and organized these using a correlating tandem model that was set at different purity thresholds between 0.8 and 1.0. A stringent purity threshold (0.96) was selected to reduce noise based on our sampling size. Then force- directed mapping algorithms were applied to visualize the relationships between these clusters of covarying residues and the CoV isolates from which they were derived (Jacomy et al., 2014). This graphing technique simulates repulsion between nodes (residues, clusters, and genomes) as well as attraction between edges (links between nodes) and then plots them to minimize both the complexity (fewer crossed edges) and to reduce edge lengths in two dimensions. Nodes are force- directed according to hierarchy and this orients these based on sequential linkage (edges) to clusters and then residues. Remarkably, we find that force-directed mapping of the linkage of covariance clusters readily organized all CoV isolates into groups that were consistent with phylogenic analysis (FIG.1 and FIG.3). Thus, related genomes are forced into groupings in the middle of the graph surrounded by related clusters that are then flanked by networks of covarying residues. Furthermore, we provide every cluster and all residues identities by gene, genome, and host in a table and Gephi force map file as an interactive website available on the world-wide web at https <sarscov2-9d60e.web.app/>. The collective approach to understanding viral evolution is referred to herein as ‘CoVariance based Phylogeny Analysis (CoVPA)’ in that it provides novel insights into the evolutionary origins of viral genomes. [00295] Some covariant clusters are ubiquitous to the entire grouping while others exclusive to only one strain. We binned these clusters to compare covarying residues that are restricted to isolates from a given host species or a given group of CoVs. By comparing networks of clusters of covarying residues between distinct groupings, we could identify clusters that are restricted to various combinations of bat, civet, pangolin, and human CoV isolates. Thus, this annotated comprehensive dataset identifies all distinct residue identities that strongly covary in each CoV protein and the distribution of these in CoVs that include the human pathogens SARS-CoV and SARS-CoV-2. In its entirety, this provides a novel dataset that can be accessed through an interactive website available on the world-wide web at https <sarscov2-9d60e.web.app/>, which allows one to explore the relatedness of CoV isolates based on conserved amino acid interactions that contribute to structure, protein-protein interactions, or biological function that multiple proteins drive interactively. [00296] Both the civet and bat CoVs were previously identified as the closest relatives of SARS- CoV (Hu et al., 2017; Song et al., 2005) and these are positioned immediately adjacent to SARS- CoV in our force-directed network. There are a few residues that are different between Civet CoVs (Cluster 26) and SARS-CoV (Cluster 65), but 103 residues are uniquely shared by both (Cluster 182). This supports previous findings that few genomic changes occurred between civet and the human-adapted SARS-CoV and that the majority of these polymorphisms likely arose during the initial transition from bats to civets (Song et al., 2005)). We conclude that these three clusters represent residue identities that separate closely related bat CoVs from those enriched by passage and adaptation through other host species. The majority of these residues are in 1a/1b polyproteins, S, and 3a while covariance in E, M, and N is virtually absent. [00297] Analogous to the civet and SARS-CoV relationship, three bat CoVs including RATG13 and five pangolin CoVs are all closely related to SARS-CoV-2 and thus positioned in proximity in the network mapping because these share a significant number of covariant residues and clusters. In total, 10 clusters contain 1231 uniquely covarying residues specific to SARS-CoV-2 and the three most closely related bat CoVs. Covariant residues are highly represented in gene 1a encoding non-structural proteins (nsp1-nsp4), as well as the Spike protein and viroporin 3a (TABLE 3 and FIG 2A, 2C, 4A, 13B). It was contemplated herein that these enriched covariant residues include those that facilitated transmissibility into humans. Covariant residues are least represented in the RNA replication proteins, viroporin E, and membrane protein M. There were not clusters found to be exclusive to all SARS-CoV-2 strains apart from those emerging in clinical isolates (Clusters 5 & 23) and instead find common overlapping clusters of covariant residues. For example, SARS- CoV-2 covariant residues are found to overlap with RATG13 (Cluster 183), or with RATG13 and pangolin CoVs (Cluster 194), or with the three bat viruses RATG13/COVCZ45/COVXC21 (Cluster 190), or with these three bat CoVs with pangolin CoVs (Cluster 197). As these CoVs share significant nucleotide similarity with another, this is not surprising; however, our covariance analysis also reveals key diverging features for the SARS-CoV-2 group that likely have biological significance. For example, there are covariant residues in the SARS-CoV-2 Spike (S protein) that reside in its receptor-binding domain (RBD), its amino-terminal domain (NTD), and in a structural region proximal to the furin cleavage site in its C-terminal domain (CTD). These SARS-CoV-2 covariant residues are likely key to the adaptation of this virus to new host species (see below). [00298] Next, the inventors determined the covarying residues that allowed SARS CoV-2 to emerge as a human pathogen by passage through intermediate hosts. To demonstrate the results of this analysis, the stepwise progression and location of SARS-CoV-2 covariant clusters that overlap with those found in 48 bat CoVs and its related bat isolate, RATG13 are shown (FIG. 2A, 2B, TABLE 2, above) All covariant residue positions across multiple groups of CoV genomes were plotted to map their locations at the gene product level. The largest density of covarying residues in this progression to SARS-CoV-2 maps to both the NTD and RBD subdomains of S protein and the 3a viroporin protein. In conclusion, enriched covariance networks between S and 3a allowed certain CoV isolates to adapt to growth in intermediary hosts and ultimately humans. [00299] The role of covariant residues distinctly enriched in a specific host may be important in the context of adaption and infection. To separate such ‘restricted’ variants that are unique to CoVs from different species, we binned groups of covariants and then compared their mapped locations on the genome (FIG. 2B). Those classified as ‘restricted’ are found in clusters linked to specific groups and absent in other groups. Surprisingly both bat and pangolin-restricted covariants are represented in distinct regions of the genome; for example, bat restricted residues are primarily located in non-structural proteins encoded in 1b (2’O-Mtase, NendoU, ExoN) and enriched in 3a and N but not S or M. Clearly SARS-CoV-2 covariant residues belong to clusters linked to bat CoVs, but it was also found residues that are linked to human CoVs which are absent in other restricted categories. These covariant residues are most highly correlated with the pandemic emergence of SARS-CoV-2 (see below). [00300] A large section in gene 1a (~3000 AA, nsP1-nsP4) is unique to SARS-CoV-2 and distinct from the rest of the genome based on our covariance analysis. The context of this difference would not be readily apparent using standard nucleic acid-based single nucleotide polymorphism analysis. The identity and network of covariant residues in this region are unique and do not match those found in bat viruses including those most similar such as RATG13 suggesting that SARS- CoV-2 experienced different selective pressures and evolutionary history (FIG. 2A and 2B). The 3’-boundary of this region is aligned to a recombinational hotspot (Tagliamonte et al., 2020) and this suggests that hybridization of different CoV isolates likely led to the emergence of SARS- CoV-2. Though this region of SARS-CoV-2 is most similar to RATG13, the divergence of covariance residues between bat and pangolin CoVs here provides strong evidence that SARS- CoV-2 emerged from an independent distinct source (FIG.2C). [00301] In contrast to host-restricted covariant clusters, some residues are conserved as covarying residues in other CoVs. For example, the Enriched Conserved A group contains residues that align in at least more than one independent category (FIG. 2A and 2C); these enriched covariant groups include residues that are part of large networks of covarying residues (for example, in the nsp1-nsp4 region). This particular group was found to be also densely enriched in the NTD and RBD of Spike, in much of 3a, and in both the CTD and NTD domains in N. The networks of covariant residues in the S protein may provide plasticity in protein domains important for receptor recognition and possibly escape from antibody responses given that the same regions are rich in dominant neutralizing epitopes (He et al., 2005; Qiu et al., 2005). [00302] The distribution and roles for the SARS-CoV-2 residues were determined using the consensus sequences and by mapping these to domains in the S protein using the structural information present in the recently solved SARS-CoV2 S trimer PDB file (Walls et al., 2020). In line with bat COVs, covariant residues in the S protein of SARS-CoV-2 are densely represented in both its NTD (residues 1- 330) and its RBD (residues 333-527) (Table 3) This includes residues between 438 and 506 that are part of the RBD interface with the ACE-2 receptor and also those between 333 and 526 that are thought to help stabilize the β-sheet and short connecting helices and loops that form the core of the RBD (Lan et al., 2020). Between 438 and 510, 7 of the 16 residues that were shown to interact directly with ACE-2 (Lan et al., 2020) are indeed covariant (FIGS.4A- 4D). As covariance analysis detects residues that change with others, nearly all (6 of 7) are residues that differ between SARS-CoV and SARS-CoV-2. When aligned to known SARS-CoV RBD contacts with 80R, a potent neutralizing monoclonal antibody that has been co-crystalized with the S trimer (Hwang et al., 2006), 13 of 19 residues contacted by 80R are covariants that differ between both SARS-CoV and SARS-CoV-2 (Lan et al., 2020). Notably, though much of S is highly conserved in structure and many CoVs share ACE-2 as a common receptor, it has been suggested that residues in the RBD-ACE2 interface between SARS-CoV and SARS-CoV-2 are distinct enough to have arose by convergent evolution (Lan et al., 2020). Our analysis suggests that covariant residues in the NTD likely influenced the receptor binding specificity of the S protein and perhaps also its ability to escape neutralizing antibodies. [00303] To examine how the observed covariance compares to enriched mutations in the current epidemic, 13,611 clinical strains sequenced since the emergence of the human epidemic were examined from the GISAID database (Elbe and Buckland-Merrett, 2017). Covariant residue pairs most highly represented in clinical isolates were identified and were discovered to form small networks (FIG. 5A). These networks include some of the most abundant polymorphisms previously identified such D614G in the Spike protein and L336P in RdRp protein (Pachetti et al., 2020). SARS-CoV-2 Spike residues 604, 606, 607,619, and 622 within the region of this D614 residue are predicted to be covariant in our 847 strains with a number of residues in viroporin 3a ectodomain as well as other residues in the S protein located between its RBD and furin cleavage site (FIG. 5B). The 3a and E viroporins of SARS-CoV are thought to be a K+ and Ca++ efflux channels and it is interesting that furin is activated by these cations (Izidoro et al., 2010; Molloy et al., 1992). Furin cleavage of the S protein is strongly implicated in its activation for membrane fusion events (Coutard et al., 2020). In this regard, it was determined that Spike D614 maps near the furin cleavage site in the Spike trimer structure suggesting that 3a may be interacting with S during the process that leads to S cleavage and activation of membrane fusion. 3a and S protein have been found to form a complex but the nature of the interaction is unknown at the structural level (Shen et al., 2005; Tan, 2005; Tan et al., 2004). Curiously, these same residues in the region near D614 are found to also covary as pairs with residues in the NTD of the S protein suggesting there is a relationship between 3a protein, S domains proximal to the furin cleavage site, and its NTD. When the analysis is expanded to map patterns of covarying residues in a broad sampling of betacoronaviruses, an expanded putative evolutionary relationship suggesting interactions between residues 280-325 in the S NTD, residues 657-794 in the S-CTD, and residues in the 3a becomes apparent (FIG.13B, FIG.13C). The location of the majority of clustered residues in 3a map to the NTD ectodomain and within a predicted intracellular loop. A single residue (3a 117) is located in the cytoplasmic domain near the membrane (FIG 13A). [00304] The observed covariance in the 3a viroporin was mapped to predicted functional domains of this protein (FIG.5B, FIG.6A). The 3a protein of SARS-CoV is predicted to assemble as both a homodimer and a tetramer that form a membrane channel (viroporin) in the cell and possibly the viral particle membrane. The subdomains vary in conservation and it was noted that covariance is in distinct separate domains of 3a. There are covariant residues within a more variable NTD region (the first 30 residues of 3a) which is predicted to be surface-exposed as well as in other predicted extracellular and cytoplasmic loops (FIG. 6A). Antibodies to 3a and its amino-terminal ectodomain from convalescent-phase SARS-CoV patients are commonly found (Qiu et al., 2005; Zhong, 2006) and the covariant residues that map here overlaps with a hypervariable region predicted to be more antigenic (FIG.6A-6D). A cluster of covariable residues in the CTD between 168 and 189 were also identified; there is no role for this subdomain though it partially overlaps between residues 160 to 173 that contain putative intracellular protein sorting and trafficking motifs (Huang et al., 2006; Tan et al., 2004). In addition to a role in intracellular vesicle formation through interactions with Caveloin-1, the 3a viroporin of SARS-CoV protein is known to cause activation of the NLRP3 inflammasome, NF-kB activation, and chemokine induction as well as efficient viral release from infected cells (Castaño-Rodriguez et al., 2018; Chen et al., 2019; Freundt et al., 2010; Lu et al., 2006; Padhan et al., 2007). Thus identifying 3a as a hotspot for covariance and mutational selection may have relevance to the severe inflammation seen in COVID-19 disease and suggests that the 3a protein is a potential new target to consider for SARS-CoV-2 immunoprophylaxis. [00305] Provided herein are covariance networks with numerous residues in SARS-CoV-2. In addition, there are a few small clusters that can reveal host-adaptation such as those that differentiate Civet CoVs and SARS-CoV. For example, Cluster 62 identifies 5 residues that emerged in SARS-CoV during its further passage and adaptation in the mouse model. In the context of human CoV diseases, the inventors identified one covariance network of eight residues in the S protein (Cluster 126) found only in SARS-COV and SARS SARS-COV-2 and the most closely related civet, pangolin, and bat CoVs (FIG. 7A-7B). Though some of these residues are missing from the available models of the Spike trimers from both SARS-CoV and SARS-CoV-2, residues Q23, H66, and V214 are proximal to one another when comparing the tertiary NTD structures of both (FIG. 7B-7C). The relative position of two other covariant residues L54 and M153 in the NTD is difficult to determine in that they reside within disordered and exposed regions in the cryo- EM S trimer reconstructions of SARS-CoV and SARS-CoV-2. There are no known interactions or roles for these protruding NTD regions of the S trimers. Because the NTD of the S protein is especially enriched in covariant residues, host- or immune-driven selective pressure has likely occurred within this subdomain. In combination, covariant residues in S protein appear to have emerged independently and are enriched in viruses that caused two independent pandemics suggesting they are particularly important to human infection. [00306] Previously covariant network analysis applied to other RNA and DNA viruses demonstrates observed covariance in viruses is often not an artifact of chance and likely influenced by imposed pressure, structure, and selection, including those provided by therapeutics (Aurora et al., 2009; Donlin et al., 2012; Sruthi and Prakash, 2019). Coevolving residues for large orthologous groups of proteins have also been useful in predicting structure (de Juan et al., 2013) and protein-protein interactions (Kamisetty et al., 2013) that are targeted for drug interference. In the present study, networks of co-varying amino acid residues were identified in proteins encoded by nearly 13,600 CoV isolates (FIG.8). Because these interactions are critical to the CoV family of viruses based on their conservation in the evolution and adaption of these viruses to humans, it is contemplated that they can define novel antigen and drug targets for CoV-related viruses. These targets are uniquely valuable because they are likely under strong selective pressure at two or more amino acid sites in one or more proteins; this property will make it less likely for a virus to mutant to escape immune responses or anti-viral drugs directed at protein sites enriched in such covariant- related targets. Finally, these data also provide evidence for new adaptive recombination events and mutations that likely drove the emergence of SARS-CoV-related viruses into the human host. These insights inform preparedness through therapeutic and vaccine development, as well as public health policy. Methods [00307] Genome and protein sequences were sourced from available NCBI and GISAID public databases. Protein sequences for genes in 1a,1b, Spike, 3a, E, M, and N were concatenated and aligned using CLC Bio Workbench (v8) for the 847 genomes using a gap open cost of 2.0 and a gap extension cost of 1.0 in very accurate mode and with MAFFT for the clinical SARS-CoV2 strain using the default FFT-NS-2 setting (Nakamura et al., 2018). Only the 13,611 clinical strains with significant contiguous coverage over the reference genes (>95%) were kept in the alignment. For Maximum Likelihood Phylogeny, we applied a WAG protein substitution model to our alignment and performed bootstrap analysis using 1000 replicates. [00308] Pairwise and multiple residue covariance and scored were predicted using FastCov (Shen and Li, 2016). We used calculated purity score for stringency cutoffs, but for clarity a raw table of predicted covariants is provided as Table 2. This allowed binning of clusters and respective strains for Force Mapping in Gephi and comparison of covariant residues based on clusters and strains (Jacomy et al., 2014). All clusters, strains, and residues were tabulated. [00309] Clusters and residues were mapped in Gephi using the MultiGravity ForceAtlas 2 algorithm and this file provided as an interactive website available on the world-wide web at sarscov2-9d60e.web.app. [00310] Residues in Spike were mapped onto the PDB structure for Spike (6VXX.pdb and 6ACC.pdb) using PyMol (v.2.3.4) (Song et al., 2018; Walls et al., 2020). Arpeggio was used to calculate interacting residues in the PDB file (Jubb et al., 2017). Circular graphing key collections of residues was done using Circos (Krzywinski et al., 2009). Bibliography: 1) Al-Omari, A., Rabaan, A.A., Salih, S., Al-Tawfiq, J.A., Memish, Z.A., 2019. MERS coronavirus outbreak: Implications for emerging viral infections. Diagn. Microbiol. Infect. Dis.93, 265–285. https://doi.org/10.1016/j.diagmicrobio.2018.10.011 2) Aurora, R., Donlin, M.J., Cannon, N.A., Tavis, J.E., Group, the V.-C.S., 2009. Genome- wide hepatitis C virus amino acid covariance networks can predict response to antiviral therapy in humans. J. Clin. Invest.119, 225. https://doi.org/10.1172/JCI37085 3) Castaño-Rodriguez, C., Honrubia, J.M., Gutiérrez-Álvarez, J., DeDiego, M.L., Nieto- Torres, J.L., Jimenez-Guardeño, J.M., Regla-Nava, J.A., Fernandez-Delgado, R., Verdia- Báguena, C., Queralt-Martín, M., Kochan, G., Perlman, S., Aguilella, V.M., Sola, I., Enjuanes, L., 2018. Role of Severe Acute Respiratory Syndrome Coronavirus Viroporins E, 3a, and 8a in Replication and Pathogenesis. mBio 9, e02325-17. https://doi.org/10.1128/mBio.02325-17 4) Chen, I.-Y., Moriyama, M., Chang, M.-F., Ichinohe, T., 2019. Severe Acute Respiratory Syndrome Coronavirus Viroporin 3a Activates the NLRP3 Inflammasome. Front. Microbiol.10. https://doi.org/10.3389/fmicb.2019.00050 5) Corman, V.M., Ithete, N.L., Richards, L.R., Schoeman, M.C., Preiser, W., Drosten, C., Drexler, J.F., 2014. Rooting the Phylogenetic Tree of Middle East Respiratory Syndrome Coronavirus by Characterization of a Conspecific Virus from an African Bat. J. Virol. 88, 11297. https://doi.org/10.1128/JVI.01498-14 6) Coutard, B., Valle, C., de Lamballerie, X., Canard, B., Seidah, N.G., Decroly, E., 2020. The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same clade. Antiviral Res. 176, 104742. https://doi.org/10.1016/j.antiviral.2020.104742 7) de Juan, D., Pazos, F., Valencia, A., 2013. Emerging methods in protein co-evolution. Nat. Rev. Genet.14, 249–261. https://doi.org/10.1038/nrg3414 8) Donlin, M.J., Szeto, B., Gohara, D.W., Aurora, R., Tavis, J.E., 2012. Genome-Wide Networks of Amino Acid Covariances Are Common among Viruses. J. Virol. 86, 3050– 3063. https://doi.org/10.1128/JVI.06857-11 9) Elbe, S., Buckland-Merrett, G., 2017. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob. Chall. Hoboken NJ 1, 33–46. https://doi.org/10.1002/gch2.1018 10) Freundt, E.C., Yu, L., Goldsmith, C.S., Welsh, S., Cheng, A., Yount, B., Liu, W., Frieman, M.B., Buchholz, U.J., Screaton, G.R., Lippincott-Schwartz, J., Zaki, S.R., Xu, X.-N., Baric, R.S., Subbarao, K., Lenardo, M.J., 2010. The Open Reading Frame 3a Protein of Severe Acute Respiratory Syndrome-Associated Coronavirus Promotes Membrane Rearrangement and Cell Death. J. Virol.84, 1097. https://doi.org/10.1128/JVI.01662-09 11) Graham, R.L., Baric, R.S., 2010. Recombination, Reservoirs, and the Modular Spike: Mechanisms of Coronavirus Cross-Species Transmission. J. Virol. 84, 3134. https://doi.org/10.1128/JVI.01394-09 12) He, Y., Lu, H., Siddiqui, P., Zhou, Y., Jiang, S., 2005. Receptor-Binding Domain of Severe Acute Respiratory Syndrome Coronavirus Spike Protein Contains Multiple Conformation- Dependent Epitopes that Induce Highly Potent Neutralizing Antibodies. J. Immunol. 174, 4908. https://doi.org/10.4049/jimmunol.174.8.4908 13) Hoffmann, M., Kleine-Weber, H., Schroeder, S., Krüger, N., Herrler, T., Erichsen, S., Schiergens, T.S., Herrler, G., Wu, N.-H., Nitsche, A., Müller, M.A., Drosten, C., Pöhlmann, S., 2020. SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor. Cell 181, 271-280.e8. https://doi.org/10.1016/j.cell.2020.02.052 14) Hu, B., Zeng, L.-P., Yang, X.-L., Ge, X.-Y., Zhang, W., Li, B., Xie, J.-Z., Shen, X.-R., Zhang, Y.-Z., Wang, N., Luo, D.-S., Zheng, X.-S., Wang, M.-N., Daszak, P., Wang, L.-F., Cui, J., Shi, Z.-L., 2017. Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. PLOS Pathog. 13, e1006698. https://doi.org/10.1371/journal.ppat.1006698 15) Huang, C., Narayanan, K., Ito, N., Peters, C.J., Makino, S., 2006. Severe Acute Respiratory Syndrome Coronavirus 3a Protein Is Released in Membranous Structures from 3a Protein- Expressing Cells and Infected Cells. J. Virol. 80, 210. https://doi.org/10.1128/JVI.80.1.210-217.2006 16) Hwang, W.C., Lin, Y., Santelli, E., Sui, J., Jaroszewski, L., Stec, B., Farzan, M., Marasco, W.A., Liddington, R.C., 2006. Structural Basis of Neutralization by a Human Anti-severe Acute Respiratory Syndrome Spike Protein Antibody, 80R. J. Biol. Chem. 281, 34610– 34616. https://doi.org/10.1074/jbc.M603275200 17) Izidoro, M.A., Assis, D.M., Oliveira, V., Santos, J.A.N., Juliano, M.A., Lindberg, I., Juliano, L., 2010. Effects of magnesium ions on recombinant human furin: selective activation of hydrolytic activity upon substrates derived from virus envelope glycoprotein. Biol. Chem.391, 1105–1112. https://doi.org/10.1515/bc.2010.114 18) Jacomy, M., Venturini, T., Heymann, S., Bastian, M., 2014. ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software. PLoS ONE 9, e98679. https://doi.org/10.1371/journal.pone.0098679 19) Jubb, H.C., Higueruelo, A.P., Ochoa-Montaño, B., Pitt, W.R., Ascher, D.B., Blundell, T.L., 2017. Arpeggio: A Web Server for Calculating and Visualising Interatomic Interactions in Protein Structures. Comput. Resour. Mol. Biol. 429, 365–371. https://doi.org/10.1016/j.jmb.2016.12.004 20) Kamisetty, H., Ovchinnikov, S., Baker, D., 2013. Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era. Proc. Natl. Acad. Sci.110, 15674. https://doi.org/10.1073/pnas.1314045110 21) Krzywinski, M.I., Schein, J.E., Birol, I., Connors, J., Gascoyne, R., Horsman, D., Jones, S.J., Marra, M.A., 2009. Circos: An information aesthetic for comparative genomics. Genome Res. https://doi.org/10.1101/gr.092759.109 22) Lan, J., Ge, J., Yu, J., Shan, S., Zhou, H., Fan, S., Zhang, Q., Shi, X., Wang, Q., Zhang, L., Wang, X., 2020. Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor. Nature. https://doi.org/10.1038/s41586-020-2180-5 23) Li, W., 2005. Bats Are Natural Reservoirs of SARS-Like Coronaviruses. Science 310, 676– 679. https://doi.org/10.1126/science.1118391 24) Lu, G., Wang, Q., Gao, G.F., 2015. Bat-to-human: spike features determining ‘host jump’ of coronaviruses SARS-CoV, MERS-CoV, and beyond. Trends Microbiol. 23, 468–478. https://doi.org/10.1016/j.tim.2015.06.003 25) Lu, W., Zheng, B.-J., Xu, K., Schwarz, W., Du, L., Wong, C.K.L., Chen, J., Duan, S., Deubel, V., Sun, B., 2006. Severe acute respiratory syndrome-associated coronavirus 3a protein forms an ion channel and modulates virus release. Proc. Natl. Acad. Sci. 103, 12540. https://doi.org/10.1073/pnas.0605402103 26) Molloy, S.S., Bresnahan, P.A., Leppla, S.H., Klimpel, K.R., Thomas, G., 1992. Human furin is a calcium-dependent serine endoprotease that recognizes the sequence Arg-X-X- Arg and efficiently cleaves anthrax toxin protective antigen. J. Biol. Chem. 267, 16396– 16402. 27) Nakamura, T., Yamada, K.D., Tomii, K., Katoh, K., 2018. Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics 34, 2490–2492. https://doi.org/10.1093/bioinformatics/bty121 28) Pachetti, M., Marini, B., Benedetti, F., Giudici, F., Mauro, E., Storici, P., Masciovecchio, C., Angeletti, S., Ciccozzi, M., Gallo, R.C., Zella, D., Ippodrino, R., 2020. Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant. J. Transl. Med.18, 179. https://doi.org/10.1186/s12967-020-02344-6 29) Padhan, K., Tanwar, C., Hussain, A., Hui, P.Y., Lee, M.Y., Cheung, C.Y., Peiris, J.S.M., Jameel, S., 2007. Severe acute respiratory syndrome coronavirus Orf3a protein interacts with caveolin. J. Gen. Virol.88, 3067–3077. https://doi.org/10.1099/vir.0.82856-0 30) Paraskevis, D., Kostaki, E.G., Magiorkinis, G., Panayiotakopoulos, G., Sourvinos, G., Tsiodras, S., 2020. Full-genome evolutionary analysis of the novel corona virus (2019- nCoV) rejects the hypothesis of emergence as a result of a recent recombination event. Infect. Genet. Evol.79, 104212. https://doi.org/10.1016/j.meegid.2020.104212 31) Qiu, M., Shi, Y., Guo, Z., Chen, Z., He, R., Chen, R., Zhou, D., Dai, E., Wang, X., Si, B., Song, Y., Li, J., Yang, L., Wang, Jin, Wang, H., Pang, X., Zhai, J., Du, Z., Liu, Y., Zhang, Y., Li, L., Wang, Jian, Sun, B., Yang, R., 2005. Antibody responses to individual proteins of SARS coronavirus and their neutralization activities. Microbes Infect. 7, 882–889. https://doi.org/10.1016/j.micinf.2005.02.006 32) Shen, S., Lin, P.-S., Chao, Y.-C., Zhang, A., Yang, X., Lim, S.G., Hong, W., Tan, Y.-J., 2005. The severe acute respiratory syndrome coronavirus 3a is a novel structural protein. Biochem. Biophys. Res. Commun.330, 286. https://doi.org/10.1016/j.bbrc.2005.02.153 33) Simmons, G., Gosalia, D.N., Rennekamp, A.J., Reeves, J.D., Diamond, S.L., Bates, P., 2005. Inhibitors of cathepsin L prevent severe acute respiratory syndrome coronavirus entry. Proc. Natl. Acad. Sci. U. S. A.102, 11876. https://doi.org/10.1073/pnas.0505577102 34) Song, H.-D., Tu, C.-C., Zhang, G.-W., Wang, S.-Y., Zheng, K., Lei, L.-C., Chen, Q.-X., Gao, Y.-W., Zhou, H.-Q., Xiang, H., Zheng, H.-J., Chern, S.-W.W., Cheng, F., Pan, C.-M., Xuan, H., Chen, S.-J., Luo, H.-M., Zhou, D.-H., Liu, Y.-F., He, J.-F., Qin, P.-Z., Li, L.-H., Ren, Y.-Q., Liang, W.-J., Yu, Y.-D., Anderson, L., Wang, M., Xu, R.-H., Wu, X.-W., Zheng, H.-Y., Chen, J.-D., Liang, G., Gao, Y., Liao, M., Fang, L., Jiang, L.-Y., Li, H., Chen, F., Di, B., He, L.-J., Lin, J.-Y., Tong, S., Kong, X., Du, L., Hao, P., Tang, H., Bernini, A., Yu, X.-J., Spiga, O., Guo, Z.-M., Pan, H.-Y., He, W.-Z., Manuguerra, J.-C., Fontanet, A., Danchin, A., Niccolai, N., Li, Y.-X., Wu, C.-I., Zhao, G.-P., 2005. Cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human. Proc. Natl. Acad. Sci.102, 2430–2435. https://doi.org/10.1073/pnas.0409608102 35) Song, W., Gui, M., Wang, X., Xiang, Y., 2018. Cryo-EM structure of the SARS coronavirus spike glycoprotein in complex with its host cell receptor ACE2. PLOS Pathog. 14, e1007236. https://doi.org/10.1371/journal.ppat.1007236 36) Sruthi, C.K., Prakash, M.K., 2019. Statistical characteristics of amino acid covariance as possible descriptors of viral genomic complexity. Sci. Rep. 9, 18410. https://doi.org/10.1038/s41598-019-54720-y 37) Tagliamonte, M.S., Abid, N., Chillemi, G., Salemi, M., Mavian, C., 2020. Re-insights into origin and adaptation of SARS-CoV-2. bioRxiv 2020.03.30.015685. https://doi.org/10.1101/2020.03.30.015685 38) Tan, Y.-J., 2005. The Severe Acute Respiratory Syndrome (SARS)-coronavirus 3a protein may function as a modulator of the trafficking properties of the spike protein. Virol. J. 2, 5–5. https://doi.org/10.1186/1743-422X-2-5 39) Tan, Y.-J., Teng, E., Shen, S., Tan, T.H.P., Goh, P.-Y., Fielding, B.C., Ooi, E.-E., Tan, H.- C., Lim, S.G., Hong, W., 2004. A Novel Severe Acute Respiratory Syndrome Coronavirus Protein, U274, Is Transported to the Cell Surface and Undergoes Endocytosis. J. Virol.78, 6723–6734. https://doi.org/10.1128/JVI.78.13.6723-6734.2004 40) Walls, A.C., Park, Y.-J., Tortorici, M.A., Wall, A., McGuire, A.T., Veesler, D., 2020. Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell 181, 281-292.e6. https://doi.org/10.1016/j.cell.2020.02.058 41) Wu, F., Zhao, S., Yu, B., Chen, Y.-M., Wang, W., Song, Z.-G., Hu, Y., Tao, Z.-W., Tian, J.-H., Pei, Y.-Y., Yuan, M.-L., Zhang, Y.-L., Dai, F.-H., Liu, Y., Wang, Q.-M., Zheng, J.- J., Xu, L., Holmes, E.C., Zhang, Y.-Z., 2020. A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269. https://doi.org/10.1038/s41586-020- 2008-3 42) Wu, K., Peng, G., Wilken, M., Geraghty, R.J., Li, F., 2012. Mechanisms of Host Receptor Adaptation by Severe Acute Respiratory Syndrome Coronavirus. J. Biol. Chem.287, 8904. https://doi.org/10.1074/jbc.M111.325803 43) Xia, X., 2020. Extreme Genomic CpG Deficiency in SARS-CoV-2 and Evasion of Host Antiviral Defense. Mol. Biol. Evol. https://doi.org/10.1093/molbev/msaa094 44) Zhang, T., Wu, Q., Zhang, Z., 2020. Probable Pangolin Origin of SARS-CoV-2 Associated with the COVID-19 Outbreak. Curr. Biol. 30, 1346-1351.e2. https://doi.org/10.1016/j.cub.2020.03.022 45) Zhong, X., 2006. Amino terminus of the SARS coronavirus protein 3a elicits strong, potentially protective humoral responses in infected patients. J. Gen. Virol. 87, 369–373. https://doi.org/10.1099/vir.0.81078-0. EXAMPLE 2: Vaccine design using the 3a N-terminal domain and S N-terminal domain of SARS-CoV-2 [00311] The covariant residues identified above provide strong evidence of an evolutionary relationship between 3a and S proteins in coronaviruses. Furthermore, an enrichment of covariant residues in the first 44 amino acids in 3a N-terminal domain was discovered including those that correlated with changes in S. Therefore, it is contemplated that immune pressure and host adaptation drive these key changes in both S and 3a proteins in coronaviruses that in turn drive selection of compensatory mutations linked in larger covariant networks. Mutations at such residues are driven by an immune response that in many cases are accompanied by others. Some co-varying residues are essential for virus growth or fitness and these mutations have additional intrinsic value in designed immunotherapeutics as this virus must mutate at least two residues. By selecting domains in 3a and S enriched in these mutations and engrafting covariant residues into antigens, a vaccine strategy based on our covariant analysis was performed. [00312] Enriched mutations at covariant residues in the first 44 amino acids of 3a were selected (below). This is based on the analysis of bat, pangolin, civet, and SAR-CoV and SARS-CoV-2 viruses and an additional 13,611 clinical isolates from the SARS-CoV-2 epidemic. Because we find a correlation between 3a NTD residues and the N-terminal domain of the S protein and residues proximal to the furin cleavage site, we have constructed an antigen for the 3a protein N-terminal domain (3aNTD) and an antigen that is a fusion between the S protein N-terminal domain (S NTD) and furin cleavage domain (S-NTD-FCD) (FIG. 9). The fusion between the S NTD and the furin cleavage domain bypasses the S protein receptor binding domain and removes amino acids 321 to 590 of SARS-CoV-2 S protein and adds a Glycine/Serine linker between two predicted β-sheets proximal in the structure. Both these domains can be modified with covariant residues, for example the 3a NTD antigen will include mutations listed in table S1 based on covariant analysis. The S- NTD-FCD includes amino acids found to be highly covariant in the first 300 AA (NTD) and a variable residue at S position 614 (G/D) which is found to be a common polymorphism that emerged during the SARS-CoV-2 pandemic and that covaries with residues in S protein NTD and also with residues in 3a. [00313] N-terminal domain constructs with residue substitutions are outlined below based on covariance. 3a_Seq1 WT 3a sequence MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 20), 3a_Seq2 T32F Covariant selected and Removes Glycosylation-site MDLFMRIFTIGTVTLKQGEIKDATPSDFVRAFATIPIQASLPFG (SEQ ID NO: 21), 3a_Seq3 V13I Covariant selected MDLFMRIFTIGTITLKQGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 22), 3a_Seq4 T14I Covariant selected MDLFMRIFTIGTVILKQGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 23), 3a_Seq5 L15F Covariant selected MDLFMRIFTIGTVTFKQGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 24), 3a_Seq6 Q17H Covariant selected MDLFMRIFTIGTVTLKHGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 25), 3a_Seq7 E19S Covariant selected MDLFMRIFTIGTVTLKQGSIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 26), 3a_Seq8 I20S Covariant selected MDLFMRIFTIGTVTLKQGESKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 27), 3a_Seq9 I20G Covariant selected MDLFMRIFTIGTVTLKQGEGKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 28), 3a_Seq10 I20V Covariant selected MDLFMRIFTIGTVTLKQGEVKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 29), 3a_Seq11 D22N Covariant selected MDLFMRIFTIGTVTLKQGEIKNATPSDFVRATATIPIQASLPFG (SEQ ID NO: 30), 3a_Seq12 A23S Covariant selected MDLFMRIFTIGTVTLKQGEIKDSTPSDFVRATATIPIQASLPFG (SEQ ID NO: 31), 3a_Seq13 P25L Covariant selected MDLFMRIFTIGTVTLKQGEIKDATLSDFVRATATIPIQASLPFG (SEQ ID NO: 32), 3a_Seq14 S26L Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPLDFVRATATIPIQASLPFG (SEQ ID NO: 33) 3a_Seq15 D27Y Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPSYFVRATATIPIQASLPFG (SEQ ID NO: 34), 3a_Seq16 I37L Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPLQASLPFG (SEQ ID NO: 35), 3a_Seq17 Q38P Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIPASLPFG (SEQ ID NO: 36), 3a_Seq18 L41F Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASFPFG (SEQ ID NO: 37), 3a_Seq19 P42S Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLSFG (SEQ ID NO: 38), 3a_Seq20 F43I Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPIG (SEQ ID NO: 39), and 3a_Seq21 G44V Covariant selected MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFV (SEQ ID NO: 40). [00314] The 3a NTD was fused to the S-NTD-FCD at both the amino and carboxyl terminus (FIG.10 and FIG.11) to provide an antigen that may generate antibodies that targets two domains in two different proteins found to co-vary. This provides epitopes in three domains (3a NTD, S NTD, and S furin cleavage domain) that possess covariant residues found to link between these three domains. It is contemplated herein that antibodies that target these regions would be difficult for the virus to escape with single mutations based on the relationships. [00315] The S protein is an additional target in coronavirus vaccine strategies, but the 3a protein’s role in the virion is unclear. It is found to associate with S and virions, but the nature of interaction is poorly understood. Therefore, the 3a NTD construct removes the channel forming domains and regions in the CTD that may be needed to stimulate the inflammasome and the innate immune response. By targeting the extracellular ectodomain of 3a and these covariant residues, the vaccine composition is designed to target this protein to reduce inflammation while also targeting predicted partner residues in S as an attempt to neutralize the virus (FIG.12). [00316] Exemplary sequences of fusion proteins are provided below. [00317] The sequence of S protein fusion. SARS-CoV-2 S residue positions indicated (parenthesis). Linkage and 3a fusions indicated. Amino Acid 614 residue in S protein (G/D) varies in clinical strains and has covariance relationships with S residues and 3a based on the above Example. A.3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is D)
Figure imgf000124_0001
B.3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is D)
Figure imgf000124_0002
C.3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is G)
Figure imgf000125_0002
D.3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is G)
Figure imgf000125_0001
EXAMPLE 3: [00318] For this study, the inventors analyzed an additional 75,000 clinical isolates for convariance. The inventors identified some covariant residues and domains that are enriched in covariance in both. This overlap also notably identified some of the recent emerging SARS-CoV- 2 variants. Some key findings from this study are summarized below. [00319] The inventors analyzed covariant pairs and networks of residues in Polyproteins 1A/1B, Spike, 3a, E, M, and N proteins using a stringent threshold (0.96). This revealed the most apparent residue covariance in Spike and 3a. This is referred to as the “pan-sarbecovirus” data herein. Focusing on predicted covariant relationships within and between 3a and Spike based on the observation that covariant abundance enriched in subdomains of 3a and Spike including those outside the Spike receptor binding domain, the inventors identified the most independent enriched covariant pairs and their locations (FIGS.14A and 14B). [00320] The inventors analyzed approximately 75,000 genome sequences from the GISAID database that were identified between February 2020 and 2021 using clinical SARS-CoV-2 samples. They mapped covariant pairs between 3a and Spike. This is referred to as the “clinical SARS-CoV-2” data herein and summarized in FIGS.15A-15C. The inventors compared the pan- sarbecovirus and clinical SARS-CoV-2 data to identify covariant common residues that overlap. Using a +1 and -1 AA sliding window in the amino acid alignment to accommodate slight deviation between the pan-sarbecovirus Spike-3a alignment and clinical SARS-CoV-2, common covariant residues linkages were identified (FIG.15 D). [00321] Inventors found some overlapping covariant regions between the pan-sarbecovirus and clinical SARS-CoV-2 aligned perfectly with deletions and mutations identified in dominant well- studied SARS-CoV-2 variants including the UK, Brazilian, and South African variants that emerged as dominant in late 2020 and early 2021 during the pandemic. They also find these same regions align perfectly with deletions in the Spike amino-terminal domain (NTD) that are apparent. [00322] The data from aligning the SARS-CoV and SARS-CoV-2 Spike proteins identified SARS-CoV-2 Spike NTD positions near amino acid residues 69-70, 142-145, and 241-243 (FIGS. 16A-16C). [00323] Inventors identified covariance in the 3a NTD region in both the pan-sarbecovirus and clinical SARS-CoV-2 as evidence that this region may interact with Spike NTD and carboxyl- terminal domain (CTD) domains. The covariance and variation in 3a NTD indicate that this is under pressure to mutate by either covariance with Spike residues and/or by immune selection in the host (FIGS 17A-17D). [00324] The enriched covariance internal to and in between Spike NTD and CTD and the 3a NTD apparent in the evolutionary record (pan-sarbecovirus) and the additional independent confirmed covariance in SARS-CoV-2 clinical strains during approximately one year of the pandemic were considered for selecting key antigens in both proteins for vaccine design. In one exemplary vaccine design, Spike NTD, CTD, and 3a NTD domains either together, separate, or in full protein form (i.e., Full-length Spike and full-length 3a) can be used to generate antigens that are wildtype and also possess mutations identified in the covariant residue analyses (via both pan- sarbecovirus and clinical) described herein. The general concept leverages covariance identified or described herein to predict interactions within and between Spike and 3a. Notably, the pan- sarbecovirus covariance can be predictive for some of key mutations identified in emerging variants isolated during the current SARS-CoV-2 pandemic. The antigen can comprise one or more mutations that are either identical or closely resemble the deletions at amino acid residues 69-70, 142-147, and 242-244. These mutations overlap with covariance and also those found in SARS- CoV2 (FIGS. 18A-18C). Without limitations, the antigen can be generated into protein, mRNA, or other methods that result in the presentation of a protein antigen in the host and produce a targeted immune response against these key residues and subdomains. Exemplary antigens can be expressed as monomers, fusions, or anchored to a trimerization domain (FIG.18D). [00325] All patents and other publications identified are expressly incorporated herein by reference for the purpose of describing and disclosing, for example, the methodologies described in such publications that could be used in connection with the present invention. These publications are provided solely for their disclosure prior to the filing date of the present application. Nothing in this regard should be construed as an admission that the inventors are not entitled to antedate such disclosure by virtue of prior invention or for any other reason. All statements as to the date or representation as to the contents of these documents is based on the information available to the applicants and does not constitute any admission as to the correctness of the dates or contents of these documents.
Figure imgf000128_0001
Figure imgf000129_0001
Figure imgf000130_0001
Figure imgf000131_0001
Figure imgf000132_0001
Figure imgf000133_0001
Attorney Docket No.002806-097770WOPT
Figure imgf000134_0001
Attorney Docket No.002806-097770WOPT
Figure imgf000135_0001
Attorney Docket No.002806-097770WOPT
Figure imgf000136_0001
Attorney Docket No.002806-097770WOPT
Figure imgf000137_0001
Attorney Docket No.002806-097770WOPT
Figure imgf000138_0001
Figure imgf000139_0001
Figure imgf000140_0001
Figure imgf000141_0001
Figure imgf000142_0001
Figure imgf000143_0001

Claims

CLAIMS What is claimed is: 1. A fusion polypeptide comprising: a first domain comprising a first viral polypeptide or a fragment thereof expressed by a first virus, wherein the first viral polypeptide or fragment thereof comprises at least two or more covarying amino acid positions.
2. The fusion polypeptide of claim 1, further comprising a second domain comprising a second viral polypeptide or a fragment thereof expressed by the first virus, optionally, the second viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the first viral polypeptide or a fragment thereof.
3. The fusion polypeptide of claim 2, wherein the second viral polypeptide or fragment thereof comprises at least two or more amino acid positions that covary with each other.
4. A fusion polypeptide comprising: (a) a first domain comprising a first viral polypeptide or a fragment thereof expressed by a first virus; and (b) a second domain comprising a second viral polypeptide or a fragment thereof expressed by the first virus, and wherein the first viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the second viral polypeptide or a fragment thereof.
5. The fusion polypeptide of claim 4, wherein the first viral polypeptide or a fragment thereof comprises at least two or more amino acid positions that covary with each other.
6. The fusion polypeptide of claim 4, wherein the second viral polypeptide or a fragment thereof comprises at least two or more amino acid positions that covary with each other.
7. The fusion polypeptide of claim 2 or 4, wherein the first and second domain are linked by a linker.
8. The fusion polypeptide of claim 2 or 4, wherein the first and second domain are linked by a flexible linker.
9. The fusion polypeptide of claim 2 or 4, further comprising a third domain comprising a third viral polypeptide or a fragment thereof expressed by the first virus, optionally the third viral polypeptide or a fragment thereof comprises at least one amino acid position that covaries with at least one amino acid position in the first or second viral polypeptide or a fragment thereof.
10. The fusion polypeptide of claim 9, wherein the third viral polypeptide or fragment thereof comprises at least two or more amino acid positions that covary with each other.
11. The fusion polypeptide of claim 9, wherein the first and second domain are linked by a linker.
12. The fusion polypeptide of 9, wherein the second and third domain are linked by a linker.
13. The fusion polypeptide of claim 9, further comprising a fourth domain comprising an amino acid sequence of the first, second or third domain.
14. The fusion polypeptide of 13, wherein the third and the fourth domains are linked by a linker.
15. The fusion polypeptide of 1 or 4, wherein the covarying amino acid positions are determined using a correlating tandem model, optionally, the tandem model purity threshold is a level greater than or equal to 0.80.
16. The fusion polypeptide of claim 1 or 4, wherein the covarying amino acid positions are relative to a viral polypeptide expressed by a second virus.
17. The fusion polypeptide of claim 2 or 4, wherein the first virus is capable of infecting a human host and the second virus is capable of infecting a non-human host or the second virus is a different virus capable of infecting a human host.
18. The fusion polypeptide of claim 17, wherein the first and second virus are from the same family.
19. The fusion polypeptide of claim 18, wherein the first and second virus are from the same genus.
20. The fusion polypeptide of claim 19, wherein the first and second virus are from the same species.
21. The fusion polypeptide of claim 20, wherein the first and second virus are capable of infecting the same host species.
22. The fusion polypeptide of claim 1 or 4, wherein the first virus is a corona virus.
23. The fusion polypeptide of claim 22, wherein the first virus is SARS-CoV or SARS-CoV2.
24. The fusion polypeptide of claim 9, wherein the third viral polypeptide or fragment thereof is a corona virus polypeptide or fragment thereof selected from the group consisting of: the viroporin 3a protein, a non-structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), and spike (S) protein.
25. The fusion protein of claim 9, wherein the third domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence:
Figure imgf000145_0001
Figure imgf000146_0001
Figure imgf000147_0001
(SEQ ID NO: 65).
26. The fusion polypeptide of claim 25, wherein the third domain comprises an amino acid sequence comprising a substitution or deletion at position 19, 41, 127, 164 or 175 of SEQ ID NO: 42, 43, 60, 61, 62, 63, 64 or 65.
27. The fusion polypeptide of claim 2 or 4, wherein the second viral polypeptide or fragment thereof is a corona virus polypeptide or fragment thereof selected from the group consisting of: the viroporin 3a protein, a non-structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), and spike (S) protein.
28. The fusion polypeptide of claim 2 or 4, wherein the second domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence:
Figure imgf000147_0002
29. The fusion polypeptide of claim 28, wherein the second domain comprises an amino acid sequence comprising a substitution or deletion at position 4, 6, 7, 9, 10, 13, 15, 16, 19, 20, 23, 49, 54, 56, 58, 59, 60, 61, 62, 126, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 150, 163, 165, 172, 191, 229, 230, 231, 232, 233, 243, 244, 246 or 248 of SEQ ID NO: 41 or at position 5, 7, 8, 10, 11, 14, 16, 17, 20, 21, 24, 50, 55, 57, 59, 60, 61, 62, 63, 127, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 151, 164, 166, 173, 192, 230, 231, 232, 233, 234, 244, 245, 247 and 249 of SEQ ID NO: 45. 30. The fusion polypeptide of claim 1 or 4, wherein the first viral polypeptide or fragment thereof is a corona virus polypeptide or a fragment thereof selected from the group consisting of: the viroporin 3a protein, a non-structural protein, a 1a/1b polyprotein, a viroporin E, membrane protein (M), and spike (S) protein. 31. The fusion protein of claim 1 or 4, wherein the first domain comprises an amino acid sequence having at least 85% identity to the amino acid sequence MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFG (SEQ ID NO: 20). 32. The fusion protein of claim 31, wherein the first domain comprises an amino acid sequence comprising a substitution or deletion at position 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 32, 37, 38, 41, 42, 43 or 44 of SEQ ID NO: 20. 33. The fusion protein of claim 32, wherein the first domain comprises an amino acid sequence comprising a substitution or deletion at position 15, 16, 18, 20, 24, 25, 26, 28, or 38 of SEQ ID NO: 20. 34. The fusion protein of claim 31, wherein the first domain comprises an amino acid sequence selected from the group consisting of:
Figure imgf000148_0001
MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLSFG (SEQ ID NO: 38), MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPIG (SEQ ID NO: 39), and MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFV (SEQ ID NO: 40). 35. The fusion polypeptide of claim 9, wherein: a. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 42; b. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 42 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; c. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 43; d. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 43 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; e. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 60; f. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 60 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; g. The first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 61; h. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 61 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; i. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 62; j. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 62 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; k. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 63; l. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 63 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; m. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 64; n. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 64 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; o. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 65; p. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 41, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 65 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; q. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 42; r. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 42 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; s. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 43; t. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 43 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; u. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 60; v. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 60 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; w. The first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 61; x. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 61 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; y. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 62; z. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 62 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; aa. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 63; bb. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 63 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; cc. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 64; dd. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 64 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20; ee. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 65; or ff. the first domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 45, the second domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 65 and the third domain comprises an amino acid sequence having at least 85% identity to SEQ ID NO: 20. 36. The fusion polypeptide of claim 9, wherein the fusion polypeptide comprises an amino acid sequence selected from the group consisting of: A.3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is D)
Figure imgf000153_0001
B. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is D)
Figure imgf000153_0002
C. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is G)
Figure imgf000153_0003
Figure imgf000154_0001
D.3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is G)
Figure imgf000154_0002
E.3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is D)
Figure imgf000154_0003
Figure imgf000155_0001
F. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is D)
Figure imgf000155_0002
G. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is G)
Figure imgf000155_0003
H. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is G)
Figure imgf000155_0004
Figure imgf000156_0001
I. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is D)
Figure imgf000156_0002
J. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is D)
Figure imgf000156_0003
K. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is G)
Figure imgf000156_0004
Figure imgf000157_0001
L. 3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is G)
Figure imgf000157_0002
M. 3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is D)
Figure imgf000157_0003
N.3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is D)
Figure imgf000157_0004
Figure imgf000158_0001
O.3aNTD-S fusion to SARS-CoV-2 S NTD (S 614 residue is G)
Figure imgf000158_0002
P.3aNTD-S fusion to Furin Cleavage Domain (S 614 residue is G)
Figure imgf000158_0003
Figure imgf000159_0001
37. The fusion polypeptide of any one of claims 1-36, wherein the fusion polypeptide induces an antigen specific immune response when administered to a subject. 38. A polynucleotide encoding an amino acid sequence of a fusion polypeptide of any one of claims 1-37. 39. A vaccine composition comprising a fusion polypeptide of any one of claims 1-36 or a polynucleotide of claim 38. 40. The vaccine composition of claim 39 further comprising an adjuvant. 41. The vaccine composition of claim 38 or 39, further comprising a pharmaceutical carrier. 42. A cell comprising a fusion polypeptide of any one of claims 1-37 or a polynucleotide of claim 38. 43. A kit comprising a fusion polypeptide of any one of claims 1-37 or a polynucleotide of claim 38. 44. A method of inducing an immune response in a subject, the method comprising: administering to the subject a fusion polypeptide of any one of claims 1-37 or a polynucleotide of claim 38 in an amount effective to produce an antigen specific immune response.
PCT/US2021/034753 2020-05-28 2021-05-28 Protein covariance networks reveal interactions important to the emergence of sars coronaviruses as human pathogens WO2021243149A2 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US202063031306P 2020-05-28 2020-05-28
US63/031,306 2020-05-28
US202063032925P 2020-06-01 2020-06-01
US63/032,925 2020-06-01
US202063129029P 2020-12-22 2020-12-22
US63/129,029 2020-12-22

Publications (2)

Publication Number Publication Date
WO2021243149A2 true WO2021243149A2 (en) 2021-12-02
WO2021243149A3 WO2021243149A3 (en) 2021-12-30

Family

ID=78722878

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/034753 WO2021243149A2 (en) 2020-05-28 2021-05-28 Protein covariance networks reveal interactions important to the emergence of sars coronaviruses as human pathogens

Country Status (1)

Country Link
WO (1) WO2021243149A2 (en)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106928326B (en) * 2015-12-31 2019-12-24 中国科学院微生物研究所 Coronavirus vaccine based on dimerized receptor binding domain subunit
GB201705765D0 (en) * 2017-04-10 2017-05-24 Univ Oxford Innovation Ltd HBV vaccine

Also Published As

Publication number Publication date
WO2021243149A3 (en) 2021-12-30

Similar Documents

Publication Publication Date Title
US20230108894A1 (en) Coronavirus rna vaccines
US20230346914A1 (en) Sars-cov-2 mrna domain vaccines
Mezhenskaya et al. M2e-based universal influenza vaccines: a historical overview and new approaches to development
US20220118073A9 (en) Zoonotic disease rna vaccines
US20230338506A1 (en) Respiratory virus immunizing compositions
US20230000970A1 (en) Seasonal rna influenza virus vaccines
US20240100151A1 (en) Variant strain-based coronavirus vaccines
US8470771B2 (en) Method and medicament for inhibiting the infection of influenza virus
Chakrabarti et al. An insight into the PB1F2 protein and its multifunctional role in enhancing the pathogenicity of the influenza A viruses
JP2024503699A (en) Variant strain-based coronavirus vaccines
US10793834B2 (en) Live-attenuated virus and methods of production and use
CN105452270B (en) Influenza virus vaccine and uses thereof
JP2024513999A (en) Influenza-coronavirus combination vaccine
JP2024511179A (en) pertussis vaccine
JP2017521425A (en) Influenza virus vaccine and use thereof
US20200171141A1 (en) Compositions and Methods for Generating an Immune Response to LASV
KR20090016671A (en) Replikin peptides and uses thereof
Pourrajab et al. Molecular basis for pathogenicity of human coronaviruses
JP2021509254A (en) H3N2 subtype influenza virus hemagglutinin protein mutant and its use
CN111163792A (en) Peptides for the treatment of viral infections
WO2022245888A1 (en) Seasonal flu rna vaccines and methods of use
AU2022237382A9 (en) Therapeutic use of sars-cov-2 mrna domain vaccines
WO2021243149A2 (en) Protein covariance networks reveal interactions important to the emergence of sars coronaviruses as human pathogens
WO2023062515A1 (en) Multiepitope self-assembled nanoparticle vaccine platform (msn-vaccine platform) and uses there of
Jennewein et al. Intranasal self-amplifying RNA SARS-CoV-2 vaccine produces protective respiratory and systemic immunity and prevents viral transmission

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21812447

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21812447

Country of ref document: EP

Kind code of ref document: A2