CA3220934A1

CA3220934A1 - Biomarkers and methods for classifying subjects following viral exposure

Info

Publication number: CA3220934A1
Application number: CA3220934A
Authority: CA
Inventors: Alexander James MANN; Gareth GUENIGAULT; Aruna BASAL
Original assignee: Poolbeg Pharma UK Ltd
Current assignee: Poolbeg Pharma UK Ltd
Priority date: 2021-06-02
Filing date: 2022-06-01
Publication date: 2022-12-08
Also published as: GB202107883D0; JP2024526045A; CN117480262A; WO2022254221A1; EP4347891A1

Abstract

Methods of predicting whether a subject will develop acute symptoms of disease after exposure, or possible exposure, to a respiratory virus, which comprise analysing a biological sample obtained from the subject for a biomarker and comparing the biomarker to a reference for the biomarker, wherein the biomarker comprises or is derived from expression levels of one or more genes selected from a gene panel comprising PHF20, ABCAI, APBA2, MORC2, SNU13, DCUN1D2, MAX, NOL9, MPRIP, HP, BST1, TM9SF2, HOMER3, NSUN6, EPHA4 and BMP2K. Also disclosed are related predictive methods, methods of conducting a clinical trial or field study, computer programs, classification algorithms, computer readable mediums and computer-implemented methods.

Description

BIOMARKERS AND METHODS FOR CLASSIFYING SUBJECTS FOLLOWING VIRAL
EXPOSURE
Field of the Invention [0001] The present invention relates to biomarkers for predicting whether a subject will develop acute symptoms or signs of disease following exposure, or possible exposure, to a respiratory virus such, for example, as an influenza virus. The present invention provides methods for predicting whether a subject will develop a severe or complicated form of disease. As disclosed herein, the invention includes methods of conducting clinical trials or field studies comprising analysing the biomarkers, but more generally, the biomarkers of the invention may be used in any healthcare or non-healthcare setting; for example to triage patients infected with a respiratory virus to identify those who are susceptible to developing acute signs or symptoms and may therefore require medical intervention. Subjects may have been administered a medicinal product for treatment or prevention of respiratory disease, and the biomarkers of the invention may therefore be used as a companion analytical product to predict the likely efficacy of the medicinal product. The present invention further provides computer programmes, computer readable media, computer implemented-methods and classification algorithms that generate or utilise the biomarkers of the invention.
Background to the Invention

[0002] Acute upper and lower respiratory infections are a major public health problem and a leading cause of morbidity and mortality worldwide. Viruses are the predominant cause of respiratory tract illnesses and include RNA viruses such as respiratory syncytial virus (RSV), influenza virus, parainfluenza virus, metapneumovirus, rhinovirus (HRV) and coronavirus (Hodinka, "Respiratory RNA
Viruses", Microbiol Spectr., 2016 Aug; 4(4)).

[0003] The CDC estimates that in the 2015-2016 period in the US there were 25 million influenza illnesses, 11 million influenza-associated medical visits, 310,000 influenza-related hospitalizations, and 12,000 pneumonia and influenza deaths (Rolfes et al 2016). In 2003 the annual economic burden of influenza in the US alone was estimated to be around 87 billion dollars (Molinari et al 2007). The costs of influenza are clearly substantial and any method to treat or diagnose influenza would be of enormous value.

[0004] Influenza infects all age groups and causes a range of outcomes from asymptomatic infection and mild respiratory disease through to severe respiratory disease and even death.
As such, different subjects exposed to the same influenza virus may be asymptomatic, mildly symptomatic, subclinical, exhibit acute symptoms, or require medical attention, or even urgent hospitalization (Cox et al 1999). Further, the proportion of infections that are asymptomatic or subclinical, and the degree to which these are contagious, as well as the proportion of shedding which occurs prior to onset of symptoms, affect the potential impact of control measures and decisions regarding treatment and the administration of medicaments (Lau et al., 2010).

[0005] Current trial designs within the human challenge model for assessing investigational treatment drugs and medicaments for influenza, RSV, or HRV rely on either:
a. "universal dosing" ¨ Universally treating all subjects inoculated with virus on a given day post inoculation (e.g. 24 hrs or 28 hrs post inoculation), irrespective of whether the subjects become infected or not;
b. "Triggered dosing" - Treating only those subjects when they have either one or both of the following:
i. their first (or confirmed) PCR positive respiratory sample (i.e.
treating only those that are expected to be infected post inoculation);
ii. initial respiratory symptoms that are indicative of onset of viral infection;
c. "Triggered dosing + Universal dosing" (DeVincenzo et al, NEJM 2014;
DeVincenzo et al, NEJM 2015)- This uses the principles of triggered dosing for the primary endpoint, however at a certain day post inoculation e.g. Day 5 any subjects that still don't have a positive viral sample (or symptoms) are subsequently given the drug anyway. The ones that are a universally given the drug in this scenario may be included for analysis in two sub-analysis approaches:
i. On their own as a sub-group ii. Combined with the triggered sub-group

[0006] In research models such as the human challenge model, knowing who will develop significant symptoms in advance, would allow dosing of an investigational medicament to be triggered only in subjects who would otherwise go on to develop acute symptoms of an influenza-like disease. A method capable of predicting who will develop acute symptoms of an influenza-like disease would allow the identification of subjects appropriate for administration of the medicament.
The benefits of this volunteer selection method for dosing include:
a. Improved ability to detect a clinically relevant reduction in disease by only evaluating the medicament effects in those that would have gone on to present with acute symptoms of an influenza-like disease. This contrasts with trial designs where triggering of treatment might be based on presence of viral shedding/symptoms or administration of the medicament to all inoculated people. Selecting appropriate subjects for the trial in advance avoids the problems associated with assessing the efficacy of the medicament in populations where the ability to detect a difference is more difficult (i.e. uninfected, asymptomatic infected or people who only have a mild infection with minimal viral loads).
b. fewer people will be exposed to the medicament unnecessarily thereby:
i. reducing medicament requirements and thus manufacturing and cost benefits;
ii. providing a treatment regime with an improved benefit: risk profile by selecting to provide treatment only to those that will develop acute symptoms;
iii. providing an improved benefit: risk profile for both the medicament and the study by requiring fewer people to be exposed to an investigational medicament.

[0007] Therefore, there remains a need for methods of predicting whether a subject will develop acute symptoms of an influenza-like disease to enable informed treatment decisions, to administer the correct level of care, and/or to improve the trial design for investigative medicaments to treat influenza- like disease.

[0008] Woods et al., "A Host Transcriptional Signature for Presymptomatic Detection of Infection in Humans Exposed to Influenza H1N1 or H3N2", PT.OS ONE, January 2013; 8(1)-e52198 describe the generation of a viral gene signature (or factor) for symptomatic influenza that is capable of detecting 94%
of infected cases. The gene signature is detectable as early as 29 hours post-exposure and is reported to achieve maximum accuracy on average 43 hours (p = 0.003, H1N1) and 38 hours (p-value equals 0.005, H3N2) before peak clinical symptoms.

[0009] While Woods et al. discloses methods for identifying a subject infected with a respiratory virus prior to presentation of symptoms, such methods do not predict whether or not an individual will develop acute symptoms of an influenza-like disease.
Summary of the invention

[00010] The present invention provides a biomarker for predicting whether a subject will develop acute symptoms of disease after exposure, or possible exposure, to a respiratory virus, wherein the biomarker comprises or is derived from expression levels of one or more genes selected from a gene panel comprising PHF20, ABCA1, APBA2, MORC2, SNU13, DCUN1D2, MAX, NOL9, MPRIP, HP, BST1, TM9SF2, HOMER3, NSUN6, EPHA4 and BMP2K measured in a biological sample obtained from the subject after exposure, or possible exposure, to a respiratory virus.

[00011] A biomarker in the context of the present invention is a measurable indicator of biological state or condition, in particular, an output to predict whether a subject will develop acute symptoms of disease.
The output may be a numerical output. In some embodiments, the biomarker of the invention may be a composite biomarker comprising expression levels of two or more genes of the gene panel or the expression level of at least one gene of the gene panel in combination with at least one other factor as described herein.

[00012] The subject may be a human or non-human mammal.
11000131 The acute symptoms of disease may consist of symptoms of an influenza-like or other respiratory disease, as disclosed herein.
[00014] Acute symptoms of an influenza-like or other respiratory disease means that the subject experiences four or more of the following symptoms and these symptoms, either individually or combined, interfere with daily activities. The symptoms include runny nose, stuffy nose, sore throat, sneezing, earache, cough, shortness of breath, wheezing, chest tightness, headache, malaise, myalgia, muscle and/or joint aches, elevated temperature, chilliness, and feverishness.
The elevated temperature may be a temperature of 38 C or more, optionally experienced together with a cough, optionally with onset within the last 8-12 (e.g. 10) days. During an average human challenge study involving inoculation with a respiratory virus in which the symptoms of influenza-like or other respiratory disease are evaluated a subset of subjects will have acute symptoms, these subject's symptoms will score in the 85' percentile, for example the subject's total VAS or CAT score will be in the 8511i percentile. For example, a total VAS
score of greater than or equal to 25 units, or a CAT score greater than or equal to 10 units, equates to acute, symptoms.
[00015] Thus in some embodiments, a subject may be identified as susceptible to progression to a complicated form of an influenza-like or other respiratory disease. A
complicated respiratory disease such as influenza is defined as disease requiring hospital admission and/or with symptoms and signs of lower respiratory tract infection (hypoxaemia, dyspnoea, lung infiltrate), central nervous system involvement and/or a significant exacerbation of an underlying medical condition.
[00016] If a subject is predicted to develop acute symptoms of an influenza-like or other respiratory disease, such as complicated flu or other respiratory disease, then in accordance with the present invention, it is predicted that the subject will go on to exhibit acute symptoms as defined above. It may be predicted that a subject will develop acute symptoms of an influenza-like or other respiratory disease, but the subject may self-resolve, or action may be taken to prevent the subject developing acute symptoms;
for example a medicament may be administered. Thus, a subject predicted to develop acute symptoms of an influenza-like or other respiratory disease does not inevitably develop acute symptoms of an influenza-like or other respiratory disease.
[00017] Respiratory virus includes all viral infections of the respiratory tract including respiratory syncytial virus (RSV), parainfluenza virus (HPIV), metapneumovirus (HMPV), rhinovirus (HRV), coronavirus, adenovirus (HAdV), enterovirus (EV), bocavirus (HBoV), parechovirus (HPeV), influenza including influenza A and influenza B.
[00018[A gene panel in the context of the present invention is a set of genes the expression levels of which can be analysed and used to predict the progression and/or outcome of an influenza-like or other respiratory disease in a subject. A gene sub-panel is a set of genes selected from a gene panel which may be used to predict at a certain stage in the progression of the disease, for example early, mid or late stage progression towards possibly developing acute symptoms of an influenza like or other respiratory disease, or at a certain time point following inoculation with, or exposure to, a respiratory virus, for example up to 25 hours (for example 13-25 hours), or 37-49 hours, or 49-61 hours, in some embodiments these time frames may be referred to as early, mid or late stage respectively. In some circumstances it may not be possible to determine when a subject was exposed to a respiratory virus, in addition some subjects develop symptoms quicker than other subjects, and so disease progression and stage of disease progression may be estimated based on evaluated symptoms.
[00019] In particular the gene panel of the present invention may include one or more genes (including two genes, three genes, four genes, five genes, six genes, etc.) selected from PHF20, ABCA1, APBA2, MORC2, SNU13, DCUN1D2, MAX, NOL9, MPRIP, HP, BST1, TM9SF2, HOMER3, NSUN6, and BMP2K.
[00020[1n some embodiments, the gene panel may consist of up to 16 genes (typically up to 10 genes, and more typically up to 6 genes) including one or more genes (including two genes, three genes, four genes, five genes, six genes, etc.) selected from PHF20, ABCA1, APBA2, MORC2, SNU13, DCUN1D2, MAX, NOL9, MPRIP, HP. BST1, TM9SF2, HOMER3, NSUN6, EPHA4 and BMP2K.
[00021] In some embodiments, the gene panel may comprise PHF20. In some embodiments, the gene panel may comprise NOL9. In some embodiments, the gene panel may comprise both PHF20 and NOL9.
[00022] The gene panel of the present invention may consist of one, two, three, four, five or six genes selected from PHF20, ABCA1, APBA2, MORC2, SNU13, DCUN1D2, MAX, NOL9, MPRIP, HP, BST1, T1\'I9SF2, HOMER3, NSUN6, EPHA4 and BMP2K. Thus, for example, the gene panel of the present invention may consist of one, two, three, four, five or six genes, including PHF20 and optionally NOL9.
[00023] Unless indicated otherwise, the present invention does not exclude the possibility of including within the gene panel one or more further genes not specifically disclosed herein, which may be found to improve further the accuracy, sensitivity or specificity of the methods of the invention.

[00024] A first gene sub-panel may comprise PHF20, ABCA1, APBA2, MORC2, SNU13 and DCUN1D2.
[00025] Thus, the first gene sub-panel may comprise the expression level of PHF20. In addition to the expression level of PHF20, the first gene sub-panel may comprise the expression level of one or both of APBA2 and A BCAl. Where the first gene sub-panel comprises the expression level of PHF20 and the expression level of one or both of APBA2 and ABCA1, the first gene sub-panel may additionally comprise the expression level of one, two or three of MORC2, SNU13 and DCUN1D2.
[00026] The first gene sub-panel may consist of one, two, three, four, five, or six of PHF20, ABCA1, APBA2, MORC2, SNU13 and DCUN1D2.
[00027] The first gene sub-panel may consist of one gene, which is PHF20. The first gene sub-panel may consist of two genes, one of which is PHF20. The first gene sub-panel may consist of three genes, one of which is PHF20. The first gene sub-panel may consist of four genes, one of which is PHF20. The first gene sub-panel may consist of five genes, one of which is PHF20. The first gene sub-panel may consist of six genes, one of which is PHF20.
[00028] The first gene sub-panel may consist of two genes, one of which is PHF20, and the other of which is APBA2 or A BCAl. The first gene sub-panel may consist of three genes, including PHF20, accompanied by one or both of APBA2 and ABCAl. The first gene sub-panel may consist of four genes, including PHF20, accompanied by one or both of APBA2 and ABCAl. The first gene sub-panel may consist of five genes, including PHF20, accompanied by one or both of APBA2 and ABCA1 . The first gene sub-panel may consist of six genes, including PHF20, accompanied by one or both of APBA2 and ABCA 1.
[00029] A second gene sub-panel may comprise MAX, NOL9, MPRIP, HP, BST1 and TM9SF2.
[00030] Thus, the second gene sub-panel may comprise the expression level of one or more of NOL9, HP
and MAX (particularly NOL9). In addition to the expression level of one or more of HP, MAX and NOL9 (particularly NOL9) the second gene sub-panel may comprise the expression level of one or both of BST1 and MPRIP. Where the second gene sub-panel comprises the expression level of one or more of HP, MAX and NOL9 (particularly NOL9), and the expression level of one or both of BST1 and MPRIP, the second gene sub-panel may additionally comprise the expression level of TM9SF2.
[00031] The second gene sub-panel may consist of one, two, three, four, five, or six of MAX, NOL9, MPRIP, HP, BST1 and TM9SF2.
[00032] The second gene sub-panel may consist of one gene, which is NOL9, HP
or MAX (particularly NOL9). The second gene sub-panel may consist of two genes, one or both of which are selected from NOL9, HP and MAX (particularly NOL9). The second gene sub-panel may consist of three genes, one, two or all of which are selected from NOL9, HP and MAX (particularly NOL9).
The second gene sub-panel may consist of four genes, one, two or three of which are selected from NOL9, HP and MAX
(particularly NOL9). The second gene sub-panel may consist of five genes, one, two or three of which are selected from NOL9, HP and MAX (particularly NOL9). The second gene sub-panel may consist of six genes, one, two or three of which are selected from NOL9, HP and MAX
(particularly NOL9).
[00033] The second gene sub-panel may consist of two genes, one of which is NOL9, HP or MAX
(particularly NOL9) and the other of which is BST1 or MPRIP. The second gene sub-panel may consist of three genes, including one or more of NOL9, HP or MAX (particularly NOL9), accompanied by one or both of BST1 and MPRIP. The second gene sub-panel may consist of four genes, including one or more of NOL9, HP or MAX (particularly NOL9), accompanied by one or both of BST1 and MPRIP. The second gene sub-panel may consist of five genes, including one or more of NOL9, HP or MAX
(particularly NOL9), accompanied by one or both of BST1 and MPRIP. The second gene sub-panel may consist of six genes, including one or more of NOL9, HP or MAX (particularly NOL9), accompanied by one or both of BST1 and MPRIP.
[000341A third gene sub-panel may comprise HOMER3, NSUN6, HP, EPHA4 and BMP2K.
[00035] Thus, the third gene sub-panel may comprise the expression level of one or both of HP and HOMER3. In addition to the expression level of one or both of HP and HOMER3, the third gene sub-panel may comprise the expression level of one or both of EPHA4 and BMP2K.
Where the third gene sub-panel comprises the expression level of one or both of HP and HOMER3, the expression level of one or both of EPHA4 and BMP2K, the third gene sub-panel may additionally comprise the expression level of NSUN6.
[00036] The third gene sub-panel may consist of one, two, three, four or five of HOMER3, NSUN6, HP, EPHA4 and BMP2K.
[00037] The third gene sub-panel may consist of one gene, which is HP or HOMER3. The third gene sub-panel may consist of two genes, one or both of which are selected from HP and HOMER3. The third gene sub-panel may consist of three genes, one or two of which are selected from HP
and HOMER3. The third gene sub-panel may consist of four genes, one or two of which are selected from HP and HOMER3. The third gene sub-panel may consist of five genes, one or two of which arc selected from HP and HOMER3.
[00038] The third gene sub-panel may consist of two genes, one of which is HP
or HOMER3; and the other of which is EPHA4 or BMP2K. The third gene sub-panel may consist of three genes, including one or both of HP and HOMER3, accompanied by one or both of EPHA4 and BMP2K. The third gene sub-panel may consist of four genes, including one or both of HP and HOMER3, accompanied by one or both of EPHA4 and BMP2K. The third gene sub-panel may consist of five genes, including one or both of HP
and HOMER3, accompanied by one or both of EPHA4 and BMP2K.
[00039] In some embodiments, any of the aforementioned gene panels may further comprise 1 to 2 genes in addition to those listed above, without departing from the essential character of the panels of the present disclosure.
[00040] As shown in the Examples below, the genes that have been identified as being predictive of a subject developing acute symptoms of an influenza like or other respiratory disease, in accordance with the present invention, exhibit altered expression levels following inoculation with a virus in subjects who then go on to exhibit acute symptoms relative to those who do not develop acute symptoms, as defined above. This indicates the potential of the genes to identify subjects who are more likely to develop acute symptoms of an influenza-like or other respiratory disease. Since symptoms of viral infection develop sooner in some subjects than in others, altered expression of the one or more genes according to the present invention may be predictive of acute symptoms, before a subject shows any symptoms of infection, or an early diagnostic indicator of acute symptoms at about the same time as the subject starts to show the first symptoms of infection.
[00041] The expression levels of the one or more genes in the biological sample may be measured using any suitable method known in the art for quantifying the expression level of a gene, particularly a mammalian gene. In some embodiments, the expression level of the one or more genes may be measured by quantifying mRNA transcripts of the one or more genes according to the invention in the biological sample.
[00042] Preferably, a PCR-based method may be used such, for example, as RT-qPCR. Examples of RT-qPCR-based methods are disclosed by United States patent no. 7,101,663, the contents of which are incorporated herein by reference. An advantage of real-time PCR is its relative ease and convenience of use.
100043] Alternatively, a gene expression microarray may be used of the kind disclosed in, for example, United States patent no. 6,040,138, the contents of which are incorporated herein by reference, in which a pool of labelled target cRNA molecules, which are obtained by transcribing double-stranded cDNA
derived from the mRNA transcripts that are isolated from the biological sample and fragmenting the resulting cRNA transcripts, are hybridised to oligonucleotide probes having specific sequences that are immobilised at specific addresses on a solid support. After incubating the cRNA targets with the surface-bound probes, the arrays are washed and the labels on the targets may be used to quantify how much target is bound to any given feature on the array. The amount of a given surface-bound target cRNA is proportional to the expression level of the corresponding gene.
[00044] Alternatively, RNA-seq may be used to quantify, discover and profile RNAs. This uses next-generation sequencing on cDNA converted from RNA (Wang et al 2009).
[00045] Suitably, the biological sample may be a blood or a respiratory sample. In particular, the sample may be a sample containing immune cells.
[00046] In some embodiments, the expression level of each of the one or more genes may be compared with a respective reference level. The reference level may be a threshold expression level that indicates acute symptoms of an influenza -like or other respiratory disease or a prediction of acute symptoms developing. Alternatively, the reference level may be a baseline level of expression which indicates that the subject is unlikely to develop acute symptoms of an influenza-like or other respiratory disease.
Significantly altered expression (increased expression or decreased expression) of the one or more genes relative to their respective baseline levels, for instance by at least 1.1x, preferably at least 1.5x or 2x, or 3x, or 4x, or 5x, etc. up to 100x, may be indicative of acute symptoms of an influenza like or other respiratory disease or predictive that a subject will develop acute symptoms of an influenza like or other respiratory disease.
[00047] In some embodiments, the method may involve an individual reference level for each gene.
Altered expression of at least one of the genes, preferably two or more of the genes, relative to their respective reference levels may indicate acute symptoms of an influenza like or other respiratory disease or predict developing acute symptoms of an influenza like or other respiratory disease in accordance with the present invention.
[00048] In some embodiments, the reference level for the, or each, gene may be a previously measured expression level for the gene in the same subject. In particular, the reference level for the, or each, gene may comprise a baseline expression level of the gene for the subject which is measured at a time when the subject is known not to be infected with a respiratory virus such, for example, as influenza. Where previous expression levels for the one or more genes, measured on more than one previous occasion, are available for a subject, the reference level for each gene may comprise an average of multiple previous levels.
[00049] Thus, in some circumstances, a subject may be tested once to obtain baseline levels for the one or more genes, which form reference levels that may be used subsequently in case of suspected viral infection or a routine check, for comparison with contemporaneous expression levels to predict whether or not the subject is likely to develop acute symptoms of an influenza like or other respiratory disease.
[00050] Exposure to a respiratory virus, or possible exposure to a respiratory virus, includes any contact or possible contact with a respiratory virus including, exposure to a community acquired respiratory viral infection, exposure to respiratory virus at home, within a care home, hospital or military setting. Exposure also includes inoculation of subjects during a human challenge model and/or clinical trial.
[00051] As explained above, it is not always possible to ascertain when a subject has been exposed to a respiratory virus. Furthermore, different subjects exhibit symptoms at different time points, with some subjects exhibiting symptoms earlier than other subjects. Therefore the progression of an influenza-like or other respiratory disease may be measured on a relative scale and may be referred to as early, mid or late stage progression towards a possible presentation of acute symptoms of an influenza-like or other respiratory disease. In other circumstances, for example following inoculation in a human challenge model, the precise timing of the exposure to a respiratory virus is known and it is possible to measure time from exposure in hours, for example up to 25 hours (for example 13-25 hours) after exposure.
[00052] As explained in the Figures and the Examples below, a single gene panel, containing one or more genes, may be analysed using a first algorithm. Alternatively, a combination of gene panels and gene-sub panels may be analysed. The different gene panels and gene sub-panels may be analysed simultaneously, for example using the same biological sample or samples obtained within a similar, or the same, time frame. The different gene panels and gene sub-panels may be analysed sequentially with a first gene panel or gene sub-panel analysed in a sample taken at a first time point using a first algorithm, and a second gene panel or gene sub-panel analysed in a sample taken at a second time point using a second algorithm, and a third gene panel or gene sub-panel analysed in a sample taken at a third time point using a third algorithm etc.
[00053] As mentioned above, the biomarker of the invention may be based on a number of input variables or factors, including gene expression levels, for example the age of the subject, or other underlying conditions that a subject may suffer e.g. asthma, may be included in the variables used to calculate the biomarker. The biomarker may therefore be a composite biomarker. The output of the biomarker may be a numerical value. The numerical value may be determined using a threshold, reference level, or baseline level, for example a numerical value above a certain reference level predicts that a subject will develop acute symptoms of an influenza-like or other respiratory disease.
[00054] The biomarker may be computer-generated and comprises an output variable of a classification algorithm that uses as input variables the expression levels of one or more genes in the gene panel; or one or more genes in the first gene sub-panel; or one or more genes in the second gene sub-panel; or one or more genes in the third gene sub-panel.
[00055] The classification algorithm may be configured to prioritise accuracy such that the algorithm produces the greatest number of correct predictions. Thus, the classification algorithm may be configured to prioritise Negative Predictive Value (NPV), the proportion of negative test results that are true negatives, the aim being to minimise the number of subjects predicted not to develop acute symptoms of influenza-like disease who in fact go on to develop acute symptoms of an influenza-like or other respiratory disease.
[00056] The classification algorithm may be derived by machine-learning from a training data-set that uses as input variables expression levels of one or more genes from the gene panel measured from a biological sample obtained from a group of subjects at a predetermined time after exposure to the respiratory virus, wherein the group of subjects is divided into two classes according to whether or not they developed acute symptoms of an influenza-like or other respiratory disease after exposure to the respiratory virus, and wherein the classification algorithm operates on the expression levels to produce an output variable that differentiates between the classes.
11000571 Numerous classification algorithms are available to those skilled in the art for classifying subjects into two or more classes based on their symptoms scores. Similarly, numerous machine learning techniques are available for using a training dataset comprising the two or more classes and their respective expression levels for the one or more genes to derive a classification algorithm that is able to classify a new subject based on their expression levels of the one or more genes. The performance of a classification algorithm built using a machine learning process may be validated using one or more known validation methods, e.g. cross-validation, and calculating statistical parameters (e.g. accuracy, sensitivity, specificity) so that the person skilled in the art can obtain a classification algorithm that is best suited for classifying subjects based on their expression levels of the one or more genes.
[00058] The acute symptoms of an influenza-like or other respiratory disease in the subjects of the group in the training data set may be assessed by evaluating one or more symptoms of influenza-like or other respiratory disease at a series of pre-set times after exposure to the respiratory virus. The one or more symptoms are evaluated by the subjects using diary cards, optionally visual analogue scare symptom diary cards (VAS), or optionally categorical symptoms (CAT) are recorded using a modified standardized symptom score for example the modified Jackson Score. The symptoms evaluated may include ninny nose, stuffy nose, sore throat, sneezing, earache, cough, shortness of breath, headache, malaise, myalgia, muscle and/or joint aches, chilliness, and feverishness.
[00059] The first class of subjects may record a total VAS of greater than or equal to 25 units and/or a total CAT score of 10 units or greater, or may show one or more of: greatest variance in total VAS or CAT up to the peak of symptoms; greatest variance in total VAS or CAT over the duration of quarantine;
or steepest gradient (slope of regression line) of total VAS or CAT up to the peak of symptoms.
[00060] Typically, machine learning processes and the resulting classification algorithms may be carried out using a computer.
[00061] The gene panels and gene sub-panels may be selected by i) analysing expression levels in biological samples obtained from the group of subjects in the data training set across the whole series of pre-set times after exposure to the virus; and ii) identifying genes that show a nominal association with acute symptoms of an influenza-like or other respiratory disease, and iii) using a variable selection process to select panels of the identified genes whose expression levels at a predetermined time after exposure to the virus exhibit maximal predictive value for developing acute symptoms of an influenza-like or other respiratory disease.
[00062] The variable selection process may comprise subjecting the expression levels of the identified genes at the predetermined time after exposure to the respiratory virus to a repeated gradient boosting process and selecting a set of 1, 2, 3, 4, 5 or 6 genes that are selected most frequently by the gradient boosting process. A variable selection process is illustrated in FIG. 6 of the accompanying drawings and is performed by a gradient boosting machine (GBM; Friedman 2001; Friedman 2002). In the context of the present invention, differential expression analysis of genes between subjects that developed acute symptoms of influenza-like or other respiratory disease and subjects that did not develop acute symptoms of influenza-like or other respiratory disease was performed by application of a cubic p-spline model.
Nominal associations arising from the cubic spline analysis were input into a variable selection process comprising gradient boosting machine, and iterative searches were conducted using fifty starting point (seeds), to determine the best gene predictors of developing acute symptoms of influenza-like or other respiratory disease.
[00063] The biomarker may be used to allocate subjects to groups in a clinical trial or to make treatment decisions. Subjects allocated to one subgroup are administered a medicament while those allocated to another subgroup are not administered a medicament or do not receive a medicament until later in the trial or study. The biomarker may also be used to monitor the efficacy of a medicament by assessing whether a subject to whom the medicament has been administered is likely to develop acute symptoms of disease after exposure, or possible exposure, to a respiratory virus. The medicament may comprise a therapeutic agent or a preventative agent such, for example, as a vaccine.
[00064] Thus, the present invention provides a method of predicting whether a subject will develop acute

13 or complicated symptoms of disease after exposure, or possible exposure, to a respiratory virus which comprises analysing a biomarker according to the invention and comparing the biomarker to a reference for the biomarker.
[00065] The present invention also provides a method of conducting a clinical trial or field study in which a group of subjects are exposed to a respiratory virus, the method comprising analysing a biomarker according to the invention for each subject and comparing the biomarker to a reference for the biomarker to predict whether the subject is likely to develop acute symptoms of disease, and including subjects who arc predicted to develop acute symptoms of disease in a first subgroup of the clinical trial or field study and including subjects who are predicted not to develop acute symptoms of disease in a second subgroup.
[00066] As explained above, the methods of the invention may include comparing the biomarker to a reference for the biomarker or to a baseline for the biomarker. The baseline for the biomarker may be determined at a time when the subject is known not to be infected with a respiratory virus. The disease may be an influenza-like or other respiratory disease.
[00067] A medicament includes all substances used for medical treatment and includes vaccines, drugs, placebos and investigational medicaments, for example investigational medicaments that are the subject of a clinical trial. Therefore medicament includes licensed, unlicensed and investigational medicaments.
Medicament also includes products that already have a marketing authorisation but that are being tested for a different use, or for efficacy when assembled in a different way, or tested to gain further information about the authorised usc.
[00068] The invention also provides a computer program for predicting whether a subject will develop acute symptoms of disease such, for example, as an influenza-like disease after exposure, or possible exposure, to a respiratory virus, which comprises instructions which, when the program is executed by a computer, cause the computer to generate a biomarker according to the invention.
[00069] The invention further provides a classification algorithm for predicting whether a subject will develop acute symptoms of disease such for example as an influenza-like or other respiratory disease after exposure, or possible exposure, to a respiratory virus, wherein the classification algorithm is derived by analysing expression levels of one or more genes in subjects who have developed acute symptoms of disease and comparing with the expression levels in subjects who do not develop acute symptoms of disease, wherein the one or more genes are PHF20, ABCA1, APBA2, MORC2, SNU13 DCUN1D2, MAX, NOL9, MPR1P, HP, BST1, TM9SF2, HOMER3, NSUN6, EPHA4 and BMP2K.
[00070] The invention further provides a computer readable medium and/or computer program comprising instructions which, when executed by a computer, cause the computer to carry out the

14 classification algorithm according to the invention.
[00071] The invention also provides a computer-implemented method for predicting whether a subject will develop acute symptoms of disease such for example as an influenza-like disease, wherein a biomarker is generated by analysing expression levels of one or more genes in subjects who have developed acute symptoms of disease following inoculation with a respiratory virus and comparing with the expression levels in subjects who do not develop acute symptoms of disease following inoculation with a respiratory virus, wherein the one or more genes are PHF20, ABCA1, APBA2, MORC2, SNU13 DCUN1D2, MAX, NOL9, MPR1P, HP, BST1, TM9SF2, HOMER3, NSUN6, EPHA4 and BMP2K.
[00072] The invention also provides a computer-implemented method, wherein the method comprises a graphical user interface which displays the biomarker to the user. It is also contemplated that the computer-implemented aspects of the invention may be carried out by more than one computer e.g. two or more computer operating in different locations. Two or more computers may communication via a data channel, for example the intemet.
[00073] The invention also provides a method of predicting whether a subject will develop acute symptoms of disease after exposure, or possible exposure, to a respiratory virus, which comprises estimating time elapsed after the exposure, or possible exposure, to the respiratory virus by analysing expression levels of one or more genes selected from PHF20, ABCA1, APBA2, MORC2, SNU13, DCLTN1D2, MAX, NOL9, MPRIP, HP, BST1, TM9SF2, HOMER3, NSLTN6, EPHA4, in a biological sample obtained from the subject; selecting a biomarker described herein, which at said time exhibits maximal predictive value for developing acute symptoms of disease; and comparing the biomarker to a reference for the biomarker.
[00074] For example, in some embodiments, the time elapsed may be estimated to be about a day (i.e.
about 23-26 hours, e.g. 25 hours) after exposure, or possible exposure, to the respiratory virus. If so, the selected biomarker may comprise expression levels of one or more genes from the first gene sub-panel described herein.
[00075] The time elapsed may be estimated to be about 1.5-2 days (i.e. about 37-49 hours) after exposure, or possible exposure, to the respiratory virus. If so, the selected biomarker may comprise expression levels of one or more genes from the second gene sub-panel described herein.
[00076] The time elapsed may be estimated to be he about 2-2.5 days (i.e.
about 49-61 hours) after exposure, or possible exposure, to the respiratory virus. If so, the selected biomarker may comprise expression levels of one or more genes from the third gene sub-panel described herein.

[00077] The invention also provides a kit for use in a method according to the invention. The kit may comprise one or more reagents that allow detection, optionally quantitation, of one or more nucleotides, or one or more peptides, corresponding to one or more genes from the gene panel or the first, second or third gene sub-panel described herein. The kit may be for detection of one or more analytes in, or extracted from, a biological sample, such as, but not limited to, a blood, serum, plasma, urine, saliva, tissue biopsy, stool, sputum, skin, nose or throat sample. The kit may comprise a device for conducting an assay, such as a lateral flow assay. The device may bc configured for autonomous usc by a patient (without assistance from a physician), for example in the homc (as opposed to a hospital or other medical facility). The device may comprise a strip of porous material, which is capable of supporting capillary flow, wherein there is a zone for receiving a sample; a zone comprising a reagent for detection of an analyte; a detection zone; and a control zone. In use, a reaction between the reagent and the analyte may be detected in the detection zone, for example by a change in colour of material in the detection zone. In use, the control zone may- serve as a reference against which to benchmark the reaction detected in the detection zone. The device may comprise more than one test strip, for example the device may comprise two, three, four, five or six test strips, in communication with the same or separate receiving zones.
[00078] Literature references to and sequence listings for the above-mentioned genes are included at the end of this description. It will be understood that the reference and sequence listings necessarily disclose specific alleles and are included by way of example only. The invention is not limited to the use of such specific alleles, but may also be implemented using products of expression of different variants of the one or more genes.
Brief description of the drawings [00079] Following is a description by way of example only with reference to the accompanying drawings of embodiments of the present invention.
[00080] In the drawings:
[00081] FIG. 1 is a chart showing how a subject is processed through a typical clinical study as described in Example 1 below.
[00082] FIG. 2 is a graph showing peak level of Total VAS for infected (left) and non-infected (right) subjects.
[00083] FIG. 3 is a graph showing change in maximum variance of VAS scores demonstrating that the four individuals with acute symptoms of influenza-like disease at peak all experienced a change in variance of Total VAS greater than 30 units.

[00084] FIG. 4 is a graph showing peak level of peak categorical scores for infected (left) and non-infected (right) subjects across three studies (hVIVO-1, Duke-1, Duke-2).
[00085] FIG. 5 is a principal component analysis to show greater homogeneity after imposing adjustment for study.
[00086] FIG. 6 is a flow chart to demonstrate the variable selection process performed by gradient boosting machines.
[00087] FIG. 7 is a flow diagram to demonstrate scenarios in which 1 to 3 of the algorithms are nin in parallel to assign subjects to groups for actioning (e.g. dosing with a medicament, additional clinical assessments). The scenarios include: within the human viral challenge model, field study, or in the community, wherein subjects are exposed to a respiratory virus.
[00088] FIG. 8 is a flow diagram to demonstrate scenarios in which each gene algorithm is used sequentially to assign subjects to groups for actioning (e.g. dosing with a medicament, additional clinical assessments) within the human viral challenge model.
Examples Example 1 [00089] As described below, subjects from three separate studies were determined to include a subset of subjects that exhibited acute symptoms of an influenza-like disease based on their peak symptom score across the quarantine.
Methods [00090] AffmetrixTM HG-U133 Plus 2.0 microarray across the whole quarantine post inoculation were used to perform transcriptomics analysis. Differential expression analysis between subjects developing acute symptoms of an influenza-like disease and subjects that did not develop acute symptoms of an influenza-like disease was performed by application of a cubic p-spline model.
Nominal associations arising from the cubic spline analysis were input into a variable selection process to determine the best six gene predictors of acute symptoms of influenza-like disease at Day 1 morning, Day 2 morning, and Day 2 evening after inoculation. These genes could be used to distinguish between subjects that developed acute symptoms of an influenza-like disease and subjects that did not develop acute symptoms of an influenza-like disease within the model at different times after exposure to virus.
[00091] Data from three separate studies was combined for this analysis, the largest of which was run by hVIVO (hVIVO-1) and two publicly available studies also run in the hVIVO
challenge model with Duke University Medical Centre (Duke-1 (Zaas et al 2009) and Duke-2 (Woods et al 2013): indexed in GEO as GSE52428 and publicly available):
Table ii. ¨ For each study the following are demonstrated; virus used, the number of subjects, the microarray platform used, the time points at which PaxGene blood samples were taken, the methods used in diary cards to measure symptoms (VAS = Visual analogue scale, CAT =
modified Jackson score/categorical), the number of diary cards taken a day and the method used to confirm influenza infection.
ESSWIllmillimmil:Duke-1 = = = = = lirillimill1111.1 Virus ILI H3N2 H3N2 H1N1 . (A/Perth/9/2009) (A/Wisconsin/67/2005) (A/Brisbane/59/2007) Volunteers 27 17 24 Transcriptomids: Affymetrix Affymetrix Affymetrix U133A 2.0 Array U133A 2.0 Array U133A 2.0 Array Transcriptomics -24h, pre-inoc., -24h, pre-inoc., -24h, pre-inoc., Time points g g every 12 hours every 8 hours every 12 hours Clinical Data VAS + CAT CAT CAT
12 Symptoms 10 Symptoms 10 Symptoms Clinical Data Twice per day Twice per day Twice per day Time points Infection Status : PCR + TCID50 TCID50 TCID50 [00092] For hVIVO-1, 60 healthy volunteers were inoculated intranasally with influenza A H3N2 Perth/16/2009, 27 of which were used for this analysis. For Duke-1, 17 healthy volunteers were inoculated intranasally with influenza A H3N2 A/Wisconsin/67/2005. For Duke-2, 24 healthy volunteers were inoculated intranasally with Influenza A H1N1 A/Brisbane/59/2007. All volunteers provided informed consent and underwent extensive pre-enrolment health screening (FIG.
1) and any with significant baseline antibodies to the strain of influenza utilised were excluded. After approximately 48 hours in quarantine (approximately mid-day on study day 0), a predetermined dose of influenza A was instilled into bilateral nares of subjects using standard pipetting methods.
The volunteers had clinical measurements and samples taken until discharged from quarantine and then at each follow up visit.
[00093] In hVIVO-1, 33 of the 60 subjects became infected after inoculation (evidenced by confirmed viral shedding), 25 were identified as not infected and 2 inconclusive. An interim analysis was performed after the first 27 were inoculated and all samples for each subject were sent for gene microarray assays.
One of the 27 subjects did not complete the quarantine and so was excluded from analysis. Of the 26 subjects with viable microarray data, 13 were identified as confirmed as infected and 11 as not infected, 2 were inconclusive.
[00094] In Duke-1, 9 of the 17 subjects became infected after inoculation (evidenced by confirmed viral shedding) and 8 were identified as not infected. Four dilutions used (6.41 TCID50/mL, 5.25 TCID50/mL, 4.41 TCI D50/m L and 3.08 TCI D50/m L) with four to five subjects receiving each dose.
[00095] In Duke-2, 9 of the 24 subjects became infected after inoculation (evidenced by confirmed viral shedding) and 15 were identified as not infected. Four dilutions used (2.35 TCID50/mL, 1.8 TCID50/mL, 1.25 TCID50/mL and 1.4 TCID50/mL) with four to six subjects receiving each dose. One subject was excluded due to a secondary infection.
[00096] Subjects had influenza infection confirmed based on qualitative viral culture and quantitative influenza RT-PCR data from epithelial lining fluid for the hVIVO study.
Epithelial lining fluid was collected from nasopharyngeal FLOQ swabs twice daily (starting on Day 1 morning, first sample approximately 20 hrs post inoculation). Nasal collection continued throughout the duration of the quarantine. For the Duke studies infection status of the subjects were obtained from the Woods et al, 2013.
[00097] Subjects self-assessed their symptoms three times daily throughout quarantine on both categorical and continuous (Visual Analogue Scale, VAS) symptom diary cards. Categorical symptoms were recorded using a modified standardized symptom score. The modified Jackson Score requires subjects to rank 10 symptoms consisting of: upper respiratory tract symptoms (runny nose, stuffy nose, sore throat, sneezing, and earache), lower respiratory symptoms (cough and shortness of breath) and systemic symptoms (headache, myalgia, and muscle and/or joint aches) on a scale of 0-3 of "no symptoms-, "just noticeable", -bothersome but can still do activities" and "bothersome and cannot do daily activities".
hVIVO-1 included wheeze and chilliness/feverishness in addition to the 10 symptoms. Additionally, shortness of breath at rest and wheeze at rest were also recorded using an additional grade for these symptoms only (grade 4 = symptoms at rest). VAS symptoms were measure along a 10cm line, measurements were made in mm.
[00098] To determine which subjects were considered to have significant symptoms initially subjects were identified in hVIVO-1 using the VAS data which was recorded alongside the categorical score.
Total VAS score was calculated for each time-point during quarantine for all fifty-eight evaluable participants from hVIVO-1. Peak VAS score was determined for each participant, and it was observed that the range in non-infected individuals was 0-20 units. Amongst the infected set however, four individuals experienced Total VAS > 25 units (FIG. 2). The four individuals were distinguishable from other participants in additional ways; Greatest variance in Total VAS up to the peak of symptoms, Greatest variance in Total VAS over the duration of quarantine and Steepest gradient (slope of regression line) of Total VAS up to the peak of symptoms [00099] In order to address the question of whether the four could have been identified early in quarantine, piece-wise estimates of variance and change in variance were calculated for the period of quarantine.
[000100] It was observed that the four individuals with most severe symptoms at peak all experienced a change in variance of Total VAS greater than 30 units (FIG. 3).
Remaining individuals had maximum change in variance less than 25 units. Furthermore, the change was observed consistently at Day 2, time-point 2, which in three of the four instances, was in advance of symptom peak (Table 2).
Table 2 Individual Change in Variance At Day 2; TPT 2 Peak Time-point 1 35.64 Day 3; TPT 1 2 88.14 Day 2; TPT 2 3 48.83 Day 2; TPT 3 4 70.57 Day 2; TPT 3 [000101] Upon the inclusion of Duke-I and Duke-2 to the analysis the severity score had to be adapted for use with the categorical symptom score due to not having VAS
available in Duke-1 and Duke-2. A similar analysis as was conducted for VAS (above) was now performed for categorical symptom score, for the three studies. It was observed that across the three studies, 11 individuals had peak total categorical score greater than or equal to 10 units (10 subjects with microarray data). These included the four members of hVIVO-1 already discussed. Four individuals from the H1N1 study, and three individuals from the H3N2 study also passed this threshold (FIG. 4). As with VAS, sudden changes in the variance of categorical score was associated with acute symptoms of an influenza-like disease.
[000102] At predetermined intervals, blood was collected into RNA
PAXGeneTM collection tubes.
This occurred once on Day -1, in the morning on the day of inoculation (approximately 5 hours before inoculation) followed by every 12 hours for hVIVO-1 and Duke-2 and every g hours for Duke-1 for the remainder of the quarantine.
[000103] All three studies used GeneChip0 Human Genome U133A 2.0 Array (Affymetrix, Santa Clara, CA) for the microarray. Microarray data for both Duke studies was obtained from the Liu et al 2016 covering both studies. The public data comprised 22,277 probe-sets, a subset of the 54,675 probe-sets available for hVIVO-1.
[000104] Principal Components Analysis (PCA) showed that the three transcriptomics data sets differed systematically (FIG. 5). It was concluded that direct pooling would lead to spurious results. An adjustment for study was therefore applied to the transcriptomics measurements and the PCA repeated (FIG. 5).
[000105] In order to exclude non-informative probe-sets, two groups namely, severe and non-severe were considered. For each molecule in each group, filter ratios were calculated reflecting variability over time, and variability across individuals. Under recommended thresholds, 13,806 probe-sets were informative in at least one group, and were explored further.
[000106] Differential expression analysis between subjects that developed acute symptoms of an influenza-like disease (n=10) and those that did not develop acute symptoms of an influenza-like disease (n=56) was performed by application of a cubic p-spline model (Straube et al).
A test was applied for group x time interaction, and a total of 1052 transcripts had q<0.05 after adjustment for False Discovery Rate (FDR) (Benjamini et al 1995).
[000107] To develop a molecular signature nominal associations arising from cubic p-spline analysis were input into a variable selection process to determine the best predictors of developing acute symptoms of an influenza-like disease at three time-points post inoculation;
Day 1, Day 2 morning, and Day 2 evening (Approximately 13 to 25, 37 to 49 and 49 to 61 hours post inoculation respectively).
[000108] Variable selection was performed by gradient boosting machines (GBA-1; Friedman 2001;
Friedman 2002), and the number of molecules to be selected was limited to six for best results. FIG. 6 shows the process that was followed.
[000109] Logistic regression was applied to produce prediction models (signatures or gene panels or gene sub-panels) for the variables selected. Signature performance (sensitivity, specificity, positive and negative predictive value) was determined at the time-point on which the model was based, and at all other time-points considered. Sensitivity ¨ the proportion of cases that receive a positive test result.
Specificity ¨ the proportion of cases that receive a negative test result.
Positive predictive value (PPV) ¨
the proportion of positive test results that are true positives. Negative predictive value (NPV) ¨ the proportion of negative test results that are true negatives. The AUC (area under the receiver operating characteristic (ROC) curve) was determined.
Day 1 (AM) Signature [000110] Table 3 shows the variables selected by gradient boosting, Table 4 shows the signature arising from logistic regression and lastly, Table 5 shows the test performance characteristics at all time-points considered. It can be seen that the signature, or gene sub-panel, performs well (AUC > 0.80) on Day 1 data.
[000111] This signature, or gene sub-panel, includes the genes PHF20, ABCA1, APBA2, MORC2, SNU13, and DCUN1D2.
Table 3: Variables selected at Day 1 (AM) Variable Symbol Entrez Gene Name Relin& Times Selected 209422_at PHF20 PHD finger protein 20 23.6 49 203504_s_at ABCA1 ATP binding cassette subfamily A member 1 19.03 209870_s_at APBA2 amyloid beta precursor protein binding family A member 2 17.34 19 216863 s at MORC2 MORC family CW-type zinc finger 2 16.53 201076_at SNU13 small nuclear ribonucleoprotein 13 14.25 48 219116_s_at DCUN1D2 defective in cullin neddylation 1 domain containing 2 9.26 37 Table 4: Day 1 (AM) signature Estimate Std. Error z value Pr(> I z I ) (Intercept) -2.03 0.65 -3.14 0.001679 X203504_s_at -5.02 2.17 -2.32 0.020531 X209422_at -7.49 6.43 -1.16 0.244544 X216863 s at -1.48 6.33 -0.23 0.815441 X201076_at -5.42 9.01 -0.60 0.547252 X219116_s_at -6.21 4.78 -1.30 0.193459 X209870 s at -6.88 4.98 -1.38 0.167406 Table 5: Test performance of Day 1 (AM) signature Train Test Cases N AUC Optimise Cut Propn.Pos Sensitivity Specificity Accuracy PPV NPV
Day1AM Baseline 10 63 0.71 Accuracy NA NA 0.00 1.00 0.84 NA 0.84 NPV 0.02 0.68 0.90 0.36 0.44 0.21 0.95 Day1AM Day1AM 9 62 0.92 Accuracy 0.24 0.21 0.89 0.91 0.90 0.62 0.98 NPV 024 0.21 0.89 0.91 0.90 0.62 0.98 Day1AM Day1PM 10 62 0.63 Accuracy 0.64 0.08 0.30 0.96 0.85 0.60 0.88 NPV 0.03 0.77 0.90 0.25 0.35 0.19 0.93 Day1AM Day2AM 10 65 0.81 Accuracy 1.00 0.02 0.10 1.00 0.86 1.00 0.86 NPV 0.04 0.69 0.90 0.35 0.43 0.20 0.95 Day1AM Day2PM 10 66 0.89 Accuracy 0.81 0.21 0.80 0.89 0.88 0.57 0.96 NPV 0.81 0.21 0.80 0.89 0.88 0.57 0.96 Day 2 (AM) Signature [000112] Table 6 shows the variables selected by gradient boosting, Table 7 shows the signaturc arising from logistic regression and lastly, Table 8 shows the test performance characteristics at all time-points considered. The signature performs well at Day 2 (AM).

[000113] This signature, or gene sub-panel, includes the genes MAX, NOL9, MPRIP, HP, BST1, TM9SF2.
Table 6: Variables selected at Day 2 (AM) Variable Symbol Entrez Gene Name Relin& Times Selected 214108_at MAX MYC associated factor X
37.82 50 218754_at NOL9 nucleolar protein 9 19.05 50 212197 x at MPRIP myosin phosphatase Rho interacting protein 14.28 50 208470_s_at HP haptoglobin 11.48 205715_at BST1 bone marrow stromal cell antigen 1 9.39 48 201078_at TM9SF2 transmembrane 9 superfamily member 2 7.97 30 Table 7: Day 2 (AM) signature Estimate Std. Error z value Pil> I z I ) (Intercept) -3.28 0.90 -3.63 0.000280 X212197_x_at 1.07 7.98 0.13 0.893670 X214108_at 6.88 3.87 1.78 0.075326 X218754_at -0.01 7.01 0.00 0.999142 X205715_at -2.08 4.32 -0.48 0.629615 X208470_s_at 4.75 2.81 1.69 0.090620 X201078_at 7.39 10.83 0.68 0.494829 Table 8: Test performance of Day 2 (AM) si2nature Train Test Cases N AUC Optimise Cut Propn.Pos Sensitivity Specificity Accuracy PPV NPV
Day2AM Baseline 10 63 0.58 Accuracy 0.48 0.02 0.10 1.00 0.86 1.00 0.85 NPV 0.01 0.68 0.90 0.36 0.44 0.21 0.95 Day2AM Day1AM 9 62 0.58 Accuracy NA NA 0.00 1.00 0.85 NA 0.85 NPV 0.01 0.74 0.89 0.28 0.37 0.17 0.94 Day2AM Day1PM 10 62 071 Accuracy NA NA 000 1.00 084 NA 0.84 NPV 0.02 0.68 0.90 0.37 0.45 0.21 0.95 Day2AM Day2AM 10 65 0.91 Accuracy 0.64 0.08 0.50 1.00 0.92 1.00 0.92 NPV 0.28 0.18 0.70 0.91 0.88 0.58 0.94 Day2AM Day2PM 10 66 0.87 Accuracy 0.64 0.11 0.50 0.96 0.89 0.71 0.92 NPV 043 0.18 070 091 088 058 0.94 Day 2 (PM) Signature [000114] Table 9 shows the variables selected by gradient boosting, Table 10 shows the signature arising from logistic regression and lastly, Table 11 shows the test performance characteristics at all time-points considered.
[000115] This signature, or gene sub-panel, includes the genes HOMER3, NSUN6, HP, EPHA4, BMP2K.

Table 9: Variables selected at Day 2 (PM) Variable Symbol Entrez Gene Name Relin&
Times Selected 215489_x_at HOMER3 homer scaffolding protein 3 30.73 214541_s_at OKI OKI, KH domain containing RNA binding 23.43 222128_at NSUN6 NOP2/Sun RNA methyltransferase family member 6 14.75 50 208470_s_at HP haptoglobin 12.64 206114 at EPHA4 EPH receptor A4 12.05 59644_at BMP2K BMP2 inducible kinase 6.4 Table 10: Day 2 (PM) signature Estimate Std. Error z value Pr(>1z I ) (Intercept) -19.21 18.67 -1.03 0.303403 X206114_at -13.07 19.53 -0.67 0.50355 X208470_s_at 16.03 16.49 0.97 0.33089 X215489_x_at 57.62 66.78 0.86 0.388274 X222128_at -17.21 15.54 -1.11 0.268238 X59644_at 55.37 62.33 0.89 0.374308 Table 11: Test performance of Day 2 (PM) signature Train Test Cases N AUC Optimise Cut Propn.Pos sensitivity specificity Accuracy PPV NPV
Day2PM Baseline 10 63 0.58 Accuracy 0.02 0.99 0.10 1.00 0.86 1.00 0.85 NPV 0.63 0.00 0.80 0.40 0.46 0.20 0.91 Day2PM Day1AM 9 62 0.62 Accuracy NA NA 0.00 1.00 0.85 NA 0.85 NPV 0.58 0.00 0.78 0.45 0.50 0.19 0.92 Day2PM Day1PM 10 62 0.65 Accuracy NA NA 0.00 1.00 0.84 NA 0.84 NPV 0.68 0.00 0.90 0.37 0.45 0.21 0.95 Day2PM Day2AM 10 65 0.83 Accuracy 0.03 1.00 0.20 1.00 0.88 1.00 0.87 NPV 0.38 0.00 0.80 0.69 0.71 0.32 0.95 Day2PM Day2PM 10 66 1.00 Accuracy 0.17 0.45 1.00 0.98 0.98 0.91 1.00 NPV 0.17 0.45 1.00 0.98 0.98 0.91 1.00 [000116]
Once the gene sub-panels were identified, it was possible to determine the minimum number of genes required to provide a good prediction of whether a subject would go on to develop acute symptoms of an influenza-like disease. An AUC value of 0.8 or higher provides a good prediction that a subject will develop acute symptoms of an influenza-like disease. The parameter with lowest relative influence was dropped from consideration and an updated signature was derived.
Model performance (sensitivity, specificity, PPV and NPV) was derived for the updated model. A
ROC curve was drawn and AUC was tabulated. Further model parameters were sequentially dropped in order of increasing relative influence. In this way, model performance was determined for signatures based upon 5,4,3,2 and 1 genes at different time points (Tables 12 to 14). It can be seen that any of 1, 2, 3, 4, 5, or 6 genes can be predictive of developing acute symptoms of an influenza-like disease.
Table 12 - Day lam algorithm performance AUC values No. of genes analysed Test Sets 6 5 4 3 2 1 PHF20, PHF20, PHF20, PHF20, PHF20, ABCA1, ABCA1, ABCA1, ABCA1, ABCA1 APBA2, APBA2, APBA2, APBA2 MORC2, MORC2, MORC2 SNU13, SNU13 Baseline 0.71 0.70 0.69 0.66 0.54 0.52 Day 1 am 0.92 0.90 0.90 0.90 0.79 0.74 Day 1 pm 0.63 0.68 0.70 0.72 0.63 0.62 Day 2am 0.81 0.81 0.80 0.81 0.75 0.81 Day 2pm 0.89 0.90 0.91 0.91 0.80 0.86 Table 13 - Day 2am algorithm performance AUC values No. of genes analysed Test Sets 6 5 4 3 2 1 MAX, MAX, MAX, MAX, MAX, MAX
NOL9, NOL9, NOL9, NOL9, NOL9 MPRIP, MPRIP, MPRIP, HP MPRIP
HP, BST1, HP, BST1 Baseline 0.58 0.59 0.57 0.60 0.62 0.58 Day 1 am 0.58 0.57 0.56 0.65 0.67 0.65 Day 1 pm 0.71 0.71 0.71 0.52 0.53 0.56 Day 2am 0.91 0.90 0.90 0.91 0.91 0.90 Day 2pm 0.87 0.86 0.85 0.84 0.85 0.83 Table 14 - Day 2pm algorithm performance AUC values No. of genes analysed Test Sets 5 4 3 2 1 HP HP HP

Baseline 0.58 0.54 0.62 0.59 0.58 Day 1 am 0.62 0.70 0.71 0.75 0.68 Day 1 pm 0.65 0.65 0.66 0.66 0.61 Day 2am 0.83 0.92 0.89 0.87 0.81 Day 2pm 1.00 0.98 0.97 0.97 0.95 Example 2 [000117] A group of individuals are recruited into a human viral challenge model and inoculated with a respiratory virus. It is beneficial to identify, in advance, subjects who will progress to develop acute symptoms of an influenza like disease, allowing selective dosing of these subjects with an investigational or licensed medicament (drug/vaccine/ placebo) at the earliest opportunity. This will improve the ability to detect a clinically relevant reduction in disease in response to the medicament by only evaluating the medicament effects in individuals that will/would have developed acute symptoms of an influenza-like disease. This will also reduce unnecessary exposure of subjects to an investigational medicament. This will also reduce the amount of medicament required.
[000118] Volunteers are screened for eligibility for the evaluation of efficacy of an investigational medicament in a human challenge study with a respiratory virus, in particular with influenza.
[000119] Eligible volunteers arrive at the clinic and baseline samples and clinical measures are taken, before they are exposed to virus (e.g. inoculation). Baseline values are obtained pre-inoculation using one or more blood samples over varying time-points.
[000120] Blood samples are taken regularly before and after virus exposure (e.g. paxgene RNA
samples twice, three times a day, or more) alongside clinical measures of their disease.
[000121] Expression levels of specific gene panels and sub-panels are measured in real-time from the blood paxgenes. As in Example 1, blood is assessed for gene expression utilising Affymetrix HG-U133 Plus 2.0 microarray chips, which were used to measure the transcripts' expression. Microarray data was pre-processed using RMA background correction and quantiles normalization.
One can use the absolute value of each gene at a given time-point or alternatively where a baseline gene level was obtained and available, the gene levels post exposure to virus can be baseline normalised for each subject (i.e. compared to the subject's baseline expression level for that gene or gene panel or gene sub-panel).
[000122] Three separate gene subpanels can be used (i.e. 3 algorithms) to identify which individuals will develop acute symptoms of an influenza-like disease a. The 3 gene sub panels are:
i. Subpanel A: PHF20, ABCA1, APBA2, MORC2, SNU13 and Subpanel B: MAX, NOL9, MPRIP, HP, BST1 and TM9SF2 Subpanel C: HOMER3, NSUN6, HP, EPHA4 and BMP2K
[000123] In one instance the different gene subpanels are used at different time points (FIG. 8, sub panel A, followed by sub panel B, followed by subpanel C). Alternatively, instead of using the subpanels sequentially 1, 2, or 3 gene sub panels are used at the same time (FIG. 7), and may be repeated at several points following inoculation.
a. For both approaches, a positive result for any test immediately triggers dosing of the subject with the investigational medicament (drug/vaccine/placebo).
b. Alternatively, two positive results, or three positive results, arc required to trigger dosing.
[000124] For each gene sub-panel the stringency can be varied by modifying the threshold at which a positive result is obtained. For example, the threshold for gene sub-panel 1 could be set to be more stringent avoiding false positives. Gene sub-panel 2 is then set with lower stringency and gene sub-panel 3 with even lower stringency, thus increasing the chances of identifying and dosing all subjects who will develop acute symptoms of influenza-like disease as early as possible.
[000125] In addition to the use of the gene sub-panels, the results may be combined with a diagnostic test that confirms the subject has the respiratory viral infection relevant to the trial (e.g. a viral test).
[000126] In addition to the use of the gene sub-panels, the results may be combined with measurements of the change in variance/gradient of symptoms.
[000127] Other actions that can be triggered alongside dosing with a medicament include increasing the observations/samples/measurements in those predicted to develop acute symptoms of an influenza-like disease or reducing observations/samples/measurements in those who are predicted not to develop acute symptoms of an influenza-like disease.
[000128] By using this invention as part of the trial design decision making, only those subjects most likely to develop acute symptoms of influenza like disease are included in the statistical analysis of efficacy for the investigational medicament, which has the benefits as previously described.
[000129] In some instances, where the algorithms do not report a positive test result, the subjects may be dosed at a predetermined time point post exposure or inoculation (e.g.
Day 4). These subjects form a further subgroup for analysis.
Example 3 [0001 301 A group of individuals are recruited into an efficacy field study and become infected in the community with a respiratory virus, in particular influenza. Following exposure, it would be beneficial to identify, in advance, subjects who will develop acute symptoms of an influenza like disease, allowing selective dosing of these subjects with an investigational or licensed medicament (drug/vaccine/
placebo) at the earliest opportunity. This would improve ability to detect clinically relevant reduction in disease by only evaluating the medicament effects in individuals that would have gone on to present with acute symptoms of an influenza-like disease. This will reduce unnecessary exposure of subjects to an investigational medicament as well as reducing the amount of medicament required.
[000131] In this example, volunteers have been screened for eligibility for the evaluation of the efficacy of an investigational medicament in a clinical field trial against a respiratory virus, in particular influenza.
[000132] Eligible volunteers arrive at the clinic and baseline samples and clinical measures are taken as they are enrolled in the study. Baseline values would be obtained using one or more blood samples over varying time-points post enrolment and prior to contracting a respiratory virus infection in the community.
[000133] Blood samples (e.g. paxgene RNA samples twice, three times a day, or more) are taken after virus exposure from a household contact that has a respiratory infection, or after showing initial symptoms of respiratory disease (the trial subjects may or may not be using a study questionnaire/symptom diary card that captures these symptoms).
[000134] Specific gene sub-panels are measured real-time in the blood paxgenes using the methods described in Examples 1 and 2. One can use the absolute value of each gene at a given time-point or alternatively where a baseline gene level was obtained and available, the gene levels post exposure to virus can be baseline normalised for each subject (i.e. compared to baseline).
[000135] Three separate gene sub-panels can be used (i.e. 3 algorithms) to identify which individuals will progress to have acute symptoms of an influenza-like disease a. The 3 subpanels of genes are:
i. Subpanel A: PHF20, ABCA1, APBA2, MORC2, SNU13 and Subpanel B: MAX, NOL9, MPRIP, HP, BST1 and TM9SF2 Subpanel C: HOMER3, NSUN6, HP, EPHA4 and BMP2K
[000136] In one instance 1, 2, or 3 gene sub-panels are used at the same (FIG. 7), and may be repeated at several points over time.

a. a positive result for any test immediately triggers dosing of the subject with the investigational medicament (drug/vaccine/placebo).
a. Alternatively, two positive results or three positive results are required to trigger dosing.
[000137] For each gene sub-panel the stringency can vary by modifying the threshold at which a positive result is obtained. For example, the threshold for gene sub-panel 1 could be set to be more stringent avoiding false positives. Gene sub-panel 2 is then set with lower stringency and gene sub-panel 3 with even lower stringency, thus increasing the chances of identifying and dosing all subjects who will develop acute symptoms of an influenza-like disease as early as possible.
[000138] In addition to the use of the gene sub-panels, the results may be combined with a diagnostic test that confirms the subject has the respiratory viral infection relevant to the trial (e.g. viral test).
[0001391 In addition to the use of the gene sub-panels, where a symptom diary card is used the results may be combined with measurements of the change in variance/gradient of symptoms.
[000140] Other actions that can be triggered alongside dosing with a medicament include increasing the observations/samples/measurements in those predicted to go on develop acute symptoms of influenza-like disease or reducing observations/samples/measurements in those predicted to not develop acute symptoms of an influenza-like disease.
[000141] By using this invention as part of the trial design decision making, only those most likely to develop acute symptoms of influenza-like disease are included in the statistical analysis of efficacy for the investigational medicament, which has the benefits as previously described.
[000142] In some instances, where the algorithms do not report a positive, the subjects may be dosed at a predetermined time points post exposure (e.g. Day 4, Day 5), with these subjects forming a secondary subgroup analysis.
Example 4 [000143] Subjects may become infected with a respiratory virus in the community or exposed to an infected person for an extended period in the community. Community settings can include at home (family member, household contact), at work, in transit (e.g. on a train, coach, plane, ship), within a care home (fellow resident, family visitor, carer), as an inpatient in hospital (fellow inpatient, healthcare worker, visitor), within military setting (fellow personnel). Following exposure, it would be beneficial to identify, in advance, people who will progress to have acute symptoms of an influenza-like disease, allowing an intervention at the earliest opportunity. Interventions include assisting referral to healthcare professionals, enabling earlier treatment with an antiviral than would otherwise be possible (for example Tamiflu), administration of an immunomodulator drug or combination of antiviral and immunomodulator, separation from others (quarantine), inclusion in a study (IMP trial, transmission study), initiate sampling for disease/biomarker monitoring.
[000144] A subject becomes infected with a respiratory virus in the community or is exposed to an infected person for a prolonged period in the community.
[000145] Post viral exposure blood samples may be collected and gene levels quantified following one or more triggers:
a. a positive diagnostic test (e.g. based on viral replication, diagnostic biomarker) b. initial symptoms of respiratory viral disease c. prolonged exposure to an infected contact [000146] Specific gene panels or gene sub-panels are measured real-time in the blood sample (e.g.
using a Point of Care test). The absolute value of each gene at a given time-point can be used or alternatively where a baseline gene level was obtained or is available, the gene levels post exposure to virus can be baseline normalised for each subject (i.e. compared to baseline).
[000147] In one scenario, three separate gene sub-panels arc used (i.e. three algorithms) to idcntify which individuals are likely to develop acute symptoms of an influenza-like disease a. The three subpanels of genes are:
i. Subpanel A: PHF20, ABCA1, APBA2, MORC2, SNU13 and Subpanel B: MAX, NOL9, MPRIP, HP, BST1 and TM9SF2 iii Subpanel C: HOMER3, NSUN6, HP, EPHA4 and BMP2K
[00014g] In another scenario one, two, or three gene sub-panels are used at the same (FIG. 7), and may be repeated at several time points.
a. a positive result for any test may trigger one or more of the following:
i. assisting referral to healthcare professionals, enabling earlier treatment with an antiviral than would otherwise be possible (e.g.Tamiflu), administration of an immunomodulator drug or combination of antiviral and immunomodulator, iv. separation from others (quarantine, using barriers to transmission e.g.
masks), v. inclusion in a study (e.g. IMP trial, disease study, transmission study), vi. initiate sampling for disease/biomarker monitoring h. Alternatively, two positive results or three positive results are required to trigger the actions, as listed above.
[000149] For each gene .. sub-panel the stringency can vary by modifying the threshold at which a positive result is obtained. For example, the threshold for gene sub-panel 1 could be set to be more stringent avoiding false positives. Gene sub-panel 2 is then set with lower stringency and gene sub-panel 3 with even lower stringency, thus increasing the chances of identifying and intervening as early as possible.
[000150] In addition to the use of the gene sub-panels, the results may be combined with a diagnostic test that confirms the subject has the respiratory viral infection (if not already performed).
Gene sequences 00015 11 Included below for each gene identified are the probe name specific to the GeneChip0 Human Genome U133A 2.0 Array (Affymetrix, Santa Clara, CA), the gene ID and the full sequence of the entire gene. It will be understood that the present invention is not limited to use of the GeneChip Human Genome U133A 2.0 Array; the expression levels of the one or more genes disclosed herein may be measured using any suitable method known to those skilled in the art, employing any suitable probe or probes which are capable of binding uniquely to respective nucleic acid sequences that represent the one or more genes.
Probe Name: 209422 at Gene ID: PHF20 (SEQ ID No: 1) Full gene sequence:

MTKHPPNRRG ISFEVGAQLE ARDRIKNWYP AHIEDIDYEE GKVIIHFKRW

NHRYDEWFCW DSPYLRPLEK IQLRIKEGLHE EDGSSEFQIN EQVLACWSDC

RFYPAKVTAV NKDGTYTVKF YDGVVQTVKH IHVKAFSKDQ NIVGNARPKE

TDHKSLSSSP DKREKFKEQR KLTVNVKKDK EDKPLKTEKR PKQPDKEGKL

ICSEKGKVSE KSLPKNEKED KENISENDRE YSGDAQVDKK PENDIVKSPQ

ENLREPKRKR GRPPSIAPTA VDSNSQTLQP ITLELRRRKI SKGCEVPLKR

31 .
PRLDKNSSQE KSKNYSENTD KDLSRRRSSR LSTNGTHEIL DPDLVVSDLV

DTDPLQDTLS STKESEEGQL ESALEAGOVS SALTCHSFGD GSGAAGLELN

CPSMGENTMK TEPTSPLVEL QEISTVEVTN TFKKTDDFGS SNAPAVDLDH

KFRCKVVDCL KFFRKAKLLH YHMKYFHGME KSLEPEESPG KREVQTRGPS

ASDKPSQETL TRKRVSASSP TTKDKEKNKE KKEKEFVRVK PKKKKKKKKK

TKPECPCSEE ISDTSQEPSP PKAFAVTRCG SSHKPGVHMS PQLHGPESGH

HKGKVKALEE DNLSESSSES FLWSDDEYGQ DVDVTTNPDE ELDGDDRYDF

EVVRCICEVQ EENDFMIQCE ECQCWQHGVC MGLLEENVPE KYTCYVCQDP

PGQRPGFKYW YDKEWLSRGH MHGLAFLEEN YSHQNAKKIV ATHQLLGDVQ

RVIEVIHGLQ LKMSILQSRE HPDLPLWCQP WKQHSGEGRS HFRNIPVTDT

RSKEEAPSYR TLNGAVEKPR PLAIPLPRSV EESYITSEHC YQKPRAYYPA

VEQKLVVETR GSALDDAVNP LHENGDDSLS PRLGWPLDQD RSKGDSDPKP

GSPKVKEYVS KKALPEEAPA RKLLDRGGEG LLSSQHQWQF NLLTHVESLQ

DEVTHRMDSI EKELDVLESW LDYTGELEPP EPLARLPQLK HCIKQLLMDL

GKVQQIALCC ST
Probe Name: 203504...s. at Gene ID: ABCA1 (SEQ ID No: 2) Full gene sequence:

MACWPQLRLL LWKNLTFRRR QTCQLLLEVA WPLFIFLILI SVRLSYPPYE

QHECHFPNKA MPSAGTLPWV QGIICNANNP CFRYPTPGEA PGVVGNFNKS

TVARLFSDAR RLLLYSQKDT SMKDMRKVLR TLQQIKKSSS NLKLQDFLVD

NETFSGFLYH NLSLPKSTVD KMLRADVILH KVFLQGYQLH LTSLCNGSKS

SKELAEATKT LLHSLGTLAQ ELFSMRSWSD MRQEVMFLTN VNSSSSSTQI

YQAVSRIVCG HPEGGGLKIK SLNWYEDNNY KALFGGNGTE EDAETFYDNS

TTPYCNDLMK NLESSPLSRI IWKALKPLLV GKILYTPDTP ATRQVMAEVN

KTFULAVFH DLEGMWEELS PKIWTFMENS QEMDLVRMLL DSRDNDHFWE

QQLDGLDWTA QDIVAFLAKH PEDVQSSNGS VYTWREAFNE TNQAIRTISR

32 .
FMECVNLRKL EPIATEVWLI NKSMELLDER KFWAGIVFTG ITPGSIELPH

HVKYKIRMDI DNVERTNRIK DGYWDPGPRA DPFEDMRYVW GGFAYLQDVV

EQAIIRVLTG TEKKTGVYMQ QMPYPCYVDD IFLRVMSRSM PLFMTLAWIY

SVAVIIKGIV YEKEARLKET MRIMGLDNSI LWFSWFISSL IPLLVSAGLL

VVILKLGNLL PYSDPSVVFV FLSVFAVVTI LQCFLISTLF SRANLAAACG

GIIYFTLYLP YVLCVAWQDY VGFTLKIFAS LLSPVAFGFG CEYFALFEEQ

GIGVQWDNLF ESPVEEDGFN LTTSVSMMLF DTFLYGVMTW YIEAVFPGQY

GIPRPWYFPC TKSYWFGEES DEKSHPCSNQ KRISEICMEE EPTHLKLGVS

IQNLVKVYRD GMKVAVDGLA LNFYEGQITS FLGHNGAGKT TTMSILTGLF

PPTSGTAYIL GKDIRSEMST IRQNLGVCPQ HNVLFDMLTV EEHIWFYARL

KGLSENHVKA EMEQMAIDVG LPSSKLKSKT SQLSGGMQRK LSVALAFVGG

SKVVILDEPT AGVDPYSRRG IWELLLKYRQ GRTIILSTHH MDEADVLGDR

IAIISHGKLC CVGSSLFLKN QLGTGYYLTL VKKDVESSLS SCRNSSSTVS

YLKKEDSVSQ SSSDAGLGSD HESDTLTIDV SAISNLIRKH VSEARLVEDI

GHELTYVLPY EAAKEGAFVE LFHEIDDRLS DLGISSYGIS ETTLEEIFLK

VAEESGVDAE TSDGTLPARR NRRAFGDKQS CLRPFTEDDA ADPNDSDIDP

ESRETDLLSC MDCKGSYQVK CWKLTQQQFV ALLWKRLLIA RRSRKGFFAQ

IVLPAVFVCI ALVFSLIVPP FGKYPSLELQ PWMYNEQYTF VSNDAPEDTG

TLELLNALTK DPGFGTRCME GNPIPDTPCQ AGEEEWTTAP VPQTIMDLFQ

NGNWTMQNPS PACQCSSDKI KKMLPVCPPG AGGLPPPQRK QNTADILQDL

TGRNISDYLV KTYVQIIAKS LKNKIWVNEF RYGGFSLGVS NTQALPPSQE

VNDAIKQMKK HLKLAKDSSA DRFLNSLGRF MTGLDTKNNV KVWFNNKGWH

AISSFLNVIN NAILRANLQK GENPSHYGIT ATNHPLNLTK QQLSEVALMT

TSVDVLVSIC VIFAMSFVPA SFVVFLIQER VSKAKHLQFI SGVKPVIYWL

SNFVWDMCNY VVPATLVIII FICFQQKSYV SSTNLPVLAI LLLLYGWSIT

PLMYPASFVF KIPSTAYVVL TSVNLFIGIN GSVATFVLEL FTDNKLNNIN

DILKSVFLIF PHFCLGRGLI DMVKNQAMAD ALERFGENRF VSPLSWDLVG

RNLFAMAVEG VVFFLITVLI QYRFFIRPRP VNAKLSPLND EDEDVRRERQ

33 =

RILDGGGQND ILEIKELTKI YRRKRKPAVD RICVGIPPGE CFGLLGVNGA

GKSSTFKMLT GDTTVTRGDA FLNKNSILSN IHEVHQNMGY CPQFDAITEL

LTGREHVEFF ALLRGVPEKE VGKVGEWAIR KLGLVKYGEK YAGNYSGGNK

RKLSTAMAII GGPPVVFLDE PTT GMDPKAR RFLWNCALSV VKEGRSVVLT

SHSMEECEAL CTRMAIMVNG RFRCLGSVQH LKNRFGDGYT IVVRIAGSNP

DLKPVQDFFG LAFPGSVLKE KHRNMLQYQL PSSLSSLARI FSILSQSKKR

LHIEDYSVSQ TTLDQVFVNF AKDQSDDDHL KDLSLHKNQT VVDVAVLTSF

LQDEKVKESY V
Probe Mame: 209870_s_at Gene ID: APBA2 (SEQ ID No: 3) Full gene sequence:

MAHRKLESVG SGMLDHRVRP GPVPHSQEPE SEDMELPLEG YVPEGLELAA

LRPESPAPEE QECHNHSPDG DSSSDYVNNT SEEEDYDEGL PEEEEGITYY

IRYCPEDDSY LEGMDCNGEE YLAHSAHPVD TDECQEAVEE WTDSAGPHPH

GHEAEGSQDY PDGQLPIPED EPSVIEAHDQ EEDGHYCASK EGYQDYYPEE

ANGNTGASPY RLRRGDGDLE DQEEDIDQIV AEIKMSLSMT SITSASEASP

EHGPEPGPED SVEACPPIKA SCSPSRHEAR PKSLNLLPEA KHPGDPQRGF

KPKTRTPEER LKWPHEQVCN GLEQPRKQQR SDLNGYVDNN NIPETKKVAS

FPSFVAVPGP CEPEDLIDGI IFAANYLGST QLLSERNPSK NIRMMQAQEA

VSRVKRMQKA AKIKKKANSE GDAQTLTEVD LFISTQRIKV LNADTQETMM

DHALRTISYI ADIGNIVVLM ARRRMPRSAS QDCIETTPGA QEGKEQYKMI

EMYNDDLIHF SNSENCKELQ LEKHKGEILG VVVVESGWGS ILPTVILANM

MNGGPAARSG KLSIGDQIMS INGTSLVGLP LATCQGIIKG LKNQTQVKLN

IVSCPPVTTV LIKRPDLKYQ LGFSVQNGII CSLMRGGIAE RGGVRVGHRI

IEINGQSVVA TAHEKIVQAL SNSVGEIHMK TMPAAMERLL TGQETPLYI
Probe Name: 216863at Gene ID: MORC2 (SEQ ID No: 4) Full gene sequence:

MAFTNYSSLN RAQLTFEYLH TNSTTHEFLF GALAELVDNA RDADATRIDI

YAERREDLRG GFMLCFLDDG AGMDPSDAAS VIQFGKSAKR TPESTQIGQY

GNGLKSGSMR IGKDFILFTK KEDTMTCLFL SRTFHEEEGI DEVIVPLPTW

NARTPEPVTD NVEKFAIETE LIYKYSPFRT EEEVMTQFMK IPGDSCTLVT

RIKEAKQRAL KEPKELNFVF GVNIEHRDLD GMFIYNCSRL IKMYEKVGPQ

LEGGMACGGV VGVVDVPYLV LEPTHNKQDF ADAKEYRHLL RAMGEHLAQY

WKDIAIAQRG IIKEWDEFGY LSANWNQPPS SELRYKRRRA MEIPTLIQCD

LCLKWRTLPF QLSSVEKDYP DTWVCSMNPD PEQDRCEASE QKQKVPLGTF

PSTEEPVRRP QRPRSPPLPA VIRNAPSRPP SLPTPRPASQ PRKAPVISST

PKLPALAARE EASTSRLLQP PEAPRKPANT LVKTASRPAP LVQQLSPSLL

EAERRKERCK RGRFVVKEEK KDSNELSDSA GEEDSADLKR AQKDKGLHVE

VRVNREWYTG RVTAVEVGKH VVRWKVKFDY VPTDTTPRDR WVEKGSEDVR

LMKPPSPEHQ SLDTQQEGGE EEVGPVAQQA IAVAEPSTSE CLRIEPDTTA

LSTNHETIDL LVQILRNCLR YFLPPSFPIS KKQLSAMNSD ELISFPLKEY

FKQYEVGLQN LCNSYQSRAD SRAKASEESL RTSERKLRET EEKLQKLRTN

IVALLQKVQE DIDINTDDEL DAYIEDLITK GD
Probe Name: 201076_at Gene ID: SNU13 (SEQ ID No: 5) Full gene sequence:

MTEADVNPKA YPLADAHLTK KLLDLVQQSC NYKQLRKGAN EATKTLNRGI

SEFIVMAADA EPLEIILHLP LLCEDKNVPY VFVRSKQALG RACGVSRPVI

ACSVTIKEGS QLKQQIQSIQ QSIERLLV
Probe Name: 219116_s_at Gene ID: DCUN1D2 (SEQ ID No: 6) Full gene sequence:

MNKLESSQKD KVRQFMACTQ AGERTATYCL TQNEWRLDEA TDSFFQNPDS

LHRESMRNAV DKKKLERLYG RYKDPQDENK IGVDGIQQFC DDLSLDPASI

SVLVIAWKFR AATQCEFSRK EFLDGMTELG CDSMEKLEAL LPRIEQELKD

TAKFKDFYQF TFTFAKNPGQ KGLDIEMAVA YWKLVLSGRF KFLDLWNTFL

MEHHKRSIPR DTWNLLLDFG NMIADDMSNY DEEGAWPVLI DDFVEYARPV
VTGGKRSLF
Probe Name: 214:1.08 at Gene ID: MAX (SEQ ID No: 7) Full gene sequence:

MSDNDDI EVE SDEEQPREQS AADKRAHHNA LERKRRDHIK DSFHSLRDSV

PSLQGEKASR AQILDKATEY IQYMRRKNHT HQQDIDDLKR QNALLEQQVR

ALEKARSSAQ LQTNYPSSDN SLYTNAKGST ISAFDGGSDS SSESEPEEPQ

SRKKLRMEAS
Probe Name: 210754 at Gene ID: NOL9 (SEQ ID No: 8) Full gene sequence:

MADSGLLLKR GSCRSTWLRV RKARPQLILS RRPRRRLGSL RWCGRPRLRW

RLLQAQASGV DWREGARQVS RAAAARRPNT ATPSPIPSPT PASEPESEPE

LFSASSCHRP LLTRPVPPVG PGRATLLLPV EQGFTFSGTC RVTCLYGQVQ

VEGFTISQGQ PAQDIFSVYT HSCLSIHALH YSQPEKSKKE LKREARNLLK

SHLNLDDRRW SMQNFSPQCS IVLLEHLKTA TVNFITSYPG SOYIEVQESP

TPQIKPEYLA LRSVGIPREK KRKGLQLTES TLSALEELVN VSCEEVDGCP

LNTTEPVLGP PFTHLPTPOK MVYYGKPSCK NNYENYIDIV KYVFSAYKRE

SPLIVNTMGW VSDQGLLLLI DLIRLLSPSH VVQFRSDHSK YMPDLTPQYV

DDMDGLYTKS KTKMRNPRFR LAAFADALEF ADEEKESPVE FTGBKLIGVY

TDFAFRITPR NRESHNKILR DLSIISYLSQ LQPPMPKPLS PLHSLTPYQV

PFNAVALRIT HSDVAPTHIL YAVNASWVGL CKIQDDVRGY TNGPILLAQT

PICDCLGFGI CRGTDMEKRL YHTLTPVPPE ELRTVNCLLV GATAIPHCVL

KCQRGIEGTV PYVTTDYNFK LPGASEKIGA REPEEAHKEK PYRRPKFCRK
MK
Probe Name: 212197 x at Gene ID: MPRIP (SE-6 ID No: 9) Full gene sequence:

MSAAKENPCR KFQANIFNKS KCQNCFKPRE SHLLNDEDLT QAKPIYGGWL

LLAPDGTDFD NPVHRSRKWQ RRFFILYEHG LLRYALDEMP TTLPQGTINM

TNKQNQKKKR KVEPPTPQEP GPAKVAVTSS SSSSSSSSSI PSAEKVPTTK

STLWQEEMRT KDQPDGSSLS PAQSPSQSQP PAASSLREPG LESKEEESAM

SSDRMDCGRK VRVESGYFSL EKTKQDLKAE EQQLPPPLSP PSPSTPNHRR

SQVIEKFEAL DIEKAEHMET NAVGPSPSSD TROGRSEKRA FPRKRDFTNE

APPAPLPDAS ASPLSPHRRA ESLDRRSTEP SVTPDLLNFK KGWLTKQYED

GQWKKHWFVL ADQSLRYYRD SVAEEAADLD GEIDLSACYD VTEYPVQPNY

GFQIHTKEGE FTLSAMTSGI RRNWIQTIMK HVHPTTAPDV TSSLPEEKNK

SSCSFETCPR PTEKQEAELG EPDPEQKRSR ARERRREGRS KTFDWAEFRP

IQQALAQERV GGVGPADTHE PLRPEAEPGE LERERARRRE ERRKREGMLD

EKQVPIAPVH LSSEDGGDRL STHEITSLLE KELEQSQKEA SDLLEQNPLL

QDQLRVALGR EQSAREGYVL QATCERGFAA MEETHQKKIE DLQRQHQREL

EKLREEKDRL LAEETAATIS AIEAMKNAHR EEMERELEKS QRSQISSVNS

DVEALRRQYL EELQSVQREL EVLSEQYSQK CLENAHLAQA LEAERQALRQ

CQRENQELNA HNQELNNRLA AEITRLRTLL TGDGGGEATG SPLAQGKDAY

ELEVLLRVKE SEIQYLKQEI SSLKDELQTA LRDKKYASDK YKDIYTELSI

AKAKADCDIS RLKEQLKAAT EALGEKSPDS ATVSGYDIMK SKSNPDFLKK

DRSCVTRQLR NIRSKSVIEQ VSWDT
Probe Name: 208470 s at Gene ID: HP (SEQ IT) No: 10) Full gene sequence:

MSALGAVIAL LLWGQLFAVD SGNDVTDIAD DGCPKPPEIA HGYVEHSVRY

QCKNYYKLRT EGDGVYTLND RKQWINKAVG DKLPECEADD GCPKPPEIAH

KPKNPANPVQ RILGGHLDAK GSFPWQAKMV SHHNLTTGAT LINEQWLLTT

AKNLFLNHSE NATAKDIAPT LTLYVGKKQL VEIEKVVLHP NYSQVDIGLI

KLKQKVSVNE RVMPICLPSK DYAEVGRVGY VSGWGRNANF KFTDHLKYVM

CYGDAGSARA VHDLEEDTWY ATGIISFDKS CAVAEYGVYV KVTSIQDWVQ
KTIAEN
Probe Name: 205715_at Gene ID: BST1 (SEQ ID No: 11) Full gene sequence:

MAAQGCAASR LLQLLLQLLL LLLLLAAGGA RARWRGEGTS AHLRDIFLGR

CAEYRALLSP EQRNKNCTAI WEAFKVALDK DPCSVLPSDY DLFINLSRHS

IPRDKSLFWE NSHLLVNSFA DNTRRFMPLS DVLYGRVADF LSWCRQKNDS

GLDYQSCPTS EDCENNPVDS FWKRASIQYS KDSSGVIHVM LNGSEPTGAY

PIKGFFADYE IPNLQKEKIT RIEIWVMHEI GGPNVESCGE GSMKVLEKRL

KDMGFQYSCI NDYPPVKLLQ CVDHSTHPDC ALKSAAAATQ RKAPSLYTEQ

RAGLIIPLFL VLASRTQL
Probe Name: 201076 at Gene ID: TM9SF2 (SEC? ID No: 12) Full gene sequence:

MSARLPVLSP PRWPRLLLLS LLLLGAVPGP RRSGAFYLPG LAPVNFCDEE

KKSDECKAEI ELFVNRLDSV ESVLPYEYTA FDFCQASEGK RPSENLGQVL

FGERIEPSPY KFTFNKKETC KLVCTKTYHT EKAEDKQKLE FLKKSMLLNY

QHHWIVDNMP VTWCYDVEDG QRFCNPGFPI GCYITDKGHA KDACVISSDF

HERDTFYIFN HVDIKIYYHV VETGSMGARL VAAKLEPKSF KHTHIDKPDC

WFSIMNSLVI VLFLSGMVAM IMLRTLHKDI ARYNQMDSTE DAQEEFGWKL

VHGDIFRPPR KGMLLSVFLG SGTQILIMTF VTLFFACLGF LSPANRGALM

TCAVVIWVIL GTPAGYVAAR FYKSFGGEKW KTNVILTSFL CPGIVFADFF

MYYMFGFLFL VFIILVITCS EATILLCYFH LCAEDYHWQW RSFLTSGFTA

VYFLIYAVHY FFSKLQITGT ASTILYFGYT MIMVLIFFLF TGTIGFFACF

WFVTKIYSVV KVD
Probe Name: 215489 x_at Gene ID: HOMER3 (SEQ ID No: 13) Full gene sequence:

MSTAREQPIF STRAHVFQID PATKRNWIPA GEHALTVSYF YDATRNVYRI

ISIGGAKAII NSTVTPNMTF TKTSQKFGQW ADSRANTVYG LGFASEQHLT

QFAEKFQEVK EAARLAREKS QDGGELTSPA LGLASHQVPP SPLVSANGPG

EEKLFRSQSA DAPGPTERER LKKMLSEGSV GEVQWEAEFF ALQDSNNKLA

GALREANAAA AQWRQQLEAQ RAEAERLRQR VAELEAQAAS EVTPTGEKEG

LGQGQSLEQL EALVQTKDQE IQTLKSQTGG PREALEAAER EETQQKVQDL

ETRNAELEHQ LRAMERSLEE ARAERERARA EVGRAAQLLD VSLFELSELR

EGLARLAEAA P
Probe Name: 222128 at Gene ID: NSUN6 (SEZ ID No: 15) Full gene sequence:

MSIFPKISLR PEVENYLKEG FMNKEIVTAL GKQEAERKFE TLLKHLSHPP

SETTVRVNTH LASVQHVKNL LLDELQKQFN GLSVPILQHP DLQDVLLIPV

IGPRKFIEKQ QCEAIVGAQC GNAVLRGAHV YAPGIVSASQ FMEAGDVISV

YSDIEGKCKK GAKEFDGTKV FLGNGISELS RKEIFSGLPE LKGMGIRMTE

PVYLSPSFDS VLPRYLFLQN LPSALVSHVL NPQPGEKILD LCAAPGGETT

HIAALMHDQG EVIALDEIFN KVEKIKONAL LLGLWSIRAF CFDGTKAVKL

DMVEDTEGEP PFLPESFDRI LLDAPCSGMG QRPNMACTWS VKEVASYQPL

QRKLETAAVQ LLKPEGVLVY STCTITLAEN EEQVAWALTK FPCLQLQPQE

PQIGGEGMRG AGLSCEQLKQ LQRFDPSAVP LPDTDMDSLR EARREDMIRL

ANKDSIGFFI AKFVKCKST
Probe Name: 206114_at Gene ID: EPHA4 (SEQ ID No: 16) Full gene sequence:

MAGIFYFALF SCLFGICDAV TGSRVYPANE VTLLDSRSVQ GELGWIASPL

EGGWEEVSIM DEKNTPIRTY QVCNVMEPSQ NNWLRTDWIT REGAQRVYIE

IKFTIRDCNS LPGVMGTCKE TFNLYYYESD NDKERFIREN QFVKIDTIAA

DESFTQVDIG DRIMKLNTEI RDVGPLSKKG FYLAFQDVGA CIALVSVRVF

YKKCPLTVRN LAQFPDTITG ADTSSLVEVR GSCVNNSEEK DVPKMYCGAD

GEWLVPIGNC LCNAGHEERS GECQACKIGY YKALSTDATC AKCPPHSYSV

WEGATSCTCD RGFFRADNDA ASMPCTRPPS APLNLISNVN ETSVNLEWSS

PQNTGGRQDI SYNVVCKKCG AGDPSKCRPC GSGVHYTPQQ NGLETTKVSI

TDLLAHTNYT FEIWAVNGVS KYNPNPDQSV SVTVTTNQAA PSSIALVQAK

EVTRYSVALA WLEPDRPNGV ILEYEVKYYE KDQNERSYRI VRTAARNTDI

EGLNPLTSYV FHVRARTAAG YGDFSEPLEV TTNTVPSRII GDGANSTVLL

VSVSGSVVLV VILIAAFVIS RRRSKYSKAK QEADEEKHLN QGVRTYVDPF

TYEDPNQAVR EFAKEIDASC IKIEKVIGVG EFGEVCSGRL KVPGKRETCV

AIKTLKAGYT DKQRRDELSE ASIMGQFDHP NIIHLEGVVT KCKPVMIITE

YMENGSLDAF LRENDGRFTV IQLVGMLRGI GSGMKYLSDM SYVHRDLAAR

40 .

NILVNSNIVC KVSDFGMSRV LEDDPEAAYT TRGGKIPIRW TAPEAIAYRK

FTSASDVWSY GIVMWEVMSY GERPYWDMSN QDVIKAIEEG YRLPPPMDCP

ALLDPSSPEF SAVVSVGDWL QAIKMDRYKD NFTAAGYTTL EAVVHVNQED

LARIGITAIT HQNKILSSVQ AMRTQMQQMH GRMVPV
Probe Name: 59644 at Gene ID: BMP2K (SEQ ID No: 17) Full gene sequence:

MKKFSRMPKS EGGSGGGAAG GGAGGAGAGA GCGSGGSSVG VRVFAVGRHQ

VTLEESLAEG GESTVELVRT HGGIRCALKR MYVNNMPDLN VCKREITIME

ELSGHNNIVG YLDCAVNSIS DNVWEVLILM EYCRAGQVVN QMNKELQTGF

TEPEVIQIFC DTCEAVARLH QCKTPIIHRD LKVENILLND GGNYVLCDFG

SATNKFLNPQ KDGVNVVEEE IKKYTTLSYR APEMINLYGG KPITTKADIW

ALGCLLYKLC FFTLPFGESQ VAICDGNFTI PDNSRYSRNI HCLIRFMLEP

DPEHRPDIFQ VSYFAFKFAK KDCPVSNINN SSIPSALPEP MTASEAAARK

SQIKARITDT IGPTETSIAP RQRPKANSAT TATPSVITIQ SSATPVKVLA

PGEFGNHRPK GALRPCNGPE ILLGQCPPQQ PPQQHRVLQQ LQQGDWRLQQ

LHLQHRHPHQ QQQQQQQQQQ QQQQQQQQQQ QQQQQQHHHH HHHHLLQDAY

MQQYQHATQQ QQMLQQQFLM HSVYQPQPSA SQYPTMMPQY QQAFFQQQML

AQHQPSQQQA SPEYLTSPQE FSPALVSYTS SLPAQVGTIM DSSYSANRSV

ADKEAIANFT NQKNISNPPD MSGWNPFGED NFSKLTEEEL LDREFDLLRS

NRLEERASSD KNVDSLSAPH NHPPEDPFGS VPFISHSGSP EKKAEHSSIN

QENGTANPIK NGKTSPASKD QRTGKKTSVQ GQVQKGNDES ESDFESDPPS

PKSSEEEEQD DEEVLQGEQG DFNDDDTEPE NLGHRPLLMD SEDEEEEEKH

SSDSDYEQAK AKYSDMSSVY RDRSGSGPTQ DLNTILLTSA QLSSDVAVET

PKQEFDVFGA VPFFAVRAQQ PQQEKNEKYL PQHRFPAAGL EQEEFDVETK

APFSKKVNVQ ECHAVGPEAH TIPGYPKSVD VEGSTPFQPF LTSTSKSESN

EDLFGLVPFD EITGSQQQKV KQRSLQKLSS RQRRTKQDMS KSNGKRHHGT

PTSTKKTLKP TYRTPERARR FIKKVGRRDSQ SSNEFLTISD SKENISVALT

DGKDRGNVIQ PEESLLDPFG AKPFHSPDLS WHPVHQGLSD IRADHNTVLP

GRPRQNSLHG SFHSADVLKM DDFGAVPFTE LVVQSITPHQ SQQSQPVELD

PFGAAPFPSK
References Benjamini, Yoav; Hochberg, Yosef (1995). Controlling the false discovery rate:
a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B. 57 (1):

Cox NJ, Subbarao K. Influenza. Lancet 1999;354:1277-82 DeVincenzo et al 2014, Oral GS-5806 activity in a respiratory syncytial virus challenge study. N
Engl J Med. 2014 Aug 21;371(8):711-22.
DeVincenzo et al 2015, Activity of Oral ALS-008176 in a Respiratory Syncytial Virus Challenge Study. N Engl J Med. 2015 Nov 19;373(21):2048-58.
Friedman JH (2001). Greedy function approximation: a gradient boosting machine. Ann Statist 29(5): 1189-1232.
Friedman JH (2002). Stochastic gradient boosting. Comput Stat Data Anal 38(4):
367-378.
Hodinka, "Respiratory RNA Viruses", Microbiol Speen., 2016 Aug; 4(4) Lau, F. L. El.., Cc.twlingõ B. , Fang, V J. C.han, K -H., Lau, E. H. Y., Lipsitch, M.õ ... Leung; G.
M. (2010). Viral shedding and clinical illness in naturally acquired influenza virus infections. The Journal opiyectious Diseases, 20410), 1509-1516 Liu, T.Y., et al., An individualized predictor of health and disease using paired reference and target samples. BMC Bioinformatics, 2016. 17: p. 47 Molinari, N. M., et al, (2007), The annual impact of seasonal influenza in the US: Measuring disease burden and costs, Volume 25 Issue 27, 28 June 2007, Pages 5086-5096 Rolfes MA, Foppa IM, Garg S, Flannery B, Brammer L, Singleton JA, et al.
Estimated Influenza Illnesses, Medical Visits, Hospitalizations, and Deaths Averted by Vaccination in the United States. 2016 Dec 9 Straube J, Gorse A-D, PROOF Centre of Excellence Team et al (2015). A linear mixed model spline framework for analysing time course comics data. PLoS ONE 10(8):
e0134540.
Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-Seq: a revolutionary tool for trail scriptomics. Nature Reviews. Genetics, 10(1), 57-63.
Woods CW, McClain MT, Chen M, Zaas AK et al (2013). A host transcriptional signature for presymptomatic detection of infection in humans exposed to influenza H1N1 or H3N2. PLoS
ONE 8(1): e52198.
Zaas, A. K., et al., Gene expression signatures diagnose influenza and other symptomatic respiratory viral infections in humans. Cell Host Microbe, 2009. 6(3): p. 207-17.

Claims

43

1. A method of predicting whether a subject will develop acute symptoms of disease after exposure, or possible exposure, to a respiratory virus, which comprises analysing a biological sample obtained from the subject for a biomarker and comparing the biomarker to a reference for the biomarker, wherein the biomarker comprises or is derived from expression levels of one or more genes selected from a gone panel comprising PHF20, ABCA1, APBA2, MORC2, SNU13, DCUN1D2, MAX, NOL9, MPRIP, HP, BST1, TM9SF2, HOMER3, NSUN6, EPHA4 and BMP2K.

2. A method according to claim 1, wherein the gene panel consists of one, two, three, four, five, or six genes selected from PHF20, ABCA1, APBA2, MORC2, SNU13, DCUN1D2, MAX, NOL9, MPRIP, HP, BST1, TM9SF2, HOMER3, NSUN6, EPHA4 and BMP2K.

3. A method according to claim 1 or claim 2, wherein the biomarker comprises expression levels of one or more genes selected from a first gene sub-panel comprising PHF20, ABCA1, APBA2, MORC2, SNU13 and DCUN1D2.

4. A method according to claim 3, wherein the first gcnc sub-pancl compriscs the expression level of PHF20.

5. A method according to claim 4, wherein the first gene sub-panel further comprises the expression level of one or both of APBA2 and ABCAI.

6. A method according to claim 5, wherein the first gene sub-panel further comprises the expression level of one, two or three of MORC2, SNU13 and DCUN1D2.

4. A method according to any one of claims 3 to 6, wherein the first gene sub-panel consists of one, two, three, four, five, or six of PHF20, ABCA1, APBA2, MORC2, SNU13 and DCUN1D2.

8. A method according to claim 1 or claim 2, wherein the biomarker comprises expression levels of one or more genes selected from a second gene sub-panel comprising MAX, NOL9, MPRIP, HP, BST1 and TM9SF2.

9. A method according to claim 8, wherein the second gene sub-panel comprises thc expression level of one or more of NOL9, HP and MAX.

10. A method according to claim 9, wherein the second gene sub-panel further comprises the expression level of one or both of BSTI and MPRIP.

11. A method according to claim 10, wherein the second gene sub-panel further comprises the expression level of TM9SF2.

12. A method according to any one of claims 8 to 11, wherein the second gene sub-panel consists of one, two, three, four, five, or six of MAX, NOL9, MPRIP, HP, BST1 and TM9SF2.

13. A method according to claim 1 or claim 2, wherein the biomarker comprises expression levels of one or more genes selected from a third gene sub-panel comprising HOMER3, NSUN6, HP, EPHA4 and BMP2K.

14. A method according to claim 13, wherein thc third gene sub-panel comprises the expression level of one or both of HP and HOMER3.

15. A method according to claim 14, wherein the third gene sub-panel further comprises the expression level of one or both of EPHA4 and BMP2K.

16. A method according to claim 15, wherein the third gene sub-panel further comprises the expression level of NSUN6.

17. A method according to any one of claims 13 to 16, wherein the third gene sub-panel consists of one, two, three, four or five of HOMER3, NSUN6, HP, EPHA4 and BMP2K.

18. A method according to any one of claims 3 to 17, wherein the biomarker is associated with the relative time course progression towards developing acute symptoms of disease, such that the first gene sub-panel is associated with an early stage during progression towards developing acute symptoms of disease, the second gene sub-panel is associated with a middle stage during progression towards developing acute symptoms of disease, and the third gene sub-panel is associated with a later stage during progression towards developing acute symptoms of disease.

19. A method according to any one of the preceding claims, wherein the biological sample is obtained from the subject up to about 25 hours after exposure, or possible exposure, to the respiratory virus.

20. A method according to claim 19, wherein the biomarker comprises expression levels of one, two, three, four, five, six or more genes selected from the first gene sub-panel defined in any one of claims 3 to 7.

21. A method according to any one of claims 1 to 18, wherein the biological sample is obtained from the subject about 37-49 hours after exposure, or possible exposure, to the respiratory virus.

22. A method according to claim 21, wherein the biomarker comprises expression levels of one, two, three, four, five, six or more genes selected from the second gene sub-panel defined in any one of claims 8 to 12.

23. A method according to any one of claims 1 to 18, wherein the biological sample is obtained from the subject about 49-61 hours after exposure, or possible exposure, to the respiratory virus.

24. A method according to claim 23, wherein the biomarker comprises expression levels of one, two, three, four, five or more genes selected from the third gene sub-panel defined in any one of claims 13 to 17.

25. A method according to any one of the preceding claims, wherein the biomarker is computer-generated and comprises an output variable of a classification algorithm that uses as input variables the expression levels of one or more genes in the gene panel; or one or more genes in the first gene sub-panel; or one or more genes in the second gene sub-panel; or one or more genes in the third gene sub-panel.

26. A mcthod according to claim 25, whcrcin thc output variable includes a numerical value.

27. A method according to claim 25 or claim 26, wherein the classification algorithm is derived by machine-learning from a training data-set that uses as input variables expression levels of one or more genes from the gene panel measured from a biological sample obtained from a group of subjects at a predetermined time after exposure to the respiratory virus, wherein the group of subjects is divided into two classes according to whether or not they developed acute symptoms of disease after exposure to the respiratory virus, and wherein the classification algorithm operates on the expression levels to produce an output variable that differentiates between the classes.

28. A method according to claim 27, wherein the classification algorithm comprises a generalised regression-based algorithm or decision tree.

29. A method according to claim 28, wherein the classification algorithm is configured to prioritise accuracy.

30. A method according to claim 28, wherein the classification algorithm is configured to prioritise negative predictive value (NPV).

31. A method according to any one of claims 27 to 30, wherein the acute symptoms of disease in the subjects of the group in the training data set is assessed by evaluating one or more symptoms of disease at a series of pre-set times after exposure to the respiratory virus.

32. A method according to claim 31, wherein the one or more symptoms are evaluated by the subjects using diary cards, optionally visual analogue score symptom diary cards (VAS), or optionally categorical symptoms (CAT) arc recorded using a modified standardized symptom score for example the modified Jackson Score.

33. A method according to claim 31 or claim 32, wherein the two classes of subjects in the training data set are differentiated by one or more parameters based on the evaluation of the one or more symptoms including runny nose, stuffy nose, sore throat, sneezing, earache, cough, shortness of breath, headache, malaise, myalgia, muscle and/or joint aches, chilliness, and feverishness.

34. A method according to claim 32 or claim 33, wherein the first class contains subjects which record a total VAS of greater than or equal to 25 units and/or a total CAT score of 10 units or greater.

35. A method according to any one of claims 32 to 34, wherein the first class contains subjects that show one or more of: greatest variance in total VAS or CAT up to the peak of symptoms; greatest variance in total VAS or CAT over the duration of quarantine; or steepest gradient (slope of regression line) of total VAS or CAT up to the peak of symptoms.

36. A method according to any one of claims 27 to 35, wherein the gene panels and gene sub-panels are selected by: i) analysing expression levels in biological samples obtained from the group of subjects in the data training set across the whole series of pre-set times after exposure to the virus; and ii) identifying genes that show a nominal association with acute symptoms of disease, and iii) using a variable selection process to select panels of the identified genes whose expression levels at a predetermined time after exposure to the virus exhibit maximal predictive value for developing acute symptoms of disease.

37. A method according to claim 36, wherein the variable selection process comprises subjecting the expression levels of the identified genes at the predetermined time after exposure to the respiratory virus to a repeated gradient boosting process and selecting a set of 1, 2, 3, 4, 5 or 6 genes that are selected most frequently by the gradient boosting process.

38. A method according to any one of the preceding claims, wherein the biomarker is compared to a baseline for the biomarker, wherein the baseline for the biomarker is determined prior to exposure, or possible exposure, of the subject to the respiratory virus.

39. A method according to any one of the preceding claims, wherein the subject has been administered a medicinal product before or after exposure or possible exposure to the respiratory virus.

40. A method according to any one of the preceding claims, wherein the subject has had a positive diagnostic test for respiratory viral disease, presents with symptoms of respiratory viral disease, and/or has had prolonged exposure to at least one other person who is infected with a respiratory virus.

41. A method according to any one of the preceding claims wherein the subject is tested two or more times for the same or a different biomarker as defined in any one of claims 1 to 31 and the subject is indicated as predicted to develop acute symptoms if the result of at least one or more than one of the tests are positive.

42. A method according to claim 41, wherein the thresholds at which a positive result is obtained are different for the two or more tests, the threshold for at least one test being configured to minimise false positives, and the threshold for at least another test being configured to have fewer false negatives than the one test.

43. A method according to any one of the preceding claims, further comprising administering a therapeutic or prophylactic treatment to the subject if they arc predicted to develop acute symptoms.

44. A method according to claim 43, wherein the treatment comprises administration of an antiviral or immunom odulatory agent.

45. A method of predicting whether a subject will develop acute symptoms of disease after exposure, or possible exposure, to a respiratory virus, which comprises estimating time elapsed after the exposure, or possible exposure, to the respiratory virus by analysing expression levels of one or more genes selected from PHF20, ABCA1, APBA2, MORC2, SNU13, DCUN1D2, MAX, NOL9, MPRIP, HP, BST1, TM9SF2, HOMER3, NSUN6, EPHA4, in a biological sample obtained from the subject;
selecting a biomarker as defined in any one of claims 1 to 37, which at said time exhibits maximal predictive value for developing acute symptoms of disease; and comparing the biomarker to a reference for the biomarker.

46. A method of conducting a clinical trial or field study in which a group of subjects are exposed to a respiratory virus, the method comprising analysing a biomarker as defined in any one of claims 1 to 237 for each subject and comparing the biomarker to a reference for the biomarker to predict whether the subject is likely to develop acute symptoms of disease, and including subjects who are predicted to develop acute symptoms of disease in a first subgroup of the clinical trial or field study.

47. A method according to claim 46, wherein the biomarker is compared to a baseline for the biomarker, wherein the baseline for the biomarker is determined prior to exposure, or possible exposure, of the subjects to the respiratory virus.

48. A method according to claim 46 or claim 47, wherein subjects in the first subgroup are administered a medicament after being predicted to develop acute symptoms of disease.

49. A method according to any one of claims 46 to 48, further comprising including subjccts who arc predicted not to develop acute symptoms of influenza-like disease in a second subgroup.

50. A method according to claim 49, wherein subjects in the second subgroup are not administered a medicament during the trial or study, or arc administered a medicament at a predetermined time after commencing the trial or study.

51. A method according to any one of the preceding claims, wherein the respiratory virus is respiratory syncytial virus (RSV), parainfluenza virus (HP1V), metapneumovirus (HMPV), rhinovirus (HRV), coronavirus, adenovirus (HAdV), enterovirus (EV), bocavirus (HBoV), parechovirus (HPeV) or an influenza virus.

52. A method according to any onc of the preceding claims, wherein the biological sample is a blood or respiratory sample.

53. A method according to any one of the preceding claims, wherein the expression level of the one or more genes is measured by quantifying mRNA transcripts of the one or more genes in the biological sample.

54. A method according to any one of the preceding claims, wherein the mRNA
transcripts in a biological sample are quantified by one or more of; PCR based methods such as RT-qPCR; or gene expression microarray; or RNA-seq.

55. A computer program for prcdicting whcthcr a subject will develop acute symptoms of disease after exposure, or possible exposure, to a respiratory virus, which comprises instructions which, when the program is executed by a computer, cause the computer to generale a biomarker as defined in any onc of claims 1 to 3 1.

56. A computer program according to claim 55, wherein the computer program compares the biomarker to a reference for the biomarker.

57. A computer program according to claim 55 or claim 56, wherein the computer program compares the biomarker to a baseline for the biomarker.

58. A classification algorithm for predicting whether a subject will develop acute symptoms of disease after exposure, or possible exposure, to a respiratory virus, wherein the classification algorithm is derived by analysing expression levels of one or more genes in subjects who have developed acute symptoms of disease and comparing with the expression levels in subjects who do not develop acute symptoms of disease, wherein the one or more genes are PHF20, ABCA1, APBA2, MORC2, SNU13 DCUN1D2, MAX, NOL9, MPRIP, HP, BST1, TM9SF2, HOMER3, NSUN6, EPHA4 and BMP2K.

59. A classification algorithm according to claim 58, wherein the acute symptoms of disease in a subject are assessed by evaluating one or more symptoms of disease.

60. A classification algorithm according to claim 58 or 59, wherein the classification algorithm is computer-implemented and comprises receiving in a computer a data set comprising expression levels of the one or more genes from one or more subjects and executing on the computer software to predict whether the one or more subjects will develop acute symptoms of disease.

61. A computer readable medium and/or computer program comprising instructions which, when executed by a computer, cause the computer to carry out the classification algorithm according to any one of claims 58 to 60.

62. A computer-implemented method for predicting whether a subject will develop acute symptoms of disease wherein a biomarker is generated by analysing expression levels of one or more genes in subjects who have developed acute symptoms of disease following inoculation with a respiratory virus and comparing with the expression levels in subjects who do not develop acute symptoms of disease following inoculation with a respiratory virus, wherein the one or more genes are PHF20, ABCA1, APBA2, MORC2, SNU13 DCUN1D2, MAX, NOL9, MPR1P, HP, BST1, TM9SF2, HOMER3, NSUN6, EPHA4 and BMP2K.

63. A computer implemented method according to claim 62, wherein the method comprises a graphical user interface which displays the biomarker to the user.