WO2022025825A1 - Methods and kits for determining integrity of viral rna - Google Patents

Methods and kits for determining integrity of viral rna Download PDF

Info

Publication number
WO2022025825A1
WO2022025825A1 PCT/SG2021/050439 SG2021050439W WO2022025825A1 WO 2022025825 A1 WO2022025825 A1 WO 2022025825A1 SG 2021050439 W SG2021050439 W SG 2021050439W WO 2022025825 A1 WO2022025825 A1 WO 2022025825A1
Authority
WO
WIPO (PCT)
Prior art keywords
virus
insert
viral
bps
biological sample
Prior art date
Application number
PCT/SG2021/050439
Other languages
French (fr)
Inventor
Yukti CHOUDHURY
Jing Shan LIM
Min-Han Tan
Original Assignee
Lucence Life Sciences Pte. Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lucence Life Sciences Pte. Ltd filed Critical Lucence Life Sciences Pte. Ltd
Publication of WO2022025825A1 publication Critical patent/WO2022025825A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/70Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving virus or bacteriophage
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A50/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
    • Y02A50/30Against vector-borne diseases, e.g. mosquito-borne, fly-borne, tick-borne or waterborne diseases whose impact is exacerbated by climate change

Definitions

  • the present invention relates to molecular biology, in particular the detection of viruses using molecular biology techniques.
  • the present invention relates to the detection of viruses in a biological sample obtained from a patient to determine the integrity of the viral genome or the infectivity of the virus in the biological sample.
  • RT-PCR assays are widely regarded as the most sensitive diagnostic method, the inability of such assays in distinguishing intact viral RNAs (which are packaged in virions and able to cause onward transmission) from products of degradation of viral RNAs (which are ineffectual in causing further infection) has led to considerable debate on the dichotomous interpretation of RT-PCR results as positive/negative. Positive detection of viral RNAs does not prove presence of infectious virus, as persistent detection of viral RNA in samples has been reported up to 35 days from symptom onset, and after resolution of clinical symptoms of both mild and severe disease.
  • the present disclosure refers to a method of determining integrity of a viral genome or infectivity of a virus in a biological sample, said method comprising: a) extracting viral nucleic acid molecules from the biological sample to generate a nucleic acid library; b) providing a plurality of primer pairs wherein,
  • each primer pair comprises a forward primer and a reverse primer, wherein the forward primer comprises, from the 5’ end to the 3’ end, a first adapter sequence (AS1), and a forward target-specific sequence (TS1); wherein the reverse primer comprises, from the 5’ end to the 3’ end, a second adapter sequence (AS2), and a reverse target specific sequence (TS2);
  • each primer pair with a unique combination of forward and reverse target sequences is for amplifying a different target region of the viral genome; c) subjecting the nucleic acid library to multiplex PCR using the plurality of primer pairs to amplify a plurality of amplicons; d) subjecting the plurality of amplicons to amplification and sequencing, to obtain amplicon sequences; e) detecting and mapping the amplicon sequences to a reference genome of the viral genome; f) determining the insert size of each amplicon, wherein the insert size is the number of nucleotide bases or base pairs between the forward and reverse primer of each amplicon; g) categorizing the amplicons into groups based on insert size, wherein each group comprises a range of insert sizes; h) enumerating the number of amplicons in each group; i) obtaining one or more insert size ratios of the number of amplicons in one group to the number of amplicons in another group; j) determining
  • the present disclosure refers to a kit for determining the integrity of a viral genome or infectivity of a virus in a biological sample, comprising: a) a first pair of primers comprising a forward primer and a reverse primer designed for amplifying a first amplicon having a first insert, wherein the first insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the smallest as determined by the method as disclosed herein; b) a second pair of primers comprising a forward primer and a reverse primer designed for amplifying a second amplicon having a second insert, wherein the second insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the second smallest as determined by the method as disclosed herein; c) a first probe to detect the first amplicon; and d) a second probe to detect the second amplicon.
  • Fig. 1 shows the design of a highly multiplexed PCR-based NGS assay which allows capture of variable lengths of RNA fragments.
  • Fig. 1A illustrates the definitions of amplicon and insert in a polymerase chain reaction (PCR) and multiplex PCR-based next-generation sequencing (NGS) assay.
  • Fig. IB shows the origin of inserts of variable lengths by amplicon- based sequencing. Primer pairs in subsequence are shown from primer 51 to 61. Inserts of length 98 bps and 103 bps are formed as a result of the expected interaction between 51-F and 51-R and between 57-F and 57-R.
  • PCR polymerase chain reaction
  • NGS next-generation sequencing
  • RNA template is intact, longer inserts can also be captured by formation of products between, for example, 51-F and 53-R, and 57-F and 59-R. These inserts would have lengths of 291 bps and 303 bps respectively. Potentially, other primers in subsequence which are even further from one another (for example, 51-F and 55-R) can capture even longer inserts of length 464 bps.
  • Fig. 1C shows that in samples with greater degree of fragmentation or low viral RNA amount, the shorter inserts would be formed favorably as compared to the longer inserts, due to the lack of long templates to be amplified by the more distant primer pairs.
  • Fig. 2 are histograms and dot plots showing that SARS-CoV-2 fragments of differing lengths can be detected by NGS as short or long inserts with varying relative abundances.
  • Fig. 2A shows histogram of sequencing read counts supporting insert lengths of ranges 70-150 bps, 220-350 bps, and 400-520 bps. Insert length distributions in longitudinal saliva samples from a single case are shown as examples. The relative abundance of insert lengths changes from a dominant 220-350 bps (long inserts) distribution in a sample taken 3 days from symptom onset, to a more abundant 70-150 bps (short inserts) distribution prominent in a sample taken 5 days from symptom onset. Insert lengths are representative of viral RNA fragments lengths.
  • Fig. 2B is a dot plot showing that long and short fragment counts are each correlated with the Ct value of the sample measured by the CDC RT-PCR assay. Long fragments have a steeper decline in abundance, becoming differentially fewer compared to short fragments at Ct values greater than 25. URT (NP and nasal swab) and saliva samples are plotted together.
  • Fig. 2C is a dot plot showing that viral fragmentation score (VFS) is moderately correlated with Ct value of a sample, in both URT and saliva samples. R 2 values of correlation are shown. Half-filled circles and triangles represent asymptomatic cases.
  • VFS viral fragmentation score
  • Light gray circles represent synthetic RNA samples of known integrity that generate a considerably constant VFS (median: 0.384, IQR: 0.289-0.449) across Ct values 24.4-33.45 over 500-fold serial dilution of synthetic RNA. Numbers next to grey circles are the concentrations in copies/ml of synthetic RNA.
  • Fig. 3 shows dot plots for Ct values and viral fragmentation scores (VFS) for synthetic SARS-CoV-2 RNA samples (having a known degree of integrity of 5000-bp fragments) with different numbers of copies of RNAs. Mean values of three replicate experiments are shown. Ct values for the same serially diluted samples obtained by the CDC RT-PCR assay varied from 24.4-33.45, reflective of the range of 500-fold difference between the highest and lowest copy numbers used. Sequencing libraries of dilutions of the synthetic RNA showed viral fragmentation scores which did not decrease with increasing Ct (VFS: 0.384, IQR: 0.289-0.449). The results show that synthetic SARS-CoV-2 RNA samples in serial dilutions have expectedly increasing Ct but stable viral fragmentation score (VFS).
  • VFS viral fragmentation scores
  • Fig. 4 are dot plots showing that viral fragmentation score is related to the clinical time-course of SARS-CoV-2 infection.
  • Fig. 4A and 4B show that in symptomatic cases, the relative amount of long fragments versus short fragments (represented by VFS) decreases with days since symptoms onset in both URT and saliva samples. With the exception of one saliva sample (unfilled circle), beyond 8 days since symptoms onset, VFS does not exceed the cut off of 0.38 (dotted line) estimated for intact RNA based on synthetic SARS-CoV-2 (upper left quadrants). For samples collected ⁇ 8 days since symptoms onset, 15 (40.5%) of 37 URT samples and 16 (55%) of 29 saliva samples exceeded the cut-off (lower right quadrant).
  • Fig. 4C shows that most (96%) samples from asymptomatic cases had low VFS.
  • Fig. 4D shows that no significant difference was observed between VFS from samples collected on day 0 of diagnosis and later samples in asymptomatic cases (URT and saliva samples combined). Median and IQR lines are shown.
  • Fig. 4E shows matched longitudinal samples from 15 symptomatic cases (left) and 4 asymptomatic cases (right) and VFS. Each case is colored by a different line. Lines marked by grey ends are saliva samples while the rest are URT samples.
  • Fig. 5 are dot plots showing profiles of subgenomic RNA (sgRNA) with relation to viral fragmentation score.
  • Fig. 5A shows that across all sample types, sgRNA expression begins to tend to zero when VFS is low, and a cut-off of 0.382 (dotted line) separates 33 samples (all of which have VFS > 0.382 and sgRNA detected) and 58 (54%) of 107 samples with VFS ⁇ 0.382 having no sgRNA.
  • Fig. 5B shows significant difference in VFS between samples with zero or any sgRNA detected. ***Mann-Whitney test p ⁇ 0.0001. Dotted line is VFS cut-off of 0.382.
  • Fig. 5C shows that S gene and N gene abundances are inversely related to the VFS, but other abundant sgRNA E gene and ORF3a are not.
  • Fig. 6 are dot plots showing that viral fragmentation score can be captured in a multiplex RT-PCR assay for long (240 bps) and short (70 bps) amplicons.
  • Fig. 6A shows translation of the VFS obtained from NGS to an RT-PCR assay with calculation of difference in Ct value of short to long amplicon (Ct_70bp-Ct_240bp) for 20 clinical samples. The deltaCt value is referred to as the viral fragmentation index (VFI).
  • the dots on the left represent samples with no detection of 240 bps amplicon, for which Ct values are assigned to 40. Correlation of VFS and VFI for samples within the box is shown.
  • Fig. 6B is VFI from RT- PCR assay showing a correlation with days since diagnosis (for the 7 samples not plotted due to non-detection of 240 bps, the days from diagnosis were > 7).
  • Fig. 7 shows visualization of inserts of two different lengths formed by the forward primer 153F.
  • the intended insert formed between primers 153F-153R is in dark gray and is of 113 bps long, while the longer insert formed between 153F-157R is of 274 bps long.
  • the alignment is shown with reference to the SARS-CoV-2 genome.
  • Fig. 8 shows histograms showing the number of count of inserts of different sizes.
  • the inserts are of predetermined sizes, and typically fall into one of three ranges visually classified as 70-150 bps, 220-350 bps, and 400-520 bps.
  • Fig. 8A shows the histogram for a sample with greater abundance of 70-150 bps inserts.
  • Fig. 8B shows the histogram for sample with greater abundance of 220-350 bps inserts.
  • Fig.8C and 8D show that the greatest variation in insert sizes are in the F_R and F_R+1 size ranges.
  • Fig. 9 shows histograms showing the number of inserts of different sizes.
  • the calculated insert size ratios ratio between inserts of range 151-350 bps and 0-150 bps) in clinical samples are also included.
  • Fig. 10 shows the relationship between NGS-derived insert size ratios and PCR- derived product size ratios in clinical samples.
  • Fig. 10A shows that the longer 253 bps amplicon (corresponding to the 206 bps insert) representing the more intact template, and the shorter amplicon of 66 bps (corresponding to the 18 bps insert) representing fragmentation products.
  • Each column Cl, Gl, C2, etc. represents a different sample.
  • Top panel shows the longer 253 bps product and the bottom panel shows the shorter 66 bps product for the same samples matched by columns.
  • Fig. 10B shows the correlation of NGS-derived insert size ratio (log2 scale) to the PCR-derived ratio abundances of 253 bps product and 66 bps product.
  • Fig. 11 shows dot plots showing the relationship between insert size ratios and timepoint of sample collection for different clinical samples.
  • Timepoints 2 and 3 of sample collection correspond to 4 days and 7 days, respectively, from Timepoint 1 of sample collection.
  • Fig. 11A and 11B show the results for nasal swab samples and saliva samples.
  • Fig. llC shows the results for individual cases with different disease course (Case 1 was presymptomatic at timepoint 1; Cases 2 and 3 were asymptomatic throughout the disease course; Case 4 was symptomatic from timepoint 1).
  • the results show that insert size ratio generally correlates with the timepoint of sample collection in multiple sample types, and decreases with time (Fig. 11A and 11B). However, for individual cases with differing disease course, the insert size ratio changes distinctly over similar timeframe of sampling.
  • the inventors of the present disclosure have set out to provide an alternative method for detecting a virus in a biological sample.
  • the method provides information regarding the integrity of the viral genome and/or the infectivity of the virus in the biological sample, which is useful information for the purpose of controlling disease transmission.
  • the present invention provides a method of determining integrity of a viral genome or infectivity of a virus in a biological sample, said method comprising: a) extracting viral nucleic acid molecules from the biological sample to generate a nucleic acid library; b) providing a plurality of primer pairs wherein, (i) each primer pair comprises a forward primer and a reverse primer, wherein the forward primer comprises, from the 5’ end to the 3’ end, a first adapter sequence (AS1), and a forward target-specific sequence (TS1); wherein the reverse primer comprises, from the 5’ end to the 3’ end, a second adapter sequence (AS2), and a reverse target specific sequence (TS2); (ii) the first adapter sequence (AS1) of each forward primer is the same, the second adapter sequence (AS2) of each reverse primer is the same; (iii) each primer pair with a unique combination of forward and reverse target sequences (TS) is for amplifying a different target region of the viral genome;
  • integrated refers to the absence of nucleic acid damages which may hamper the replication of the genome or its normal functionality.
  • nucleic acid damages include but are not limited to: chemical addition or disruption to a base of the nucleic acid molecule (thus creating an abnormal nucleotide or nucleotide fragment), single strand breaks and double strand breaks.
  • the method as disclosed herein also allows the infectivity of the virus in the biological sample to be assessed.
  • infectious refers to the capacity of the virus to enter the host cell and exploit the resources of the host cell to replicate and produce progeny infectious viral particles, which may lead to infection and subsequent disease in the host. Damages in the viral genome will generally reduce the infectivity of the virus, as an intact viral genome is generally necessary for the virus to remain infectious.
  • the plurality of primer pairs comprises 50-700 primer pairs, 100- 400 primer pairs, 100-300 primer pairs, 125-250 primer pairs, or 150-200 primer pairs.
  • the first and second pluralities of primer pairs comprise a total of 100-1400 primer pairs, 200-800 primer pairs, 250-600 primer pairs, 250-450 primer pairs, or 300-400 primer pairs.
  • adapter sequence refers to any nucleotide sequence which can be added to an oligonucleotide of interest to prepare said oligonucleotide of interest for various purposes.
  • an adapter sequence allows for the amplification of the oligonucleotide of interest by a universal primer.
  • an adapter sequence allows for the sequencing of the oligonucleotide of interest. Sequencing platform specific adapter sequences are known in the art, and include, for example, the Illumina P5/P7 adapter sequences.
  • the length of the first adapter sequence (AS1) and the length of the second adapter sequence (AS2) are the same, while in some other examples, the length of AS1 and the length of AS2 are different.
  • the length of AS1 or AS2 is from 10 to 30 nucleotides, or from 17 to 28 nucleotides, or from 19 to 26 nucleotides, or from 19 to 26 nucleotides, or 15 nucleotides, or 16 nucleotides, or 17 nucleotides, or 18 nucleotides, or 19 nucleotides, or 20 nucleotides, or 21 nucleotides, or 22 nucleotides, or 23 nucleotides, or 24 nucleotides, or 25 nucleotides, or 26 nucleotides.
  • AS1 and the length of AS2 are the same, and are of 20 nucleotides long.
  • AS1 comprises the sequence of 5’-ACACGACGCTCTTCCGATCT-3’ (SEQ ID NO: 1)
  • AS2 comprises the sequence of 5’-GACGTGTGCTCTTCCGATCT-3’ (SEQ ID NO: 2).
  • target-specific sequence as used herein is well known in the art, and refers to the part of the primer that binds or anneals to the target region to be amplified.
  • the target-specific sequence is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 99%, or 100% complementary to the target region to be amplified.
  • a forward target-specific sequence binds or anneals to the 5’ end of the target region to be amplified, and the reverse target-specific sequence binds or anneals to the 3’ end of the target region to be amplified.
  • the length of the forward and/or reverse target-specific sequence is from 15 to 35 nucleotides, or from 15 to 33 nucleotides, or from 15 to 31 nucleotides, or from 17 to 28 nucleotides, or from 19 to 26 nucleotides, or from 19 to 26 nucleotides, or 18 nucleotides, or 19 nucleotides, or 20 nucleotides, or 21 nucleotides, or 22 nucleotides, or 23 nucleotides, or 24 nucleotides, or 25 nucleotides, or 26 nucleotides, or 27 nucleotides, or 28 nucleotides, or 29 nucleotides, or 30 nucleotides.
  • target region refers to a particular region of a nucleic acid molecule to be amplified.
  • the identity of the target region or a combination of target regions is unique for the virus to be detected, in order to guarantee the specificity of the detection of the virus.
  • the target region or combination of target regions for the detection of a particular virus should not be found in a genome or a transcriptome of another virus, and is further not found in a genome or a transcriptome of a human, and is further not found in a genome or a transcriptome of any microorganism.
  • the target regions amplified by the plurality of primer pairs are of the same length. In some examples, the target region amplified by one primer pair is of a different length as the target region amplified by another primer pair. In some examples, the target region amplified by each primer pair has a length between 50-1000 nucleotides, or a length between 100-800 nucleotides, or a length between 250- 650 nucleotides, or a length between 300-600 nucleotides.
  • multiplex polymerase chain reaction refers to the PCR that allows the simultaneous detection of multiple targets in a single reaction, with a different pair of primers for each target. This technique requires two or more probes that can be distinguished from each other and detected simultaneously. Methods and systems for carrying out multiplex PCR are well known in the art.
  • amplicon refers to a polynucleotide that is the source and/or product of amplification or replication events. It can be formed artificially, using various methods including polymerase chain reactions (PCR) or ligase chain reactions, or naturally through gene duplication.
  • PCR polymerase chain reactions
  • ligase chain reactions or naturally through gene duplication.
  • sequencing steps of the method of the invention can be carried out using any sequencing method known in the art, and/or on any standard sequencing platform, including but not limited to Illumina and Ion Torrent platforms.
  • reference genome also known as a reference assembly
  • reference genome refers to a digital nucleic acid sequence database, assembled as a representative example of the set of genes in one idealized individual organism of a species.
  • a reference genome of a virus provides a haploid mosaic of different nucleic acid sequences from each known variant of the virus.
  • Reference genomes of various organisms are publicly available.
  • the reference genome is a SARS-CoV-2 reference genome (NCBI Reference Sequence: NC_045512.2).
  • the biological sample is a body fluid sample.
  • body fluid refers to, but are not limited to, blood, plasma, serum, sputum, urine, feces, semen, mucus, lymph, saliva, or nasal lavage.
  • the biological sample is a nasopharyngeal swab sample, an oropharyngeal swab sample, a nasal swab sample, a saliva sample, a sputum sample, a viral culture sample, or mixtures thereof.
  • the biological sample is a nasopharyngeal swab sample, a nasal swab sample, or mixtures thereof.
  • the biological sample is an inactivated cultured viral isolate.
  • the method as described above further comprises the step of splitting the nucleic acid library from step a) into a first nucleic acid pool and a second nucleic acid pool.
  • splitting the nucleic acid library into the first nucleic acid pool and the second nucleic acid pool comprises randomly splitting the nucleic acid library.
  • the number of nucleic acid molecules in the first nucleic acid pool and the number of nucleic acid molecules in the second nucleic acid pool are the same.
  • the number of nucleic acid molecules in the first nucleic acid pool and the number of nucleic acid molecules in the second nucleic acid pool are different.
  • the first nucleic acid pool contains more nucleic acid molecules than the second nucleic acid pool, while in some other examples, the second nucleic acid pool contains more nucleic acid molecules than the first nucleic acid pool.
  • primers are designed for each of the two pools of nucleic acid molecules, and each of the two pools of nucleic acid molecules will be subjected to multiplex PCR. Doing so will prevent the formation of very short undesired PCR products between adjacent primer pairs, and allow reconstruction of the whole viral genome from sequencing.
  • the plurality of primer pairs comprises a first plurality of primer pairs and a second plurality of primer pairs, wherein each forward primer in the first and second plurality of primer pairs further comprises a barcode sequence (BS), and wherein the barcode sequence (BS) of each forward primer is different.
  • the first nucleic acid pool is subjected to multiplex PCR using the first plurality of primers to amplify a first plurality of amplicons
  • the second nucleic acid pool is subjected to multiplex PCR using the second group of primer pairs to amplify a second plurality of amplicons.
  • each of the first and/or second plurality of primer pairs comprises 50-700 primer pairs, 100-400 primer pairs, 100-300 primer pairs, 125-250 primer pairs, or 150- 200 primer pairs.
  • the first and second pluralities of primer pairs comprise a total of 100-1400 primer pairs, 200-800 primer pairs, 250-600 primer pairs, 250-450 primer pairs, or 300-400 primer pairs.
  • the term “barcode sequence” refers to an encoded molecule or barcode that includes variable amount of information within the nucleic acid sequence.
  • the barcode sequence is a tag that can be read out using any of a variety of sequence identification techniques, for example, nucleic acid sequencing, probe hybridization based assay, and the like.
  • the barcode sequence is used in the method as described herein to append different target specific sequences, such that when the target specific sequence anneal to the target region, each different target region would then have a unique barcode sequence that is attached to it and read out with the sequence of the target region from that sample.
  • the barcode sequence allows the pooled analysis of multiple unique target regions, where the resulting sequence information from the pool can be later attributed back to each starting target region. That is, after the process of amplification, the barcode sequence is used to group amplicons to form a family of amplicons having the same barcode sequence. In some examples, the barcode sequence is an overhang that does not complement any sequence within the target region.
  • the barcode sequence is an oligonucleotide comprising 10 to 16 random nucleotides, or 10 to 15 random nucleotides, or 10 to 13 random nucleotides, or 10 random nucleotides, or 11 random nucleotides, or 12 random nucleotides, or 13 random nucleotides, or 14 random nucleotides, or 15 random nucleotides, or 16 random nucleotides.
  • the barcode sequence is an oligonucleotide comprising 10 random nucleotides which can be represented as NNNNNNNN (SEQ ID NO: 3), wherein N is any nucleotide.
  • the target regions to be amplified by the first plurality of primer pairs are non-overlapping and are separated by gaps of nucleotides, and the target regions to be amplified by the second plurality of primer pairs are non-overlapping, and each of the gaps of nucleotides is comprised within one of the target regions to be amplified by the second plurality of primer pairs.
  • the target regions to be amplified by the first and second plurality of primer pairs span the viral genome.
  • the target regions to be amplified by the first plurality of primer pairs have the same or different length
  • the target regions to be amplified by the second plurality of primer pairs have the same or different length.
  • the target region amplified by each primer pair has a length between 50-1000 nucleotides, or a length between 100-800 nucleotides, or a length between 250- 650 nucleotides, or a length between 300-600 nucleotides.
  • the length of each of the gaps of nucleotides separating the target regions to be amplified by the first plurality of primer pairs is from 2 to 30 nucleotides, or from 4 to 28 nucleotides, or from 6 to 26 nucleotides, or from 8 to 24 nucleotides, or from 10 to 22 nucleotides, or from 12 to 20 nucleotides, or from 14 to 18 nucleotides, or 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 nucleotides.
  • the ranges of insert sizes of the one or more groups are determined based on the insert sizes observed in step (f), or expected insert sizes, or both.
  • the expected insert sizes are determined by: a) providing multiplex PCR sequencing data that has been obtained from a viral genome of the same virus species; b) processing the sequencing data to identify one or more pairs of predetermined primers, wherein each pair of predetermined primers comprises a forward primer and a reverse primer that flank an amplicon in the 5’to 3’direction; c) trimming primer specific sequences from each amplicon to obtain the sequence of the insert; d) aligning the insert sequence to a reference genome; e) removing inserts with supplementary alignment from the sequencing data, wherein inserts with supplementary alignment are inserts that do not align in contiguity with the reference genome; f) determining the insert size of each remaining insert from step e); g) retaining inserts from step f) having an insert size of between 70 base pairs (bps) to 1000 bps to obtain the expected insert sizes.
  • the term “insert with supplementary alignment” refers to an insert that does not align contiguously with a reference genome.
  • reads for which a completely linear alignment may not happen also described as chimeric reads.
  • Part of the read may align to a portion of the genome not contiguous with the region to which the other part is aligning. This may happen when a structural rearrangement brings ordinarily distant portions of a chromosome together.
  • the supplementary part of the alignment is the portion of the sequencing read that does not align in contiguity with the rest of the read in alignment with the reference genome.
  • insert sizes of reads with supplementary alignment are calculated as the starting and ending position of the linear and supplementary part of the reads (with respect to the reference genome) and insert sizes can be inflated due to the mapping positions being distant for the linear and supplementary portions of the read. These may be filtered out so as not to skew the true insert size distribution.
  • the insert size information may be obtained from sequence data using bioinformatics tools/programs.
  • the insert size data may be extracted from sequence data using Samtools, available at http://www.htslib.org.
  • the range of insert sizes of each group can be determined based on the results of observed insert size distribution, or expected insert size distribution, or both.
  • the insert size distribution results can be shown as a plot of (the number of insert counts) vs (insert length), as exemplified in Fig. 8. From the results of insert size distribution, it can be seen that ranges of insert sizes are clustered together. For example, in Fig. 8A, the insert sizes fall into three ranges visually classified as 70-150 bps, 220-350 bps, and 400-520 bps.
  • each group comprises amplicons formed between the forward and reverse primers of a primer pair.
  • each group comprises amplicons formed between the forward primer of a primer pair and the reserve primer of another primer pair.
  • amplicons can be formed between forward primer F and reverse primer R, or between forward primer F and reverse primer R+l, or between forward primer F and reverse primer R+2, or between forward primer F and reverse primer R+3, and so forth.
  • the range of insert sizes of the groups is about 50 to about 100 bps, or about 100 to about 150 bps, or about 150 to about 200 bps, or about 200 to about 250 bps, or about 250 to about 300 bps, or about 300 to about 350 bps, or about 350 to about 400 bps, or about 400 to about 450 bps, or about 450 to about 500 bps, or about 500 to about 550 bps, or about 550 to about 600 bps, or about 600 to about 650 bps, or about 650 to about 700 bps, or about 700 to about 750 bps, or about 750 to about 800 bps, or about 800 to about 850 bps, or about 850 to about 900 bps, or about 900 to about 950 bps, or about 950 to about 1000 bps.
  • the range of insert sizes of the groups is about 0 to about 150 bps, or about 70 to about 150 bps, or about 150 to about 350 bps, or about 220 to about 350 bps, or about 400
  • insert size ratio represents the ratio of the abundance of longer template molecules to shorter template molecules in a PCR reaction, in particular a multiplexed PCR reaction.
  • the extent to which longer amplicons (and hence inserts) form is a function of the abundance of longer template molecules available in the multiplex PCR reaction and the degree to which template molecules are degraded in a sample.
  • the “insert size ratio” indirectly correlates with infectivity of a virus since infectivity requires the presence of intact viral particles represented by larger, intact viral genomes.
  • VFS Viral Fragmentation Score
  • VFS long viral nucleic acid fragment count/ short viral nucleic acid fragment count.
  • VFS is positively correlated with viral load in the sample, intactness of the viral genome in the sample, and/or the insert size ratio of the sample as defined above.
  • a number of insert size ratios can be obtained from a multiplex PCR from a single biological sample.
  • a single insert size ratio can be obtained between the first group and the second group.
  • the amplicons are categorized into four groups in step g) of the method as described above, six insert size ratios can be obtained, namely, the insert size ratios between the first group and the second group, between the first and the third group, between the first and the fourth group, between the second and the third group, between the second and the fourth group, and between the third and the fourth group.
  • the number of groups is n (where n>2)
  • N the number of insert size ratios is N
  • the insert size ratio that indicates the integrity of the viral genome and/or the infectivity of the virus is the number of amplicons in the group with a range of insert sizes that is the second smallest, to the number of amplicons in the group with a range of insert sizes that is the smallest. In some other examples, the insert size ratio that indicates the integrity of the viral genome and/or the infectivity of the virus is the number of amplicons in the group with a range of insert sizes that is the largest, to the number of amplicons in the group with a range of insert sizes that is the second largest.
  • the insert size ratio that indicates the integrity of the viral genome and/or the infectivity of the virus is the number of amplicons in the group with a range of insert sizes that is the largest, to the number of amplicons in the group with a range of insert sizes that is the smallest.
  • the range of insert sizes that is the largest is about 300 to about 350 bps, or about 350 to about 400 bps, or about 400 to about 450 bps, or about 450 to about 500 bps, or about 500 to about 550 bps, or about 550 to about 600 bps, or about 600 to about 650 bps, or about 650 to about 700 bps, or about 700 to about 750 bps, or about 750 to about 800 bps, or about 800 to about 850 bps, or about 850 to about 900 bps, or about 900 to about 950 bps, or about 950 to about 1000 bps.
  • the range of insert sizes that is the largest is about 400 to about 520 bps.
  • the range of insert sizes that is the second smallest is about 150 to about 200 bps, or about 200 to about 250 bps, or about 250 to about 300 bps, or about 300 to about 350 bps, or about 350 to about 400 bps, or about 400 to about 450 bps, or about 450 to about 500 bps. In one specific example, the range of insert sizes that is the second smallest is about 151 to 350 bps. In some examples, the range of insert sizes that is the smallest is about 0 to about 50 bps, or about 50 to about 100 bps, or about 100 to about 150 bps, or about 150 to about 200 bps, or about 200 to about 250 bps, or about 250 to about 300 bps.
  • the range of insert sizes that is the smallest is about 0 to 150 bps. In one specific example, the range of insert sizes that is the second smallest is about 151 to about 350 bps, and the range of insert sizes that is the smallest is about 0 to about 150 bps.
  • an insert size ratio of 15. 15.5. 16. 16.5. 17. 17.5, 18, 18.5, 19, 19.5, 20, 50, 100, 500, or more is regarded as a “large” insert size ratio, which indicates that the viral genome has high integrity, and/or that the virus has high infectivity.
  • an insert size ratio of more than about 1.5 is regarded as a “large” insert size ratio, which indicates that the viral genome has high integrity, and/or that the virus has high infectivity.
  • an insert size ratio of less than about 0.5 is regarded as a “small” insert size ratio, which indicates that the viral genome has low integrity, and/or that the virus has low infectivity.
  • an insert size ratio of about 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5 is regarded as a “medium” insert size ratio, which indicates that the viral genome has medium integrity, and/or that the virus has medium infectivity.
  • an insert size ratio of about 0.5 to 1.5 is regarded as a “medium” insert size ratio, which indicates that the viral genome has medium integrity, and/or that the virus has medium infectivity.
  • the viral genome is a viral DNA genome, and the virus is an DNA virus.
  • the viral genome is a viral RNA genome, and the virus is an RNA virus.
  • step a) further comprises reverse-transcribing RNA molecules extracted from the biological sample to generate the nucleic acid library, and wherein the nucleic acid library is a cDNA (complementary DNA) library.
  • reverse transcription and its grammatical variants as used herein refers to the enzyme-mediated synthesis of a DNA molecule from an RNA template.
  • the resulting DNA known as complementary DNA (cDNA)
  • cDNA complementary DNA
  • Methods of reverse transcription which typically involve the use of non-target specific primers (random hexanucleotide primers or hexamers in short), are well known in the art.
  • cDNA library refers to a combination of cloned cDNA fragments inserted into a collection of host cells, which constitute some portion of the transcriptome of the organism and are stored as a "library”. cDNA is produced from fully transcribed mRNA and therefore contains only the expressed genes of an organism.
  • the cDNA library obtained is purified using methods known in the art (including, for example, column purification and gel purification methods), in order to remove, for example, excess primers.
  • the cDNA library is purified to retain all amplicons.
  • the cDNA library is purified to retain amplicons that are more than about 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190 or 200 bps in size.
  • the cDNA library is purified to retain amplicons that are more than about 100 bps in size.
  • the viral RNA genome is from a virus selected from the group consisting of: Lymphocytic choriomeningitis virus, Coronavirus, human immunodeficiency virus (HIV), Human metapneumo virus, Poliovirus, Rhinovirus, Hepatitis A, Norwalk virus, Yellow fever virus, West Nile virus, Hepatitis C virus, Dengue fever virus, Enterovirus, Zika virus, Rubella virus, Ross River virus, Sindbis virus, Chikungunya virus, Borna disease virus, Ebola virus, Marburg virus, Measles virus, Mumps virus, Nipah virus, Hendra virus, Newcastle disease virus, Human respiratory syncytial virus, Rabies virus, Lassa virus, Hantavirus, Crimean-Congo hemorrhagic fever virus, Human parainfluenza viruses 1-4, Influenza virus, and Hepatitis D virus.
  • HIV human immunodeficiency virus
  • HIV Human metapneumo virus
  • Poliovirus Poliovirus
  • Rhinovirus
  • coronavirus examples include but are not limited to: Severe acute respiratory syndrome virus (SARS-CoV), Severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1) and Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
  • SARS-CoV Severe acute respiratory syndrome virus
  • SARS-CoV-1 Severe acute respiratory syndrome coronavirus 1
  • SARS-CoV-2 Severe acute respiratory syndrome coronavirus 2
  • the RNA virus is selected from the group consisting of: coronaviridae, flaviviridae (such as hepacivirus and zika virus) and retroviridae (such as lentivirus).
  • the viral genome is from a virus that causes respiratory tract infection.
  • the viral genome is from a virus selected from the group consisting of: influenza virus, dengue virus, and coronavirus.
  • the viral genome is a viral DNA genome from a DNA virus.
  • DNA virus include but are not limited to: bocavirus, Epstein-Barr virus, Hepatitis B virus, smallpox virus, adenovirus and papillomavirus.
  • the method as described herein can also be used to compare the viral genome integrity and/or virus infectivity in different biological samples.
  • the insert size ratio(s) of the biological sample obtained in step i) of the method can be compared to the insert size ratio(s) obtained from a reference sample. Larger insert size ratio(s) relative to the reference sample indicates that the viral genome in the biological sample has a higher integrity or that the virus in the biological sample has a higher infectivity as compared to the reference sample, and/or wherein smaller insert size ratio(s) relative to the reference sample indicates that the viral genome in the biological sample has a lower integrity or that the virus in the biological sample has a lower infectivity as compared to the reference sample.
  • the insert size ratio(s) to be compared between the biological sample and the reference sample are calculated based on the same ranges of insert sizes.
  • the reference sample is a body fluid sample.
  • the biological sample is a nasopharyngeal swab sample, an oropharyngeal swab sample, a saliva sample, a sputum sample, a viral culture sample, or mixtures thereof.
  • the biological sample is a nasopharyngeal swab sample, a nasal swab sample, or mixtures thereof.
  • the biological sample is an inactivated cultured viral isolate.
  • the biological sample and the reference sample are the same type of samples. In some other examples, the biological sample and the reference sample are different types of samples. In some examples, the biological sample is obtained from a subject, and the reference sample is the same type of sample obtained from the same subject. In some examples, the biological sample is obtained from a subject, and the reference sample is the same type of sample obtained from a different subject. In some examples, the biological sample is obtained from a subject, and the reference sample is a different type of sample obtained from the same subject. In some examples, the biological sample is obtained from a subject, and the reference sample is a different type of sample obtained from a different subject.
  • the biological sample is obtained from a subject, and the reference sample is obtained from the same subject at one or more different time points.
  • the one or more different time points is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 days after the original biological sample is obtained.
  • the reference sample is a viral culture sample.
  • the reference sample is an inactivated cultured viral isolate.
  • the insert size ratio of a biological sample obtained from a subject indicates whether the subject is asymptomatic, pre-symptomatic or symptomatic.
  • an insert size ratio of more than about 1.5 indicates that the subject is symptomatic. In other examples, an insert size ratio of about 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1, 0.05, 0.01, or less, indicates that the subject is asymptomatic. In one specific example, an insert size ratio of less than about 0.5 indicates that the subject is asymptomatic. In some examples, an insert size ratio of about 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5 indicates that the subject is pre-symptomatic. In some specific examples, an insert size ratio of about 0.5 to 1.5 indicates that the subject is pre-symptomatic.
  • the change in insert size ratios of biological samples obtained from a subject over two or more time points is indicative of the course of a viral infection in the subject.
  • an increase in insert size ratios of biological samples obtained from a subject over two or more time points indicates that the integrity of the viral genome is increasing and/or that the infectivity of the virus is increasing.
  • a decrease in insert size ratios of biological samples obtained from a subject over two or more time points indicates that the integrity of the viral genome is decreasing and/or that the infectivity of the virus is creasing.
  • the method as described above uses next-generation sequencing (NGS) for the simultaneous analysis of the plurality of viral nucleic acid molecules in the biological sample.
  • NGS next-generation sequencing
  • the inventors of the present disclosure have also found that results from the NGS method can be validated using real-time PCR assays.
  • the method as described above further comprises the following steps: a) selecting a first amplicon having a first insert, selecting a second amplicon having a second insert, wherein the first insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the smallest, and wherein the second insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the second smallest; b) providing a first pair of primers comprising a forward primer and a reverse primer, wherein the forward primer and reverse primer are designed for amplifying the first insert; c) providing a second pair of primers comprising a forward primer and a reverse primer, wherein the forward primer and reverse primer are designed for amplifying the second insert; d) providing viral nucleic acid molecules extracted from the biological sample; e) subjecting the viral nucleic acid molecules to amplification using the first pair of primers and the second pair
  • real-time PCR is also known as quantitative polymerase chain reaction (qPCR). It monitors the amplification of a targeted nucleic acid molecule during the PCR in real time.
  • Real-time PCR is characterized by the point in time (or PCR cycle) where the target amplification is first detected. This value is usually referred to as cycle threshold (Ct), the time at which fluorescence intensity is greater than background fluorescence. Consequently, the greater the quantity of target nucleic acid in the sample, the faster a significant increase in fluorescent signal will appear, yielding a lower Ct value.
  • Ct cycle threshold
  • enumerating the number of the first amplicons and the number of the second amplicons in step f) above comprises obtaining the Ct values for the first and second amplicons.
  • the Ct value for the first amplicon can be denoted as Cti
  • the Ct value for the second amplicon can be denoted as Ct2.
  • the term “intactness ratio” as used herein represents the inferred intactness of target template molecules in an real-time PCR assay and is based on the relative abundance of two amplicons with different insert sizes. The “intactness ratio” is therefore related to the “insert size ratio”. The “intactness ratio” can also be used to infer the integrity of a nucleic acid template or nucleic acid starting material, or a viral genome. Based on the values of Cti and Ct2, a person skilled in the art can calculate the intactness ratio as defined above using available knowledge in the art. In some examples, the intactness ratio of the number of the second amplicons to the number of the first amplicons is calculated using the following formula:
  • the “constant” is an optional scaling factor, a positive value, added to ( Cti-Ct2 ) to convert all values to a positive scale.
  • Cti will typically be numerically smaller than Ct2, resulting in negative values most of the time (see for example, Fig. 6 of VFI).
  • the “constant” is empirically determined and is added to the difference to scale all values to be positive.
  • an intactness size ratio of more than about 1.5 is regarded as a “large” intactness size ratio, which indicates that the viral genome has high integrity, and/or that the virus has high infectivity.
  • an intactness ratio of less than about 0.5 is regarded as a “small intactness ratio, which indicates that the viral genome has low integrity, and/or that the virus has low infectivity.
  • an intactness ratio of about 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5 is regarded as a “medium” intactness ratio, which indicates that the viral genome has medium integrity, and/or that the virus has medium infectivity.
  • an intactness ratio of about 0.5 to 1.5 is regarded as a “medium” intactness ratio, which indicates that the viral genome has medium integrity, and/or that the virus has medium infectivity.
  • the first amplicon has a size of between 20 to 30 bps, or between 30 to 40 bps, or between 40 to 50 bps, or between 50 to 60 bps, or between 60 to 70 bps, or between 70 to 80 bps, or between 80 to 90 bps, or between 90 to 100 bps, or between 100 to 110 bps, or between 110 to 120 bps, or between 120 to 130 bps, or between 130 to 140 bps, or between 140 to 150 bps, or between 150 to 160 bps, or between 160 to 170 bps, or between 170 to 180 bps, or between 180 to 190 bps, or between 190 to 200 bps.
  • the first amplicon has a size of between 65 to 75 bps.
  • the second amplicon has a size of between 200 to 210 bps, or between 210 to 220 bps, or between 220 to 230 bps, or between 230 to 240 bps, or between 240 to 250 bps, or between 250 to 260 bps, or between 260 to 270 bps, or between 270 to 280 bps, or between 280 to 290 bps, or between 290 to 300 bps, or between 300 to 310 bps, or between 310 to 320 bps, or between 320 to 330 bps, or between 330 to 340 bps, or between 340 to 350 bps, or between 350 to 360 bps, or between 360 to 370 bps, or between 370 to 380 bps, or between 380 to 390 bps, or between 390 to 400 bps, or between 400 to 410 bps, or between 410 to 420 bps, or between 420 to 430 bps, or between 430 to 440 bps, or between 440 to 450 bps, or between 400 to 410
  • step d) above further comprises reverse-transcribing the RNA molecules extracted from the biological sample to generate cDNA prior to amplification.
  • the real-time PCR assay as mentioned above can also be used as a stand-alone assay to determine the integrity of a viral genome or infectivity of a virus in a biological sample.
  • the first and second amplicons to be used for the real time PCR assay will be selected based on the results of the next-generation sequencing (NGS) method of the first aspect.
  • NGS next-generation sequencing
  • the present disclosure provides a method of determining integrity of a viral genome or infectivity of a virus in a biological sample, said method comprises: a) extracting viral nucleic acid molecules from the biological sample; b) selecting a first amplicon having a first insert, selecting a second amplicon having a second insert, wherein the first insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the smallest, and wherein the second insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the second smallest; c) providing a first pair of primers comprising a forward primer and a reverse primer, wherein the forward primer and reverse primer are designed for amplifying the first insert; d) providing a second pair of primers comprising a forward primer and a reverse primer, wherein the forward primer and reverse primer are designed for amplifying the second insert; e) subject
  • kits for use in the detection method as described herein comprising: a) a first pair of primers comprising a forward primer and a reverse primer designed for amplifying a first amplicon having a first insert, wherein the first insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the smallest as determined by the method as described herein; b) a second pair of primers comprising a forward primer and a reverse primer designed for amplifying a second amplicon having a second insert, wherein the second insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the second smallest as determined by the method as described herein; c) a first probe to detect the first amplicon; and d
  • the first pair of primers comprise the sequences: GAT AGGTT GATC AC AGGC AGAC (S_gene_5d_fwd) (SEQ ID NO: 4) and AGTTCTTTTCTTGTGCAGGGAC (S_gene_5d_rev) (SEQ ID NO: 5); wherein the second pair of primers comprise the sequences: GCGCATTGGCATGGAAGTC (Setl_N_3_fwd) (SEQ ID NO: 6) and GTCATCCAATTTGATGGCACCTG (Setl_N_3_rev) (SEQ ID NO: 7); wherein the first probe comprises the sequence CCCTCAGTCAGCACCTCATGGTGT (S_gene_5_probe) (SEQ ID NO: 8); and wherein the second probe comprises the sequence CCTTCGGGAACGTGGTTGACCT AC (Setl_N_3_probe) (SEQ ID NO: 9).
  • the first primer pair is used for amplifying the longer
  • the kit further comprises reagents useful for reverse-transcription. In some specific examples, the kit further comprises reagents useful for polymerase chain reaction (PCR). In some specific examples, the kit further comprises reagents useful for real time polymerase chain reaction (PCR).
  • PCR polymerase chain reaction
  • the kit further comprises instructions for use.
  • range format may be disclosed in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosed ranges. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
  • Samples used in this study were collected as part of a previous study conducted in a large cohort of migrant workers in Singapore, and evaluated and approved by the Director of Medical Services, Ministry of Health, under Singapore’s Infectious Disease Act.
  • the samples included 37 nasopharyngeal (NP) swabs collected in 3 mL of viral transport medium, 60 self-nasal swabs, and 58 oropharyngeal saliva samples collected in 2 mL of viral RNA stabilization fluid (SAFERTM VTM, Lucence, Singapore).
  • NP nasopharyngeal
  • SAFERTM VTM viral RNA stabilization fluid
  • Viral nucleic acid is extracted from the specimens (200 or 400 m ⁇ ) using Viral Nucleic Acid Extraction Kit II (Geneaid Biotech Ltd., Taiwan) or QIAsymphony DSP Virus/Pathogen Kit (QIAGEN).
  • RNA extracted nucleic acid
  • RT-PCR real-time reverse transcription-polymerase chain reaction
  • CDC Centers for Disease Control and Prevention
  • Samples were processed in a College of American Pathologists (CAP) accredited and a Clinical Laboratory Improvement Amendments (CLIA) licensed laboratory (Lucence).
  • Lower limit of detection (sensitivity) of the NGS method was determined using synthetic SARS-CoV-2 RNA spiked into pooled clinical matrix that was tested negative for the virus by a RT-PCR method.
  • synthetic SARS-CoV-2 RNA genomes (Twist Bioscience, San Francisco, USA) of known concentrations were spiked in to pooled negative clinical matrix (nasopharyngeal swab specimens) at known copy numbers prior to RNA extraction.
  • the synthetic controls corresponded to these GenBank IDs of various published SARS-CoV-2 genome isolates.
  • MT007544.1 (SKU: 102019), MN908947.3 (SKU: 102024), LC528232.1 (SKU: 102860), MT106054.1 (SKU: 102862), MT188340.1 (SKU: 102917), MT118835.1 (SKU: 102918).
  • the threshold for detection of SARS-CoV-2 was determined by comparing the genome coverage (%) resulting from confirmed negative samples (by RT-PCR), no-template controls, and known positive samples (by RT-PCR).
  • a multiplex amplicon-based next-generation sequencing (NGS) platform was developed to sequence the entire SARS-CoV-2 genome.
  • Primers for 327 amplicons were designed to span the entire SARS-CoV-2 genome in tiled configuration, and alternately assigned to two separate primer pools (Pool 1 and Pool 2) to allow amplification of the whole genome while minimizing formation of short overlapping amplicons.
  • Five primer pairs designed to target five different human housekeeping genes (TBP, MYC, LRP1, ITGB7, and HMBS) are used as control in both pools.
  • Each forward primer additionally includes on the 5’ end, a random 10 nucleotide sequence to serve as molecular barcode.
  • Each designed primer was checked against published SARS CoV-2 genomes as of 15 April 2020, and base degeneracy was incorporated when required to achieve coverage of >99% of Asian, USA, Europe and Chinese genomes published.
  • RNA control comprises a nucleotide sequence which is not found in a genome or a transcriptome of the virus, and is further not found in a genome or a transcriptome of a human, and is further not found in a genome or a transcriptome of a microorganism.
  • Each cDNA was split into two reactions for target capture and enrichment of Pool 1 and Pool 2 using Platinum SuperFi II DNA Polymerase (Invitrogen, USA) under the following thermocycling conditions: Denaturation at 980C for 30s, followed by 5 to 15 cycles (5 cycles for samples with Ct ⁇ 15, 10 cycles for samples with Ct 15-30, 15 cycles for samples with Ct >30) of 98°C for 1 min, 60°C for 1 min, and 72°C for 1 min, with final extension at 72°C for 5 min.
  • excess primers were removed by purification two times with 1.5x AMPure XP beads (Beckman Coulter, US A).
  • purified products were subjected to a final PCR to amplify targets and to complete the library with indexed sequencing adaptors for sequencing on the Ihumina platform.
  • purified product was amplified with indexed P5 adapter sequence and indexed P7 adapter sequence using KAPA HiFi HotStart ReadyMix under the following thermocycling conditions: Denaturation at 98°C for 45 s, followed by 18 cycles of 98 °C for 15 s, 60°C for 30 s, and 72°C for 30 s, with a final extension at 72°C for 1 min.
  • the amplified library was purified twice with 0.7x AMPure XP beads to remove non specific products.
  • the quality and quantity of the sequencing library was assessed using the 4200 Tapestation system (Agilent Technologies, USA) and KAPA Library Quantification Kit for Ihumina® Platforms (Kapa Biosystems Inc., USA) respectively. Paired-end sequencing (2x151 bps) of the final dual-indexed libraries was performed on the Ihumina platform as per manufacturer’s instructions.
  • FASTQ files were processed using a custom pipeline. First, expected amplicons were identified and labeled in the FASTQ files based on the expected primer sequences in Read 1 and paired Read 2. Primer sequences and upstream molecular barcode sequences were trimmed using cutadapt, primer trimmed sequences were mapped to the reference genome SARS-CoV-2 reference genome (NCBI Reference Sequence: NC_045512.2) using bwamem. Molecular tag (or barcode) sequences were included in the trimmed “primer” sequences of read 1, and can be extracted given the unique structure of primer sequences in read 1.
  • Barcode sequences were clustered based on sequence and amplicon identity, and consensus calling was done for each molecular tag (or barcode) cluster, by first performing global alignment among all associated reads using MAFFT.
  • the consensus base in each aligned position was called by determining the majority representative base type, the percentage of which should be no less than an automatically determined threshold.
  • the threshold is a function of the total number of reads for that barcode sequence. If no representative base could be called, the position was assigned N (as opposed to one of A, C, T, G).
  • An overall quality score of either 90th percentile of all the quality values from the representative base type in that position (if a consensus base is found), or 10th percentile of all quality values in that position (if no consensus bases is found) is assigned.
  • the consensus reads are then written to a new FASTQ file.
  • Variant calling subgenomic RNA analysis and phylogenetic analysis
  • consensus FASTQ files from two pool 1 and pool 2 were merged to create a single FASTQ file.
  • Consensus FASTQ reads are mapped to the SARS-CoV-2 reference genome (NCBI Reference Sequence: NC_045512.2) using bwa-mem.
  • Samtools was used to calculate depth per base and coverage metrics.
  • Variant calling was performed on consensus BAM files using snippy and freebayes. For variants to be called, a minimum of lOx coverage is required. With molecular barcoding, the sequencing is error-free and can enable detection of quasi-species and increase confidence of variant calls due to the high quality of sequencing data.
  • sgRNA subgenomic RNA
  • each sgRNA was calculated as the number of split reads supporting sgRNA/number all reads supporting the location spanning the junction 3’ to the junction.
  • the sgRNA split read count was normalized to the mean depth of coverage for the sample.
  • the sum of all sgRNA read counts was normalized to the mean depth of coverage for the sample.
  • FASTA file of the assembled genome are used to construct phylogenetic trees. The phylogenetic tree building process and parameters follows the NextS train team’s repository at https://github.eoTn/nextstrain/ncov ⁇
  • an insert size range of 570-700 bps can be discerned and is suggestive of a limited amount of still longer template molecules being available for long insert formation by distant (3-displaced) subsequent primers.
  • the three major ranges represent inserts formed from the generalizable F_R (intended), F_R+1, F_R+2 primer combinations. It is also noted that even as the total number of inserts captured changes (overall abundance of template molecules of all starting sizes), the two ranges that are most captured and subject to the greater variance are F_R and F_R+1. In other words, inserts in ranges of F_R+2, F_R+3 and upwards are of relatively diminishing abundance among total inserts formed (Fig. 8C and 8D).
  • the insert size information was further formally quantified to determine the patterns and degrees of intactness (conversely, degradation) of a viral RNA template in clinical samples.
  • the insert size ranges formed from the F_R combination and F_R+1 combination of primers dominate among insert sizes due to 1) scarcer availability of ultra-long template molecules (capturable by F_R+2, F_R+3.), and 2) competition for resources in PCR which makes shorter inserts reliably more detectable in multiplex PCR-based NGS assays. Nonetheless, despite the anticipated insert size distribution stemming from this rationalization, it was clear that not all clinical samples followed the expected pattern of insert size abundance of F_R>>F_R+1 (see Fig. 8B).
  • insert size ratio defined as the (number of longer inserts)/(number of shorter inserts), for example, (number of inserts of size 151-350 bps)/(number of inserts of size 0-150 bps). Examples of the insert size ratio values are shown in Fig. 9.
  • insert size ratio is correlated with infectivity, as the insert size ratios generally show a decreasing trend over sampling timepoint across cases (timepoint 1 is the first day of sampling, where timepoint 1 is the first day of onset of symptoms for symptomatic cases; timepoints 2 and 3 correspond to 4 days and 7 days after timepoint l).
  • the period of infectiousness/infectivity (defined as the time interval during which SARS-CoV-2 may be transferred from an infected person to another person), based on viable virus culture from clinical specimens, has been identified as six days before, to nine days after, the first evidence of typical symptoms (for asymptomatic cases this infectivity period is not defined).
  • the insert size ratio derived from multiplexed PCR- based assay is identified as a measure of the integrity or intactness of viral RNA template, which could reflect the infectivity potential of the virus in a particular sample.
  • a multiplex three-target RT-PCR assay was designed for the simultaneous detection of two short targets of 70 bps each in the N gene region (in two separate detection channels), and one long amplicon of 240 bps in the S gene of the SARS-CoV-2 genome.
  • An internal housekeeping control target was included as fourth target in the multiplex reaction. Briefly, extracted RNA (5 m ⁇ ) was mixed with the primers/probes for the multiplexed targets with Luna® Universal Probe One-Step RT-qPCR Kit (New England Biolabs, USA) and RT-PCR was performed using recommended protocol for the Luna master mix on a BioRad CFX96. Cycle threshold (Ct) values for each target were collected from respective channels and compared.
  • RT-PCR assays Two separate one-step RT-PCR assays were designed, one for an amplicon of size 253 bps (insert size 206 bps) and another for an amplicon size 66 bps (insert size 18 bps) for SARS-CoV-2 genome in the S gene.
  • the assay was run separately (not in multiplex) and products of amplification were analyzed by Agilent Tapestation. Product amounts were quantified by the region molarity of the expected product sizes. Ratio of product amounts (253 bp/66 bps) for each sample was calculated based on the region molarities corresponding to each product formed in the sample. The results are shown in Fig. 10.
  • SARS-CoV-2 RNA fragments of differing lengths are captured by NGS
  • VFS viral fragmentation score
  • Synthetic RNA mimics SARS- CoV-2 genome but is composed of 6 fragments of 5000 bps each representing the genome, which was considered as a relatively intact starting material.
  • the Ct values measured by the same CDC RTPCR assay were on average 24.4, 27.97, 31.73 and 33.45, respectively. Since the RNA in the serial dilutions are similarly intact, being derived from the same stock of synthetic RNA, the increase in Ct values are attributable only to the total copy numbers of RNA detectable by the RT-PCR assay.
  • VFS for the serial dilutions of synthetic RNA do not show the same degree of decrease as the trend for increasing Ct values, and ranged with a median: 0.384 (IQR: 0.289-0.449) across all 4 serial dilutions (Fig. 2C, Fig. 3).
  • This experiment shows that factors besides total viral RNA amount measured by a Ct value (with short PCR amplicons) determine the VFS, which depends on the presence of long fragments, more of which is likely to be present in a sample with more intact viral RNA.
  • VFS was related to other clinical indications of infectivity. This is particularly relevant as it is well-recognised that RT-PCR positivity alone, particularly from a test designed for short targets, does not translate to viable virus with infection potential.
  • Fig. 2C Ct value
  • VFS 3.36 of saliva sample collected on the day of diagnosis was matched with URT samples (NP and nasal swabs) from the same case with VFS of 0.79 and 0.14, suggesting a less fragmented SARS-CoV-2 RNA in saliva even at the same timepoint.
  • VFS differed significantly between samples collected ⁇ 8 days from symptoms and samples collected after 8 days from symptoms, for cases with information on symptom onset (Fig. 4D, left), but it was difficult to find a time measure that trended with VFS for asymptomatic cases (Fig. 4D, right).
  • VFS were tracked with respect to time from symptom onset, and were generally observed to be higher early between days -1 to 6, and generally decreased with time, reaching levels similar to those seen in asymptomatic cases through the course of the asymptomatic infection (Fig. 4E).
  • Fig. 5A No linear correlation was observed between increasing VFS and total sgRNA abundance (Fig.5A), but there was a significant difference in VFS of samples with any sgRNA compared to those with none, showing a dichotomous distribution (p ⁇ 0.0001) (Fig. 5B). The dichotomy of distribution was related to the VFS cut-off of 0.382 determined earlier.
  • VFS can be translated into a simple multiplexed RT-PCR assay deliberately composed of a long amplicon of 240 bps and short amplicon of 70 bps to detect the long and short fragments observed in clinical samples using NGS.
  • VFI viral fragmentation index
  • the respiratory tract represents the major area of viral shedding, and shedding kinetics have been reported to be variable in different studies, ranging from 8-12 days of virus positivity after the clearance of symptoms in some studies, up to 20 days of virus persistence in another study. It is also reported in one study reporting multiple sample types - bronchoalveolar lavage, sputum, nasal swabs, pharyngeal swabs, feces and blood, that nasal swabs had the highest viral loads.
  • the viral load of SARS-CoV-2 in clinical samples as measured by the primary diagnostic tool of RT-PCR is an imperfect readout for infection potential.
  • Standard qRT-PCR methods are deliberately designed to detect the presence of very short SARS-CoV-2 RNA sequences, making them unsuitable to distinguish intact SARS-CoV-2 genomes from degraded viral RNA. Detection of degraded viral RNA also contributes to positive test results but cannot provide any insight into the infectiousness (or infectivity) of the virus.
  • Infectivity requires the presence of intact viral particles indirectly represented by larger, intact viral RNA genomes.
  • RNA virus genomes (30 kb in the case of SARS-CoV-2) are subject to a variety of degradative effects (both viral RNA-specific and nonspecific degradation processes) which are in competition with replacement of intact viral RNA genomes through active viral replication. The process of active viral replication is required for the generation of infectious virus particles and viral shedding.
  • infectivity is determined by live virus isolation performed by complex cell culture methods requiring specialized equipment and days to results.
  • the present study aims to address this gap in the measurement of infectivity in both symptomatic and asymptomatic settings, to incorporate a measure of integrity or intactness of the virus. Further, the study envisions that the integrity measure can be translated to a qRT- PCR test designed to detect both short inserts ( ⁇ 70 bps) and long inserts (>200 bps) for SARS- CoV-2.
  • next-generation sequencing was used to characterize clinical samples related to the spread of SARS-CoV-2 infection in a cohort of migrant workers in Singapore, collected from different tissue sites and over serial timepoints.
  • the design of the NGS panel allowed virus detection, genomic variant and sgRNA analysis.
  • the inventors have shown for the first time that NGS can be used for the detection of differential fragment lengths of viral RNA in clinical samples, potentially related to the integrity and transmissibility or infectiousness of the virus.
  • the present disclosure presents a novel measure of fragmentation of viral RNA, which could be translated to a test for transmission potential of a current infection.
  • the integrity/intactness measure could more informatively allow the downstream management of positive SARS-CoV-2 results in symptomatic, pre-symptomatic, convalescent and in asymptomatic cases, according to the degree of integrity measured as a proxy of infectiousness.
  • NGS is a sensitive and specific method of detection of SARS-CoV-2, with 100% detection of virus in clinical samples known to be positive by RT-PCR. Additional detection of virus fragments spanning ⁇ 3% of the virus genome in two RT-PCR negative samples, highlights that NGS may be a more sensitive diagnostic method for low levels of SARS-CoV-2.
  • SARS-CoV-2 NGS methods do not report similar quantitative capacity.
  • the quantitative feature of the method allows counting of reads to correlate with other parameters as described further on.
  • Genomic variant analysis showed complete similarity of variants in multiple samples derived from a single case, suggesting no independent replication or generation of minor variants over time course of infection, in contrast to previous reports.
  • Phylogenetic analyses was possible on assembled SARS-CoV-2 genomes from 28 cases and showed that all cases were infected with virus belonging to the Clade O, lineage B.6, known to be circulating in Singapore and India.
  • the method provided novel insights into two aspects of SARS-CoV-2 infection which have become increasingly more relevant in the context of predicting the onward transmission potential of an infection. In this study, the inventors were able to elicit information on the presence of subgenomic RNA in clinical samples as well as the relative degree of fragmentation of the viral RNA in a sample.
  • sgRNA detection outlasted successful virus culture and was a poor predictor of successful virus culture, or showed moderate agreement with virus culture, and was detected in 18 of 22 (81.8%) specimens collected ⁇ 8 days after symptom onset and in 1 of 11 (9.1 %) specimens collected >9 days after symptom onset. Due to the prolonged detection of sgRNA in clinical samples including up to 22 days since start of illness, sgRNA may not be a marker of active replication but remain detectable in clinical samples due to their stability.
  • RNA shedding can persist even after resolution of clinical symptoms of both mild and severe disease. While more ill patients have generally longer detection of RNA, persistent positivity by PCR is seen in patients with mild illness as well as in asymptomatic cases.
  • a better measure of infectiousness is the culture of live virus from clinical specimens, as successful virus growth is reliant on the presence of intact whole virions with complete RNA genomes. By this measure, samples taken more than 8 days since symptom onset in immunocompetent patients do not have infectious virus, as no live virus could be cultured from them.
  • NGS data can not only provide total viral RNA abundance information, it can also characterize the viral RNA fragments by length. This was possible due to combination of consensus read counting (related to the original copy numbers of virus RNA in a sample undergoing library preparation) and the tiled configuration of the primer panel which could capture long fragments, if present. In other words, the more intact or full-length RNA that is present in a sample, the more long fragments would be captured by the method.
  • the relative abundances of long and short fragments was converted to a viral fragmentation score (VFS), representing the relative integrity of the virus RNA in the sample and was shown to be related to the viral load measured as Ct value by an RT-PCR assay.
  • VFS viral fragmentation score
  • the CDC RT-PCR assay targets particularly short amplicons of about 70 bps, which means it would accurately detect intact RNA when present, but would continue to give an abundant signal even when largely fragmented RNA was present in a sample.
  • the relative abundance of long fragments were as much as 100-185 times more than short fragments (very high viral fragmentation score) and could not have resulted simply as a result of the specific sequencing method.
  • Oropharyngeal saliva contains secretion of the salivary glands mixed with sputum from lower respiratory tract, hence is more akin to lower respiratory tract samples.
  • the inventors observed higher VFS (more long fragments) in saliva samples on average compared to URT samples.
  • Ct value as a surrogate measure of infectivity is variably correlated with success of viral culture and multiple RT-PCR assays with Ct scales that are not directly comparable are in clinical use (evidenced from the dramatically different Ct cut-offs derived based on culture positivity).
  • typical RT-PCR assays are designed to detect short target templates and will readily amplify small amounts of fragmented viral RNA, precluding any consideration of integrity of the viral RNA.
  • duration since symptom onset has been shown to correlate with success of viral culture, however, all measures suffer from recall bias.
  • the viral culture test is labor intensive and requires the Biosafety Level 3, which precludes it from being established in all diagnostic laboratories, and suffers from variations in accuracy and permissiveness of cell lines. Accurate measurement of infection potential becomes particularly relevant as vaccines against SARS CoV-2 that immunize against COVID- 19 disease become available, but transmission capacity of an immunized individual remains unknown.
  • the viral fragmentation scores (VFS) determined in this study are relative and post-laboratory processing and no determination of absolute fragment lengths contained in a sample has been attempted.
  • the viral culture assay is the closest surrogate of this measure. However, the basic premise of the argument that more intact virus would reflect in more longer fragments in a clinical sample would universally apply.
  • NGS is an enabling tool that provides sequence-related information for which it is primarily designed, and also information from size and length dimensions. Based on this, fragment length differences among clinical samples were identified, which are correlated to clinical features of infectiousness of SARS-CoV-2, quantification of which could be incorporated as relevant and straightforward measure to determine infection potential.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Immunology (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Virology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Disclose is a method of determining integrity of a viral genome or infectivity of a virus in a biological sample. The method comprises multiplex amplicon-based next generation sequencing, using primers designed to span the entire viral genome in tiled configuration and alternately assigned to two separate pools, and determining one or more insert size ratio (also termed as viral fragmentation score), which is defined as the number of amplicons with longer insert sizes to the number of amplicons with shorter insert sizes. Larger insert size ratio(s) relative to a reference sample indicates that the viral genome in the biological sample has a higher integrity or that the virus in the biological sample has a higher infectivity as compared to the reference sample. In one embodiment, the virus is Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Also disclosed is a kit for determining the integrity of a viral genome or infectivity of a virus in a biological sample.

Description

METHODS AND KITS FOR DETERMINING INTEGRITY OF VIRAL RNA
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claim the benefit of priority of Singapore provisional application No. 10202007270P, filed 29 July 2020, and Singapore provisional application No. 10202011694Q, filed 24 November 2020, the contents of them being hereby incorporated by reference in their entirety for all purposes.
FIELD OF THE INVENTION
[0002] The present invention relates to molecular biology, in particular the detection of viruses using molecular biology techniques. In particular, the present invention relates to the detection of viruses in a biological sample obtained from a patient to determine the integrity of the viral genome or the infectivity of the virus in the biological sample.
BACKGROUND OF THE INVENTION
[0003] In March 2020, the World Health Organization (WHO) declared a pandemic of the coronavirus disease 2019 (COVID-19), an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). By the end of 2020, 82 million infections have been reported worldwide with 1.79 million deaths. Sensitive methods for detecting SARS-CoV-2 are paramount in reducing viral transmission, as early identification and isolation of infected individuals and their close contacts allows effective control of the spread of the virus. In addition, releasing infected persons prematurely from isolation carries the risk of fueling transmission, while prolonging isolation unnecessarily can adversely affect resource allocation and cause undue disruptions.
[0004] While RT-PCR assays are widely regarded as the most sensitive diagnostic method, the inability of such assays in distinguishing intact viral RNAs (which are packaged in virions and able to cause onward transmission) from products of degradation of viral RNAs (which are ineffectual in causing further infection) has led to considerable debate on the dichotomous interpretation of RT-PCR results as positive/negative. Positive detection of viral RNAs does not prove presence of infectious virus, as persistent detection of viral RNA in samples has been reported up to 35 days from symptom onset, and after resolution of clinical symptoms of both mild and severe disease. [0005] The isolation of whole virions from cultured cells, in contrast to RNA fragments, provides evidence of the isolate’ s replicative potential, and more closely reflects the true infection potential of the virus. Duration of live virus detection is much shorter than viral shedding, and no live virus has been isolated in samples taken more than 8 days since symptom onset in immunocompetent patients. However, culturing of cells and isolation of whole virions are time consuming, which hinders the rapid detection of the virus on a large scale.
[0006] Prompted by the above limitations of the commonly used diagnostic device for SARS-CoV-2 in informing contagiousness, there is a need for an alternative method for the rapid and accurate detection of SARS-CoV-2.
SUMMARY OF THE INVENTION
[0007] In one aspect, the present disclosure refers to a method of determining integrity of a viral genome or infectivity of a virus in a biological sample, said method comprising: a) extracting viral nucleic acid molecules from the biological sample to generate a nucleic acid library; b) providing a plurality of primer pairs wherein,
(i) each primer pair comprises a forward primer and a reverse primer, wherein the forward primer comprises, from the 5’ end to the 3’ end, a first adapter sequence (AS1), and a forward target-specific sequence (TS1); wherein the reverse primer comprises, from the 5’ end to the 3’ end, a second adapter sequence (AS2), and a reverse target specific sequence (TS2);
(ii) the first adapter sequence (AS1) of each forward primer is the same, the second adapter sequence (AS2) of each reverse primer is the same;
(iii) each primer pair with a unique combination of forward and reverse target sequences (TS) is for amplifying a different target region of the viral genome; c) subjecting the nucleic acid library to multiplex PCR using the plurality of primer pairs to amplify a plurality of amplicons; d) subjecting the plurality of amplicons to amplification and sequencing, to obtain amplicon sequences; e) detecting and mapping the amplicon sequences to a reference genome of the viral genome; f) determining the insert size of each amplicon, wherein the insert size is the number of nucleotide bases or base pairs between the forward and reverse primer of each amplicon; g) categorizing the amplicons into groups based on insert size, wherein each group comprises a range of insert sizes; h) enumerating the number of amplicons in each group; i) obtaining one or more insert size ratios of the number of amplicons in one group to the number of amplicons in another group; j) determining the integrity of the viral genome or infectivity of the virus based on the insert size ratio, wherein a large insert size ratio indicates that the viral genome in the biological sample has a high integrity or the virus in the biological sample has a high infectivity, or wherein a small insert size ratio indicates that the viral genome in the biological sample has a low integrity or the virus in the biological sample has a low infectivity
[0008] In one aspect, the present disclosure refers to a kit for determining the integrity of a viral genome or infectivity of a virus in a biological sample, comprising: a) a first pair of primers comprising a forward primer and a reverse primer designed for amplifying a first amplicon having a first insert, wherein the first insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the smallest as determined by the method as disclosed herein; b) a second pair of primers comprising a forward primer and a reverse primer designed for amplifying a second amplicon having a second insert, wherein the second insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the second smallest as determined by the method as disclosed herein; c) a first probe to detect the first amplicon; and d) a second probe to detect the second amplicon.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] The invention will be better understood with reference to the detailed description when considered in conjunction with the non-limiting examples and the accompanying drawings, in which:
[0010] Fig. 1 shows the design of a highly multiplexed PCR-based NGS assay which allows capture of variable lengths of RNA fragments. Fig. 1A illustrates the definitions of amplicon and insert in a polymerase chain reaction (PCR) and multiplex PCR-based next-generation sequencing (NGS) assay. Fig. IB shows the origin of inserts of variable lengths by amplicon- based sequencing. Primer pairs in subsequence are shown from primer 51 to 61. Inserts of length 98 bps and 103 bps are formed as a result of the expected interaction between 51-F and 51-R and between 57-F and 57-R. If the RNA template is intact, longer inserts can also be captured by formation of products between, for example, 51-F and 53-R, and 57-F and 59-R. These inserts would have lengths of 291 bps and 303 bps respectively. Potentially, other primers in subsequence which are even further from one another (for example, 51-F and 55-R) can capture even longer inserts of length 464 bps. Fig. 1C shows that in samples with greater degree of fragmentation or low viral RNA amount, the shorter inserts would be formed favorably as compared to the longer inserts, due to the lack of long templates to be amplified by the more distant primer pairs.
[0011] Fig. 2 are histograms and dot plots showing that SARS-CoV-2 fragments of differing lengths can be detected by NGS as short or long inserts with varying relative abundances. Fig. 2A shows histogram of sequencing read counts supporting insert lengths of ranges 70-150 bps, 220-350 bps, and 400-520 bps. Insert length distributions in longitudinal saliva samples from a single case are shown as examples. The relative abundance of insert lengths changes from a dominant 220-350 bps (long inserts) distribution in a sample taken 3 days from symptom onset, to a more abundant 70-150 bps (short inserts) distribution prominent in a sample taken 5 days from symptom onset. Insert lengths are representative of viral RNA fragments lengths. Fig. 2B is a dot plot showing that long and short fragment counts are each correlated with the Ct value of the sample measured by the CDC RT-PCR assay. Long fragments have a steeper decline in abundance, becoming differentially fewer compared to short fragments at Ct values greater than 25. URT (NP and nasal swab) and saliva samples are plotted together. Fig. 2C is a dot plot showing that viral fragmentation score (VFS) is moderately correlated with Ct value of a sample, in both URT and saliva samples. R2 values of correlation are shown. Half-filled circles and triangles represent asymptomatic cases. Light gray circles represent synthetic RNA samples of known integrity that generate a considerably constant VFS (median: 0.384, IQR: 0.289-0.449) across Ct values 24.4-33.45 over 500-fold serial dilution of synthetic RNA. Numbers next to grey circles are the concentrations in copies/ml of synthetic RNA. Fig. 2D shows that viral fragmentation scores are significantly higher across saliva samples compared to URT samples. The horizontal lines represent median and interquartile ranges. **Mann-Whitney test p = 0.0035.
[0012] Fig. 3 shows dot plots for Ct values and viral fragmentation scores (VFS) for synthetic SARS-CoV-2 RNA samples (having a known degree of integrity of 5000-bp fragments) with different numbers of copies of RNAs. Mean values of three replicate experiments are shown. Ct values for the same serially diluted samples obtained by the CDC RT-PCR assay varied from 24.4-33.45, reflective of the range of 500-fold difference between the highest and lowest copy numbers used. Sequencing libraries of dilutions of the synthetic RNA showed viral fragmentation scores which did not decrease with increasing Ct (VFS: 0.384, IQR: 0.289-0.449). The results show that synthetic SARS-CoV-2 RNA samples in serial dilutions have expectedly increasing Ct but stable viral fragmentation score (VFS).
[0013] Fig. 4 are dot plots showing that viral fragmentation score is related to the clinical time-course of SARS-CoV-2 infection. Fig. 4A and 4B show that in symptomatic cases, the relative amount of long fragments versus short fragments (represented by VFS) decreases with days since symptoms onset in both URT and saliva samples. With the exception of one saliva sample (unfilled circle), beyond 8 days since symptoms onset, VFS does not exceed the cut off of 0.38 (dotted line) estimated for intact RNA based on synthetic SARS-CoV-2 (upper left quadrants). For samples collected < 8 days since symptoms onset, 15 (40.5%) of 37 URT samples and 16 (55%) of 29 saliva samples exceeded the cut-off (lower right quadrant). The fitted regression lines and 95% Cl error are shown on the graphs. Fig. 4C shows that most (96%) samples from asymptomatic cases had low VFS. Fig. 4D (left) shows VFS differ significantly in URT and saliva between samples collected < 8 days after symptoms onset and later samples. ***Mann- Whitney test p<0.0001, **p=0.0017. Fig. 4D (right) shows that no significant difference was observed between VFS from samples collected on day 0 of diagnosis and later samples in asymptomatic cases (URT and saliva samples combined). Median and IQR lines are shown. Fig. 4E shows matched longitudinal samples from 15 symptomatic cases (left) and 4 asymptomatic cases (right) and VFS. Each case is colored by a different line. Lines marked by grey ends are saliva samples while the rest are URT samples.
[0014] Fig. 5 are dot plots showing profiles of subgenomic RNA (sgRNA) with relation to viral fragmentation score. Fig. 5A shows that across all sample types, sgRNA expression begins to tend to zero when VFS is low, and a cut-off of 0.382 (dotted line) separates 33 samples (all of which have VFS > 0.382 and sgRNA detected) and 58 (54%) of 107 samples with VFS <0.382 having no sgRNA. Fig. 5B shows significant difference in VFS between samples with zero or any sgRNA detected. ***Mann-Whitney test p<0.0001. Dotted line is VFS cut-off of 0.382. Fig. 5C shows that S gene and N gene abundances are inversely related to the VFS, but other abundant sgRNA E gene and ORF3a are not.
[0015] Fig. 6 are dot plots showing that viral fragmentation score can be captured in a multiplex RT-PCR assay for long (240 bps) and short (70 bps) amplicons. Fig. 6A shows translation of the VFS obtained from NGS to an RT-PCR assay with calculation of difference in Ct value of short to long amplicon (Ct_70bp-Ct_240bp) for 20 clinical samples. The deltaCt value is referred to as the viral fragmentation index (VFI). The dots on the left represent samples with no detection of 240 bps amplicon, for which Ct values are assigned to 40. Correlation of VFS and VFI for samples within the box is shown. Fig. 6B is VFI from RT- PCR assay showing a correlation with days since diagnosis (for the 7 samples not plotted due to non-detection of 240 bps, the days from diagnosis were > 7).
[0016] Fig. 7 shows visualization of inserts of two different lengths formed by the forward primer 153F. The intended insert formed between primers 153F-153R is in dark gray and is of 113 bps long, while the longer insert formed between 153F-157R is of 274 bps long. The alignment is shown with reference to the SARS-CoV-2 genome.
[0017] Fig. 8 shows histograms showing the number of count of inserts of different sizes. In a multiplex PCR assay, the inserts are of predetermined sizes, and typically fall into one of three ranges visually classified as 70-150 bps, 220-350 bps, and 400-520 bps. Fig. 8A shows the histogram for a sample with greater abundance of 70-150 bps inserts. Fig. 8B shows the histogram for sample with greater abundance of 220-350 bps inserts. Fig.8C and 8D show that the greatest variation in insert sizes are in the F_R and F_R+1 size ranges.
[0018] Fig. 9 shows histograms showing the number of inserts of different sizes. The calculated insert size ratios (ratio between inserts of range 151-350 bps and 0-150 bps) in clinical samples are also included.
[0019] Fig. 10 shows the relationship between NGS-derived insert size ratios and PCR- derived product size ratios in clinical samples. Fig. 10A shows that the longer 253 bps amplicon (corresponding to the 206 bps insert) representing the more intact template, and the shorter amplicon of 66 bps (corresponding to the 18 bps insert) representing fragmentation products. Each column Cl, Gl, C2, etc. represents a different sample. Top panel shows the longer 253 bps product and the bottom panel shows the shorter 66 bps product for the same samples matched by columns. Fig. 10B shows the correlation of NGS-derived insert size ratio (log2 scale) to the PCR-derived ratio abundances of 253 bps product and 66 bps product.
[0020] Fig. 11 shows dot plots showing the relationship between insert size ratios and timepoint of sample collection for different clinical samples. Timepoints 2 and 3 of sample collection correspond to 4 days and 7 days, respectively, from Timepoint 1 of sample collection. Fig. 11A and 11B show the results for nasal swab samples and saliva samples. Fig. llC shows the results for individual cases with different disease course (Case 1 was presymptomatic at timepoint 1; Cases 2 and 3 were asymptomatic throughout the disease course; Case 4 was symptomatic from timepoint 1). The results show that insert size ratio generally correlates with the timepoint of sample collection in multiple sample types, and decreases with time (Fig. 11A and 11B). However, for individual cases with differing disease course, the insert size ratio changes distinctly over similar timeframe of sampling.
DETAILED DESCRIPTION OF THE PRESENT INVENTION
[0021] The inventors of the present disclosure have set out to provide an alternative method for detecting a virus in a biological sample. Advantageously, the method provides information regarding the integrity of the viral genome and/or the infectivity of the virus in the biological sample, which is useful information for the purpose of controlling disease transmission.
[0022] Thus, in one aspect, the present invention provides a method of determining integrity of a viral genome or infectivity of a virus in a biological sample, said method comprising: a) extracting viral nucleic acid molecules from the biological sample to generate a nucleic acid library; b) providing a plurality of primer pairs wherein, (i) each primer pair comprises a forward primer and a reverse primer, wherein the forward primer comprises, from the 5’ end to the 3’ end, a first adapter sequence (AS1), and a forward target-specific sequence (TS1); wherein the reverse primer comprises, from the 5’ end to the 3’ end, a second adapter sequence (AS2), and a reverse target specific sequence (TS2); (ii) the first adapter sequence (AS1) of each forward primer is the same, the second adapter sequence (AS2) of each reverse primer is the same; (iii) each primer pair with a unique combination of forward and reverse target sequences (TS) is for amplifying a different target region of the viral genome; c) subjecting the nucleic acid library to multiplex PCR using the plurality of primer pairs to amplify a plurality of amplicons; d) subjecting the plurality of amplicons to amplification and sequencing, to obtain amplicon sequences; e) detecting and mapping the amplicon sequences to a reference genome of the viral genome; f) determining the insert size of each amplicon, wherein the insert size is the number of nucleotide bases or base pairs between the forward and reverse primer of each amplicon; g) categorizing the amplicons into groups based on insert size, wherein each group comprises a range of insert sizes; h) enumerating the number of amplicons in each group; i) obtaining one or more insert size ratios of the number of amplicons in one group to the number of amplicons in another group; j) determining the integrity of the viral genome or infectivity of the virus based on the insert size ratio, wherein a large insert size ratio indicates that the viral genome in the biological sample has a high integrity or the virus in the biological sample has a high infectivity, or wherein a small insert size ratio indicates that the viral genome in the biological sample has a low integrity or the virus in the biological sample has a low infectivity.
[0023] The term “integrity” as used herein in the context of “integrity of the genome” refers to the absence of nucleic acid damages which may hamper the replication of the genome or its normal functionality. Examples of such nucleic acid damages include but are not limited to: chemical addition or disruption to a base of the nucleic acid molecule (thus creating an abnormal nucleotide or nucleotide fragment), single strand breaks and double strand breaks. [0024] By allowing the determination of the integrity of a viral genome in a biological sample, the method as disclosed herein also allows the infectivity of the virus in the biological sample to be assessed. The term “infectivity” as used herein refers to the capacity of the virus to enter the host cell and exploit the resources of the host cell to replicate and produce progeny infectious viral particles, which may lead to infection and subsequent disease in the host. Damages in the viral genome will generally reduce the infectivity of the virus, as an intact viral genome is generally necessary for the virus to remain infectious.
[0025] In some examples, the plurality of primer pairs comprises 50-700 primer pairs, 100- 400 primer pairs, 100-300 primer pairs, 125-250 primer pairs, or 150-200 primer pairs. In some examples, the first and second pluralities of primer pairs comprise a total of 100-1400 primer pairs, 200-800 primer pairs, 250-600 primer pairs, 250-450 primer pairs, or 300-400 primer pairs.
[0026] The term “adapter sequence” as used herein refers to any nucleotide sequence which can be added to an oligonucleotide of interest to prepare said oligonucleotide of interest for various purposes. In some examples, an adapter sequence allows for the amplification of the oligonucleotide of interest by a universal primer. In some examples, an adapter sequence allows for the sequencing of the oligonucleotide of interest. Sequencing platform specific adapter sequences are known in the art, and include, for example, the Illumina P5/P7 adapter sequences.
[0027] In some examples, the length of the first adapter sequence (AS1) and the length of the second adapter sequence (AS2) are the same, while in some other examples, the length of AS1 and the length of AS2 are different. In some examples, the length of AS1 or AS2 is from 10 to 30 nucleotides, or from 17 to 28 nucleotides, or from 19 to 26 nucleotides, or from 19 to 26 nucleotides, or 15 nucleotides, or 16 nucleotides, or 17 nucleotides, or 18 nucleotides, or 19 nucleotides, or 20 nucleotides, or 21 nucleotides, or 22 nucleotides, or 23 nucleotides, or 24 nucleotides, or 25 nucleotides, or 26 nucleotides. In one specific example, the length of AS1 and the length of AS2 are the same, and are of 20 nucleotides long. In some specific examples, AS1 comprises the sequence of 5’-ACACGACGCTCTTCCGATCT-3’ (SEQ ID NO: 1), and AS2 comprises the sequence of 5’-GACGTGTGCTCTTCCGATCT-3’ (SEQ ID NO: 2). [0028] The term “target-specific sequence” as used herein is well known in the art, and refers to the part of the primer that binds or anneals to the target region to be amplified. In general, the target-specific sequence is at least 80%, or at least 85%, or at least 90%, or at least 95%, or at least 99%, or 100% complementary to the target region to be amplified. A forward target-specific sequence binds or anneals to the 5’ end of the target region to be amplified, and the reverse target-specific sequence binds or anneals to the 3’ end of the target region to be amplified. In some examples, the length of the forward and/or reverse target-specific sequence is from 15 to 35 nucleotides, or from 15 to 33 nucleotides, or from 15 to 31 nucleotides, or from 17 to 28 nucleotides, or from 19 to 26 nucleotides, or from 19 to 26 nucleotides, or 18 nucleotides, or 19 nucleotides, or 20 nucleotides, or 21 nucleotides, or 22 nucleotides, or 23 nucleotides, or 24 nucleotides, or 25 nucleotides, or 26 nucleotides, or 27 nucleotides, or 28 nucleotides, or 29 nucleotides, or 30 nucleotides.
[0029] The term “target region” as used herein refers to a particular region of a nucleic acid molecule to be amplified. The identity of the target region or a combination of target regions is unique for the virus to be detected, in order to guarantee the specificity of the detection of the virus. For example, the target region or combination of target regions for the detection of a particular virus should not be found in a genome or a transcriptome of another virus, and is further not found in a genome or a transcriptome of a human, and is further not found in a genome or a transcriptome of any microorganism.
[0030] In some examples, the target regions amplified by the plurality of primer pairs are of the same length. In some examples, the target region amplified by one primer pair is of a different length as the target region amplified by another primer pair. In some examples, the target region amplified by each primer pair has a length between 50-1000 nucleotides, or a length between 100-800 nucleotides, or a length between 250- 650 nucleotides, or a length between 300-600 nucleotides.
[0031 ] The term “multiplex polymerase chain reaction (“multiplex PCR” for short)” as used herein refers to the PCR that allows the simultaneous detection of multiple targets in a single reaction, with a different pair of primers for each target. This technique requires two or more probes that can be distinguished from each other and detected simultaneously. Methods and systems for carrying out multiplex PCR are well known in the art.
[0032] The term “amplicon” as used herein refers to a polynucleotide that is the source and/or product of amplification or replication events. It can be formed artificially, using various methods including polymerase chain reactions (PCR) or ligase chain reactions, or naturally through gene duplication.
[0033] The sequencing steps of the method of the invention can be carried out using any sequencing method known in the art, and/or on any standard sequencing platform, including but not limited to Illumina and Ion Torrent platforms.
[0034] The term “reference genome” (also known as a reference assembly) as used herein refers to a digital nucleic acid sequence database, assembled as a representative example of the set of genes in one idealized individual organism of a species. A reference genome of a virus provides a haploid mosaic of different nucleic acid sequences from each known variant of the virus. Reference genomes of various organisms are publicly available. In one specific example, the reference genome is a SARS-CoV-2 reference genome (NCBI Reference Sequence: NC_045512.2).
[0035] In some examples, the biological sample is a body fluid sample. The term “body fluid” refers to, but are not limited to, blood, plasma, serum, sputum, urine, feces, semen, mucus, lymph, saliva, or nasal lavage. In some examples, the biological sample is a nasopharyngeal swab sample, an oropharyngeal swab sample, a nasal swab sample, a saliva sample, a sputum sample, a viral culture sample, or mixtures thereof. In some specific examples, the biological sample is a nasopharyngeal swab sample, a nasal swab sample, or mixtures thereof. In some examples, the biological sample is an inactivated cultured viral isolate.
[0036] In some examples, the method as described above further comprises the step of splitting the nucleic acid library from step a) into a first nucleic acid pool and a second nucleic acid pool. In some examples, splitting the nucleic acid library into the first nucleic acid pool and the second nucleic acid pool comprises randomly splitting the nucleic acid library. In some examples, the number of nucleic acid molecules in the first nucleic acid pool and the number of nucleic acid molecules in the second nucleic acid pool are the same. In some other examples, the number of nucleic acid molecules in the first nucleic acid pool and the number of nucleic acid molecules in the second nucleic acid pool are different. In some examples when the numbers are different, the first nucleic acid pool contains more nucleic acid molecules than the second nucleic acid pool, while in some other examples, the second nucleic acid pool contains more nucleic acid molecules than the first nucleic acid pool.
[0037] In some examples, when the nucleic acid library from step a) is split into a first nucleic acid pool and a second nucleic acid pool, primers are designed for each of the two pools of nucleic acid molecules, and each of the two pools of nucleic acid molecules will be subjected to multiplex PCR. Doing so will prevent the formation of very short undesired PCR products between adjacent primer pairs, and allow reconstruction of the whole viral genome from sequencing. Thus, in some examples, when the nucleic acid library from step a) is split into a first nucleic acid pool and a second nucleic acid pool, the plurality of primer pairs comprises a first plurality of primer pairs and a second plurality of primer pairs, wherein each forward primer in the first and second plurality of primer pairs further comprises a barcode sequence (BS), and wherein the barcode sequence (BS) of each forward primer is different. In some examples, the first nucleic acid pool is subjected to multiplex PCR using the first plurality of primers to amplify a first plurality of amplicons, and the second nucleic acid pool is subjected to multiplex PCR using the second group of primer pairs to amplify a second plurality of amplicons.
[0038] In some examples, each of the first and/or second plurality of primer pairs comprises 50-700 primer pairs, 100-400 primer pairs, 100-300 primer pairs, 125-250 primer pairs, or 150- 200 primer pairs. In some examples, the first and second pluralities of primer pairs comprise a total of 100-1400 primer pairs, 200-800 primer pairs, 250-600 primer pairs, 250-450 primer pairs, or 300-400 primer pairs.
[0039] As used herein, the term “barcode sequence” refers to an encoded molecule or barcode that includes variable amount of information within the nucleic acid sequence. For example, the barcode sequence is a tag that can be read out using any of a variety of sequence identification techniques, for example, nucleic acid sequencing, probe hybridization based assay, and the like. In some examples, the barcode sequence is used in the method as described herein to append different target specific sequences, such that when the target specific sequence anneal to the target region, each different target region would then have a unique barcode sequence that is attached to it and read out with the sequence of the target region from that sample.
[0040] The barcode sequence allows the pooled analysis of multiple unique target regions, where the resulting sequence information from the pool can be later attributed back to each starting target region. That is, after the process of amplification, the barcode sequence is used to group amplicons to form a family of amplicons having the same barcode sequence. In some examples, the barcode sequence is an overhang that does not complement any sequence within the target region.
[0041] In some examples, the barcode sequence is an oligonucleotide comprising 10 to 16 random nucleotides, or 10 to 15 random nucleotides, or 10 to 13 random nucleotides, or 10 random nucleotides, or 11 random nucleotides, or 12 random nucleotides, or 13 random nucleotides, or 14 random nucleotides, or 15 random nucleotides, or 16 random nucleotides. In one specific example, the barcode sequence is an oligonucleotide comprising 10 random nucleotides which can be represented as NNNNNNNNNN (SEQ ID NO: 3), wherein N is any nucleotide. [0042] In some examples, the target regions to be amplified by the first plurality of primer pairs are non-overlapping and are separated by gaps of nucleotides, and the target regions to be amplified by the second plurality of primer pairs are non-overlapping, and each of the gaps of nucleotides is comprised within one of the target regions to be amplified by the second plurality of primer pairs. In some examples, the target regions to be amplified by the first and second plurality of primer pairs span the viral genome.
[0043] In some examples, the target regions to be amplified by the first plurality of primer pairs have the same or different length, and the target regions to be amplified by the second plurality of primer pairs have the same or different length. In some examples, the target region amplified by each primer pair has a length between 50-1000 nucleotides, or a length between 100-800 nucleotides, or a length between 250- 650 nucleotides, or a length between 300-600 nucleotides.
[0044] In some examples, the length of each of the gaps of nucleotides separating the target regions to be amplified by the first plurality of primer pairs is from 2 to 30 nucleotides, or from 4 to 28 nucleotides, or from 6 to 26 nucleotides, or from 8 to 24 nucleotides, or from 10 to 22 nucleotides, or from 12 to 20 nucleotides, or from 14 to 18 nucleotides, or 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 nucleotides. [0045] In some examples, in step (g) of the method as described above, the ranges of insert sizes of the one or more groups are determined based on the insert sizes observed in step (f), or expected insert sizes, or both.
[0046] In some examples, the expected insert sizes are determined by: a) providing multiplex PCR sequencing data that has been obtained from a viral genome of the same virus species; b) processing the sequencing data to identify one or more pairs of predetermined primers, wherein each pair of predetermined primers comprises a forward primer and a reverse primer that flank an amplicon in the 5’to 3’direction; c) trimming primer specific sequences from each amplicon to obtain the sequence of the insert; d) aligning the insert sequence to a reference genome; e) removing inserts with supplementary alignment from the sequencing data, wherein inserts with supplementary alignment are inserts that do not align in contiguity with the reference genome; f) determining the insert size of each remaining insert from step e); g) retaining inserts from step f) having an insert size of between 70 base pairs (bps) to 1000 bps to obtain the expected insert sizes. [0047] The term “insert with supplementary alignment” refers to an insert that does not align contiguously with a reference genome. During the alignment of reads to the intended reference genome, there may exist reads for which a completely linear alignment may not happen (also described as chimeric reads). Part of the read may align to a portion of the genome not contiguous with the region to which the other part is aligning. This may happen when a structural rearrangement brings ordinarily distant portions of a chromosome together. The supplementary part of the alignment is the portion of the sequencing read that does not align in contiguity with the rest of the read in alignment with the reference genome. The insert sizes of reads with supplementary alignment are calculated as the starting and ending position of the linear and supplementary part of the reads (with respect to the reference genome) and insert sizes can be inflated due to the mapping positions being distant for the linear and supplementary portions of the read. These may be filtered out so as not to skew the true insert size distribution. [0048] In some examples, the insert size information may be obtained from sequence data using bioinformatics tools/programs. For example, the insert size data may be extracted from sequence data using Samtools, available at http://www.htslib.org.
[0049] In some examples, the range of insert sizes of each group can be determined based on the results of observed insert size distribution, or expected insert size distribution, or both. The insert size distribution results can be shown as a plot of (the number of insert counts) vs (insert length), as exemplified in Fig. 8. From the results of insert size distribution, it can be seen that ranges of insert sizes are clustered together. For example, in Fig. 8A, the insert sizes fall into three ranges visually classified as 70-150 bps, 220-350 bps, and 400-520 bps.
[0050] In some examples, each group comprises amplicons formed between the forward and reverse primers of a primer pair. In some other examples, each group comprises amplicons formed between the forward primer of a primer pair and the reserve primer of another primer pair. For example, as illustrated in Fig. 8B, amplicons can be formed between forward primer F and reverse primer R, or between forward primer F and reverse primer R+l, or between forward primer F and reverse primer R+2, or between forward primer F and reverse primer R+3, and so forth.
[0051] In some examples, the range of insert sizes of the groups is about 50 to about 100 bps, or about 100 to about 150 bps, or about 150 to about 200 bps, or about 200 to about 250 bps, or about 250 to about 300 bps, or about 300 to about 350 bps, or about 350 to about 400 bps, or about 400 to about 450 bps, or about 450 to about 500 bps, or about 500 to about 550 bps, or about 550 to about 600 bps, or about 600 to about 650 bps, or about 650 to about 700 bps, or about 700 to about 750 bps, or about 750 to about 800 bps, or about 800 to about 850 bps, or about 850 to about 900 bps, or about 900 to about 950 bps, or about 950 to about 1000 bps. In some specific examples, the range of insert sizes of the groups is about 0 to about 150 bps, or about 70 to about 150 bps, or about 150 to about 350 bps, or about 220 to about 350 bps, or about 400 to about 520 bps.
[0052] The term “insert size ratio” as used herein represents the ratio of the abundance of longer template molecules to shorter template molecules in a PCR reaction, in particular a multiplexed PCR reaction. The extent to which longer amplicons (and hence inserts) form is a function of the abundance of longer template molecules available in the multiplex PCR reaction and the degree to which template molecules are degraded in a sample. The “insert size ratio” indirectly correlates with infectivity of a virus since infectivity requires the presence of intact viral particles represented by larger, intact viral genomes.
[0053] The inventors of the present disclosure has also derived a Viral Fragmentation Score (VFS), wherein VFS = long viral nucleic acid fragment count/ short viral nucleic acid fragment count. In some examples, VFS is positively correlated with viral load in the sample, intactness of the viral genome in the sample, and/or the insert size ratio of the sample as defined above. [0054] Depending on the number of groups that the amplicons can be categorized into in step g) of the method as described above, a number of insert size ratios can be obtained from a multiplex PCR from a single biological sample. For example, if the amplicons are categorized into two groups in step g) of the method as described above, a single insert size ratio can be obtained between the first group and the second group. If the amplicons are categorized into four groups in step g) of the method as described above, six insert size ratios can be obtained, namely, the insert size ratios between the first group and the second group, between the first and the third group, between the first and the fourth group, between the second and the third group, between the second and the fourth group, and between the third and the fourth group. Assuming that the number of groups is n (where n>2), and the number of insert size ratios is N, the value of N can be calculated using the following formula:
Figure imgf000017_0001
[0055] In some examples, the insert size ratio that indicates the integrity of the viral genome and/or the infectivity of the virus is the number of amplicons in the group with a range of insert sizes that is the second smallest, to the number of amplicons in the group with a range of insert sizes that is the smallest. In some other examples, the insert size ratio that indicates the integrity of the viral genome and/or the infectivity of the virus is the number of amplicons in the group with a range of insert sizes that is the largest, to the number of amplicons in the group with a range of insert sizes that is the second largest. Yet in some other examples, the insert size ratio that indicates the integrity of the viral genome and/or the infectivity of the virus is the number of amplicons in the group with a range of insert sizes that is the largest, to the number of amplicons in the group with a range of insert sizes that is the smallest. In some examples, the range of insert sizes that is the largest is about 300 to about 350 bps, or about 350 to about 400 bps, or about 400 to about 450 bps, or about 450 to about 500 bps, or about 500 to about 550 bps, or about 550 to about 600 bps, or about 600 to about 650 bps, or about 650 to about 700 bps, or about 700 to about 750 bps, or about 750 to about 800 bps, or about 800 to about 850 bps, or about 850 to about 900 bps, or about 900 to about 950 bps, or about 950 to about 1000 bps. In one specific example, the range of insert sizes that is the largest is about 400 to about 520 bps. In some examples, the range of insert sizes that is the second smallest is about 150 to about 200 bps, or about 200 to about 250 bps, or about 250 to about 300 bps, or about 300 to about 350 bps, or about 350 to about 400 bps, or about 400 to about 450 bps, or about 450 to about 500 bps. In one specific example, the range of insert sizes that is the second smallest is about 151 to 350 bps. In some examples, the range of insert sizes that is the smallest is about 0 to about 50 bps, or about 50 to about 100 bps, or about 100 to about 150 bps, or about 150 to about 200 bps, or about 200 to about 250 bps, or about 250 to about 300 bps. In one specific example, the range of insert sizes that is the smallest is about 0 to 150 bps. In one specific example, the range of insert sizes that is the second smallest is about 151 to about 350 bps, and the range of insert sizes that is the smallest is about 0 to about 150 bps.
[0056] In some examples, an insert size ratio of about 1.6, 1.7, 1.8, 1.9, 2, 2.1, 2.2, 2.3, 2.4,
2.5, 2.6, 2.7, 2.8, 2.9, 3, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 10.5, 11, 11.5, 12, 12.5, 13, 13.5, 14, 14.5,
15. 15.5. 16. 16.5. 17. 17.5, 18, 18.5, 19, 19.5, 20, 50, 100, 500, or more, is regarded as a “large” insert size ratio, which indicates that the viral genome has high integrity, and/or that the virus has high infectivity. In one specific example, an insert size ratio of more than about 1.5 is regarded as a “large” insert size ratio, which indicates that the viral genome has high integrity, and/or that the virus has high infectivity. In other examples, an insert size ratio of about 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1, 0.05, 0.01, or less, in regarded as a “small” insert size ratio, which indicates that the viral genome has low integrity, and/or that the virus has low infectivity. In one specific example, an insert size ratio of less than about 0.5 is regarded as a “small” insert size ratio, which indicates that the viral genome has low integrity, and/or that the virus has low infectivity. In some examples, an insert size ratio of about 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5 is regarded as a “medium” insert size ratio, which indicates that the viral genome has medium integrity, and/or that the virus has medium infectivity. In some specific examples, an insert size ratio of about 0.5 to 1.5 is regarded as a “medium” insert size ratio, which indicates that the viral genome has medium integrity, and/or that the virus has medium infectivity.
[0057] In some examples, the viral genome is a viral DNA genome, and the virus is an DNA virus. In some other examples, the viral genome is a viral RNA genome, and the virus is an RNA virus. In some examples, when the viral genome is a viral RNA genome, step a) further comprises reverse-transcribing RNA molecules extracted from the biological sample to generate the nucleic acid library, and wherein the nucleic acid library is a cDNA (complementary DNA) library.
[0058] The term “reverse transcription” and its grammatical variants as used herein refers to the enzyme-mediated synthesis of a DNA molecule from an RNA template. The resulting DNA, known as complementary DNA (cDNA), can be used as a template for PCR amplification. Methods of reverse transcription, which typically involve the use of non-target specific primers (random hexanucleotide primers or hexamers in short), are well known in the art.
[0059] The term “cDNA library” as used herein is well known in the art, and refers to a combination of cloned cDNA fragments inserted into a collection of host cells, which constitute some portion of the transcriptome of the organism and are stored as a "library". cDNA is produced from fully transcribed mRNA and therefore contains only the expressed genes of an organism.
[0060] In some examples, after the reverse transcription step, the cDNA library obtained is purified using methods known in the art (including, for example, column purification and gel purification methods), in order to remove, for example, excess primers. In some examples, the cDNA library is purified to retain all amplicons. In some specific examples, the cDNA library is purified to retain amplicons that are more than about 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190 or 200 bps in size. In one specific example, the cDNA library is purified to retain amplicons that are more than about 100 bps in size.
[0061] In some examples, the viral RNA genome is from a virus selected from the group consisting of: Lymphocytic choriomeningitis virus, Coronavirus, human immunodeficiency virus (HIV), Human metapneumo virus, Poliovirus, Rhinovirus, Hepatitis A, Norwalk virus, Yellow fever virus, West Nile virus, Hepatitis C virus, Dengue fever virus, Enterovirus, Zika virus, Rubella virus, Ross River virus, Sindbis virus, Chikungunya virus, Borna disease virus, Ebola virus, Marburg virus, Measles virus, Mumps virus, Nipah virus, Hendra virus, Newcastle disease virus, Human respiratory syncytial virus, Rabies virus, Lassa virus, Hantavirus, Crimean-Congo hemorrhagic fever virus, Human parainfluenza viruses 1-4, Influenza virus, and Hepatitis D virus. Examples of coronavirus include but are not limited to: Severe acute respiratory syndrome virus (SARS-CoV), Severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1) and Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
[0062] In some examples, the RNA virus is selected from the group consisting of: coronaviridae, flaviviridae (such as hepacivirus and zika virus) and retroviridae (such as lentivirus).
[0063] In some examples, the viral genome is from a virus that causes respiratory tract infection.
[0064] In some examples, the viral genome is from a virus selected from the group consisting of: influenza virus, dengue virus, and coronavirus.
[0065] In some examples, the viral genome is a viral DNA genome from a DNA virus. Examples of DNA virus include but are not limited to: bocavirus, Epstein-Barr virus, Hepatitis B virus, smallpox virus, adenovirus and papillomavirus.
[0066] The method as described herein can also be used to compare the viral genome integrity and/or virus infectivity in different biological samples. For example, the insert size ratio(s) of the biological sample obtained in step i) of the method can be compared to the insert size ratio(s) obtained from a reference sample. Larger insert size ratio(s) relative to the reference sample indicates that the viral genome in the biological sample has a higher integrity or that the virus in the biological sample has a higher infectivity as compared to the reference sample, and/or wherein smaller insert size ratio(s) relative to the reference sample indicates that the viral genome in the biological sample has a lower integrity or that the virus in the biological sample has a lower infectivity as compared to the reference sample. In some examples, the insert size ratio(s) to be compared between the biological sample and the reference sample are calculated based on the same ranges of insert sizes.
[0067] In some examples, the reference sample is a body fluid sample. In some examples, the biological sample is a nasopharyngeal swab sample, an oropharyngeal swab sample, a saliva sample, a sputum sample, a viral culture sample, or mixtures thereof. In some specific examples, the biological sample is a nasopharyngeal swab sample, a nasal swab sample, or mixtures thereof. In some examples, the biological sample is an inactivated cultured viral isolate.
[0068] In some examples, the biological sample and the reference sample are the same type of samples. In some other examples, the biological sample and the reference sample are different types of samples. In some examples, the biological sample is obtained from a subject, and the reference sample is the same type of sample obtained from the same subject. In some examples, the biological sample is obtained from a subject, and the reference sample is the same type of sample obtained from a different subject. In some examples, the biological sample is obtained from a subject, and the reference sample is a different type of sample obtained from the same subject. In some examples, the biological sample is obtained from a subject, and the reference sample is a different type of sample obtained from a different subject. In some examples, the biological sample is obtained from a subject, and the reference sample is obtained from the same subject at one or more different time points. In some examples, the one or more different time points is selected from the group consisting of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 days after the original biological sample is obtained. In some examples, the reference sample is a viral culture sample. In some examples, the reference sample is an inactivated cultured viral isolate.
[0069] In some examples, the insert size ratio of a biological sample obtained from a subject indicates whether the subject is asymptomatic, pre-symptomatic or symptomatic. In some examples, an insert size ratio of about 1.6, 1.7, 1.8, 1.9, 2, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 10.5, 11, 11.5, 12, 12.5, 13, 13.5, 14, 14.5, 15, 15.5, 16, 16.5, 17, 17.5, 18, 18.5, 19, 19.5, 20, 50, 100, 500, or more, indicates that the subject is symptomatic. In one specific example, an insert size ratio of more than about 1.5 indicates that the subject is symptomatic. In other examples, an insert size ratio of about 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1, 0.05, 0.01, or less, indicates that the subject is asymptomatic. In one specific example, an insert size ratio of less than about 0.5 indicates that the subject is asymptomatic. In some examples, an insert size ratio of about 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5 indicates that the subject is pre-symptomatic. In some specific examples, an insert size ratio of about 0.5 to 1.5 indicates that the subject is pre-symptomatic.
[0070] In some examples, the change in insert size ratios of biological samples obtained from a subject over two or more time points is indicative of the course of a viral infection in the subject. In some examples, an increase in insert size ratios of biological samples obtained from a subject over two or more time points indicates that the integrity of the viral genome is increasing and/or that the infectivity of the virus is increasing. In contrast, in some other examples, a decrease in insert size ratios of biological samples obtained from a subject over two or more time points indicates that the integrity of the viral genome is decreasing and/or that the infectivity of the virus is creasing.
[0071] In some examples, the method as described above uses next-generation sequencing (NGS) for the simultaneous analysis of the plurality of viral nucleic acid molecules in the biological sample. The inventors of the present disclosure have also found that results from the NGS method can be validated using real-time PCR assays.
[0072] Thus, in some examples, the method as described above further comprises the following steps: a) selecting a first amplicon having a first insert, selecting a second amplicon having a second insert, wherein the first insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the smallest, and wherein the second insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the second smallest; b) providing a first pair of primers comprising a forward primer and a reverse primer, wherein the forward primer and reverse primer are designed for amplifying the first insert; c) providing a second pair of primers comprising a forward primer and a reverse primer, wherein the forward primer and reverse primer are designed for amplifying the second insert; d) providing viral nucleic acid molecules extracted from the biological sample; e) subjecting the viral nucleic acid molecules to amplification using the first pair of primers and the second pair of primers; f) enumerating the number of the first amplicons and the number of the second amplicons; g) obtaining an intactness ratio of the number of the second amplicons to the number of the first amplicons; h) determining the intactness of the viral genome based on the intactness ratio, wherein a large intactness ratio indicates that the viral genome in the biological sample has a high intactness, and/or wherein a small intactness ratio indicates that the viral genome in the biological sample has a low intactness.
[0073] As is well known in the art, a real-time polymerase chain reaction (real-time PCR) is also known as quantitative polymerase chain reaction (qPCR). It monitors the amplification of a targeted nucleic acid molecule during the PCR in real time. Real-time PCR is characterized by the point in time (or PCR cycle) where the target amplification is first detected. This value is usually referred to as cycle threshold (Ct), the time at which fluorescence intensity is greater than background fluorescence. Consequently, the greater the quantity of target nucleic acid in the sample, the faster a significant increase in fluorescent signal will appear, yielding a lower Ct value.
[0074] In some examples, enumerating the number of the first amplicons and the number of the second amplicons in step f) above comprises obtaining the Ct values for the first and second amplicons. For example, the Ct value for the first amplicon can be denoted as Cti, and the Ct value for the second amplicon can be denoted as Ct2.
[0075] The term “intactness ratio” as used herein represents the inferred intactness of target template molecules in an real-time PCR assay and is based on the relative abundance of two amplicons with different insert sizes. The “intactness ratio” is therefore related to the “insert size ratio”. The “intactness ratio” can also be used to infer the integrity of a nucleic acid template or nucleic acid starting material, or a viral genome. Based on the values of Cti and Ct2, a person skilled in the art can calculate the intactness ratio as defined above using available knowledge in the art. In some examples, the intactness ratio of the number of the second amplicons to the number of the first amplicons is calculated using the following formula:
Intactness ratio = Cti Ct2 + constant
The “constant” is an optional scaling factor, a positive value, added to ( Cti-Ct2 ) to convert all values to a positive scale. Cti will typically be numerically smaller than Ct2, resulting in negative values most of the time (see for example, Fig. 6 of VFI). The “constant” is empirically determined and is added to the difference to scale all values to be positive.
[0076] In some examples, an intactness ratio of about 1.6, 1.7, 1.8, 1.9, 2, 2.1, 2.2, 2.3, 2.4,
2.5, 2.6, 2.7, 2.8, 2.9, 3, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 4, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10, 10.5, 11, 11.5, 12, 12.5, 13, 13.5, 14, 14.5,
15. 15.5. 16. 16.5. 17. 17.5, 18, 18.5, 19, 19.5, 20, 50, 100, 500, or more, is regarded as a “large” intactness ratio, which indicates that the viral genome has high integrity, and/or that the virus has high infectivity. In one specific example, an intactness size ratio of more than about 1.5 is regarded as a “large” intactness size ratio, which indicates that the viral genome has high integrity, and/or that the virus has high infectivity. In other examples, an intactness ratio of about 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1, 0.05, 0.01, or less, in regarded as a “small” intactness ratio, which indicates that the viral genome has low integrity, and/or that the virus has low infectivity. In one specific example, an intactness ratio of less than about 0.5 is regarded as a “small intactness ratio, which indicates that the viral genome has low integrity, and/or that the virus has low infectivity. In some examples, an intactness ratio of about 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5 is regarded as a “medium” intactness ratio, which indicates that the viral genome has medium integrity, and/or that the virus has medium infectivity. In some specific examples, an intactness ratio of about 0.5 to 1.5 is regarded as a “medium” intactness ratio, which indicates that the viral genome has medium integrity, and/or that the virus has medium infectivity.
[0077] In some examples, the first amplicon has a size of between 20 to 30 bps, or between 30 to 40 bps, or between 40 to 50 bps, or between 50 to 60 bps, or between 60 to 70 bps, or between 70 to 80 bps, or between 80 to 90 bps, or between 90 to 100 bps, or between 100 to 110 bps, or between 110 to 120 bps, or between 120 to 130 bps, or between 130 to 140 bps, or between 140 to 150 bps, or between 150 to 160 bps, or between 160 to 170 bps, or between 170 to 180 bps, or between 180 to 190 bps, or between 190 to 200 bps. In some specific examples, the first amplicon has a size of between 65 to 75 bps. In some examples, the second amplicon has a size of between 200 to 210 bps, or between 210 to 220 bps, or between 220 to 230 bps, or between 230 to 240 bps, or between 240 to 250 bps, or between 250 to 260 bps, or between 260 to 270 bps, or between 270 to 280 bps, or between 280 to 290 bps, or between 290 to 300 bps, or between 300 to 310 bps, or between 310 to 320 bps, or between 320 to 330 bps, or between 330 to 340 bps, or between 340 to 350 bps, or between 350 to 360 bps, or between 360 to 370 bps, or between 370 to 380 bps, or between 380 to 390 bps, or between 390 to 400 bps, or between 400 to 410 bps, or between 410 to 420 bps, or between 420 to 430 bps, or between 430 to 440 bps, or between 440 to 450 bps, or between 450 to 460 bps, or between 460 to 470 bps, or between 470 to 480 bps, or between 480 to 490 bps, or between 490 to 500 bps. In some specific examples, the second amplicon has a size of between 220 to 250 bps. [0078] In some examples, when the viral genome is a viral RNA genome, step d) above further comprises reverse-transcribing the RNA molecules extracted from the biological sample to generate cDNA prior to amplification.
[0079] In some examples, the real-time PCR assay as mentioned above can also be used as a stand-alone assay to determine the integrity of a viral genome or infectivity of a virus in a biological sample. In such examples, the first and second amplicons to be used for the real time PCR assay will be selected based on the results of the next-generation sequencing (NGS) method of the first aspect.
[0080] Thus, in one example, the present disclosure provides a method of determining integrity of a viral genome or infectivity of a virus in a biological sample, said method comprises: a) extracting viral nucleic acid molecules from the biological sample; b) selecting a first amplicon having a first insert, selecting a second amplicon having a second insert, wherein the first insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the smallest, and wherein the second insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the second smallest; c) providing a first pair of primers comprising a forward primer and a reverse primer, wherein the forward primer and reverse primer are designed for amplifying the first insert; d) providing a second pair of primers comprising a forward primer and a reverse primer, wherein the forward primer and reverse primer are designed for amplifying the second insert; e) subjecting the viral nucleic acid molecules obtained from a) to amplification using the first pair of primers and the second pair of primers; f) enumerating the number of the first amplicons and the number of the second amplicons; g) obtaining an intactness ratio of the number of the second amplicons to the number of the first amplicons; h) determining the intactness of the viral genome based on the intactness ratio, wherein a large intactness ratio indicates that the viral genome in the biological sample has a high intactness, and/or wherein a small intactness ratio indicates that the viral genome in the biological sample has a low intactness; wherein the range of insert sizes of the group with the range of insert sizes that is the smallest, and the range of insert sizes of the group with the range of insert sizes that is the second smallest, are determined using a reference sample, using the following method: i) extracting viral nucleic acid molecules from the reference sample to generate a nucleic acid library; ii) providing a plurality of primer pairs wherein, (1) each primer pair comprises a forward primer and a reverse primer, wherein the forward primer comprises, from the 5’ end to the 3’ end, a first adapter sequence (AS1), and a forward target-specific sequence (TS1); wherein the reverse primer comprises, from the 5’ end to the 3’ end, a second adapter sequence (AS2), and a reverse target specific sequence (TS2); (2) the first adapter sequence (AS1) of each forward primer is the same, the second adapter sequence (AS2) of each reverse primer is the same; (3) each primer pair with a unique combination of forward and reverse target sequences (TS) is for amplifying a different target region of the viral genome; iii) subjecting the nucleic acid library in i) to multiplex PCR using the plurality of primer pairs to amplify a plurality of amplicons; iv) subjecting the plurality of amplicons to amplification and sequencing, to obtain amplicon sequences; v) detecting and mapping the amplicon sequences to a reference genome of the viral genome; vi) determining the insert size of each amplicon, wherein the insert size is the number of nucleotide bases or base pairs between the forward and reverse primer of each amplicon; vii) categorizing the amplicons into groups based on insert size, wherein each group comprises a range of insert sizes; and viii) determining the range of insert sizes of the group with the range of insert sizes that is the smallest, and the range of insert sizes of the group with the range of insert sizes that is the second smallest. In some examples, the reference sample and the biological sample contain the same virus, or viral genome from the same viral species.
[0081] The present disclosure also provides a kit for use in the detection method as described herein. Thus, in one aspect, there is provided a kit for determining the integrity of a viral genome or infectivity of a virus in a biological sample, the kit comprising: a) a first pair of primers comprising a forward primer and a reverse primer designed for amplifying a first amplicon having a first insert, wherein the first insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the smallest as determined by the method as described herein; b) a second pair of primers comprising a forward primer and a reverse primer designed for amplifying a second amplicon having a second insert, wherein the second insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the second smallest as determined by the method as described herein; c) a first probe to detect the first amplicon; and d) a second probe to detect the second amplicon.
[0082] In some examples, the first pair of primers comprise the sequences: GAT AGGTT GATC AC AGGC AGAC (S_gene_5d_fwd) (SEQ ID NO: 4) and AGTTCTTTTCTTGTGCAGGGAC (S_gene_5d_rev) (SEQ ID NO: 5); wherein the second pair of primers comprise the sequences: GCGCATTGGCATGGAAGTC (Setl_N_3_fwd) (SEQ ID NO: 6) and GTCATCCAATTTGATGGCACCTG (Setl_N_3_rev) (SEQ ID NO: 7); wherein the first probe comprises the sequence CCCTCAGTCAGCACCTCATGGTGT (S_gene_5_probe) (SEQ ID NO: 8); and wherein the second probe comprises the sequence CCTTCGGGAACGTGGTTGACCT AC (Setl_N_3_probe) (SEQ ID NO: 9). In one example, the first primer pair is used for amplifying the longer amplicon.
[0083] In some examples, the kit further comprises reagents useful for reverse-transcription. In some specific examples, the kit further comprises reagents useful for polymerase chain reaction (PCR). In some specific examples, the kit further comprises reagents useful for real time polymerase chain reaction (PCR).
[0084] In some examples, the kit further comprises instructions for use.
[0085] The invention illustratively described herein may suitably be practiced in the absence of any element or elements, limitation or limitations, not specifically disclosed herein. Thus, for example, the terms "comprising", "including", "containing", etc. shall be read expansively and without limitation. Additionally, the terms and expressions employed herein have been used as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding any equivalents of the features shown and described or portions thereof, but it is recognized that various modifications are possible within the scope of the invention claimed. Thus, it should be understood that although the present invention has been specifically disclosed by preferred embodiments and optional features, modification and variation of the inventions embodied therein herein disclosed may be resorted to by those skilled in the art, and that such modifications and variations are considered to be within the scope of this invention.
[0086] It will be understood by those skilled in the art that a wide variety of methods and techniques known in the art may be used in carrying out certain embodiments of the present invention.
[0087] Throughout this disclosure, certain embodiments may be disclosed in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosed ranges. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
[0088] The invention has been described broadly and generically herein. Each of the narrower species and subgeneric groupings falling within the generic disclosure also form part of the invention. This includes the generic description of the invention with a proviso or negative limitation removing any subject matter from the genus, regardless of whether or not the excised material is specifically recited herein.
[0089] Other embodiments are within the following claims and non- limiting examples. In addition, where features or aspects of the invention are described in terms of Markush groups, those skilled in the art will recognize that the invention is also thereby described in terms of any individual member or subgroup of members of the Markush group.
EXPERIMENTAL SECTION [0090] Materials and Methods [0091] Sample collection
[0092] Samples used in this study were collected as part of a previous study conducted in a large cohort of migrant workers in Singapore, and evaluated and approved by the Director of Medical Services, Ministry of Health, under Singapore’s Infectious Disease Act. A total of 155 samples, corresponding to 48 individuals, were analyzed by NGS. The samples included 37 nasopharyngeal (NP) swabs collected in 3 mL of viral transport medium, 60 self-nasal swabs, and 58 oropharyngeal saliva samples collected in 2 mL of viral RNA stabilization fluid (SAFER™ VTM, Lucence, Singapore). The majority of the cases (35 of 48, 72.9%) were symptomatic at the time of first sample collection, one case was presymptomatic at time of first sampling, and 12 cases (25%) were asymptomatic throughout the study. For asymptomatic cases, the day of the first diagnostic test was used to determine days since diagnosis. For 17 cases (35.4%), only a single sample was available for sequencing, and for the remaining 31 cases (64.5%), between 2 and 9 samples were studied. For 5 cases (10.41%), samples from different sites (NP, nasal swab or saliva) were available but from the same timepoint (not longitudinal). All samples had previously been characterized for presence of SARS-CoV-2 using an RT-PCR assay using primers and probes sequences from the CDC 2019-nCoV Real- Time RT-PCR Diagnostic Panel, henceforth referred to as CDC RTPCR assay. Samples were selected for NGS based on SARS-CoV-2 positivity at at least one of 2 or 3 timepoints of sample collection and relatively high viral loads as estimated by the RT-PCR assay. Archived extracted RNA from anonymized samples were used for this study.
[0093] Sample processing
[0094] Viral nucleic acid is extracted from the specimens (200 or 400 mΐ) using Viral Nucleic Acid Extraction Kit II (Geneaid Biotech Ltd., Taiwan) or QIAsymphony DSP Virus/Pathogen Kit (QIAGEN). For each sample, aliquots of extracted nucleic acid (RNA) was screened for presence of SARS-CoV-2 RNA using real-time reverse transcription-polymerase chain reaction (RT-PCR) targeting the N1 and N2 markers as specified by Centers for Disease Control and Prevention (CDC), and also processed for library preparation for NGS. Samples were processed in a College of American Pathologists (CAP) accredited and a Clinical Laboratory Improvement Amendments (CLIA) licensed laboratory (Lucence).
[0095] Analytical validation of SARS-CoV-2 NGS method
[0096] Lower limit of detection (sensitivity) of the NGS method was determined using synthetic SARS-CoV-2 RNA spiked into pooled clinical matrix that was tested negative for the virus by a RT-PCR method. For the analytical validation of NGS, synthetic SARS-CoV-2 RNA genomes (Twist Bioscience, San Francisco, USA) of known concentrations were spiked in to pooled negative clinical matrix (nasopharyngeal swab specimens) at known copy numbers prior to RNA extraction. The synthetic controls corresponded to these GenBank IDs of various published SARS-CoV-2 genome isolates. MT007544.1 (SKU: 102019), MN908947.3 (SKU: 102024), LC528232.1 (SKU: 102860), MT106054.1 (SKU: 102862), MT188340.1 (SKU: 102917), MT118835.1 (SKU: 102918). The threshold for detection of SARS-CoV-2 was determined by comparing the genome coverage (%) resulting from confirmed negative samples (by RT-PCR), no-template controls, and known positive samples (by RT-PCR).
[0097] Design of multiplex PCR panel for SARS-CoV-2 genome [0098] A multiplex amplicon-based next-generation sequencing (NGS) platform was developed to sequence the entire SARS-CoV-2 genome. Primers for 327 amplicons were designed to span the entire SARS-CoV-2 genome in tiled configuration, and alternately assigned to two separate primer pools (Pool 1 and Pool 2) to allow amplification of the whole genome while minimizing formation of short overlapping amplicons. Five primer pairs designed to target five different human housekeeping genes (TBP, MYC, LRP1, ITGB7, and HMBS) are used as control in both pools. Each forward primer additionally includes on the 5’ end, a random 10 nucleotide sequence to serve as molecular barcode. Each designed primer was checked against published SARS CoV-2 genomes as of 15 April 2020, and base degeneracy was incorporated when required to achieve coverage of >99% of Asian, USA, Europe and Chinese genomes published.
[0099] Preparation of library for NGS of SARS-CoV-2
[00100] Based on the CDC RT-PCR results, extracted nucleic acid was diluted 10-fold for samples with cycle threshold (Ct) <20 or undiluted for samples with Ct>20, followed by reverse transcription using High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems, USA) to generate complementary DNA (cDNA). A synthetic spike-in RNA control was added to the sample just before cDNA synthesis. The RNA control comprises a nucleotide sequence which is not found in a genome or a transcriptome of the virus, and is further not found in a genome or a transcriptome of a human, and is further not found in a genome or a transcriptome of a microorganism. Each cDNA was split into two reactions for target capture and enrichment of Pool 1 and Pool 2 using Platinum SuperFi II DNA Polymerase (Invitrogen, USA) under the following thermocycling conditions: Denaturation at 980C for 30s, followed by 5 to 15 cycles (5 cycles for samples with Ct <15, 10 cycles for samples with Ct 15-30, 15 cycles for samples with Ct >30) of 98°C for 1 min, 60°C for 1 min, and 72°C for 1 min, with final extension at 72°C for 5 min. At the end of the reaction, excess primers were removed by purification two times with 1.5x AMPure XP beads (Beckman Coulter, US A). Purified products were subjected to a final PCR to amplify targets and to complete the library with indexed sequencing adaptors for sequencing on the Ihumina platform. Briefly, purified product was amplified with indexed P5 adapter sequence and indexed P7 adapter sequence using KAPA HiFi HotStart ReadyMix under the following thermocycling conditions: Denaturation at 98°C for 45 s, followed by 18 cycles of 98 °C for 15 s, 60°C for 30 s, and 72°C for 30 s, with a final extension at 72°C for 1 min. The amplified library was purified twice with 0.7x AMPure XP beads to remove non specific products. The quality and quantity of the sequencing library was assessed using the 4200 Tapestation system (Agilent Technologies, USA) and KAPA Library Quantification Kit for Ihumina® Platforms (Kapa Biosystems Inc., USA) respectively. Paired-end sequencing (2x151 bps) of the final dual-indexed libraries was performed on the Ihumina platform as per manufacturer’s instructions.
[00101] Primary sequencing data analysis [00102] FASTQ files were processed using a custom pipeline. First, expected amplicons were identified and labeled in the FASTQ files based on the expected primer sequences in Read 1 and paired Read 2. Primer sequences and upstream molecular barcode sequences were trimmed using cutadapt, primer trimmed sequences were mapped to the reference genome SARS-CoV-2 reference genome (NCBI Reference Sequence: NC_045512.2) using bwamem. Molecular tag (or barcode) sequences were included in the trimmed “primer” sequences of read 1, and can be extracted given the unique structure of primer sequences in read 1. Barcode sequences were clustered based on sequence and amplicon identity, and consensus calling was done for each molecular tag (or barcode) cluster, by first performing global alignment among all associated reads using MAFFT. The consensus base in each aligned position was called by determining the majority representative base type, the percentage of which should be no less than an automatically determined threshold. The threshold is a function of the total number of reads for that barcode sequence. If no representative base could be called, the position was assigned N (as opposed to one of A, C, T, G). An overall quality score, of either 90th percentile of all the quality values from the representative base type in that position (if a consensus base is found), or 10th percentile of all quality values in that position (if no consensus bases is found) is assigned. The consensus reads are then written to a new FASTQ file.
[00103] Variant calling, subgenomic RNA analysis and phylogenetic analysis [00104] First, consensus FASTQ files from two pool 1 and pool 2 were merged to create a single FASTQ file. Consensus FASTQ reads are mapped to the SARS-CoV-2 reference genome (NCBI Reference Sequence: NC_045512.2) using bwa-mem. Samtools was used to calculate depth per base and coverage metrics. Variant calling was performed on consensus BAM files using snippy and freebayes. For variants to be called, a minimum of lOx coverage is required. With molecular barcoding, the sequencing is error-free and can enable detection of quasi-species and increase confidence of variant calls due to the high quality of sequencing data. Detection of subgenomic RNA (sgRNA) was done by analysis of split reads supported by multiple consensus reads. Chimeric reads in which non-contiguous regions of the genome are captured within a read are identified as split reads and have 1) unexpected combination of primers which resulted in the amplicon, 2) presence of junction sites at which fusion occurs and 3) presence of a junction core sequence for coronaviruses which is exemplified by the sequence - CTAAACGAAC (SEQ ID NO: 10) - within the read. The detection of canonical sgRNA was defined by the presence of the 5’ end leader sequence of the genome and downstream ORFs on the 3’end of the genome. Due to the design of the primers and full coverage inherent in the primer panel design, all canonical sgRNA can be detected. The frequency of each sgRNA was calculated as the number of split reads supporting sgRNA/number all reads supporting the location spanning the junction 3’ to the junction. For comparison across samples and across different species of sgRNA, the sgRNA split read count was normalized to the mean depth of coverage for the sample. For comparing total sgRNA, the sum of all sgRNA read counts was normalized to the mean depth of coverage for the sample. FASTA file of the assembled genome are used to construct phylogenetic trees. The phylogenetic tree building process and parameters follows the NextS train team’s repository at https://github.eoTn/nextstrain/ncov·
[00105] Insert length analysis and calculation of viral fragmentation score
[00106] For each positive sample with sequencing data, SAMtools was used to capture insert sizes from alignment files with the following specifications: samtools view -f65 -F2048 SampleXYZ_consensus.bam | cut -f 1,4,9 > SampleXYZ_f65F2048.txt where f65 = filter in read paired, first in pair (Reads which are paired and insert sizes for first read in read pair and second in read in read pair will have same insert size with “negative” length); F2048 = filter out reads with supplementary alignment (removes most subgenomic RNA reads which have part of reads with supplementary alignment and removes chimeric reads). Finally inserts of size 70 to 1000 are retained to get representation of expected insert sizes as no insert is expected to be shorter than 72 bps (design of inserts) and remove extremely long inserts.
[00107] Binning of inserts was done based on the expected and observed insert sizes (see Fig. 7 and Fig. 8) into 0-150 bps (short) and 151-350 bps (long) ranges. Short and long insert counts were determined for each sample, representing short and long viral fragments, respectively. The viral fragmentation score (i.e. insert size ratio) is then derived as (number of fragments of size 151-350 bps)/(number of fragments of size 0-150 bps). This represents the approximate abundance of “long” inserts relative to the “short” inserts.
[00108] In the present study, it was shown that the method of preparation of sequencing libraries by highly-multiplexed PCR, routinely generated amplicons of variable lengths in a variety of clinical samples tested for SARS-CoV-2 RNA. In Fig. 7, this is shown in the visualization of sequencing read data aligned to the SARS-CoV-2 genome with inserts of length 113 bps and inserts of length 274 bps being formed by the same forward primer in combination with two distinct reverse primers. [00109] This is further illustrated in Fig. 8, as a histogram of insert size count data obtained from two unrelated samples. The inserts fall into three main predetermined size ranges of 70- 150 bps, 220-350 bps, and 400-520 bps. In Fig. 8B, an insert size range of 570-700 bps can be discerned and is suggestive of a limited amount of still longer template molecules being available for long insert formation by distant (3-displaced) subsequent primers. The three major ranges represent inserts formed from the generalizable F_R (intended), F_R+1, F_R+2 primer combinations. It is also noted that even as the total number of inserts captured changes (overall abundance of template molecules of all starting sizes), the two ranges that are most captured and subject to the greater variance are F_R and F_R+1. In other words, inserts in ranges of F_R+2, F_R+3 and upwards are of relatively diminishing abundance among total inserts formed (Fig. 8C and 8D).
[00110] In the present study, the insert size information was further formally quantified to determine the patterns and degrees of intactness (conversely, degradation) of a viral RNA template in clinical samples. As described in the previous paragraphs, the insert size ranges formed from the F_R combination and F_R+1 combination of primers dominate among insert sizes due to 1) scarcer availability of ultra-long template molecules (capturable by F_R+2, F_R+3.), and 2) competition for resources in PCR which makes shorter inserts reliably more detectable in multiplex PCR-based NGS assays. Nonetheless, despite the anticipated insert size distribution stemming from this rationalization, it was clear that not all clinical samples followed the expected pattern of insert size abundance of F_R>>F_R+1 (see Fig. 8B). Therefore, this was formally quantified as an “insert size ratio” defined as the (number of longer inserts)/(number of shorter inserts), for example, (number of inserts of size 151-350 bps)/(number of inserts of size 0-150 bps). Examples of the insert size ratio values are shown in Fig. 9.
[00111] In addition, the present study has also shown (see Fig. 11A and Fig. 11B) that insert size ratio is correlated with infectivity, as the insert size ratios generally show a decreasing trend over sampling timepoint across cases (timepoint 1 is the first day of sampling, where timepoint 1 is the first day of onset of symptoms for symptomatic cases; timepoints 2 and 3 correspond to 4 days and 7 days after timepoint l).The period of infectiousness/infectivity (defined as the time interval during which SARS-CoV-2 may be transferred from an infected person to another person), based on viable virus culture from clinical specimens, has been identified as six days before, to nine days after, the first evidence of typical symptoms (for asymptomatic cases this infectivity period is not defined). For symptomatic cases, there is evidence of a reduction in infectivity 7-10 days after onset of symptoms. For specific cases, insert size ratio showed distinct trends of increase, decrease or remaining stably low, depending on the disease course - presymptomatic, asymptomatic or symptomatic (see Fig. 11C). [00112] Further, the data in Table 1 below shows that insert size ratio ranges indicating high
(>1.5), medium (0.5-1.5), low integrity (<0.5) were associated with the time between symptom onset and time of sampling for SARS-CoV-2 test, suggesting degree of symptom (and therefore, infectivity) correlated with intactness of RNA. Individuals with high or medium integrity of RNA had shorter time between sampling and symptom onset, compared to those who had RNA that was of low integrity at time of testing.
Table 1. (A) Insert size ratios and (B) the time between symptom onset and sampling for SARS-CoV-2 test
A
Figure imgf000034_0001
B
Figure imgf000035_0001
[00113] In summary, in the present study, the insert size ratio derived from multiplexed PCR- based assay is identified as a measure of the integrity or intactness of viral RNA template, which could reflect the infectivity potential of the virus in a particular sample.
[00114] Multiplex RT-PCR assay for simultaneous detection of fragments of different lengths
[00115] A multiplex three-target RT-PCR assay was designed for the simultaneous detection of two short targets of 70 bps each in the N gene region (in two separate detection channels), and one long amplicon of 240 bps in the S gene of the SARS-CoV-2 genome. An internal housekeeping control target was included as fourth target in the multiplex reaction. Briefly, extracted RNA (5 mΐ) was mixed with the primers/probes for the multiplexed targets with Luna® Universal Probe One-Step RT-qPCR Kit (New England Biolabs, USA) and RT-PCR was performed using recommended protocol for the Luna master mix on a BioRad CFX96. Cycle threshold (Ct) values for each target were collected from respective channels and compared.
[00116] RT-PCR assay for long and short amplicon sizes
[00117] Two separate one-step RT-PCR assays were designed, one for an amplicon of size 253 bps (insert size 206 bps) and another for an amplicon size 66 bps (insert size 18 bps) for SARS-CoV-2 genome in the S gene. The assay was run separately (not in multiplex) and products of amplification were analyzed by Agilent Tapestation. Product amounts were quantified by the region molarity of the expected product sizes. Ratio of product amounts (253 bp/66 bps) for each sample was calculated based on the region molarities corresponding to each product formed in the sample. The results are shown in Fig. 10.
[00118] Statistical Methods [00119] All datapoints are described individually with median and interquartile ranges, where appropriate. For comparison between groups, the Mann- Whitney unpaired t-test was used. Regression analysis and curve fitting was done using Prism 8.0.1.
[00120] Results
[00121] SARS-CoV-2 RNA fragments of differing lengths are captured by NGS
[00122] It was observed that one of the consequences of tiling multiple primer pairs (in subsequence) to capture the entire known target template of SARS-CoV-2 in a highly multiplexed PCR based NGS method, was the capture of longer inserts in addition to the inserts expected to be formed from amplicons of length 130-178 bps for which primer pairs were originally designed (Fig. 1A). The origin of the longer inserts was the formation of amplicons between a given forward primer and a reverse primer that is 1-, 2-, 3-, 4- and so on-displaced in the subsequence of primer pairs in the pool of primers (Fig. IB). The inventors reasoned that the more degraded the viral RNA template was, the more the capture of short inserts (representing short fragments) would be favoured, due to the unavailability of intact RNA template for longer amplicons to form (Fig. 1C). Analysis of the sequencing data for the distribution of insert lengths per sample showed that 2 to 3 distinct ranges of insert lengths could be captured per sample, and the number of sequencing reads supporting inserts in each insert length group could vary substantially between samples (Fig. 2A). Based on the dominance of inserts of either 70-150 bps or 220-350 bps (and not other longer insert length groups) in the sequencing data, these two insert length groups were selected for further study and were referred to as “short” and “long” fragments, respectively. The cumulative numbers of short and long fragments (all reads mapping to SARS-CoV-2 from all potential primer combinations across the targeted genome) for samples were determined. It is noted that the sizes of the amplicons for N1 and N2 targets from the CDC RT-PCR assay initially used to characterize the samples are 72 bps and 67 bps, allowing sensitive detection of even very small fragments of virus RNA. It follows that such an RT-PCR assay would not be able to distinguish between long and short fragments that may exist in a given sample, but would give a cumulative readout comprising all fragments. In the 144 samples tested positive for SARS-CoV-2 by the CDC RT-PCR assay, the sequencing read counts for long and short fragments were each correlated with Ct value as expected, diminishing as the Ct value increased or presumed viral load decreased. However, the longer fragments had a steeper change in abundance relative to Ct value, become disproportionately fewer relative to short fragments in samples with low viral load reflected by later Ct values (Fig. 2B). It was hypothesized that given a mass of viral RNA (a mix of intact and degraded viral RNA), measurable as a Ct value by an assay like the CDC RT-PCR assay with very short amplicons, that at later Ct values, previously long fragments have been converted to short fragments due to degradation of viral RNA, and contribute in turn to the observed read counts for short fragments which continue to maintain a relatively higher count from this contribution. Consequently, degradation of viral RNA more rapidly leads to the diminishing read counts for long fragments observed at late Ct values. This suggests that counting long fragments or the relative distribution of long to short fragments would provide a more accurate read-out for the presence, if any, of intact viral RNA in samples.
[00123] A viral fragmentation score (VFS) was devised to capture this variability in distribution of fragment lengths. VFS equals to the long to short fragment count ratio. VFS showed a moderate correlation with Ct (R2 = 0.65 in NP and nasal swabs, and R2 = 0.74 in saliva), suggesting relative abundances of long and short fragments within a sample are not captured fully by its Ct value, despite detection of long fragments diminishing with increasing Ct values (Fig. 2C). For 5 samples sequencing libraries were repeated from RNA extracts and VFS were shown to be within reproducible range for the same sample (Table 2). To rule out a fully Ct value-dependent behaviour of VFS, intact synthetic SARS-CoV-2 RNA genomes was used to prepare sequencing libraries. Serial dilutions of synthetic RNA were spiked into negative clinical matrix and extracted via routine procedures. Synthetic RNA mimics SARS- CoV-2 genome but is composed of 6 fragments of 5000 bps each representing the genome, which was considered as a relatively intact starting material. For the amounts of synthetic RNA spiked in - 250000, 25000, 2500 and 500 copies/ml - the Ct values measured by the same CDC RTPCR assay were on average 24.4, 27.97, 31.73 and 33.45, respectively. Since the RNA in the serial dilutions are similarly intact, being derived from the same stock of synthetic RNA, the increase in Ct values are attributable only to the total copy numbers of RNA detectable by the RT-PCR assay. However, the VFS for the serial dilutions of synthetic RNA do not show the same degree of decrease as the trend for increasing Ct values, and ranged with a median: 0.384 (IQR: 0.289-0.449) across all 4 serial dilutions (Fig. 2C, Fig. 3). This experiment shows that factors besides total viral RNA amount measured by a Ct value (with short PCR amplicons) determine the VFS, which depends on the presence of long fragments, more of which is likely to be present in a sample with more intact viral RNA. This is supported by the wide variation in VFS that was observed in clinical samples, which was higher in saliva samples relative to URT samples (p = 0.0035) and ranged from 0.0433 to 184.9 in saliva samples (n =58, median: 0.267, IQR: 0.113-0.68) and from 0.02073 to 14.38 in URT samples (n =81, median: 0.124, IQR: 0.074-0.246) (Fig. 4D). On the basis of VFS, on average, saliva samples in this study had less fragmented SARS-CoV-2 RNA.
Table 2. Reproducibility of viral fragmentation score measured by NGS in 5 clinical samples.
Figure imgf000038_0001
[00124] Overall, these results support the presence of a wide range of fragmentation profiles in clinical samples which is readily detectable by NGS.
[00125] Correlation of viral fragmentation scores (VFS) with clinically relevant measures of infectiousness
[00126] As the NGS method described herein was able to quantify the relative abundances of short and long fragments representing less and more intact virus, respectively, next it was determined whether VFS was related to other clinical indications of infectivity. This is particularly relevant as it is well-recognised that RT-PCR positivity alone, particularly from a test designed for short targets, does not translate to viable virus with infection potential. The inventors have already shown that VFS is moderately correlated with Ct value (Fig. 2C), which in itself is considered a proxy for infectious potential of SARS-CoV-2 in clinical samples based on the culturability of virus isolates, although with widely differing Ct cut-offs. In multiple studies, no live virus has been culturable after eight days in non-immunocompromised patients. The typical clinical profile of the symptomatic cases in this study was immunocompetent with presentation of mild to moderate respiratory symptoms. Compared directly, VFS and days since symptom onset was not strongly correlated in both URT and saliva samples over the full duration of the symptom onset (days -1 to day 14) (Fig. 4A and Fig. 4B). Fiowever, transposing a VFS derived from a sample with known integrity (i.e. synthetic RNA, see Fig. 2C), subject to the same extraction and sequencing procedure as the clinical samples, a cut-off of 0.382 was selected to separate samples with intact RNA from those without. It was noted that with this cut-off value, no URT samples (total 69) collected >8 days since symptoms (n= 36) qualified as having intact RNA (Fig. 4A, top left quadrant). In saliva samples (total 46), only 1 of 19 samples (5.26%) collected >8 days since symptoms had a VFS indicative of intact viral RNA (Fig. 4B). Of 37 URT samples collected <8 days, 15 (40.5%), and of 29 saliva samples collected <8 days, 16 (55%) had VFS greater than 0.382 suggesting intact RNA profiles. These results suggest about 50% of samples collected <8 days have relatively fragmented RNA, in accordance with the observations made in previous studies where about 29% - 42% of samples with high viral load inferred from low Ct or from <8 days did not recover live virus. Among asymptomatic cases, 25 of 26 samples (96%) had low VFS below the 0.382 cut-off, from 0 to 8 days from diagnosis, suggesting that viral RNA from most asymptomatic cases is fragmented (Fig.4C). For one case, VFS 3.36 of saliva sample collected on the day of diagnosis was matched with URT samples (NP and nasal swabs) from the same case with VFS of 0.79 and 0.14, suggesting a less fragmented SARS-CoV-2 RNA in saliva even at the same timepoint. Overall, VFS differed significantly between samples collected <8 days from symptoms and samples collected after 8 days from symptoms, for cases with information on symptom onset (Fig. 4D, left), but it was difficult to find a time measure that trended with VFS for asymptomatic cases (Fig. 4D, right). For cases with multiple longitudinal samples, VFS were tracked with respect to time from symptom onset, and were generally observed to be higher early between days -1 to 6, and generally decreased with time, reaching levels similar to those seen in asymptomatic cases through the course of the asymptomatic infection (Fig. 4E). [00127] No linear correlation was observed between increasing VFS and total sgRNA abundance (Fig.5A), but there was a significant difference in VFS of samples with any sgRNA compared to those with none, showing a dichotomous distribution (p<0.0001) (Fig. 5B). The dichotomy of distribution was related to the VFS cut-off of 0.382 determined earlier. All 33 samples with VFS >0.382 had some sgRNA expression, whereas 58 of 107 samples (54%) with VFS <0.382 had no sgRNA (Fig. 5A and 5B). When individual species of sgRNA were separately considered, interestingly, there was an inverse correlation with the VFS for S and N genes only, and not for other most abundant sgRNA M gene and ORF3a (Fig. 5C). No other sgRNA species showed a correlation with VFS (not shown), suggesting a specific effect of intact RNA on the abundance of S and N genes expression in clinical samples.
[00128] Finally, it was demonstrated that VFS can be translated into a simple multiplexed RT-PCR assay deliberately composed of a long amplicon of 240 bps and short amplicon of 70 bps to detect the long and short fragments observed in clinical samples using NGS. In a set of 20 clinical samples, a good correlation was seen between VFS from NGS and from the difference in Ct signal of short 70 bps and Ct signal of long 240 bps amplicon, here referred to as a viral fragmentation index (VFI) (Fig. 6A). For 7 samples with low VFS, 240bps amplicon was not detectable by PCR, potentially reflecting the higher sensitivity of NGS for detection due to the interrogation of the whole genome with multiple targets, but still in line with the expectation of continued detection of short 70 bps amplicon in these samples. For 13 samples with detection of both 70 and 240 bps by RT-PCR, the delta Ct (70 bps - 240 bps) VFI was correlated with the VFS (R2 =0.79, p<0.0001) (Fig.6A, right). Over days since symptom onset, the VFI became more negative, indicating the increasingly greater abundance of short 70 bps amplicon, likely due to the increasing fragmentation of viral RNA over clinical course of the disease (Fig. 6B).
[00129] Discussion
[00130] There is a wide variance in the reported SARS-CoV-2 detection rates with different diagnostic methods (typically qRT-PCR methods) and differences in sampling of respiratory samples. The reported SARSCoV-2 detection rate in COVID-19 diagnosed individuals, has been reported to range from 25% to >70% of nasopharyngeal swabs, 32% to 65% for oropharyngeal swabs, and 48% to >90% for sputum. Detection of the virus depends on the assay, the biology and kinetics of viral replication, which in turn depend on site of infection and active viral shedding. For SARS-CoV-2, the respiratory tract represents the major area of viral shedding, and shedding kinetics have been reported to be variable in different studies, ranging from 8-12 days of virus positivity after the clearance of symptoms in some studies, up to 20 days of virus persistence in another study. It is also reported in one study reporting multiple sample types - bronchoalveolar lavage, sputum, nasal swabs, pharyngeal swabs, feces and blood, that nasal swabs had the highest viral loads.
[00131] The viral load of SARS-CoV-2 in clinical samples as measured by the primary diagnostic tool of RT-PCR is an imperfect readout for infection potential. Standard qRT-PCR methods are deliberately designed to detect the presence of very short SARS-CoV-2 RNA sequences, making them unsuitable to distinguish intact SARS-CoV-2 genomes from degraded viral RNA. Detection of degraded viral RNA also contributes to positive test results but cannot provide any insight into the infectiousness (or infectivity) of the virus. [00132] Infectivity requires the presence of intact viral particles indirectly represented by larger, intact viral RNA genomes. While both intact and degraded RNA molecules are equivalently detectable by assays targeting a constant region of the target viral RNA genome (as long as the RNA molecule is amplifiable by the target assay), the detection of many copies of degradation products and similar number of intact molecules cannot be construed to have equivalent infectivity potential. In fact, the infectivity potential with current RT-qPCR cannot be discerned due to the simplistic binary read-out of positive/negative for SARS-CoV-2 virus. [00133] Large RNA virus genomes (30 kb in the case of SARS-CoV-2) are subject to a variety of degradative effects (both viral RNA-specific and nonspecific degradation processes) which are in competition with replacement of intact viral RNA genomes through active viral replication. The process of active viral replication is required for the generation of infectious virus particles and viral shedding.
[00134] Typically, infectivity is determined by live virus isolation performed by complex cell culture methods requiring specialized equipment and days to results.
[00135] The present study aims to address this gap in the measurement of infectivity in both symptomatic and asymptomatic settings, to incorporate a measure of integrity or intactness of the virus. Further, the study envisions that the integrity measure can be translated to a qRT- PCR test designed to detect both short inserts (~70 bps) and long inserts (>200 bps) for SARS- CoV-2.
[00136] In this study, next-generation sequencing (NGS) was used to characterize clinical samples related to the spread of SARS-CoV-2 infection in a cohort of migrant workers in Singapore, collected from different tissue sites and over serial timepoints. The design of the NGS panel allowed virus detection, genomic variant and sgRNA analysis. Further exploiting the design of the NGS panel for whole-genome sequencing of the virus, the inventors have shown for the first time that NGS can be used for the detection of differential fragment lengths of viral RNA in clinical samples, potentially related to the integrity and transmissibility or infectiousness of the virus. The present disclosure presents a novel measure of fragmentation of viral RNA, which could be translated to a test for transmission potential of a current infection. The integrity/intactness measure could more informatively allow the downstream management of positive SARS-CoV-2 results in symptomatic, pre-symptomatic, convalescent and in asymptomatic cases, according to the degree of integrity measured as a proxy of infectiousness. [00137] The present study shows that relative to the gold-standard RT-PCR assay, NGS is a sensitive and specific method of detection of SARS-CoV-2, with 100% detection of virus in clinical samples known to be positive by RT-PCR. Additional detection of virus fragments spanning ~3% of the virus genome in two RT-PCR negative samples, highlights that NGS may be a more sensitive diagnostic method for low levels of SARS-CoV-2. This is attributable to the targeting of the whole genome of the virus in NGS compared to RT-PCR assays targeting specific loci which may not be present among the virus fragments in a sample having low levels of SARS-CoV-2. In agreement with other NGS-based detection methods, sensitivity was high with analytical limit of detection determined to be 50 copies/ml. The detection of SARS-CoV- 2 was quantitative, as the genome coverage percentage and the mean depth of coverage of a sample which was closely related to a Ct value from an orthogonal RT-PCR assay. As this method incorporates molecular barcodes in the primer sequences, the number of consensus reads corresponding to various regions of the genome are representative of the copy number of viral RNA fragments that are present during library preparation. Other SARS-CoV-2 NGS methods do not report similar quantitative capacity. The quantitative feature of the method allows counting of reads to correlate with other parameters as described further on. Genomic variant analysis showed complete similarity of variants in multiple samples derived from a single case, suggesting no independent replication or generation of minor variants over time course of infection, in contrast to previous reports. Phylogenetic analyses was possible on assembled SARS-CoV-2 genomes from 28 cases and showed that all cases were infected with virus belonging to the Clade O, lineage B.6, known to be circulating in Singapore and India. [00138] Besides the sensitive detection of virus and identification of genomic variants, the method provided novel insights into two aspects of SARS-CoV-2 infection which have become increasingly more relevant in the context of predicting the onward transmission potential of an infection. In this study, the inventors were able to elicit information on the presence of subgenomic RNA in clinical samples as well as the relative degree of fragmentation of the viral RNA in a sample.
[00139] Detection of sgRNA has been used as a proof of viral replication in infected cells and decline over days 10-11 from infection. sgRNA detection outlasted successful virus culture and was a poor predictor of successful virus culture, or showed moderate agreement with virus culture, and was detected in 18 of 22 (81.8%) specimens collected <8 days after symptom onset and in 1 of 11 (9.1 %) specimens collected >9 days after symptom onset. Due to the prolonged detection of sgRNA in clinical samples including up to 22 days since start of illness, sgRNA may not be a marker of active replication but remain detectable in clinical samples due to their stability.
[00140] In this study, abundant sgRNA was detected in clinical samples including prolonged detection in a saliva sample taken 13 days from illness onset from a symptomatic case. In samples with high viral loads, sgRNA was consistently detected, but became undetectable in -75% of samples with low viral loads (Ct>28.67), suggesting that both viral load and sgRNA stability account for its detection. This study is the first one to demonstrate direct sgRNA detection in samples from immunocompetent asymptomatic individuals although at lower levels than symptomatic cases, in line with the non-zero but reduced transmission potential of asymptomatic carriers.
[00141] It is increasingly becoming apparent that positive detection of virus RNA by RT- PCR is unrelated to presence of infectious virus, as viral RNA shedding can persist even after resolution of clinical symptoms of both mild and severe disease. While more ill patients have generally longer detection of RNA, persistent positivity by PCR is seen in patients with mild illness as well as in asymptomatic cases. A better measure of infectiousness is the culture of live virus from clinical specimens, as successful virus growth is reliant on the presence of intact whole virions with complete RNA genomes. By this measure, samples taken more than 8 days since symptom onset in immunocompetent patients do not have infectious virus, as no live virus could be cultured from them. In immunocompromised patients or those with severe illness, live virus could be isolated up to 14 days, 20 days, and much longer in severely immunocompromised patients. Ability to culture live virus has been shown to be similar in asymptomatic and symptomatic indivduals. Some studies have correlated viral loads measured by Ct value with ability to culture live virus, and no viral growth was reported based on a Ct value cut-off (varying between >24 and >35). Intriguingly, a number of studies have reported limited correlation between Ct value (or viral load) and success of virus isolation. Samples with low Ct values (Ct<23) constituted 28.6% of samples with unsuccessful viral culture, and conversely, samples with low viral loads (Ct >35) could harbor viable virus.
[00142] The observation that not all samples with high viral loads generate live virus suggests that factors other than viral genome copies are important. The topmost factor would be viral integrity, whether for a small or large viral load. Only the full-length intact viral RNA would represent the virus fragment that is capable of infection. It is understood that the processing of sample in the laboratory (extraction, enzymatic reactions) would lead to some degree of fragmentation, and the use of primer-limited amplicons would artificially limit fragment lengths. However, relative to a sample with highly fragmented viral RNA, a sample with intact viral RNA would contain more long fragments relative to short fragments.
[00143] In this work, the inventors have shown for the first time that NGS data can not only provide total viral RNA abundance information, it can also characterize the viral RNA fragments by length. This was possible due to combination of consensus read counting (related to the original copy numbers of virus RNA in a sample undergoing library preparation) and the tiled configuration of the primer panel which could capture long fragments, if present. In other words, the more intact or full-length RNA that is present in a sample, the more long fragments would be captured by the method. The relative abundances of long and short fragments was converted to a viral fragmentation score (VFS), representing the relative integrity of the virus RNA in the sample and was shown to be related to the viral load measured as Ct value by an RT-PCR assay. It is important to highlight that the CDC RT-PCR assay targets particularly short amplicons of about 70 bps, which means it would accurately detect intact RNA when present, but would continue to give an abundant signal even when largely fragmented RNA was present in a sample. At early Ct values (representing abundant viral load), the relative abundance of long fragments were as much as 100-185 times more than short fragments (very high viral fragmentation score) and could not have resulted simply as a result of the specific sequencing method.
[00144] To further address if the observed relative abundances were simply a function of the copy number of template subject to sequencing, irrespective of its starting length, the inventors showed using synthetic S ARS-CoV-2 RNA, that for relatively intact RNA template molecules, the VFS remains constant even over 500-fold reduction in starting copy numbers. An empirically determined cut-off from synthetic RNA applied to clinical samples was able to separate all samples collected >8 days since illness onset as having predominantly fragmented RNA. This correlated very well with the failure to culture live virus from clinical samples collected after 8-9 days of illness in immunocompetent patients. Conversely, based on the cut off, about 50% of case with VFS <0.382 (hence lesser intact RNA) were collected < 8 days from symptom onset. This fraction mirrors those observed previous studied with about 29%- 42% of samples high viral inferred from low Ct or from <8 days did not recover live virus. [00145] Another variable to consider is the sample type. Viral load and duration of positivity tends to be greater in lower respiratory tract samples (sputum) compared to throat and nasal swabs. Accordingly, prolonged ability to culture virus has been reported for samples from severely ill patients with mostly lower respiratory tract samples, compared to the success rate in upper respiratory tract samples. In the present study, upper respiratory tract samples (nasopharyngeal and nasal swabs) and oropharyngeal saliva samples were compared. Oropharyngeal saliva contains secretion of the salivary glands mixed with sputum from lower respiratory tract, hence is more akin to lower respiratory tract samples. In line with this, the inventors observed higher VFS (more long fragments) in saliva samples on average compared to URT samples.
[00146] One study has looked at the potential of RT-PCR assays to determine the integrity of viral RNA and suggested that strong correlation between the levels of detection of multiple amplicons spanning the intact genome is indicative of viral integrity (and demonstrated this with live virus culture), as opposed to non-correlated levels. In the light of findings in this study and especially with the evidence that both high viral load samples and low viral load samples can produce unexpected live virus culture results, another direct measure of viral integrity is urgently needed.
[00147] Despite the large body of knowledge surrounding viral infectiousness, several gaps and hurdles remain. First, Ct value as a surrogate measure of infectivity is variably correlated with success of viral culture and multiple RT-PCR assays with Ct scales that are not directly comparable are in clinical use (evidenced from the dramatically different Ct cut-offs derived based on culture positivity). Second, typical RT-PCR assays are designed to detect short target templates and will readily amplify small amounts of fragmented viral RNA, precluding any consideration of integrity of the viral RNA. Third, duration since symptom onset has been shown to correlate with success of viral culture, however, all measures suffer from recall bias. Fourth, the viral culture test is labor intensive and requires the Biosafety Level 3, which precludes it from being established in all diagnostic laboratories, and suffers from variations in accuracy and permissiveness of cell lines. Accurate measurement of infection potential becomes particularly relevant as vaccines against SARS CoV-2 that immunize against COVID- 19 disease become available, but transmission capacity of an immunized individual remains unknown. [00148] It is to be noted that the viral fragmentation scores (VFS) determined in this study are relative and post-laboratory processing and no determination of absolute fragment lengths contained in a sample has been attempted. The viral culture assay is the closest surrogate of this measure. However, the basic premise of the argument that more intact virus would reflect in more longer fragments in a clinical sample would universally apply.
[00149] In conclusion, the inventors have applied NGS to comprehensively characterize longitudinal samples collected from different sites. NGS is an enabling tool that provides sequence-related information for which it is primarily designed, and also information from size and length dimensions. Based on this, fragment length differences among clinical samples were identified, which are correlated to clinical features of infectiousness of SARS-CoV-2, quantification of which could be incorporated as relevant and straightforward measure to determine infection potential.

Claims

Claims:
1. A method of determining integrity of a viral genome or infectivity of a virus in a biological sample, said method comprising: a) extracting viral nucleic acid molecules from the biological sample to generate a nucleic acid library; b) providing a plurality of primer pairs wherein,
(i) each primer pair comprises a forward primer and a reverse primer, wherein the forward primer comprises, from the 5’ end to the 3’ end, a first adapter sequence (AS1), and a forward target-specific sequence (TS1); wherein the reverse primer comprises, from the 5’ end to the 3’ end, a second adapter sequence (AS2), and a reverse target specific sequence (TS2);
(ii) the first adapter sequence (AS1) of each forward primer is the same, the second adapter sequence (AS2) of each reverse primer is the same;
(iii) each primer pair with a unique combination of forward and reverse target sequences (TS) is for amplifying a different target region of the viral genome; c) subjecting the nucleic acid library to multiplex PCR using the plurality of primer pairs to amplify a plurality of amplicons; d) subjecting the plurality of amplicons to amplification and sequencing, to obtain amplicon sequences; e) detecting and mapping the amplicon sequences to a reference genome of the viral genome; f) determining the insert size of each amplicon, wherein the insert size is the number of nucleotide bases or base pairs between the forward and reverse primer of each amplicon; g) categorizing the amplicons into groups based on insert size, wherein each group comprises a range of insert sizes; h) enumerating the number of amplicons in each group; i) obtaining one or more insert size ratios of the number of amplicons in one group to the number of amplicons in another group; j) determining the integrity of the viral genome or infectivity of the virus based on the insert size ratio, wherein a large insert size ratio indicates that the viral genome in the biological sample has a high integrity or the virus in the biological sample has a high infectivity, or wherein a small insert size ratio indicates that the viral genome in the biological sample has a low integrity or the virus in the biological sample has a low infectivity.
2. The method of claim 1, wherein an insert size ratio of more than about 1.5 indicates that the viral genome has high integrity or high infectivity, or wherein an insert size ratio of about 0.5-1.5 indicates that the viral genome has medium integrity or medium infectivity, or wherein an insert size ratio of less than about 0.5 indicates that the viral genome has low integrity or low infectivity.
3. The method of any one of the preceding claims, wherein in step (g) the range of insert sizes of each group is determined based on the insert sizes observed in step (f), or expected insert sizes, or both.
4. The method of claim 3, wherein the expected insert sizes are determined by: a) providing multiplex PCR sequencing data that has been obtained from a viral genome of the same virus species; b) processing the sequencing data to identify one or more pairs of predetermined primers, wherein each pair of predetermined primers comprises a forward primer and a reverse primer that flank an amplicon in the 5 ’to 3 ’direction; c) trimming primer specific sequences from each amplicon to obtain the sequence of the insert; d) aligning the insert sequence to a reference genome; e) removing inserts with supplementary alignment from the sequencing data, wherein inserts with supplementary alignment are inserts that do not align in contiguity with the reference genome; f) determining the insert size of each remaining insert from step e); g) retaining inserts from step f) having an insert size of between 70 base pairs (bps) to 1000 bps to obtain the expected insert sizes.
5. The method of any one of the preceding claims, wherein the insert size ratio is the number of amplicons in the group with a range of insert sizes that is the second smallest, to the number of amplicons in the group with a range of insert sizes that is the smallest.
6. The method of any one of the preceding claims, wherein each group comprises amplicons formed between the forward and reverse primers of a primer pair, and/or between the forward primer of a primer pair and the reverse primer of another primer pair.
7. The method of claim 5 or 6, wherein the range of insert sizes that is the second smallest is 151-350 bps, and the range of insert sizes that is the smallest is 0-150 bps.
8. The method of any one of the preceding claims, wherein the viral genome is a viral RNA genome, and the viral nucleic acid molecules in step a) are viral RNA molecules, wherein step a) further comprises reverse-transcribing the viral RNA molecules extracted from the biological sample to generate the nucleic acid library, and wherein the nucleic acid library is a cDNA (complementary DNA) library; wherein optionally, the cDNA library is purified to retain all amplicons, including those that are more than 100 bps in size.
9. The method of claim 8, wherein the viral RNA genome is from a virus selected from the group consisting of: Lymphocytic choriomeningitis virus, Coronavirus, human immunodeficiency virus (HIV), Human metapneumovirus, Poliovirus, Rhinovirus, Hepatitis A, Norwalk virus, Yellow fever virus, West Nile virus, Hepatitis C virus, Dengue fever virus, Enterovirus, Zika virus, Rubella virus, Ross River virus, Sindbis virus, Chikungunya virus, Borna disease virus, Ebola virus, Marburg virus, Measles virus, Mumps virus, Nipah virus, Hendra virus, Newcastle disease virus, Human respiratory syncytial virus, Rabies virus, Lassa virus, Hantavirus, Crimean-Congo hemorrhagic fever virus, Human parainfluenza viruses 1-4, Influenza virus, and Hepatitis D virus; wherein optionally, the coronavirus is selected from the group consisting of Severe acute respiratory syndrome coronavirus (SARS-CoV), Severe acute respiratory syndrome coronavirus 1 (SARS-CoV- 1) and Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
10. The method of any one of the preceding claims, wherein the biological sample is a body fluid sample, or wherein the biological sample is selected from the group consisting of a nasopharyngeal swab sample, an oropharyngeal swab sample, a saliva sample, a sputum sample, a viral culture sample, or an inactivated cultured viral isolate.
11. The method of any one of the preceding claims, wherein the insert size ratio is compared to an insert size ratio obtained from a reference sample, wherein a larger insert size ratio relative to the reference sample indicates that the viral genome in the biological sample has a higher integrity or infectivity compared to the reference sample; and/or wherein a smaller insert size ratio relative to the reference sample indicates that the viral genome in the biological sample has a lower integrity or infectivity compared to the reference sample.
12. The method of claim 10, wherein the biological sample is obtained from a subject and the reference sample is a biological sample selected from the group consisting of a biological sample obtained from a different subject, a biological sample obtained from the same subject at one or more different time points, a different type of biological sample obtained from the same subject and a viral culture.
13. The method of any one of the preceding claims, wherein the insert size ratio of the viral genome in a biological sample obtained from a subject indicates whether the subject is asymptomatic, pre-symptomatic or symptomatic.
14. The method of any one of the preceding claims, wherein the change in insert size ratios of biological samples obtained from a subject over two or more time points is indicative of the course of a viral infection in the subject.
15. The method of any one of the preceding claims, further comprising: a) selecting a first amplicon having a first insert, selecting a second amplicon having a second insert, wherein the first insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the smallest, and wherein the second insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the second smallest; b) providing a first pair of primers comprising a forward primer and a reverse primer, wherein the forward primer and reverse primer are designed for amplifying the first insert; c) providing a second pair of primers comprising a forward primer and a reverse primer, wherein the forward primer and reverse primer are designed for amplifying the second insert; d) providing viral nucleic acid molecules extracted from the biological sample; e) subjecting the viral nucleic acid molecules to amplification using the first pair of primers and the second pair of primers; f) enumerating the number of the first amplicons and the number of the second amplicons; g) obtaining an intactness ratio of the number of the second amplicons to the number of the first amplicons; h) determining the intactness of the viral genome based on the intactness ratio, wherein a large intactness ratio indicates that the viral genome in the biological sample has a high intactness, and/or wherein a small intactness ratio indicates that the viral genome in the biological sample has a low intactness.
16. The method of claim 15, wherein an intactness ratio of more than about 1.5 indicates that the viral genome has high intactness, or wherein an insert size ratio of about 0.5- 1.5 indicates that the viral genome has medium intactness, or wherein an insert size ratio of less than about 0.5 indicates that the viral genome has low intactness.
17. The method of any one of claims 15-16, wherein the first amplicon has a size of between 65-75 bps, and the second amplicon has a size of between 220-250 bps.
18. A kit for determining the integrity of a viral genome or infectivity of a virus in a biological sample, comprising: a) a first pair of primers comprising a forward primer and a reverse primer designed for amplifying a first amplicon having a first insert, wherein the first insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the smallest as determined by the method of any one of claims 1 to 15; b) a second pair of primers comprising a forward primer and a reverse primer designed for amplifying a second amplicon having a second insert, wherein the second insert has an insert size that is between 70-80 bps shorter than the range of insert sizes of the group with the range of insert sizes that is the second smallest as determined by the method of any one of claims 1 to 15; c) a first probe to detect the first amplicon; and d) a second probe to detect the second amplicon.
19. The kit of claim 18, wherein the viral RNA genome is from a virus selected from the group consisting of: Lymphocytic choriomeningitis virus, Coronavirus, human immunodeficiency virus (HIV), Human metapneumovirus, Poliovirus, Rhinovirus, Hepatitis A, Norwalk virus, Yellow fever virus, West Nile virus, Hepatitis C virus, Dengue fever virus, Enterovirus, Zika virus, Rubella virus, Ross River virus, Sindbis virus, Chikungunya virus, Borna disease virus, Ebola virus, Marburg virus, Measles virus, Mumps virus, Nipah virus, Hendra virus, Newcastle disease virus, Human respiratory syncytial virus, Rabies virus, Lassa virus, Hantavirus, Crimean-Congo hemorrhagic fever virus, Human parainfluenza viruses 1-4, Influenza virus, and
Hepatitis D virus; wherein the coronavirus is selected from the group consisting of Severe acute respiratory syndrome coronavirus (SARS-CoV), Severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1) and Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
20. The kit of claim 19, wherein the first pair of primers comprise the sequences:
GAT AGGTT GATC AC AGGC AGAC (S_gene_5d_fwd)(SEQ ID NO: 4) and AGTTCTTTTCTTGTGCAGGGAC (S_gene_5d_rev)(SEQ ID NO: 5); wherein the second pair of primers comprise the sequences:
GCGCATTGGCATGGAAGTC (Setl_N_3_fwd)(SEQ ID NO: 6) and GTCATCCAATTTGATGGCACCTG (Setl_N_3_rev)(SEQ ID NO: 7); wherein the first probe comprises the sequence
CCCTCAGTCAGCACCTCATGGTGT (S_gene_5_probe)(SEQ ID NO: 8); and wherein the second probe comprises the sequence
CCTTCGGGAACGTGGTTGACCTAC (Setl_N_3_probe)(SEQ ID NO: 9).
PCT/SG2021/050439 2020-07-29 2021-07-29 Methods and kits for determining integrity of viral rna WO2022025825A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
SG10202007270P 2020-07-29
SG10202007270P 2020-07-29
SG10202011694Q 2020-11-24
SG10202011694Q 2020-11-24

Publications (1)

Publication Number Publication Date
WO2022025825A1 true WO2022025825A1 (en) 2022-02-03

Family

ID=80038136

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2021/050439 WO2022025825A1 (en) 2020-07-29 2021-07-29 Methods and kits for determining integrity of viral rna

Country Status (1)

Country Link
WO (1) WO2022025825A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4001435A1 (en) * 2020-11-19 2022-05-25 Clear Labs Inc. Systems and processes for distinguishing sequences from microorganisms
CN116287479A (en) * 2023-05-16 2023-06-23 北京百奥益康医药科技有限公司 Primer combination for detecting respiratory viruses and application thereof
RU2816270C2 (en) * 2022-09-08 2024-03-28 Федеральное бюджетное учреждение науки "Санкт-Петербургский научно-исследовательский институт эпидемиологии и микробиологии им. Пастера Федеральной службы по надзору в сфере защиты прав потребителей и благополучия человека" (ФБУН НИИ эпидемиологии и микробиологии имени Пастера) Method of detecting nipah virus using real-time rt-pcr

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Comparison of National RT-PCR Primers, Probes, and Protocols for SARS-CoV-2 Diagnostics", JOHNS HOPKINS CENTER FOR HEALTH SECURITY, 13 April 2020 (2020-04-13), XP055904574, Retrieved from the Internet <URL:https://www.centerforhealthsecurity.org/resources/COVID-19/COVID-19-fact-sheets/200410-RT-PCR.pdf> *
ATKINSON B. ET AL.: "SARS-CoV-2 shedding and infectivity", THE LANCET, vol. 395, no. 10233, 15 April 2020 (2020-04-15), pages 1339 - 1340, XP086146469, [retrieved on 20210923], DOI: 10.1016/S0140-6736(20)30868-0 *
CHOUDHURY YUKTI, CHER CHAE YIN, WAN ZI YI, XIE CHAO, LIM JING SHAN, VIRK RAMANDEEP KAUR, TAN MIN HAN, JING TEO ALVIN KUO, HSU LI Y: "A Viral Fragmentation Signature for SARS-CoV-2 in Clinical Samples Correlating with Contagiousness", MEDRXIV, 15 January 2021 (2021-01-15), XP055904578, Retrieved from the Internet <URL:https://www.medrxiv.org/content/10.1101/2021.01.11.21249265v1.full.pdf> DOI: 10.1101/2021.01.11.21249265 *
HUANG CHUNG-GUEI, LEE KUO-MING, HSIAO MEI-JEN, YANG SHU-LI, HUANG PENG-NIEN, GONG YU-NONG, HSIEH TZU-HSUAN, HUANG PO-WEI, LIN YA-J: "Culture-Based Virus Isolation To Evaluate Potential Infectivity of Clinical Specimens Tested for COVID-19", JOURNAL OF CLINICAL MICROBIOLOGY, AMERICAN SOCIETY FOR MICROBIOLOGY, US, vol. 58, no. 8, 23 July 2020 (2020-07-23), US , XP055904576, ISSN: 0095-1137, DOI: 10.1128/JCM.01068-20 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4001435A1 (en) * 2020-11-19 2022-05-25 Clear Labs Inc. Systems and processes for distinguishing sequences from microorganisms
GB2605477A (en) * 2020-11-19 2022-10-05 Clear Labs Inc Systems and processes for distinguishing sequences from microorganisms
RU2816270C2 (en) * 2022-09-08 2024-03-28 Федеральное бюджетное учреждение науки "Санкт-Петербургский научно-исследовательский институт эпидемиологии и микробиологии им. Пастера Федеральной службы по надзору в сфере защиты прав потребителей и благополучия человека" (ФБУН НИИ эпидемиологии и микробиологии имени Пастера) Method of detecting nipah virus using real-time rt-pcr
CN116287479A (en) * 2023-05-16 2023-06-23 北京百奥益康医药科技有限公司 Primer combination for detecting respiratory viruses and application thereof
CN116287479B (en) * 2023-05-16 2023-07-28 北京百奥益康医药科技有限公司 Primer combination for detecting respiratory viruses and application thereof

Similar Documents

Publication Publication Date Title
Xiao et al. Multiple approaches for massively parallel sequencing of SARS-CoV-2 genomes directly from clinical samples
Braun et al. Acute SARS-CoV-2 infections harbor limited within-host diversity and transmit via tight transmission bottlenecks
James et al. LamPORE: rapid, accurate and highly scalable molecular screening for SARS-CoV-2 infection, based on nanopore sequencing
Meiring et al. Next-generation sequencing of cervical DNA detects human papillomavirus types not detected by commercial kits
Sun et al. Sequential waves of gene expression in patients with clinically defined dengue illnesses reveal subtle disease phases and predict disease severity
WO2022025825A1 (en) Methods and kits for determining integrity of viral rna
Ptasinska et al. Diagnostic accuracy of loop-mediated isothermal amplification coupled to nanopore sequencing (LamPORE) for the detection of SARS-CoV-2 infection at scale in symptomatic and asymptomatic populations
Jensen et al. Target-dependent enrichment of virions determines the reduction of high-throughput sequencing in virus discovery
US20230193410A1 (en) Identification of nsp1 gene as target of sars-cov-2 real-time rt-pcr using nanopore whole genome sequencing
WO2022025823A1 (en) Methods and kits for detecting and sequencing rna viruses
KR102154816B1 (en) Use of HLA-B*1301 allele
Sharma et al. In silico evaluation of the impact of the Omicron variant on the sensitivity of RT-qPCR assays for SARS-CoV-2 detection using whole genome sequencing
Gärtner et al. A fast extraction-free isothermal LAMP assay for detection of SARS-CoV-2 with potential use in resource-limited settings
Chong et al. Rhinovirus/enterovirus was the most common respiratory virus detected in adults with severe acute respiratory infections pre-COVID-19 in Kuala Lumpur, Malaysia
Mortier et al. Frequency and predictors of HIV-1 co-receptor switch in treatment naive patients
Child et al. Optimised protocol for monitoring SARS-CoV-2 in wastewater using reverse complement PCR-based whole-genome sequencing
Marcolungo et al. ACoRE: Accurate SARS-CoV-2 genome reconstruction for the characterization of intra-host and inter-host viral diversity in clinical samples and for the evaluation of re-infections
Martín et al. Comparison of in-house SARS-CoV-2 genome extraction procedures. A need for COVID-19 pandemic
WO2021191829A1 (en) Assays for detecting pathogens
US20230374592A1 (en) Massively paralleled multi-patient assay for pathogenic infection diagnosis and host physiology surveillance using nucleic acid sequencing
Cuong et al. Comparison of Primer‐Probe Sets among Different Master Mixes for Laboratory Screening of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)
Choudhury et al. A Viral Fragmentation Signature for SARS-CoV-2 in Clinical Samples Correlating with Contagiousness
Alves et al. Clinical validation of colorimetric RT-LAMP, a fast, highly sensitive and specific COVID-19 molecular diagnostic tool that is robust to detect SARS-CoV-2 variants of concern
Abe et al. Long-term validation of a reverse transcription loop-mediated isothermal amplification (RT-LAMP) assay for the rapid detection of SARS-CoV-2 from March 2020 to October 2021 in Central Africa, Gabon
Itokawa et al. Disentangling primer interactions improves SARS-CoV-2 genome sequencing by the ARTIC Network’s multiplex PCR

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21851238

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21851238

Country of ref document: EP

Kind code of ref document: A1