WO2020250068A1 - Materials and methods for assessing virome and microbiome matter - Google Patents

Materials and methods for assessing virome and microbiome matter Download PDF

Info

Publication number
WO2020250068A1
WO2020250068A1 PCT/IB2020/055047 IB2020055047W WO2020250068A1 WO 2020250068 A1 WO2020250068 A1 WO 2020250068A1 IB 2020055047 W IB2020055047 W IB 2020055047W WO 2020250068 A1 WO2020250068 A1 WO 2020250068A1
Authority
WO
WIPO (PCT)
Prior art keywords
viral
clusters
subject
ibd
dataset
Prior art date
Application number
PCT/IB2020/055047
Other languages
French (fr)
Inventor
Scott Plevy
Colin Hill
Andrey SHKOPOROV
Adam CLOONEY
Thomas Sutton
Original Assignee
University College Cork – National University Of Ireland, Cork
Janssen Biotech, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University College Cork – National University Of Ireland, Cork, Janssen Biotech, Inc. filed Critical University College Cork – National University Of Ireland, Cork
Publication of WO2020250068A1 publication Critical patent/WO2020250068A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/70Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving virus or bacteriophage
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K31/00Medicinal preparations containing organic active ingredients
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K35/00Medicinal preparations containing materials or reaction products thereof with undetermined constitution
    • A61K35/66Microorganisms or materials therefrom
    • A61K35/74Bacteria
    • A61K35/741Probiotics
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K35/00Medicinal preparations containing materials or reaction products thereof with undetermined constitution
    • A61K35/66Microorganisms or materials therefrom
    • A61K35/74Bacteria
    • A61K35/741Probiotics
    • A61K35/742Spore-forming bacteria, e.g. Bacillus coagulans, Bacillus subtilis, clostridium or Lactobacillus sporogenes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K35/00Medicinal preparations containing materials or reaction products thereof with undetermined constitution
    • A61K35/66Microorganisms or materials therefrom
    • A61K35/74Bacteria
    • A61K35/741Probiotics
    • A61K35/744Lactic acid bacteria, e.g. enterococci, pediococci, lactococci, streptococci or leuconostocs
    • A61K35/745Bifidobacteria
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6893Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids related to diseases not provided for elsewhere
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2535/00Reactions characterised by the assay type for determining the identity of a nucleotide base or a sequence of oligonucleotides
    • C12Q2535/122Massive parallel sequencing
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/06Gastro-intestinal diseases
    • G01N2800/065Bowel diseases, e.g. Crohn, ulcerative colitis, IBS

Definitions

  • the present invention in some aspects, relates to a method of analyzing the
  • microbiome e.g., virome
  • the present invention also relates to methods of diagnosing and treating dysbiosis of the microbiome, and various disorders that include infections and inflammatory disorders.
  • the virome is likely to be one of the major forces shaping the human gut microbiome, but is perhaps its least understood component.
  • the virome is dominated by phages, such as bacteriophages, which play vital roles in many microbial communities by driving diversity, facilitating nutrient turnover (Weitz et al., 2015. ISME J, 9, 1352-64) and facilitating horizontal gene transfer (Canchaya et al, 2003. Current opinion in microbiology, 6, 417-424).
  • High throughput sequencing has revealed the enormous diversity of the viral fraction of microbial ecosystems.
  • IBD ulcerative colitis
  • a method for identifying a plurality of viral marker clusters for determining the presence of inflammatory bowel disease (IBD) using viral genome sequences comprising:
  • GI gastrointestinal
  • creating a first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort;
  • each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD;
  • identifying a plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters.
  • At least a portion of the first plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database, and at least a portion of the second plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database.
  • a totality of the first plurality and second plurality of viral genome sequences are each unassociated with a viral taxonomic category derived from a viral genome database.
  • the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises using machine learning to identify the plurality of marker clusters.
  • the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises identifying the plurality of marker clusters unassociated with a known taxon.
  • each of the viral clusters in the plurality of marker clusters respectively represent an unidentified taxon of higher rank than a strain and of lower rank than a family.
  • the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises performing beta diversity analysis on the first plurality of viral clusters and the second plurality of viral clusters.
  • performing the beta diversity analysis comprises performing a scaling and ordination technique selected from a group consisting of principal coordinates analysis (PCoA), principal components analysis (PCA), non-metric multidimensional scaling (NMDS), canonical correspondence analysis (CCA), and redundancy analysis (RDA).
  • PCoA principal coordinates analysis
  • PCA principal components analysis
  • NMDS non-metric multidimensional scaling
  • CCA canonical correspondence analysis
  • RDA redundancy analysis
  • the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises calculating differential abundance of viral clusters in the first plurality of viral clusters and the second plurality of viral clusters.
  • the healthy cohort and the cohort diagnosed with IBD are each human cohorts.
  • the methods described above further comprise:
  • the above methods further comprise:
  • the first dataset further represents a first plurality of identified viral genome sequences derived from the healthy cohort
  • the second dataset further represents a second plurality of identified viral genome sequences derived from the cohort diagnosed with IBD.
  • the method further comprises:
  • step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters further comprises identifying the plurality of marker clusters by comparing a combination of the first plurality of viral clusters and the first plurality of reference viral clusters to a combination of the second plurality of viral clusters and the second plurality of reference viral clusters.
  • the first plurality of identified viral genome sequences are associated with a viral taxonomic category present in a viral genome database
  • the second plurality of identified viral genome sequences are associated with a viral taxonomic category present in a viral genome database.
  • a method for determining the presence of inflammatory bowel disease (IBD) in a subject comprising:
  • each viral cluster in the plurality of subject viral clusters comprising one or more viral genome sequences derived from the subject;
  • At least a portion of the plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database. In some embodiments, a totality of the plurality of viral genome sequences are each unassociated with a viral taxonomic category derived from a viral genome database. In some embodiments, at least a portion of the plurality of marker clusters are unassociated with a viral taxonomic category derived from a viral genome database.
  • the above methods further comprise determining the presence of IBD in the subject based at least in part on the comparison of the plurality of subject viral clusters to the plurality of marker clusters.
  • the marker clusters comprise one or more viral clusters from taxa Siphoviridae, Myoviridae, Podoviridae, CrAss-like, or Microviridae.
  • the plurality of marker clusters comprises one or more viral clusters selected from vc2, vc6, vc7, vc13, vc14, vc15, vc17, vc19, vc21, vc22, vc23, vc24, vc25, vc28, vc29, vc36, vc37, vc38, vc39, vc40, vc42, vc45, vc48, vc53, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc75, vc76, vc77, vc78, vc79, vc80, vc82, vc84, vc85,
  • an increased abundance of one or more viral clusters selected from vc2, vc13, vc14, vc15, vc17, vc21, vc22, vc36, vc40, vc48, vc53, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc77, vc78, vc79, vc80, vc85, vc88, vc89, vc91, vc94, vc95, vc97, vc102, vc108, vc1 l3, vc1 l 5, vc1 l7, vc1 l8, vc122, vc123, vc130, vc132, vc142, vc152, vc155, vc160, vc161, vc175, vc178,
  • CD Crohn’s Disease
  • an increased abundance of one or more viral clusters selected from vc28 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
  • an increased abundance of one or more viral clusters selected from vc2, vc17, vc21, vc22, vc53, vc70, vc74, vc85, vc88, vc89, vc1 l5, vc122, vc123, vc130, vc152, vc161, vc175, vc181, vc205, vc218, vc263, and vc413 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
  • UC ulcerative colitis
  • an increased abundance of viral cluster vc2 in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
  • an increased abundance of one or more viral clusters selected from vc38 vc46, vc48, vc54, vc57, vc62, vc64, vc69, vc71, vc108, vc1 l l, vc1 l4, vc1 l 5, vc128, vc159, vc162, vc215, vc220, vc242, vc340, vc374, and vc392 in the subject sample as compared to a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject.
  • an increased abundance of one or more viral clusters selected from vc16, vc1 19, and vc163 in the subject sample as compared to a patient with a flare-up of ulcerative colitis (UC) is indicative of the presence of UC in remission in the subject.
  • a decreased abundance of one or more viral clusters selected from vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1Ol, vc103, vc104, vc109, vc112, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320,
  • a decreased abundance of one or more viral clusters selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
  • CD Crohn’s Disease
  • a decreased abundance of one or more viral clusters selected from vc7, vc25, vc47, and vc64 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
  • a decreased abundance of vc98 and/or vc103 viral cluster in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
  • obtaining the dataset(s) is performed by sequencing VLP DNA isolated from GI microbiota sample(s).
  • the method further comprises:
  • the method further comprises determining the presence of IBD in the subject based at least in part on the comparison of the individual bacteriome dataset to at least one of a healthy control and a control diagnosed with IBD.
  • the bacterial taxa associated with IBD comprise one or more bacterial genera selected from Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, Flavonifr actor, Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Dorea, Roseburia, Odoribacter, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
  • an increased abundance of one or more bacterial genera selected from Clostridium XIV a, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, and Flavonifractor in the subject sample as compared to a healthy control is indicative of the presence of IBD in the subject.
  • an increased abundance of one or more bacterial genera selected from Clostridium XlVa, Blautia, Megasphaera, and Fusobacterium in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
  • CD Crohn’s Disease
  • an increased abundance of one or more bacterial species selected from Bacteroides fragilis and Ruminococcus gnavus in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
  • an increased abundance of Ruminococcus gnavus in the subject sample as compared to a control sample from a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject.
  • an increased abundance of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes in the subject sample as compared to a control sample from a patient with a flare-up of ulcerative colitis (UC) in remission is indicative of the presence of UC in remission in the subject.
  • an increased abundance of bacterial genus Flavonifractor in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
  • a decreased abundance of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
  • a decreased abundance of bacterial genus Akkermansia in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
  • obtaining the individual bacteriome dataset is performed by sequencing 16S rDNA or a V region of 16S rDNA in the GI microbiota sample.
  • the V region is V4 region.
  • the GI microbiota sample is a fecal sample.
  • the subject is human.
  • the method further comprises administering an IBD treatment to the subject.
  • the method further comprises administering to the subject additional diagnostic tests for IBD, CD and/or UC.
  • the method further comprises enrolling the subject in a clinical trial.
  • comparing the plurality of subject viral clusters to the plurality of marker clusters comprises:
  • kits for determining the presence of inflammatory bowel disease (IBD) in a subject, the kit comprising:
  • a second dataset representing a plurality of viral genome IBD marker clusters receive a second dataset representing a plurality of viral genome IBD marker clusters; create a plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group unidentified viral genome sequences of the plurality of unidentified viral genome sequences, each viral cluster in the plurality of viral clusters comprising one or more unidentified viral genome sequences of the plurality of unidentified genome sequences;
  • the device is further configured to:
  • the GI microbiota sample is one or more of group consisting a fecal sample, a cecal sample, an ileal sample, and a colonic microbiota sample.
  • the IBD is ulcerative colitis (UC).
  • the IBD is Crohn's disease (CD).
  • the subject is human.
  • processors one or more processors
  • a memory in communication with the one or more processors and storing instructions thereon that, when executed by the one or more processors, are configured to cause the system to: receive a first dataset representing a first plurality of viral genome sequences derived from a healthy cohort;
  • first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort;
  • each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD;
  • a method for preventing and/or treating inflammatory bowel disease (IBD) in a subject in need thereof comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1 Ol, vc103, vc104, vc109,
  • the method further comprises administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus,
  • Methanobrevibacter Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,
  • Flavonifr actor Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
  • the probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,
  • Flavonifr actor Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.
  • a method for preventing and/or treating IBD in a subject in need thereof comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
  • a method for preventing and/or treating Crohn's disease (CD) in a subject in need thereof comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284.
  • a virus from a viral cluster selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, v
  • the method further comprises administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Abstipes,
  • Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
  • the probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Abstipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter.
  • a method for preventing and/or treating CD in a subject in need thereof comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
  • a for preventing and/or treating ulcerative colitis (UC) in a subject in need thereof comprising administering to the subject an effective amount of a virus from a viral cluster vc98 and/or vc103.
  • the method further comprises administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of the bacterial genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus.
  • the probiotic composition comprises one or more bacterial strains from the genus Akkermansia.
  • a method for preventing and/or treating UC in a subject in need thereof comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
  • a method for preventing and/or treating IBD in a subject in need thereof comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium,
  • the probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV,
  • Faecalibacterium Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.
  • a method for preventing and/or treating CD in a subject in need thereof comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes,
  • Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
  • the probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter.
  • a method for preventing and/or treating UC in a subject in need thereof comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of the genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus.
  • the probiotic composition comprises one or more bacterial strains from the genus Akkermansia.
  • the V region is V4 region.
  • a method for preventing and/or treating UC in a subject in need thereof comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.
  • a method for preventing and/or treating UC in a subject in need thereof comprising administering to the subject an effective amount of a probiotic comprising one or more of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.
  • the subject is human.
  • Figures 1A-1D demonstrate a comparison of commonality pre and post clustering of viral contigs.
  • PCoA of Spearman distances using pre-clustering (viral contigs) Figure 1A
  • VC post-clustering viral cluster
  • Figure IB Figure 1B
  • Figure 1C shows the relative abundance of viral contigs (top) and VCs (bottom) for control subjects at varying thresholds of commonality across subjects.
  • Figure ID depicts the number of viral contigs/VCs shared between 30%, 50% and 70% of subjects in each cohort.
  • Figures 2A-2D show the virome composition comparison of the IBD cohorts to controls.
  • Figure 2A depicts PCoA using Spearman distances.
  • Figure 2B depicts alpha diversity (observed VCs) with p-values from wilcoxon tests.
  • Figure 2C shows volcano plots of differential abundance results from DeSeq2 between controls and CD.
  • Figure 2D shows volcano plots of differential abundance results from DeSeq2 between control and UC. All points above the dotted line are significant.
  • Figures 3A-3D show the bacterial compositional comparison of the IBD cohorts and controls.
  • Figure 3A depicts PCoA using unweighted UniFrac distances.
  • Figure 3B is a plot showing alpha diversity (Chaol diversity) with p-values from wilcoxon tests.
  • Figure 3C shows differential abundance results from DeSeq2 between controls and CD.
  • Figure 3D shows differential abundance results from DeSeq2 between control and UC. All points above the dotted line are significant.
  • Figures 4A-4B show the drivers of PCoA separation for the virome (spearman distances; Figure 4A) and 16S unweighted UniFrac (Figure 4B).
  • VC and RSV abundances were correlated, using spearman correlations, with PC axis 1 and 2. Only significant correlations with a rho of greater than 0.35 or -0.35 were graphed for the virome or ⁇ .5 for the 16S (or a maximum of the top 6 for each quadrant).
  • Grey arrows indicate unclassified VCs/RSVs. The length of the arrow represents the degree of correlation to the PC axes.
  • Figures 5A-5F demonstrate the investigation of differences in viromes and 16S between subjects in UC flare and UC remission.
  • Beta diversity for viromes using Spearman distances; Figure 5A) and 16S (unweighted UniFrac; Figure 5B) are shown.
  • VCs and RSV abundance were correlated with PC axis 1 and 2. Only significant correlations with a rho of greater than ⁇ 0.35 were graphed for the virome or ⁇ .5 (or top 6 for each quadrant) for the 16S.
  • Grey arrows indicate unclassified VCs/RSVs. The length of the arrow represents the degree of correlation to the PC axes.
  • Figures 6A-6D show the classification between healthy controls and patients with IBD using VC and 16S composition. The top 20 importance factors are shown for each models for VCs ( Figure 6A), 16S ( Figure 6B), VCs and 16S combined ( Figure 6C). The shades of grey of the bars correspond to differential abundance between groups; text to the right of the bar are the classifications and/or the bacterial annotation to CRISPR protospacers.
  • Figure 6D shows the ROC curve analysis for each of the 3 models including the % accuracy.
  • Figure 7A depicts a VC PCoA using Spearman distances comparing the 3 cohorts CD, UC and controls.
  • Figure 7B shows distances between points in each cohort for the VC spearman PCoA.
  • Figure 7C shows 16S PCoA using unweighted UniFrac distances comparing the 3 cohorts.
  • Figure 7D is a boxplot showing distances between points in each cohort for the 16S unweighted UniFrac PCoA. P-values for boxplots are from wilcoxon tests.
  • Figures 8A-8F show the alpha diversity of patients with IBD versus healthy controls. Shown are Observed VCs (Figure 8A), Shannon diversity of VCs (Figure 8B), Chaol diversity of 16S counts (Figure 8C), and Shannon diversity of 16S counts (Figure 8D). P-values for boxplots are from wilcoxon tests.
  • Figure 8E shows Spearman correlations between observed VC counts and observed bacterial species counts.
  • Figure 8F shows Shannon diversity of VCs and 16S counts.
  • Figures 9A-9B show the alpha diversity of observed VLPs for any VCs classified as Caudovirales tested for disease groups and controls (Figure 9A) and disease groups/states and controls ( Figure 9B). P-values for boxplots are from wilcoxon tests.
  • Figures 10A-10B show the read alignment for samples in each cohort to VCs classified as lysogenic ( Figure 10A) and non-lysogenic ( Figure 10B). P-values for boxplots are from wilcoxon tests.
  • Figure 11 depicts a Procrustes plot of the Virome PCoA using Spearman distances and the 16S PCoA with unweighted UniFrac. Lines connect samples from the same subject.
  • Figure 12 depicts a Procrustes plot of the Virome PCoA using Spearman distances and the 16S PCoA with unweighted UniFrac. Lines connect samples from the same subject.
  • Figure 13A shows the Spearman correlation between estimated viral load and observed VCs.
  • Figure 13B shows viral load plotted per subject with points colored using various intensities of grey by disease status
  • Figure 14 depicts a network plot of CRISPR protospacers to the 20 most relevant VCs (10 key and additional important VCs from machine learning). Clusters and CRISPR
  • protospacers are colored using various intensities of grey according to differential abundance using DeSeq2.
  • Figures 15A-15J show images of the 10 key drivers in the separation of IBD and controls. Annotations are using pVOGs.
  • Figure 16 is a block diagram illustrating a system or device for identifying virome marker clusters according to aspects of the present invention.
  • Figure 17 is a block diagram illustrating a system or device for detecting health or disease in a subject based at least in part on virome marker clusters according to aspects of the present invention.
  • bacteria encompasses both prokaryotic organisms and archaea present in mammalian microbiota.
  • microbiota is used herein to refer to microorganisms (e.g., bacteria, archaea, fungi, protozoa) and viruses (e.g., phages and eukaryotic viruses) present in a host animal or human (e.g., in the gastrointestinal tract, skin, oral cavity, vagina, etc.). Microbiota exerts a significant influence on health and well-being of the host. Viruses present in microbiota are separately described as“virobiota”.
  • microbiome refers to the collective genes of all organisms comprising the microbiota.
  • the term“virome” is used herein to refer to include viruses, virus-like particles (VLPs), and molecules that closely resemble viruses but may or may not be infectious and may or may not include viral genetic material.
  • The“virome” can include the“virobiota” but is not limited to the“virobiota”.
  • A“microbiota sample” it is meant a sample that contains a microbiota from a particular source.
  • A“GI microbiota sample” is from the gastro-intestinal tract, and may include a fecal microbiota sample.
  • Microbiota samples may comprise all of the components present in the microbiota.
  • GI microbiota refers to microorganisms (e.g., bacteria, fungi, unicellular parasites) and viruses (e.g., phages and eukaryotic viruses) in the digestive tract.
  • microorganisms e.g., bacteria, fungi, unicellular parasites
  • viruses e.g., phages and eukaryotic viruses
  • the term“dysbiosis” refers to a microbial imbalance on or inside the body. Dysbiosis can result from, e.g., antibiotic exposure as well as other causes, e.g., infections with pathogens including viruses, bacteria and eukaryotic parasites. Dysbiosis can also result from unknown causes, or causes that are not yet known.
  • the term“consequences of dysbiosis” refers to various disorders associates with dysbiosis.
  • dysbiosis in the GI tract has been reported to be associated with a wide variety of illnesses, such as, e.g., irritable bowel syndrome (IBS), inflammatory bowel disease (IBD), chronic fatigue syndrome, obesity, rheumatoid arthritis, ankylosing spondylitis, bacterial vaginosis, colitis, small intestinal cancer, colorectal cancer, metabolic syndrome, cardiovascular disease, Crohn's disease, infectious gastroenteritis, non-infectious gastroenteritis, food allergy, Celiac disease, gastrointestinal graft versus host disease, pouchitis, intestinal failure, short bowel syndrome, antibiotics-associated diarrhea, etc.
  • IBS irritable bowel syndrome
  • IBD inflammatory bowel disease
  • chronic fatigue syndrome obesity
  • rheumatoid arthritis ankylosing spondylitis
  • bacterial vaginosis colitis
  • small intestinal cancer colorectal cancer
  • metabolic syndrome cardiovascular disease
  • Crohn's disease infectious gastroenteritis
  • restoring normal microbiota is used herein to refer to restoring microbiota of a subject to the level of bioactivity and diversity of corresponding microbiota of a healthy subject. This may also be considered as normalizing the microbiota, populating the microbiota, populating normal microbiota, preventing the onset of dysbiosis, or augmenting the growth of at least one type of virus in a subject.
  • qPCR quantitative PCR
  • qPCR high-throughput sequencing methods which detect over- and under-represented genes in the total bacterial population
  • 454- sequencing for community analysis screening of microbial 16S ribosomal RNAs (16S rRNA), etc.
  • transcriptomic or proteomic studies that identify lost or gained microbial transcripts or proteins within total bacterial populations. See, e.g., U.S. Patent Publication No.
  • Various exemplary ways of amplifying and sequencing of nucleic acids from microbiota samples includes, but is not limited to: solid-phase PCR involving bridge
  • amplification of DNA fragments of the biological samples on a substrate with oligo adapters wherein amplification involves primers having a forward index sequence (e.g., Illumina forward index for MiSeq/NextSeq/HiSeq platforms) or a reverse index sequence (e.g., Illumina reverse index for MiSeq/NextSeq/HiSeq platforms), a forward barcode sequence or a reverse barcode sequence, a transposase sequence (e.g., corresponding to a transposase binding site for
  • MiSeq/NextSeq/HiSeq platforms a linker, an additional random base, and a sequence for targeting a specific target region (e.g., 16S region, 18S region, ITS region).
  • Illumina sequencing e.g., with a HiSeq platform, with a MiSeq platform, with a NextSeq platform, etc. may be used as part of a sequencing-by-synthesis technique.
  • a microbiota disease and“disease of a microbiota” refer to a change in the composition of a microbiota, including without limitation very small changes in a relative abundance of one or more organisms within the microbiota as compared to a healthy control.
  • Microbiota diseases can result from, e.g., infections with pathogens including viruses, bacteria and eukaryotic parasites, antibiotic exposure as well as other causes.
  • Exemplary microbiota diseases in the GI tract include, but are not limited to, irritable bowel syndrome (IBS), inflammatory bowel disease (IBD), chronic fatigue syndrome, obesity, rheumatoid arthritis, ankylosing spondylitis, colitis, small intestinal cancer, colorectal cancer, metabolic syndrome, cardiovascular disease, Crohn's disease, gastroenteritis, food allergy, Celiac disease, gastrointestinal graft versus host disease, pouchitis, intestinal failure, short bowel syndrome, diarrhea, etc.
  • IBS irritable bowel syndrome
  • IBD inflammatory bowel disease
  • chronic fatigue syndrome obesity
  • rheumatoid arthritis ankylosing spondylitis
  • colitis small intestinal cancer
  • colorectal cancer colorectal cancer
  • metabolic syndrome cardiovascular disease
  • Crohn's disease Crohn's disease
  • gastroenteritis food allergy
  • Celiac disease gastrointestinal graft versus host disease
  • pouchitis intestinal failure
  • short bowel syndrome diarrhea
  • the term“probiotic” refers to a substantially pure bacteria (i.e., a single isolate, of, e.g., live bacterial cells, conditionally lethal bacterial cells, inactivated bacterial cells, killed bacterial cells, spores, recombinant carrier strains), or a mixture of desired bacteria, bacteria components or bacterial extract, or bacterially-derived products (natural or synthetic bacterially-derived products such as, e.g., bacterial antigens or metabolic products) and may also include any additional components that can be administered to a mammal. Such compositions are also referred to herein as a“bacterial inoculant.”
  • prebiotic refers to an agent that increases the number and/or activity of one or more desired bacteria, enhancing their growth.
  • prebiotics useful in the methods of the present disclosure include fructooligosaccharides (e.g., oligofructose, inulin, inulin-type fructans), galactooligosaccharides, human milk
  • HMO oligosaccharides
  • XOS xylo-oligosaccharides
  • AXOS arabinoxylan-oligosaccharides
  • N-acetylglucosamine N-acetylgalactosamine
  • glucose other five- and six-carbon sugars (such as arabinose, maltose, lactose, sucrose, cellobiose, etc.), amino acids, alcohols, resistant starch (RS), and mixtures thereof.
  • the prebiotic may be effective to fully, or partially, restore normal microbiota.
  • VC virtual cluster
  • the term“stimulate” when used in connection with growth and/or activity of bacteria encompasses the term“enhance”.
  • the terms“treat” or“treatment” of a state, disorder or condition include: (1) preventing, delaying, or reducing the incidence and/or likelihood of the appearance of at least one clinical or sub-clinical symptom of the state, disorder or condition developing in a subject that may be afflicted with or predisposed to the state, disorder or condition but does not yet experience or display clinical or subclinical symptoms of the state, disorder or condition; or (2) inhibiting the state, disorder or condition, i.e., arresting, reducing or delaying the development of the disease or a relapse thereof (in case of maintenance treatment) or at least one clinical or sub- clinical symptom thereof; or (3) relieving the disease, i.e., causing regression of the state, disorder or condition or at least one of its clinical or sub-clinical symptoms.
  • the benefit to a subject to be treated is either statistically significant or at least
  • the terms“patient”,“individual”,“subject”,“mammal”, and“animal” are used interchangeably herein and refer to mammals, including, without limitation, human and veterinary animals (e.g., cats, dogs, cows, horses, sheep, pigs, etc.) and experimental animal models.
  • the subject is a human.
  • the term“therapeutically effective amount” refers to the amount of a compound, composition, particle, organism (e.g., a probiotic or a microbiota transplant), etc. that, when administered to a subject for treating (e.g., preventing or ameliorating) a state, disorder or condition, is sufficient to effect such treatment.
  • The“therapeutically effective amount” will vary depending, e.g., on the agent being administered as well as the disease severity, age, weight, and physical conditions and responsiveness of the subject to be treated.
  • the term“acceptable” with reference to excipients, diluents, and carriers refers to molecular entities and compositions that are generally regarded as
  • carrier refers to a diluent, adjuvant, excipient, or vehicle with which the compound is administered.
  • Such pharmaceutical carriers can be sterile liquids, such as water and oils, including those of petroleum, animal, vegetable or synthetic origin, such as peanut oil, soybean oil, mineral oil, sesame oil and the like. Water or aqueous solution saline solutions and aqueous dextrose and glycerol solutions are preferably employed as carriers, particularly for injectable solutions.
  • the carrier can be a solid dosage form carrier, including but not limited to one or more of a binder (for compressed pills), a glidant, an encapsulating agent, a flavorant, and a colorant. Suitable pharmaceutical carriers are described in“Remington’s Pharmaceutical Sciences” by E.W. Martin.
  • the term“about” or“approximately” means within a statistically meaningful range of a value. Such a range can be within an order of magnitude, preferably within 50%, more preferably within 20%, still more preferably within 10%, and even more preferably within 5% of a given value or range.
  • a computing system is intended to include stand alone machines or devices and/or a combination of machines, components, modules, systems, servers, processors, memory, detectors, user interfaces, computing device interfaces, network interfaces, hardware elements, software elements, firmware elements, and other computer-related untis.
  • a computing system can include one or more of a general-purpose computer, a special-purpose computer, a processor, a portable electronic device, a portable electronic medical instrument, a stationary or semi-stationary electronic medical instrument, or other electronic data processing apparatus.
  • data in the database can include numerical values, textual values, computational representation of physical objects (including living, non-living, organic, non-organic objects, and combinations thereof), computational representation of physical phenomina, categorical classification.
  • data in the database can be linked together or otherwise indexed.
  • data in the database can be represented as an indexed matrix.
  • dataset as referred to herein is intended to include information that can be provided to a computing system in a computer readable format.
  • ком ⁇ онент may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer.
  • a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer.
  • an application running on a computing device and the computing device can be a component.
  • One or more components can reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
  • these components can execute from various computer readable media having various data structures stored thereon.
  • the components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets, such as data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal.
  • These computer-executable program instructions may be loaded onto a computing system such as a general-purpose computer, a special-purpose computer, a processor, or other programmable data processing apparatus to produce a machine, such that the instructions that execute on the computer, processor, or other programmable data processing apparatus create means for implementing one or more functions specified in the flow diagram block or blocks.
  • These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means that implement one or more functions specified in the flow diagram block or blocks.
  • embodiments or implementations of the disclosed technology may provide for a computer program product, including a computer-usable medium having a computer-readable program code or program instructions embodied therein, said computer- readable program code adapted to be executed to implement one or more functions specified in the flow diagram block or blocks.
  • the computer program instructions may be loaded onto a computer or other programmable data processing apparatus to cause a series of operational elements or steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions that execute on the computer or other programmable apparatus provide elements or steps for implementing the functions specified in the flow diagram block or blocks.
  • blocks of the block diagrams and flow diagrams support combinations of means for performing the specified functions, combinations of elements or steps for performing the specified functions, and program instruction means for performing the specified functions. It will also be understood that each block of the block diagrams and flow diagrams, and combinations of blocks in the block diagrams and flow diagrams, can be implemented by special-purpose, hardware-based computer systems that perform the specified functions, elements or steps, or combinations of special-purpose hardware and computer instructions.
  • IBD markers can be identified using unidentified viral genome sequences derived from a cohort of healthy subjects and a cohort of subjects diagnosed with IBD. Individuals in each cohort can be human.
  • the viral genome sequences can be unidentified in that they are not taxonomically classified in a viral genome database.
  • the viral genome sequences can be unidentified in that they are considered“viral dark matter” as described herein and would otherwise be understood by a person of ordinary skill in the art.
  • the viral genome sequences can be unidentified at the order level, at the family level, strain level, or any intervening level.
  • the viral genome sequences can be unidentified in that they are classified taxonomically, at some level, in a viral genome database, however, the viral genome sequences have not been compared to the classification database.
  • Viral genome sequences can include sequenced VLPs, molecules that closely resemble viruses, but are non- infectious because they contain no viral genetic material.
  • Viral genome sequences can be derived from gastrointestinal (GI) microbiota samples provided from individuals in each cohort.
  • GI gastrointestinal
  • Metagenomic assembly can be performed on the samples using short reads to resolve viral genomes.
  • the reads can subsequently be aligned to determine abundance, or count of members in each viral genome.
  • the resolved viral genomes can include unidentified viral genome sequences.
  • the IBD markers can also be identified using the identified viral genomes.
  • IBD markers can be identified using unidentified viral genome sequences derived from a cohort of healthy subjects and a cohort of subjects diagnosed with IBD. Protein clustering and protein homology can be performed on the whole virome, including the unidenfied viral genome sequences, from each cohort, resulting in viral clusters.
  • a viral cluster can each include one or more unidentified viral genome sequences.
  • the viral clusters can each respectively be associated with the cohort of healthy subjects, the cohort of subjects diagnosed with IBD, or both cohorts.
  • IBD markers can be identified by comparing viral clusters associated with the healthy cohort to viral clusters associated with the cohort diagnosed with IBD. The IBD markers can thereby be identified without relying on categorization of viral genome sequences in a database.
  • viral clusters associated with IBD can further be associated with one or both of a sub-cohort diagnosed with Crohn’s disease (CD) and a sub-cohort diagnosed with ulcerative colitis (UC).
  • CD Crohn’s disease
  • UC ulcerative colitis
  • the viral genome sequences can be represented as datasets that are readable by a computational device or system.
  • the viral genome sequences can be represented as viral contigs.
  • Each viral genome sequence can be represented in whole or in part.
  • Each viral genome sequence can be represented with resolution at the strain level.
  • Each dataset can be associated with a cohort and/or sub-cohort.
  • the datasets collectively can include a significant number of viral genome sequence reads within the GI microbiota samples provided from the individuals.
  • the dataset is performed by sequencing VLP DNA isolated from GI microbiota sample(s).
  • the VLP DNA may be isolated from GI microbiota samples and prepared by any of the various methods of preparing DNA known in the art, such as those described in Thurber R.V. et al., 2009,
  • the datasets collectively can include a number of viral genome sequence reads within the GI microbiota samples.
  • the reads per sample can include the ranges of 15% to 97%, 25% to 97%, 50% to 97%, 60% to 97%, 70% to 97%, 80% to 97%, and 90% to 97%.
  • the viral genome sequences can be respresented as protein sequences.
  • the viral genome sequences can be represented as a sequence from which protein sequences or protein content can be derived (e.g. genetic sequence).
  • Protein clustering and protein homology can be performed on the whole virome, including the unidenfied viral genome sequences, from each cohort, resulting in viral clusters. To the extent that the whole virome includes identified viral genome sequences, the identified viral genome sequences can be included in the protein clustering and protein homology analysis. Proteins can be derived from each dataset based on the viral genome sequences.
  • the proteins can be organized into protein clusters (PCs) using Markov cluster (MCL)-based protein families, transitive clustering (TransClust), spectral clustering of protein sequences (SCPS), High-Fidelity clustering of protein sequences (HiFix) or other appropriate technique. Additional clustering techniques are described in Bernardes et al., BMC Bioinformatics (2015) 16:34“Evaluation and improvements of clustering algorithms for detecting remote homologous protein families”, incorporated by reference herein.
  • MCL Markov cluster
  • TransClust transitive clustering
  • SCPS spectral clustering of protein sequences
  • HiFix High-Fidelity clustering of protein sequences
  • viral genome sequences or protein sequences derived therefrom can be evaluated pairwise such that each pair is given a similarity score based on the shared protein content between the sequences within the pair.
  • Viral clusters can determined based on the similarity scores.
  • a viral cluster can include one or more unidentified viral genome sequences.
  • a viral cluster can be completely populated by unidentified viral genome sequences.
  • a viral cluster can be unassociated with a known taxon.
  • a viral cluster can represent an unidentified taxon of higher rank than a strain and of lower rank than a family.
  • the viral clusters can each respectively be associated with the cohort of healthy subjects, the cohort of subjects diagnosed with IBD, or both cohorts. Or, said another way, a collection of viral clusters associated with the healthy cohort can be created such that each viral cluster in the collection includes at least one viral genome derived from the healthy cohort, and another collection of viral clusters associated with the cohort diagnosed with IBD can be created such that this collection of viral clusters includes at least one viral genome derived from the cohort diagnosed with IBD.
  • IBD markers can be identified by comparing viral clusters associated with the healthy cohort to viral clusters associated with the cohort diagnosed with IBD. The IBD markers can thereby be identified without relying on categorization of viral genome sequences in a database.
  • Viral clusters associated with IBD can further be associated with one or both of a sub cohort diagnosed with Crohn’s disease (CD) and a sub-cohort diagnosed with ulcerative colitis (UC).
  • CD Crohn’s disease
  • UC ulcerative colitis
  • IBD markers can be defined as a viral cluster that is prevalent in at least one cohort and/or sub-cohort and minimal or absent in at least one other cohort or sub-cohort.
  • the IBD markers can include viral clusters that are found predominantly in the healthy cohort and not in the IBD cohort and viral clusters that are found predominantly in the IBD cohort and not in the healthy cohort.
  • the IBD markers can include viral clusters that are found predominantly in the CD cohort and not the UC cohort and vice-versa, regardless of whether the same viral clusters are predominant in both the healthy and IBD cohorts.
  • IBD marker clusters can identified by comparing the viral clusters associated with the CD sub-cohort to the UC sub-cohort.
  • the IBD marker clusters can include a first subset of IBD marker clusters that are viral clusters more prevalently found in subjects diagnosed with CD compared to UC and a second subset of IBD marker clusters that are viral clusters more prevelantly found in subjects diagnosed with UC compared to CD.
  • the IBD markers can include viral clusters that contain no identified viral sequences.
  • An IBD marker can be unassociated with a known taxon.
  • An IBD marker cluster can represent an unidentified taxon of higher rank than a strain and of lower rank than a family.
  • the identified viral genome sequences can be included in the protein clustering and protein homology analysis.
  • a viral cluster including an identified viral genome sequence can represent an unidentified taxon of higher rank than a strain and of lower rank than a family.
  • a viral cluster including an identified viral genome sequence can be associated with one or more cohorts and/or sub-cohorts.
  • Identified viral genome sequences can be clustered by protein clustering and protein homology to create reference viral clusters.
  • Reference viral clusters can be associated with one or more cohorts and/or sub-cohorts.
  • Identification of IBD marker clusters can include comparing reference viral clusters associated with the healthy cohort to reference viral clusters associated with the cohort diagnosed with IBD.
  • IBD marker clusters can include comparing reference viral clusters associated with CD with reference viral clusters associated with UC.
  • the IBD markers can include viral clusters that contain at least one identified viral sequence.
  • An IBD marker cluster containing an identified viral sequence can include an unidentified grouping of viral sequences.
  • An IBD marker cluster can be an unidentified grouping of viral sequences, optionally comprising an identified viral sequence.
  • An IBD marker cluster containing an identified viral sequence can represent an identified taxon.
  • Identification of the IBD markers as described above can be perfomed on a computing system having one or more processors and a memory with instructions thereon that can be performed by the processor(s).
  • the computing system can receive datasets associated with each cohort and/or sub-cohort that each respectively include unidentified viral genome sequences.
  • the viral genome sequences can be represented as a viral contig or other suitable computer-readable format.
  • the computing system can create viral clusters for each dataset associated with each cohort and/or sub-cohort. Clustering can use a protein clustering algorithm to group like protiens and a protein homology algorithm to group viral genome sequences, including unidentified viral genome sequences, into viral clusters.
  • Viral clusters can be compared across cohorts and/or sub cohorts to identify marker clusters.
  • Marker clusters can represent clusters highly represented in at least one cohort and/or sub-cohort that is also marginally represented in at least one other cohort and/or sub- cohort.
  • Identification of the marker clusters can be performed using machine learning.
  • the datasets can include an associaton for each viral cluster to a known variable, the known variable being the health state of the patient (healthy, IBD diagnosis, and optionally CD diagnosis and/or UC diagnosis).
  • the system can determine a correlated set of viral clusters from the total set of viral clusters. Viral clusters having a strong correlation to the presence or absence of a given health state can be identified as viral clusters.
  • Identification of the marker clusters can be performed using a beta diversity analysis on the viral clusters.
  • a count table can be created by summing the counts of the viral genomic sequences (potentially represented as viral contigs) in each viral cluster.
  • the count table can be subjected to an ordination method to determine beta diversity.
  • the beta diversity analysis can be performed through principal coordinates analysis (PCoA), principal components analysis (PC A), non-metric multidimensional scaling (NMDS), canonical correspondence analysis (CCA), redundancy analysis (RDA), and/or other suitable technique.
  • Identification of the marker clusters can be performed using a calculation of differential abundance of viral clusters across cohorts and/or sub-cohorts.
  • the calculation can be executed using a test or software package such as available through DESeq2, t-test, Wilcoxon rank-sum test, edgeR package, metagenomieSeq package, ANCOM package, and/or other suitable technique, algorithm, or software package.
  • a test or software package such as available through DESeq2, t-test, Wilcoxon rank-sum test, edgeR package, metagenomieSeq package, ANCOM package, and/or other suitable technique, algorithm, or software package.
  • FIG 16 is block diagram illustrating an example system 100 for identifying IBD marker clusters.
  • the system 100 can include a non-transient memory 120 with executable instructions thereon to perform methods for identifying IBD marker clusters as described herein, a processor 130 in communication with the memory 120 capable of receiving and executing the instructions from the memory 120, to identy IBD marker clusters, and an output interface 140 capable of outputting a representation of the IBD marker clusters identified by the processor 130.
  • the system can be in communication with a data store 110 on which cohort datasets are stored.
  • the processor 130 can be configured to receive the datasets from the datastore 110, receive instructions from the memory 120, compute IBD marker clusters by performing operations on the datasets according to the executable instructions, and provide a representation of the IBD marker clusters to the output interface 140.
  • the representation of the IBD marker clusters can be a computer-readable representation and/or a human user interface.
  • output interface 140 can provide a means for conveying a computer readable representation of the IBD marker clusters to a digital storage medium such that the IBD marker clusters can be accessed by an IBD diagnosis device such as an example IBD diagnosis device as described herein.
  • the system 100 can be contained within a singular device, potentially even a singular semiconductor chip (e.g.
  • the data store 110 can be provided by a data server at a location remote to the processor 130 via a network (e.g. internet), and the processor 130 can be located on a computing device remote from the memory 120 and the executable instructions can be transmitted from through a network (e.g. internet) to the processor.
  • a network e.g. internet
  • a subject can be diagnosed with IBD by analyzing
  • Viral genome sequences can be obtained from the subject through a fecal sample or other means.
  • the viral genome sequences can be derived from a GI microbiota sample obtained from the subject.
  • the viral genome sequences can include unidentified viral genome sequences.
  • the viral genome sequences can be represented as a subject dataset.
  • the subject dataset can be in a computer readable format.
  • the analysis can include clustering the viral genome sequences from the subject, including the unidentified viral genome sequences obtained from the patient. Clustering of the subject’s viral genome sequences can be carried out similar to as described above.
  • the collection of viral clusters created based on the subject’s viral genome sequences can be compared to IBD markers.
  • the IBD markers can be identified through analysis of a healthy cohort and a cohort diagnosed with IBD similar to as described above.
  • the subject can be diagnosed with IBD based on analysis and comparison of viral genome sequences alone.
  • bacteria derived from the subject can be analyzed for the purpose of IBD diagnosis of the subject and analysis of the viral genome sequences can be performed in conjunction such that the combination of bacterial and viral analysis can be used to diagnose the subject with IBD.
  • the marker clusters may comprise one or more viral clusters from taxa Siphoviridae, Myoviridae, Podoviridae, CrAss-like, or Microviridae.
  • the marker clusters may comprise viral clusters from Siphoviridae.
  • the marker clusters may comprise viral clusters from Myoviridae.
  • the marker clusters may comprise viral clusters from Podoviridae.
  • the marker clusters may comprise CrAss-like viral clusters.
  • the marker clusters may comprise viral clusters from Microviridae.
  • the marker clusters may comprise one or more of the following exemplary viral clusters: vc2, vc6, vc7, vc13, vc14, vc15, vc17, vc19, vc21, vc22, vc23, vc24, vc25, vc28, vc29, vc36, vc37, vc38, vc39, vc40, vc42, vc45, vc48, vc53, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc75, vc76, vc77, vc78, vc79, vc80, vc82, vc84, vc85,
  • an increased abundance of one or more of the following marker clusters in the subject sample is indicative of the presence of IBD in the subject: vc2, vc13, vc14, vc15, vc17, vc21, vc22, vc36, vc40, vc48, vc53, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc77, vc78, vc79, vc80, vc85, vc88, vc89, vc91, vc94, vc95, vc97, vc102, vc108, vc1 l3, vc1 l 5, vc1 l7, vc1 l8, vc122, vc123, vc130,
  • the abundance may be increased by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • an increased abundance of one or more of the following marker clusters in the subject sample is indicative of the presence of Crohn’s Disease (CD) in the subject: vc15, vc66, vc71, vc73, vc77, vc78, vc79, vc80, vc91, vc94, vc108, vc1 l3, vc1 l7, vc1 l 8, vc132, vc142, vc155, vc160, vc178, vc232, vc264, vc281, vc298, and vc420.
  • CD Crohn’s Disease
  • an increased abundance of one or more viral clusters selected from vc28 in the subject sample as compared to a healthy control is indicative of the presence of CD in the subject.
  • the abundance may be increased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • an increased abundance of one or more of the following marker clusters in the subject sample, as compared to that of a sample from a healthy patient or control, is indicative of the presence of ulcerative colitis (UC) in the subject: vc2, vc17, vc21, vc22, vc53, vc70, vc74, vc85, vc88, vc89, vc1 l5, vc122, vc123, vc130, vc152, vc161, vc175, vc181, vc205, vc218, vc263, and vc413.
  • UC ulcerative colitis
  • an increased abundance of viral cluster vc2 in the subject sample as compared to a healthy control is indicative of the presence of UC in the subject.
  • an increased abundance of one or more viral clusters selected from vc38 vc46, vc48, vc54, vc57, vc62, vc64, vc69, vc71, vc108, vc111, vc114, vc115, vc128, vc159, vc162, vc215, vc220, vc242, vc340, vc374, and vc392 in the subject sample as compared to a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject.
  • an increased abundance of one or more viral clusters selected from vc16, vc119, and vc163 in the subject sample as compared to a patient with a flare-up of ulcerative colitis (UC) is indicative of the presence of UC in remission in the subject.
  • the abundance may be increased by 10-99%, 10-20%, 20- 30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • a decreased abundance of one or more of the following marker clusters in the subject sample is indicative of the presence of IBD in the subject: vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1Ol, vc103, vc104, vc109, vc1 12, vc124, vc136, vc138, vc143, vc154, v
  • a decreased abundance of one or more viral clusters selected from vc7, vc25, vc47, and vc64 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
  • the abundance may be decreased 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100- fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • a decreased abundance of one or more of the following marker clusters in the subject sample is indicative of the presence of Crohn’s Disease (CD) in the subject: vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284.
  • CD Crohn’s Disease
  • the abundance may be decreased by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • an increased abundance of vc98 and/or vc103 in the subject sample is indicative of the presence of ulcerative colitis (UC) in the subject.
  • the abundance may be increased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100- fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • the dataset may be prepared by sequencing VLP DNA isolated from GI microbiota sample(s).
  • a fourth dataset may be obtained that represents bacterial sequences derived from the GI microbiota sample obtained from the subject, with the fourth dataset for the presence of bacterial taxa associated with IBD.
  • the fourth dataset may be obtained by sequencing 16S rDNA or a V region (e.g., V4 region) of 16S rDNA in the GI microbiota sample.
  • the presence of IBD in the subject may be determined based at least in part on the comparison of the fourth dataset to at least one of a healthy control and a control diagnosed with IBD.
  • the GI microbiota sample is a fecal sample. In various embodiments, the GI microbiota sample is a cecal sample. In various embodiments, the GI microbiota sample is an ileal sample. In various embodiments, the GI microbiota sample is a colonic microbiota sample. In various embodiments, microbiota from other sites can be used, such as oral microbiota samples, nasal microbiota samples, skin microbiota samples, and vaginal microbiota samples.
  • the subject is human.
  • the methods may further comprise administering an IBD treatment to the subject.
  • IBD treatments include conventional treatments such as mesalamine, steroids, immunomodulators, and dietary modification.
  • IBD treatments may also comprise administration of compositions comprising viruses and bacteria, as described below.
  • a method for preventing and/or treating IBD in a subject An effective amount of one or more viruses from any of the following viral clusters is administered: vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1Ol, vc103, vc104, vc109, vc1 l2, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc2
  • the method may further comprise administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,
  • Flavonifr actor Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
  • the probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,
  • Flavonifr actor Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.
  • the prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.
  • a method for preventing and/or treating IBD in a subject in need thereof comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
  • the method may further comprise administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium,
  • the probiotic composition comprises one or more bacterial strains from the genus selected from
  • the prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.
  • An effective amount of one or more viruses from any of the following viral clusters is administered: vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284.
  • a method for preventing and/or treating CD in a subject in need thereof An effective amount of one or more viruses from any of the following viral clusters is administered: vc1 O, vc23, and vc39.
  • the method may further comprise administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium,
  • the probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter.
  • the prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.
  • a method for preventing and/or treating UC in a subject in need thereof An effective amount of one or more viruses from any of the following viral clusters is administered: vc1 O, vc23, and vc39.
  • the method may further comprise administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium,
  • the probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter.
  • the prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.
  • a method for preventing and/or treating UC in a subject An effective amount of a virus from a viral cluster vc98 and/or vc103 is administered.
  • the method may further comprise administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof.
  • the composition stimulates growth and/or activity in the GI microbiota of the subject of the bacterial genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus.
  • the probiotic composition comprises one or more bacterial strains from the genus Akkermansia.
  • the prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.
  • a method for preventing and/or treating IBD in a subject in need thereof An effective amount of a probiotic or a prebiotic composition or a combination thereof is administered to the subject.
  • the composition stimulates growth and/or activity of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus,
  • Methanobrevibacter Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,
  • Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
  • the probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,
  • Flavonifractor Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.
  • the prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.
  • a method for preventing and/or treating CD in a subject in need thereof An effective amount of a probiotic or a prebiotic composition or a combination thereof is administered to the subject.
  • the composition stimulates growth and/or activity of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter,
  • the probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter.
  • the prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.
  • a method for preventing and/or treating UC in a subject in need thereof An effective amount of a probiotic or a prebiotic composition or a combination thereof is administered to the subject.
  • the composition stimulates growth and/or activity of the genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus.
  • the probiotic composition comprises one or more bacterial strains from the genus Akkermansia.
  • the prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.
  • Any of the above methods may further comprise administering to the subject additional diagnostic tests for IBD, CD and/or UC.
  • Any of the above methods may further comprise enrolling the subject in a clinical trial.
  • Bacterial taxa associated with IBD may comprise one or more of the following bacterial genera: Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, Flavonifractor, Catenibacterium, Ruminococcus, Coprococcus,
  • Bacterial taxa associated with IBD may comprise a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
  • An increased abundance of one or more of the following bacterial genera in the subject sample as compared to a healthy control may be indicative of the presence of IBD in the subject: Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, and Flavonifr actor.
  • the abundance may be increased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • An increased abundance of one or more of the following bacterial genera in the subject sample as compared to a healthy control may be indicative of the presence of Crohn’s Disease (CD) in the subject: Clostridium XlVa, Blautia, Megasphaera, and Fusobacterium.
  • CD Crohn’s Disease
  • the abundance may be increased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • An increased abundance of the bacterial genus Flavonifractor in the subject sample as compared to a healthy control may be indicative of the presence of ulcerative colitis (UC) in the subject.
  • UC ulcerative colitis
  • An increased abundance of one or more bacterial species selected from Bacteroides fragilis and Ruminococcus gnavus in the subject sample as compared to a healthy control may be indicative of the presence of UC in the subject.
  • An increased abundance of Ruminococcus gnavus in the subject sample as compared to a control sample from a patient with UC in remission may be indicative of the presence of a flare-up of UC in the subject.
  • Dorea longicatena or Coprococcus comes in the subject sample as compared to a control sample from a patient with a flare-up of UC in remission may be indicative of the presence of UC in remission in the subject.
  • the abundance may be increased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • a decreased abundance of one or more of the following bacterial genera in the subject sample as compared to a healthy control may be indicative of the presence of IBD in the subject: Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV,
  • Faecalibacterium Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.
  • the abundance may be decreased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100- fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • a decreased abundance of one or more of the following bacterial genera in the subject sample as compared to a healthy control may be indicative of the presence of Crohn’s Disease (CD) in the subject: Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter.
  • CD Crohn’s Disease
  • the abundance may be decreased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150- 200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • a decreased abundance e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000- fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10- fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500- fold, or by about 1,000-fold, of the bacterial genus Akkermansia in the subject sample as compared to a healthy control may be indicative of the presence of ulcerative colitis (UC) in the subject.
  • UC ulcerative colitis
  • the analysis of the collection of the viral clusters created based on the subject’s viral genome sequences can include identifying common clusters present in both the collection of viral clusters associated with the subject and present in the collection of marker clusters. For each common cluster, a relative abundance of members within that cluster found in the subject’s GI microbiota sample can be determined. For each common cluster, a correlation value can be associated with each common cluster in the collection of marker clusters. The comparision of the viral clusters derived from the subject to the marker clusters can include comparing the relative abundance of members within each common cluster associated with the patient to the correlation value of each common cluster in the collection of marker clusters.
  • the subject can be diagnosed with Crohn’s disease if there is a decrease in the abundance of a virus of a viral taxon listed in Table 13 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • a reference amount of the abundance of the virus in one or more healthy subjects e.g., by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%,
  • the subject can be diagnosed with ulcerative colitis if there is a decrease in the abundance of a virus of a viral taxon listed in Table 14 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • a reference amount of the abundance of the virus in one or more healthy subjects e.g., by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90
  • the subject can be diagnosed with Crohn’s disease if there is an increase in the abundance of a virus of a viral taxon listed in Table 15 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • a reference amount of the abundance of the virus in one or more healthy subjects e.g., 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%
  • the subject can be diagnosed with ulcerative colitis if there is an increase in the abundance of a virus of a viral taxon listed in Table 16 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • the subject can be diagnosed with Crohn’s disease if there is an increase in the abundance of bacteria of a bacterial taxon listed in Table 15 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • a reference amount of the abundance of the virus in one or more healthy subjects e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%
  • the subject can be diagnosed with ulcerative colitis if there is an increase in the abundance of bacteria of a bacterial taxon listed in Table 16 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • a reference amount of the abundance of the virus in one or more healthy subjects e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%,
  • the subject can be diagnosed with Crohn’s disease if there is an increase in the abundance of bacteria of a bacterial taxon listed in Table 17 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • a reference amount of the abundance of the virus in one or more healthy subjects e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%
  • the subject can be diagnosed with ulcerative colitis if there is an increase in the abundance of bacteria of a bacterial taxon listed in Table 18 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • a reference amount of the abundance of the virus in one or more healthy subjects e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%,
  • the subject can be diagnosed with IBD (e.g., Crohn’s disease or ulcerative colitis) if in the subject the abundance of one or more viruses in vc23 is reduced as compared the abundance of the same one or more viruses in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • IBD e.g., Crohn’s disease or ulcerative colitis
  • the subject can be diagnosed with IBD (e.g., Crohn’s disease or ulcerative colitis) if in the subject the abundance of one or more viruses in vc39 is reduced as compared the abundance of the same one or more viruses in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
  • IBD e.g., Crohn’s disease or ulcerative colitis
  • the subject can be diagnosed with IBD (e.g., Crohn’s disease or ulcerative colitis) if in the subject the abundance of one or more viruses in vc1O is reduced as compared the abundance of the same one or more viruses in one or more healthy subjects.
  • IBD e.g., Crohn’s disease or ulcerative colitis
  • a kit for determining the presence of IBD in a subject can include a device to receive viral genome sequences, including unidentified viral genome sequences, from an individual subject and diagnose the subject for IBD based at least in part on the virome marker clusters.
  • the viral genome sequences can be derived from a GI microbiota sample provided by the subject.
  • the GI microbiota sample can be a fecal sample.
  • the GI microbiota sample can be a cecal sample.
  • the GI microbiota sample can be an ileal sample.
  • the GI microbiota sample can be a colonic microbiota sample.
  • microbiota from other sites can be used, such as oral microbiota samples, nasal microbiota samples, skin microbiota samples, and vaginal microbiota samples.
  • Diagnosis can include clustering the received viral genome sequences and comparing the subject’s viral genome clusters to the marker clusters.
  • the viral genome sequences can be clustered by protein clustering and protein homology as described herein.
  • the device can also analyse bacteria from the subject for the purpose of IBD diagnosis.
  • the bacteria can be derived from the same GI microbiota sample provided by the subject used to obtain the viral genome sequences or a separate GI microbiota sample.
  • the subject can be diagnosed for IBD based on the analysis of the bacteria and/or the analysis of the viral genome sequences.
  • the IBD diagnosis can include a diagnosis for ulcerative colitis and/or Crohn’s disease.
  • FIG 17 is a block diagram illustration of a system or device 200 (referred to herein for simplicity as“device”) that be used as part of a kit for detecting IBD or health in a subject.
  • the device 200 can include a dataset input module 210 configured to receive a dataset derived from a GI microbiota sample of a subject, a clustering module 220 configured to determine viral genome sequences within the subject’s dataset and cluster the viral genome sequences into viral clusters, a marker cluster input module 230 configured to receive an input that is based on IBD marker clusters, a cluster comparison module 240 that is configured to compare the subject’s viral clusters to the input representation of the marker clusters, and an output interface 250 configured to provide an indication of health or disease based on the comparison of the subject’s viral clusters to the representation of the marker clusters.
  • the modules 210, 220, 230, 240 can be implemented by a computing system in hardware and/or software according to the principles described herein and as would be appreciated and understood by a person of
  • the dataset input module when implemented at least in part by hardware, can include a wired or wireless receiver capable of receiving an electronic signal representative of the subject’s dataset.
  • the clustering module 220 when implemented at least in part of hardware, can include a processor in communication with a memory with instructions thereon to create viral clusters based on and/or associated with the subject’s dataset according the principles described herein.
  • the marker cluster input module 230 when implemented at least in part by hardware, can include a wired or wireless receiver capable of receiving an electronic signal representative of IBD marker clusters. Additionally, or alternatively, the marker cluster input module 230 can include a memory store with a representation of the IBD marker clusters stored thereon.
  • the cluster comparison module 240 when implemented at least in part by hardware, can include memory with instructions thereon to compare the subject’s viral clusters to the IBD marker clusters and provide as an output and indication of health or disease.
  • the output interface 250 when implemented at least in part by hardware, can include a wired or wireless transmitter configured to transmit an electronic signal representative of the indication nfo health or disease. Additionally, or alternatively, the output interface 250 can include a user interface configured to provide an auditory, visual, or other sensory indication to a user that can be interpreted as an indication of health or disease.
  • the disclosure provides a method for treating dysbiosis in the gastrointestinal tract of a subject (e.g., human) in need thereof, said method comprising administering to said subject a therapeutically effective amount of any of the viruses described herein.
  • the virus is from any viral taxon listed in Table 13. In some embodiments, the virus is from any viral taxon listed in Table 14
  • the disclosure provides a method for treating dysbiosis in the gastrointestinal tract of a subject (e.g., human) in need thereof, said method comprising administering to said subject a therapeutically effective amount of an inhibitor of, or an agent that specifically targets, any of the viruses described herein.
  • the virus is from any viral taxon listed in Table 11. In some embodiments, the virus is from any viral taxon listed in Table 12
  • the disclosure provides a method for treating dysbiosis in the gastrointestinal tract of a subject (e.g., human) in need thereof, said method comprising administering to said subject a therapeutically effective amount of any of the bacteria described herein.
  • the bacteria is from any bacterial taxon listed in Table 17. In some embodiments, the bacteria is from any bacterial taxon listed in Table 18.
  • the disclosure provides a method for treating dysbiosis in the gastrointestinal tract of a subject (e.g., human) in need thereof, said method comprising administering to said subject a therapeutically effective amount of an inhibitor of, or agent that specifically targets, any of the bacteria described herein.
  • the bacteria is from any bacterial taxon listed in Table 15. In some embodiments, the bacteria is from any bacterial taxon listed in Table 16.
  • the disclosure provides a method for treating a gastrointestinal (GI) disorder in a subject (e.g., human) in need thereof, said method comprising administering to said subject a therapeutically effective amount of any of the virus compositions described herein.
  • GI disorders include, e.g., inflammatory bowel disease (IBD), ulcerative colitis, Crohn's disease, irritable bowel syndrome (IBS), infectious gastroenteritis, non-infectious gastroenteritis, food allergy, and gastrointestinal graft versus host disease.
  • IBD inflammatory bowel disease
  • IBS irritable bowel syndrome
  • infectious gastroenteritis non-infectious gastroenteritis
  • food allergy e.g., gastrointestinal graft versus host disease.
  • gastrointestinal graft versus host disease e.g., gastrointestinal graft versus host disease.
  • the disclosure also provides pharmaceutical compositions comprising the viruses and/or bacteria of the disclosure.
  • compositions disclosed herein can be formulated into a variety of forms and administered by a number of different means.
  • useful routes of delivery include oral, topical, rectal, mucosal, sublingual, nasal, intravenous, subcutaneous, and via naso/oro-gastric gavage.
  • the active agent may be systemic after administration or may be localized by the use of regional administration, intramural administration, or use of an implant that acts to retain the active dose at the site of implantation.
  • the active agent, vector, virus, bacteriophage, particle, or a bacterial inoculant can be mixed with a carrier and (for easier delivery to the digestive tract) applied to liquid or solid food, or feed or to drinking water.
  • the carrier material should be non-toxic to the
  • Non-limiting examples of formulations useful in the methods of the present disclosure include oral capsules and saline suspensions for use in feeding tubes, transmission via nasogastric tube, or enema. If live virus, bacteriophage or bacteria are used, the carrier should preferably contain an ingredient that promotes viability of the virus/bacteriophage/bacteria during storage.
  • the formulation can include added ingredients to improve palatability, improve shelf-life, impart nutritional benefits, and the like. If a reproducible and measured dose is desired, the formulation can be administered by a rumen cannula.
  • the formulation used in the methods of the disclosure further comprises a buffering agent. Examples of useful buffering agents include saline, sodium bicarbonate, milk, yogurt, infant formula, and other dairy products.
  • Bacteria-containing formulations may also comprise one or more prebiotics which promote growth and/or immunomodulatory activity of the bacteria in the formulation. While it is possible to use a compound, vector, virus, bacteriophage, particle, or a bacterial inoculant of the present disclosure for therapy as is, it may be preferable to administer it in a pharmaceutical formulation, e.g., in admixture with a suitable pharmaceutical excipient, diluent or carrier selected with regard to the intended route of administration and standard pharmaceutical practice. The excipient, diluent and/or carrier must be“acceptable” in the sense of being compatible with the other ingredients of the formulation and not deleterious to the recipient thereof.
  • Acceptable excipients, diluents, and carriers for therapeutic use are well known in the pharmaceutical art, and are described, for example, in Remington: The Science and Practice of Pharmacy. Lippincott Williams & Wilkins (A.R. Gennaro edit. 2005).
  • the choice of pharmaceutical excipient, diluent, and carrier can be selected with regard to the intended route of administration and standard pharmaceutical practice.
  • oral delivery is preferred for delivery to the digestive tract because of its ease and convenience, and because oral formulations readily accommodate additional mixtures, such as milk, yogurt, and infant formula.
  • Oral delivery may also include the use of nanoparticles that can be targeted, e.g., to the GI tract of the subject, such as those described in Yun et al., Adv Drug Deliv Rev. 2013, 65(6):822-832 (e.g., mucoadhesive nanoparticles, negatively charged carboxylate- or sulfate- modified particles, etc.).
  • nanoparticles that can be targeted, e.g., to the GI tract of the subject, such as those described in Yun et al., Adv Drug Deliv Rev. 2013, 65(6):822-832 (e.g., mucoadhesive nanoparticles, negatively charged carboxylate- or sulfate- modified particles, etc.).
  • Non-limiting examples of other methods of targeting delivery of compositions to the GI tract are discussed in U.S. Pat. Appl. Pub. No.
  • pH sensitive compositions such as, e.g., enteric polymers which release their contents when the pH becomes alkaline after the enteric polymers pass through the stomach
  • compositions for delaying the release e.g., compositions which use hydrogel as a shell or a material which coats the active substance with, e.g., in vivo degradable polymers, gradually hydrolyzable polymers, gradually water-soluble polymers, and/or enzyme degradable polymers]
  • bioadhesive compositions which specifically adhere to the colonic mucosal membrane, compositions into which a protease inhibitor is incorporated, a carrier system being specifically decomposed by an enzyme present in the colon).
  • the active ingredient(s) can be administered in solid dosage forms, such as capsules, tablets, and powders, or in liquid dosage forms, such as elixirs, syrups, and suspensions.
  • a capsule typically comprises a core material comprising a bacterial composition and a shell wall that encapsulates the core material.
  • the core material comprises at least one of a solid, a liquid, and an emulsion.
  • the shell wall material comprises at least one of a soft gelatin, a hard gelatin, and a polymer.
  • Suitable polymers include, but are not limited to: cellulosic polymers such as hydroxypropyl cellulose, hydroxyethyl cellulose, hydroxypropyl methyl cellulose (HPMC), methyl cellulose, ethyl cellulose, cellulose acetate, cellulose acetate phthalate, cellulose acetate trimellitate,
  • acrylic acid polymers and copolymers such as those formed from acrylic acid, methacrylic acid, methyl acrylate, ammonio methylacrylate, ethyl acrylate, methyl methacrylate and/or ethyl methacrylate (e.g., those copolymers sold under the trade name “Eudragit”); vinyl polymers and copolymers such as polyvinyl pyrrolidone, polyvinyl acetate, polyvinylacetate phthalate, vinylacetate crotonic acid copolymer, and ethylene-vinyl acetate copolymers; and shellac (purified lac).
  • at least one polymer functions as taste-masking agents.
  • the active component(s) can be encapsulated in gelatin capsules together with inactive ingredients and powdered carriers, such as glucose, lactose, sucrose, mannitol, starch, cellulose or cellulose derivatives, magnesium stearate, stearic acid, sodium saccharin, talcum, magnesium carbonate.
  • inactive ingredients and powdered carriers such as glucose, lactose, sucrose, mannitol, starch, cellulose or cellulose derivatives, magnesium stearate, stearic acid, sodium saccharin, talcum, magnesium carbonate.
  • additional inactive ingredients that may be added to provide desirable color, taste, stability, buffering capacity, dispersion or other known desirable features are red iron oxide, silica gel, sodium lauryl sulfate, titanium dioxide, and edible white ink.
  • Similar diluents can be used to make compressed tablets. Both tablets and capsules can be manufactured as sustained release products to provide for continuous release of medication over a period of hours. Compressed tablets can be
  • Formulations suitable for parenteral administration include aqueous and nonaqueous, isotonic sterile injection solutions, which can contain antioxidants, buffers, bacteriostats, and solutes that render the formulation isotonic with the blood of the intended recipient, and aqueous and nonaqueous sterile suspensions that can include suspending agents, solubilizers, thickening agents, stabilizers, and preservatives.
  • powders or granules embodying the bacterial and viral compositions disclosed herein can be incorporated into a food product.
  • the food product is a drink for oral administration.
  • suitable drink include fruit juice, a fruit drink, an artificially flavored drink, an artificially sweetened drink, a carbonated beverage, a sports drink, a liquid diary product, a shake, an alcoholic beverage, a caffeinated beverage, infant formula and so forth.
  • suitable means for oral administration include aqueous and nonaqueous solutions, emulsions, suspensions and solutions and/or suspensions reconstituted from non-effervescent granules, containing at least one of suitable solvents, preservatives, emulsifying agents, suspending agents, diluents, sweeteners, coloring agents, and flavoring agents.
  • the food product can be a solid foodstuff. Suitable examples of a solid foodstuff include without limitation a food bar, a snack bar, a cookie, a brownie, a muffin, a cracker, an ice cream bar, a frozen yogurt bar, and the like.
  • the bacterial and viral compositions disclosed herein are incorporated into a therapeutic food.
  • the therapeutic food is a ready-to- use food that optionally contains some or all essential macronutrients and micronutrients.
  • the compositions disclosed herein are incorporated into a supplementary food that is designed to be blended into an existing meal.
  • the supplemental food contains some or all essential macronutrients and micronutrients.
  • the bacterial compositions disclosed herein are blended with or added to an existing food to fortify the food's protein nutrition. Examples include food staples (grain, salt, sugar, cooking oil, margarine), beverages (coffee, tea, soda, beer, liquor, sports drinks), snacks, sweets and other foods.
  • compositions and formulations of the disclosure will vary widely, depending upon the nature of the disease, the patient’s medical history, the frequency of administration, the manner of administration, the clearance of the agent from the host, and the like.
  • the initial dose may be larger, followed by smaller maintenance doses.
  • the dose may be administered as infrequently as weekly or biweekly, or fractionated into smaller doses and administered daily, semi- weekly, etc., to maintain an effective dosage level.
  • a method for identifying a plurality of viral marker clusters for determining the presence of inflammatory bowel disease (IBD) using viral genome sequences comprising: obtaining a first dataset representing a first plurality of viral genome sequences derived from gastrointestinal (GI) microbiota samples of a healthy cohort;
  • creating a first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort; creating a second plurality of viral clusters by using protein clustering to group like proteins derived from the second dataset and by using protein homology to group viral genome sequences of the second dataset, each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD; and
  • identifying a plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters.
  • step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises using machine learning to identify the plurality of marker clusters.
  • step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises identifying the plurality of marker clusters unassociated with a known taxon.
  • each of the viral clusters in the plurality of marker clusters respectively represent an unidentified taxon of higher rank than a strain and of lower rank than a family.
  • step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises performing beta diversity analysis on the first plurality of viral clusters and the second plurality of viral clusters.
  • performing the beta diversity analysis comprises performing a scaling and ordination technique selected from a group consisting of principal coordinates analysis (PCoA), principal components analysis (PCA), non-metric multidimensional scaling (NMDS), canonical correspondence analysis (CCA), and redundancy analysis (RDA).
  • PCoA principal coordinates analysis
  • PCA principal components analysis
  • NMDS non-metric multidimensional scaling
  • CCA canonical correspondence analysis
  • RDA redundancy analysis
  • step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises calculating differential abundance of viral clusters in the first plurality of viral clusters and the second plurality of viral clusters.
  • the first dataset further represents a first plurality of identified viral genome sequences derived from the healthy cohort
  • the second dataset further represents a second plurality of identified viral genome sequences derived from the cohort diagnosed with IBD
  • step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters further comprises identifying the plurality of marker clusters by comparing a combination of the first plurality of viral clusters and the first plurality of reference viral clusters to a combination of the second plurality of viral clusters and the second plurality of reference viral clusters.
  • the first plurality of identified viral genome sequences are associated with a viral taxonomic category present in a viral genome database
  • a method for determining the presence of inflammatory bowel disease (IBD) in a subject comprising:
  • each viral cluster in the plurality of subject viral clusters comprising one or more viral genome sequences derived from the subject;
  • the marker clusters comprise one or more viral clusters from taxa Siphoviridae, Myoviridae, Podoviridae, CrAss-like, or Microviridae. 21.
  • the plurality of marker clusters comprises one or more viral clusters selected from vc2, vc6, vc7, vc13, vc14, vc15, vc17, vc19, vc21, vc22, vc23, vc24, vc25, vc28, vc29, vc36, vc37, vc38, vc39, vc40, vc42, vc45, vc48, vc53, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc75, vc76, vc77, vc78, vc79, vc80, vc82, vc84
  • bacterial taxa associated with IBD comprise one or more bacterial genera selected from Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, Flavonifractor, Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Bamesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Dorea, Roseburia, Odoribacter, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
  • GI microbiota sample is a fecal sample, a cecal sample, an ileal sample, or a colonic microbiota sample.
  • kits for determining the presence of inflammatory bowel disease (IBD) in a subject comprising:
  • each viral cluster in the plurality of viral clusters comprising one or more unidentified viral genome sequences of the plurality of unidentified genome sequences;
  • kits of embodiment 54 or 55, wherein the GI microbiota sample is one or more of group consisting a fecal sample, a cecal sample, an ileal sample, and a colonic microbiota sample.
  • IBD ulcerative colitis
  • a system comprising:
  • processors one or more processors
  • a memory in communication with the one or more processors and storing instructions thereon that, when executed by the one or more processors, are configured to cause the system to:
  • first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort;
  • each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD;
  • a method for preventing and/or treating inflammatory bowel disease (IBD) in a subject in need thereof comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc101, vc103, vc104, vc109, vc1 l2, vc124, vc136, vc138, vc143, vc154, vc190, v
  • a method for preventing and/or treating IBD in a subject in need thereof comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
  • a method for preventing and/or treating Crohn's disease (CD) in a subject in need thereof comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284.
  • a virus from a viral cluster selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc
  • a method for preventing and/or treating CD in a subject in need thereof comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
  • a method for preventing and/or treating ulcerative colitis (UC) in a subject in need thereof comprising administering to the subject an effective amount of a virus from a viral cluster vc98 and/or vc103.
  • UC ulcerative colitis
  • a method for preventing and/or treating CD in a subject in need thereof comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
  • invention 65 further comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of the bacterial genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus. 70.
  • a method for preventing and/or treating IBD in a subject in need thereof comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
  • a method for preventing and/or treating CD in a subject in need thereof comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
  • a method for preventing and/or treating UC in a subject in need thereof comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of the genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus.
  • probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.
  • probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter.
  • probiotic composition comprises one or more bacterial strains from the genus Akkermansia.
  • a method for preventing and/or treating UC in a subject in need thereof comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.
  • a method for preventing and/or treating UC in a subject in need thereof comprising administering to the subject an effective amount of a probiotic comprising one or more of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.
  • the described approach provides insight into the viral dark matter in human health and disease.
  • the methods also allow cohort comparisons and overcome problems associated with the high level of inter-individual variation.
  • this approach provides a framework for identifying novel virome biomarkers and targets for further wet-lab
  • vOTUs viral contigs made non-redundant at 90% identity over 90% of the length
  • vConTACT an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ, 5, e3243
  • the method can allow for improved comparisons of cohorts for a multiude of disease conditions and body sites by enabling the analysis of the whole virome with a reduction of the level of uniqueness at strain level; differently abundant VCs across cohorts can themselves be viral marker clusters or can lead to the identification of viral marker clusters that can be used to classify individual subject samples as indicative of health or disease.
  • Gastroenterology, 139, 1844-1854 el which correlated towards the shift in beta diversity as previously found in subjects with IBD.
  • the trends observed in both alpha and beta diversity are also in agreement with previous reports (Halfvarson et al, 2017. Nat Microbiol, 2, 17004; Manichanh et al., 2006. Gut, 55, 205-11 ; Dicksved et al., 2008. ISME J, 2, 716-27; Pascal et al., 2017. Gut, 66, 813-822) and the previous analysis of this dataset (Norman et al, 2015. Cell, 160, 447-60), thus providing validity to the cohort tested and the methods.
  • a common trend observed in the present virome analysis is the increased severity in the virome alteration of CD patients in comparison to UC, a finding replicated in the 16S.
  • CD is located further from the healthy controls while subjects with CD are also the least stable cohort.
  • Subjects with CD also had more differentially abundant RSVs than UC versus controls in the bacteriome, together with being located furthest from controls on the PCoA.
  • This CD cohort had the least beta stability which may also be linked to having the lowest diversity.
  • CD had a significantly higher diversity of Caudovirales and an increased number of reads aligned to lysogenic VCs when compared to healthy controls.
  • Microbiome, 6, 119 can be undertaken. More modern methods such as the accel-NGS prep kit may remove the need for amplification and may provide for more reliable indication of diversity (Roux et al, 2016. Towards quantitative viromics for both double-stranded and single-stranded DNA viruses. PeerJ, 4, e2777).
  • This study provides a detailed analysis of whole virome composition comparing CD/UC and healthy controls date. It also represents a detailed study of the unidentified majority of the virome in human disease and provides insights, paving the way for better understanding of the human virome as a whole. This analysis shows that analysis of the dark matter can be used to detect accurate profiles of the human gut virome. Although it is not yet possible to conclude if the bacteriome shapes the virome or vice-versa, they do correlate with each other, as shown by Procrustes analysis, and can assist in the classification of subjects with IBD from healthy controls. This analysis provides a method for the comparison of whole viromes across cohorts in diseases other than IBD, which will give further insights into how a fuller understanding of the role of the microbiome in health and disease can be beneficial.
  • a publicly available dataset which was generated on human gut virome composition associated with IBD was utilized.
  • the dataset was analyzed with a novel whole- virome analysis protocol that provided novel insights into compositional changes of the virome, and any potential role of such changes in IBD.
  • the dataset (Norman dataset) comprised 165 virome samples from 130 subjects, more specifically 61 healthy controls, 27 subjects with Crohn’s disease (CD), and 42 subjects with ulcerative colitis (UC). Of these, six samples were known to be collected during CD flare, eight in CD remission, 13 in UC flare, and 20 in UC remission.
  • a second dataset (Simponi dataset] was generated that consisted of longitudinal samples from 40 subjects with UC.
  • Protein-based clustering can overcome virome individuality and allow cohort comparisons
  • DeSeq2 analysis revealed a number of classifiable VCs, including two crAss-like phages and two Microviridae, at significantly increased abundances in healthy controls compared to CD (Figure 2C) and UC ( Figure 2D) vc19 and vc320 (crass-like phages) were absent from all CD and only vc320 was in one subject with UC, but other clusters classified as crAss-like phages were present. Conversely VCs classified as Siphoviridae (nine for CD, eight for UC) and Myoviridae (one for CD, two for UC) were increased in CD and UC versus controls.
  • the bacteriome also differs between patients with IBP and controls
  • control bacteriome contained the largest variation amongst samples with CD having the smallest distances between points ( Figures 7C-7D).
  • Quadrant 3 (bottom left) one Myoviridae and 1 unclassified VC were significantly correlated towards subjects with IBD.
  • VCs classed as Microviridae and crAss-like phages were significantly correlated towards the healthy controls (quadrant 4, bottom right), while there were also two unclassified VCs.
  • RSVs were significantly correlated including Ruminococcus gnaves and Flavonifractor plautii (Table 6).
  • RSVs towards the shift in UC remission included Faecalibacterium prausnitzii, Dorea longicatena and Coprococcus comes.
  • An RSV classified as Ruminococcus gnavus was the only RSV which correlated towards UC flare.
  • the virome and 16S were correlated using Procrustes analysis and there was a significant positive correlation, in agreement with previous results, with an observed correlation coefficient of 0.906 (p-value of 0.001) ( Figure 12).
  • Virome composition aids the classification between Health and Disease
  • ROC curve analysis was performed as a second measure of accuracy of each model (Figure 6D).
  • the AUC (area under the curve) of the virome alone was 78.31%, a decrease compared to 16S AUC which yielded an AUC of 89.72%.
  • the virome and 16S combined had the largest AUC with 94.79%, predicting all 16 patients with IBD as IBD and only misclassifying five controls as IBD.
  • vc23 although unclassified, contained CRISPR protospacers to Parabacteroides, while vc39, also unclassified, had hits to undefined Lachnospiraceae.
  • CIO a crAss-like phage, did not feature any CRISPR protospacer alignments.
  • vc13, vc15, vc17 all classified as Siphoviridae, had CRISPR protospacer hits to a number of genus of the Firmicutes, including Blautia, Coprobacillus, Pentoiphilus, Ruminococcus, Enterococcus, Lactobacillus, Streptococcus and Clostridium
  • vc5, vc9, vc22, classified as Myoviridae contained CRISPR protospacers to Firmicutes genera Clostridium, Coprobacillus, Enterococcus, Lactobacillus, Johnsonella, Roseburia, Ruminococcus, Veillonella and Flavonifractor along with the Proteobacteria Parasutterella ( Figure 14). Finally, vc1Ol, a Microviridae, did not have any CRISPR protospacer alignments.
  • the key VCs were shown to be effective as marker clusters for classifying individual subject GI microbiota datasets within the larger dataset as either diseased or healthy.
  • the Simponi cohort consisting of longitudinal samples from 40 subjects with UC, including 82 samples from periods of flareand 31 samples from periods of remission, was processed and analyzed.
  • the processing included extraction of fecal VLP DNA, library preparation and sequencing.
  • Processing also included extraction of fecal DNA, library preparation and 16S sequencing. Q33 spiking was also performed.
  • Genome Res, 27, 824-834) was utilized to assemble the reads into contigs per sample accurately (Sutton et al, 2019. Microbiome, 7, 12) which were subsequently pooled and retained if longer than lkb. Redundancy was removed with 90% identity over 90% of the length (of the shorter) retaining the longest contig in each case. Bacterial contamination was removed by using an extensive set of inclusion criteria to select viral sequences only. Briefly, contigs were required to be: 1) VirSorter (Roux et al, 2015a. VirSorter: mining viral signal from microbial genomic data.
  • Protein sequences were predicted using Prodigal (Hyatt et al., 2010. BMC
  • vConTACT an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ, 5, e3243) using a pc-inflation and vc-inflation of 1.5 with all other parameters set to default. This resulted in 472 viral clusters of >2 members and 2,382 singletons, hereby referred to as a viral cluster (VC) with one member.
  • a cluster count table was generated by summing all the counts from the previous table in each cluster.
  • Taxonomic classification was assigned to a cluster using vContact2 and a custom database of viral genomes formed from the concatenation of the taxonomically classified portion of the NCBI's Viral RefSeq (v.89) and the JGI's IMG-VR (downloaded 9 January 2019).
  • the resulting clusters were classified to family level based on the presence of reference genomes within. Clusters containing genomes from multiple families, were termed "heterogeneous", and may arise from disagreement between protein based phylogeny and current taxonomic classification discussed further by Bolduc et al.
  • CRISPR protospacers were predicted from the human microbiome project bacterial reference genomes (VERSION/REF) using PILRCR (Edgar, 2007).
  • a VC was deemed lysogenic if it contained VLS with alignments to PVOGs featuring annotated integrase genes or site specific recombinase genes.
  • DeSeq2 refers to software for estimating variance-mean dependence in count data from high-throughput sequencing assays, and for testing differential expression based on a model using the negative binomial distribution. See, Love MI, Huber W, Anders S (2014).“Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome Biology, 15, 550. [00233] “lfcSE” refers to the logfoldchangeStandard Error calculation performed by DeSeq2.
  • The“p-value” ranges from zero to one and indicates the probability of finding such values from a given null (HO) hypothesis.
  • The“padj” value is the p-value adjusted for multiple testing using the Benjamini- Hochberg method.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mycology (AREA)
  • Medicinal Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Public Health (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Epidemiology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Veterinary Medicine (AREA)
  • Zoology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Urology & Nephrology (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • Hematology (AREA)
  • Biochemistry (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • General Engineering & Computer Science (AREA)
  • Cell Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Virology (AREA)
  • Food Science & Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A method is described of analyzing the microbiome, including the virome, of a patient. Viral marker clusters for diagnosing inflammatory bowel disease, Crohn's disease and ulcerative colitis are identified from such analysis. Methods of diagnosis and treatment of dysbiosis and various disorders, such as inflammatory bowel disease, Crohn's disease and ulcerative colitis, are also included.

Description

MATERIALS AND METHODS FOR ASSESSING VIROME AND MICROBIOME
MATTER
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to United States Provisional Application Serial Number 62/861,807, filed on 14 June 2019, United States Provisional Application Serial Number 62/861,818, filed on 14 June 2019, United States Provisional Application Serial Number 62/861,776, filed on 14 June 2019, and United States Provisional Application Serial Number 62/861,746, filed on 14 June 2019. The disclosure of each of the aforementioned applications is incorporated herein by reference in its entirety.
FIELD OF THE INVENTION
[0002] The present invention, in some aspects, relates to a method of analyzing the
microbiome, e.g., virome, of a patient. The present invention, in some aspects, also relates to methods of diagnosing and treating dysbiosis of the microbiome, and various disorders that include infections and inflammatory disorders.
BACKGROUND
[0003] The virome is likely to be one of the major forces shaping the human gut microbiome, but is perhaps its least understood component. The virome is dominated by phages, such as bacteriophages, which play vital roles in many microbial communities by driving diversity, facilitating nutrient turnover (Weitz et al., 2015. ISME J, 9, 1352-64) and facilitating horizontal gene transfer (Canchaya et al, 2003. Current opinion in microbiology, 6, 417-424). High throughput sequencing has revealed the enormous diversity of the viral fraction of microbial ecosystems. Understanding the role of bacteriophages in microbial community structure can provide for more understanding and/or control of the alterations in human gut microbiome composition and diversity associated with many diseases, including Inflammatory Bowel Disease (IBD) (Gevers et al, 2014. Cell Host Microbe, 15, 382-392; Halfvarson et al., 2017. Nat Microbiol, 2, 17004), obesity (Le Chatelier et al., 2013. Nature, 500, 541-6) and diabetes (Forslund et al., 2015. Nature, 528, 262-266). [0004] Many gut bacteria (and potential phage hosts) remain difficult to culture (Forster et al., 2019. Nat Biotechnol, 37, 186-192). This places a heavy reliance on metagenomic sequencing and bioinformatic approaches. However, a lack of universal marker genes (similar to 16S rRNA for the bacteriome) and a subsequent lack of taxonomic information due to poorly populated databases (Krishnamurthy and Wang, 2017. Virus Res, 239, 136-142) means that database- independent analysis of the virome must be carried out at the level of metagenomic assembly or individual viral genome. Early sequencing studies using 454 technology first described the novelty and diversity of the human gut virome (Minot et al., 2011. Genome Res, 21, 1616-25), but were only able to identify 2% of reads and with limits in sequencing depth, the true diversity and composition was not revealed. Improvements in sequencing technologies have allowed the virome to be analyzed in unprecedented detail with studies sequencing up to 50 million reads per sample (Zuo et al, 2019. Gut mucosal virome alterations in ulcerative colitis. Gut ) and have confirmed that the virome is incredibly diverse, that the majority do not align to known sequences in databases (viral dark matter) (Roux et al, 2015b. Viral dark matter and virus-host interactions resolved from publicly available microbial genomes. Elife, 4), and that composition is highly unique to individuals (Reyes et al, 2010, Nature, 466, 334-8).
[0005] Inflammatory Bowel Disease, including Crohn’s disease (CD) and ulcerative colitis (UC), is a chronic disorder of the intestinal tract resulting in periods of flare (active) and remission (inactive) disease. IBD has been associated with alterations in the human gut microbiome which include decreased diversity and reduced abundance of the Firmicutes and Bacteroides. There is tenative evidence that the gut virome plays a role in IBD (Norman et al., 2015. Cell, 160, 447-60; Zuo et al., 2019. Gut, Fernandes et al., 2019. J Pediatr Gastroenterol Nutr, 68, 30-36) where IBD is associated with a decreased overall virome diversity and abundance and an increased abundance of the family Caudovirales. Because only a small fraction of gut virome are classified at the family level, let alone classified genomically, nearly all of this research has been conducted on a fraction of the virome, with a current benchmark study using about 15% of the data (Norman et al., 2015. Cell, 160, 447-60). This hampers the identification of virome disease biomarkers and means that any link between virome, bacteriome and disease status remains elusive. [0006] Analysis of the whole gut virome using metagenomic assembly is also challenging. At this level of resolution, the virome exhibits enormous diversity and interpersonal variation, obscuring patterns in the virome across individuals and cohorts.
[0007] Accordingly, there is a need for improved systems and methods to identify markers for IBD in the gut virome.
SUMMARY OF THE INVENTION
[0008] In one aspect, a method is provided for identifying a plurality of viral marker clusters for determining the presence of inflammatory bowel disease (IBD) using viral genome sequences, the method comprising:
obtaining a first dataset representing a first plurality of viral genome sequences derived from gastrointestinal (GI) microbiota samples of a healthy cohort;
obtaining a second dataset representing a second plurality of viral genome sequences derived from GI microbiota samples of a cohort diagnosed with IBD;
creating a first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort;
creating a second plurality of viral clusters by using protein clustering to group like proteins derived from the second dataset and by using protein homology to group viral genome sequences of the second dataset, each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD; and
identifying a plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters.
[0009] In some embodiments, at least a portion of the first plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database, and at least a portion of the second plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database.
[0010] In some embodiments, a totality of the first plurality and second plurality of viral genome sequences are each unassociated with a viral taxonomic category derived from a viral genome database. [0011] In some embodiments, the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises using machine learning to identify the plurality of marker clusters.
[0012] In some embodiments, the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises identifying the plurality of marker clusters unassociated with a known taxon.
[0013] In some embodiments, each of the viral clusters in the plurality of marker clusters respectively represent an unidentified taxon of higher rank than a strain and of lower rank than a family.
[0014] In some embodiments, the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises performing beta diversity analysis on the first plurality of viral clusters and the second plurality of viral clusters.
[0015] In some embodiments, performing the beta diversity analysis comprises performing a scaling and ordination technique selected from a group consisting of principal coordinates analysis (PCoA), principal components analysis (PCA), non-metric multidimensional scaling (NMDS), canonical correspondence analysis (CCA), and redundancy analysis (RDA).
[0016] In some embodiments, the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises calculating differential abundance of viral clusters in the first plurality of viral clusters and the second plurality of viral clusters.
[0017] In some embodiments, the healthy cohort and the cohort diagnosed with IBD are each human cohorts.
[0018] In some embodiments, the methods described above further comprise:
associating a first data subset of the second dataset with a first sub-cohort diagnosed with IBD and Crohn's disease (CD);
associating a second data subset of the second dataset with a second sub-cohort diagnosed with IBD and ulcerative colitis (UC);
associating a first subset of viral clusters of the second plurality of viral clusters with the first sub-cohort; associating a second subset of viral clusters of the second plurality of viral clusters with the second sub-cohort; and
identifying a first subset of marker clusters of the plurality of marker clusters and a second subset of marker clusters of the plurality of marker clusters by comparing the first subset of viral clusters to the second subset of viral clusters.
[0019] In some embodiments, the above methods further comprise:
representing the viral genome sequences in the first dataset each respectively as a first viral contig of a protein sequence; and
representing the viral genome sequences in the second dataset each respectively as a second viral contig of a protein sequence.
[0020] In some embodiments, of the above methods,
the first dataset further represents a first plurality of identified viral genome sequences derived from the healthy cohort,
the second dataset further represents a second plurality of identified viral genome sequences derived from the cohort diagnosed with IBD, and
the method further comprises:
creating a first plurality of reference viral clusters using protein clustering to group like proteins and protein homology to group identified viral genome sequences of the first plurality of identified viral genome sequences;
creating a second plurality of reference viral clusters using protein clustering to group like proteins and protein homology to group identified viral genome sequences of the second plurality of identified viral genome sequences; and
wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters further comprises identifying the plurality of marker clusters by comparing a combination of the first plurality of viral clusters and the first plurality of reference viral clusters to a combination of the second plurality of viral clusters and the second plurality of reference viral clusters.
[0021] In some embodiments, of the above methods,
the first plurality of identified viral genome sequences are associated with a viral taxonomic category present in a viral genome database, and the second plurality of identified viral genome sequences are associated with a viral taxonomic category present in a viral genome database.
[0022] In one aspect, a method for determining the presence of inflammatory bowel disease (IBD) in a subject is provided, the method comprising:
obtaining an individual viral dataset representing a plurality of viral genome sequences derived from a GI microbiota sample obtained from the subject;
creating a plurality of subject viral clusters using protein clustering to group like proteins derived from the individual viral dataset and by using protein homology to group unidentified viral genome sequences of the individual viral dataset, each viral cluster in the plurality of subject viral clusters comprising one or more viral genome sequences derived from the subject;
obtaining a plurality of marker clusters indicative of the presence or absence of IBD; and comparing the plurality of subject viral clusters to the plurality of marker clusters.
[0023] In some embodiments, at least a portion of the plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database. In some embodiments, a totality of the plurality of viral genome sequences are each unassociated with a viral taxonomic category derived from a viral genome database. In some embodiments, at least a portion of the plurality of marker clusters are unassociated with a viral taxonomic category derived from a viral genome database.
[0024] In some embodiments, the above methods further comprise determining the presence of IBD in the subject based at least in part on the comparison of the plurality of subject viral clusters to the plurality of marker clusters. In some embodiments, the marker clusters comprise one or more viral clusters from taxa Siphoviridae, Myoviridae, Podoviridae, CrAss-like, or Microviridae. In some embodiments, the plurality of marker clusters comprises one or more viral clusters selected from vc2, vc6, vc7, vc13, vc14, vc15, vc17, vc19, vc21, vc22, vc23, vc24, vc25, vc28, vc29, vc36, vc37, vc38, vc39, vc40, vc42, vc45, vc48, vc53, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc75, vc76, vc77, vc78, vc79, vc80, vc82, vc84, vc85, vc86, vc88, vc89, vc91, vc92, vc94, vc95, vc96, vc97, vc98, vc99, vc1Ol, vc102, vc103, vc104, vc108, vc109, vc1 l2, vc1 l3, vc1 l5, vc1 l7, vc1 l 8, vc122, vc123, vc124, vc130, vc132, vc136, vc138, vc142, vc143, vc152, vc154, vc155, vc160, vc161, vc175, vc178, vc181, vc190, vc193, vc205, vc209, vc216, vc218, vc225, vc232, vc263, vc264, vc281, vc284, vc298, vc320, vc411, vc413, vc420, vc456, and vc467. In some embodiments, an increased abundance of one or more viral clusters selected from vc2, vc13, vc14, vc15, vc17, vc21, vc22, vc36, vc40, vc48, vc53, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc77, vc78, vc79, vc80, vc85, vc88, vc89, vc91, vc94, vc95, vc97, vc102, vc108, vc1 l3, vc1 l 5, vc1 l7, vc1 l8, vc122, vc123, vc130, vc132, vc142, vc152, vc155, vc160, vc161, vc175, vc178, vc181, vc205, vc218, vc232, vc263, vc264, vc281, vc298, vc413, and vc420 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of IBD in the subject. In some embodiments, an increased abundance of one or more viral clusters selected from vc15, vc66, vc71, vc73, vc77, vc78, vc79, vc80, vc91, vc94, vc108, vc1 l3, vc1 l7, vei l 8, vc132, vc142, vc155, vc160, vc178, vc232, vc264, vc281, vc298, and vc420 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
[0025] In some embodiments, an increased abundance of one or more viral clusters selected from vc28 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject. In some embodiments, an increased abundance of one or more viral clusters selected from vc2, vc17, vc21, vc22, vc53, vc70, vc74, vc85, vc88, vc89, vc1 l5, vc122, vc123, vc130, vc152, vc161, vc175, vc181, vc205, vc218, vc263, and vc413 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject. In some embodiments, an increased abundance of viral cluster vc2 in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject. In some embodiments, an increased abundance of one or more viral clusters selected from vc38 vc46, vc48, vc54, vc57, vc62, vc64, vc69, vc71, vc108, vc1 l l, vc1 l4, vc1 l 5, vc128, vc159, vc162, vc215, vc220, vc242, vc340, vc374, and vc392 in the subject sample as compared to a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject. In some embodiments, an increased abundance of one or more viral clusters selected from vc16, vc1 19, and vc163 in the subject sample as compared to a patient with a flare-up of ulcerative colitis (UC) is indicative of the presence of UC in remission in the subject.
[0026] In some embodiments, a decreased abundance of one or more viral clusters selected from vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1Ol, vc103, vc104, vc109, vc112, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411, vc456, and vc467 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of IBD in the subject. In some embodiments, a decreased abundance of one or more viral clusters selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject. In some embodiments, a decreased abundance of one or more viral clusters selected from vc7, vc25, vc47, and vc64 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject. In some embodiments, a decreased abundance of vc98 and/or vc103 viral cluster in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
[0027] In some embodiments, obtaining the dataset(s) is performed by sequencing VLP DNA isolated from GI microbiota sample(s).
[0028] In some embodiments, the method further comprises:
obtaining an individual bacteriome dataset representing bacterial sequences derived from the GI microbiota sample obtained from the subject; and
evaluating the individual bacteriome dataset for the presence of bacterial taxa associated with IBD.
[0029] In a specific embodiment, the method further comprises determining the presence of IBD in the subject based at least in part on the comparison of the individual bacteriome dataset to at least one of a healthy control and a control diagnosed with IBD.
[0030] In some embodiments, the bacterial taxa associated with IBD comprise one or more bacterial genera selected from Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, Flavonifr actor, Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Dorea, Roseburia, Odoribacter, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In a specific embodiment, an increased abundance of one or more bacterial genera selected from Clostridium XIV a, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, and Flavonifractor in the subject sample as compared to a healthy control is indicative of the presence of IBD in the subject. In a specific embodiment, an increased abundance of one or more bacterial genera selected from Clostridium XlVa, Blautia, Megasphaera, and Fusobacterium in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
[0031] In some embodiments, an increased abundance of one or more bacterial species selected from Bacteroides fragilis and Ruminococcus gnavus in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject. In some embodiments, an increased abundance of Ruminococcus gnavus in the subject sample as compared to a control sample from a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject. In some embodiments, an increased abundance of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes in the subject sample as compared to a control sample from a patient with a flare-up of ulcerative colitis (UC) in remission is indicative of the presence of UC in remission in the subject. In some embodiments, an increased abundance of bacterial genus Flavonifractor in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
[0032] In some embodiments, a decreased abundance of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia in the subject sample as compared to a healthy control is indicative of the presence of IBD in the subject. In some embodiments, a decreased abundance of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject. In some embodiments, a decreased abundance of bacterial genus Akkermansia in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject. In some embodiments, obtaining the individual bacteriome dataset is performed by sequencing 16S rDNA or a V region of 16S rDNA in the GI microbiota sample.
In a specific embodiment, the V region is V4 region.
[0033] In some embodiments, the GI microbiota sample is a fecal sample. In some
embodiments, the subject is human.
[0034] In various embodiments of the above methods, the method further comprises administering an IBD treatment to the subject. In various embodiments, the method further comprises administering to the subject additional diagnostic tests for IBD, CD and/or UC. In various embodiments, the method further comprises enrolling the subject in a clinical trial.
[0035] In various embodiments of the above methods, comparing the plurality of subject viral clusters to the plurality of marker clusters comprises:
identifying common clusters present in the plurality of subject viral clusters and the plurality of marker clusters;
determining relative abundance of members within each common cluster in the plurality of subject viral clusters;
associating a correlation value with each common cluster in the plurality of marker clusters; and
comparing the relative abundance of members within each common cluster in the plurality of subject viral clusters to the correlation value of each common cluster in the plurality of marker clusters.
[0036] In one aspect a kit is provided for determining the presence of inflammatory bowel disease (IBD) in a subject, the kit comprising:
a device to:
receive a first dataset representing a plurality of unidentified viral genome sequences derived from a GI microbiota sample obtained from the subject;
receive a second dataset representing a plurality of viral genome IBD marker clusters; create a plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group unidentified viral genome sequences of the plurality of unidentified viral genome sequences, each viral cluster in the plurality of viral clusters comprising one or more unidentified viral genome sequences of the plurality of unidentified genome sequences; and
compare the first plurality of viral clusters to the second dataset; and determine the presence of IBD based at least in part on the comparison of the plurality of viral clusters to the second dataset.
[0037] In some embodiments, the device is further configured to:
receive a third dataset representing bacteria from the GI microbiota sample obtained from the subject;
evaluate the third dataset for the purpose of IBD diagnosis; and
determine the presence of IBD based at least in part on the evaluation of the third database.
[0038] In some embodiments of the above kits, the GI microbiota sample is one or more of group consisting a fecal sample, a cecal sample, an ileal sample, and a colonic microbiota sample. In some embodiments, the IBD is ulcerative colitis (UC). In some embodiments, the IBD is Crohn's disease (CD). In some embodiments, the subject is human.
[0039] In another aspect is provided a system comprising:
one or more processors;
a memory in communication with the one or more processors and storing instructions thereon that, when executed by the one or more processors, are configured to cause the system to: receive a first dataset representing a first plurality of viral genome sequences derived from a healthy cohort;
receive a second dataset representing a second plurality of viral genome sequences derived from a cohort diagnosed with IBD;
create a first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort;
create a second plurality of viral clusters by using protein clustering to group like proteins derived from the second dataset and by using protein homology to group viral genome sequences of the second dataset, each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD; and
identify a plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters. [0040] In one aspect a method is provided for preventing and/or treating inflammatory bowel disease (IBD) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1 Ol, vc103, vc104, vc109, vc1 l2, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411, vc456, and vc467. In some embodiments, the method further comprises administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus,
Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,
Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In a specific embodiment, the probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,
Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.
[0041] In another aspect a method is provided for preventing and/or treating IBD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
[0042] In another aspect is provided a method for preventing and/or treating Crohn's disease (CD) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284. In some embodiments, the method further comprises administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Abstipes,
Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In a specific embodiment, the probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Abstipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter.
[0043] In another aspect is provided a method for preventing and/or treating CD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
[0044] In another aspect is provided a for preventing and/or treating ulcerative colitis (UC) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster vc98 and/or vc103. In some embodiments, the method further comprises administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of the bacterial genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus. In a specific embodiment, the probiotic composition comprises one or more bacterial strains from the genus Akkermansia.
[0045] In another aspect is provided a method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
[0046] In another aspect is provided a method for preventing and/or treating IBD in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium,
Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In a specific embodiment, the probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV,
Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.
[0047] In another aspect is provided a method for preventing and/or treating CD in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes,
Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In a specific embodiment, the probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter.
[0048] In another aspect, a method is provided for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of the genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus. In a specific embodiment, the probiotic composition comprises one or more bacterial strains from the genus Akkermansia.
[0049] In some embodiments of the above aspects, the V region is V4 region.
[0050] In another aspect is provided a method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.
[0051] In another aspect a method is provided for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic comprising one or more of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.
[0052] In various embodiments of the above aspects, the subject is human.
BRIEF DESCRIPTION OF THE DRAWINGS
[0053] Figures 1A-1D demonstrate a comparison of commonality pre and post clustering of viral contigs. PCoA of Spearman distances using pre-clustering (viral contigs) (Figure 1A) and post-clustering viral cluster (VC) count tables (Figure IB). Figure 1C shows the relative abundance of viral contigs (top) and VCs (bottom) for control subjects at varying thresholds of commonality across subjects. Figure ID depicts the number of viral contigs/VCs shared between 30%, 50% and 70% of subjects in each cohort.
[0054] Figures 2A-2D show the virome composition comparison of the IBD cohorts to controls. Figure 2A depicts PCoA using Spearman distances. Figure 2B depicts alpha diversity (observed VCs) with p-values from wilcoxon tests. Figure 2C shows volcano plots of differential abundance results from DeSeq2 between controls and CD. Figure 2D shows volcano plots of differential abundance results from DeSeq2 between control and UC. All points above the dotted line are significant.
[0055] Figures 3A-3D show the bacterial compositional comparison of the IBD cohorts and controls. Figure 3A depicts PCoA using unweighted UniFrac distances. Figure 3B is a plot showing alpha diversity (Chaol diversity) with p-values from wilcoxon tests. Figure 3C shows differential abundance results from DeSeq2 between controls and CD. Figure 3D shows differential abundance results from DeSeq2 between control and UC. All points above the dotted line are significant.
[0056] Figures 4A-4B show the drivers of PCoA separation for the virome (spearman distances; Figure 4A) and 16S unweighted UniFrac (Figure 4B). VC and RSV abundances were correlated, using spearman correlations, with PC axis 1 and 2. Only significant correlations with a rho of greater than 0.35 or -0.35 were graphed for the virome or ± .5 for the 16S (or a maximum of the top 6 for each quadrant). Grey arrows indicate unclassified VCs/RSVs. The length of the arrow represents the degree of correlation to the PC axes.
[0057] Figures 5A-5F demonstrate the investigation of differences in viromes and 16S between subjects in UC flare and UC remission. Beta diversity for viromes (using Spearman distances; Figure 5A) and 16S (unweighted UniFrac; Figure 5B) are shown. VCs and RSV abundance were correlated with PC axis 1 and 2. Only significant correlations with a rho of greater than ± 0.35 were graphed for the virome or ± .5 (or top 6 for each quadrant) for the 16S. Grey arrows indicate unclassified VCs/RSVs. The length of the arrow represents the degree of correlation to the PC axes. Alpha diversity is shown in Figure 5C for VC (Observed VCs and Shannon), and in Figure 5D for 16S (Chaol and Shannon diversity), differential abundance results using DeSeq2 between UU flare are shown in Figure 5E, along with remission for VCs (Figure 5E) and 16S (Figure 5F). All points above the dotted line are significant.
[0058] Figures 6A-6D show the classification between healthy controls and patients with IBD using VC and 16S composition. The top 20 importance factors are shown for each models for VCs (Figure 6A), 16S (Figure 6B), VCs and 16S combined (Figure 6C). The shades of grey of the bars correspond to differential abundance between groups; text to the right of the bar are the classifications and/or the bacterial annotation to CRISPR protospacers. Figure 6D shows the ROC curve analysis for each of the 3 models including the % accuracy.
[0059] Figure 7A depicts a VC PCoA using Spearman distances comparing the 3 cohorts CD, UC and controls. Figure 7B shows distances between points in each cohort for the VC spearman PCoA. Figure 7C shows 16S PCoA using unweighted UniFrac distances comparing the 3 cohorts. Figure 7D is a boxplot showing distances between points in each cohort for the 16S unweighted UniFrac PCoA. P-values for boxplots are from wilcoxon tests.
[0060] Figures 8A-8F show the alpha diversity of patients with IBD versus healthy controls. Shown are Observed VCs (Figure 8A), Shannon diversity of VCs (Figure 8B), Chaol diversity of 16S counts (Figure 8C), and Shannon diversity of 16S counts (Figure 8D). P-values for boxplots are from wilcoxon tests. Figure 8E shows Spearman correlations between observed VC counts and observed bacterial species counts. Figure 8F shows Shannon diversity of VCs and 16S counts. [0061] Figures 9A-9B show the alpha diversity of observed VLPs for any VCs classified as Caudovirales tested for disease groups and controls (Figure 9A) and disease groups/states and controls (Figure 9B). P-values for boxplots are from wilcoxon tests.
[0062] Figures 10A-10B show the read alignment for samples in each cohort to VCs classified as lysogenic (Figure 10A) and non-lysogenic (Figure 10B). P-values for boxplots are from wilcoxon tests.
[0063] Figure 11 depicts a Procrustes plot of the Virome PCoA using Spearman distances and the 16S PCoA with unweighted UniFrac. Lines connect samples from the same subject.
[0064] Figure 12 depicts a Procrustes plot of the Virome PCoA using Spearman distances and the 16S PCoA with unweighted UniFrac. Lines connect samples from the same subject.
[0065] Figure 13A shows the Spearman correlation between estimated viral load and observed VCs. Figure 13B shows viral load plotted per subject with points colored using various intensities of grey by disease status
[0066] Figure 14 depicts a network plot of CRISPR protospacers to the 20 most relevant VCs (10 key and additional important VCs from machine learning). Clusters and CRISPR
protospacers are colored using various intensities of grey according to differential abundance using DeSeq2.
[0067] Figures 15A-15J show images of the 10 key drivers in the separation of IBD and controls. Annotations are using pVOGs.
[0068] Figure 16 is a block diagram illustrating a system or device for identifying virome marker clusters according to aspects of the present invention.
[0069] Figure 17 is a block diagram illustrating a system or device for detecting health or disease in a subject based at least in part on virome marker clusters according to aspects of the present invention.
DETAILED DESCRIPTION
[0070] It is an object of the present invention to meet the above-stated needs. Generally, this disclosure provides a framework for analyzing viromes across cohorts and demonstrates the presence of significant IBD signals in the virome, which could have value in the development of biomarkers and therapeutics into the future. [0071] Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
[0072] As used herein, the term“bacteria” encompasses both prokaryotic organisms and archaea present in mammalian microbiota.
[0073] The term“microbiota” is used herein to refer to microorganisms (e.g., bacteria, archaea, fungi, protozoa) and viruses (e.g., phages and eukaryotic viruses) present in a host animal or human (e.g., in the gastrointestinal tract, skin, oral cavity, vagina, etc.). Microbiota exerts a significant influence on health and well-being of the host. Viruses present in microbiota are separately described as“virobiota”. The term“microbiome” refers to the collective genes of all organisms comprising the microbiota.
[0074] The term“virome” is used herein to refer to include viruses, virus-like particles (VLPs), and molecules that closely resemble viruses but may or may not be infectious and may or may not include viral genetic material. The“virome” can include the“virobiota” but is not limited to the“virobiota”.
[0075] A“microbiota sample” it is meant a sample that contains a microbiota from a particular source. A“GI microbiota sample” is from the gastro-intestinal tract, and may include a fecal microbiota sample. Microbiota samples may comprise all of the components present in the microbiota.
[0076] The term“gastrointestinal (GI) microbiota” is used to refer to microorganisms (e.g., bacteria, fungi, unicellular parasites) and viruses (e.g., phages and eukaryotic viruses) in the digestive tract.
[0077] As used herein, the term“dysbiosis” refers to a microbial imbalance on or inside the body. Dysbiosis can result from, e.g., antibiotic exposure as well as other causes, e.g., infections with pathogens including viruses, bacteria and eukaryotic parasites. Dysbiosis can also result from unknown causes, or causes that are not yet known. The term“consequences of dysbiosis” refers to various disorders associates with dysbiosis. For example, dysbiosis in the GI tract has been reported to be associated with a wide variety of illnesses, such as, e.g., irritable bowel syndrome (IBS), inflammatory bowel disease (IBD), chronic fatigue syndrome, obesity, rheumatoid arthritis, ankylosing spondylitis, bacterial vaginosis, colitis, small intestinal cancer, colorectal cancer, metabolic syndrome, cardiovascular disease, Crohn's disease, infectious gastroenteritis, non-infectious gastroenteritis, food allergy, Celiac disease, gastrointestinal graft versus host disease, pouchitis, intestinal failure, short bowel syndrome, antibiotics-associated diarrhea, etc.
[0078] The term“restoring normal microbiota” is used herein to refer to restoring microbiota of a subject to the level of bioactivity and diversity of corresponding microbiota of a healthy subject. This may also be considered as normalizing the microbiota, populating the microbiota, populating normal microbiota, preventing the onset of dysbiosis, or augmenting the growth of at least one type of virus in a subject.
[0079] Specific changes in microbiota discussed herein can be detected using various methods, including without limitation quantitative PCR (qPCR) or high-throughput sequencing methods which detect over- and under-represented genes in the total bacterial population (e.g., 454- sequencing for community analysis; screening of microbial 16S ribosomal RNAs (16S rRNA), etc.), or transcriptomic or proteomic studies that identify lost or gained microbial transcripts or proteins within total bacterial populations. See, e.g., U.S. Patent Publication No. 2010/0074872; Eckburg et al, Science, 2005, 308: 1635-8; Costello et al., Science, 2009, 326: 1694-7; Grice et al, Science, 2009, 324: 1190-2; Li et al, Nature, 2010, 464: 59-65; Bjursell et al., Journal of Biological Chemistry, 2006, 281 :36269-36279; Mahowald et al, PNAS, 2009, 14:5859-5864; Wikoff et al., PNAS, 2009, 10:3698-3703.
[0080] Various exemplary ways of amplifying and sequencing of nucleic acids from microbiota samples includes, but is not limited to: solid-phase PCR involving bridge
amplification of DNA fragments of the biological samples on a substrate with oligo adapters, wherein amplification involves primers having a forward index sequence (e.g., Illumina forward index for MiSeq/NextSeq/HiSeq platforms) or a reverse index sequence (e.g., Illumina reverse index for MiSeq/NextSeq/HiSeq platforms), a forward barcode sequence or a reverse barcode sequence, a transposase sequence (e.g., corresponding to a transposase binding site for
MiSeq/NextSeq/HiSeq platforms), a linker, an additional random base, and a sequence for targeting a specific target region (e.g., 16S region, 18S region, ITS region). Illumina sequencing (e.g., with a HiSeq platform, with a MiSeq platform, with a NextSeq platform, etc.) may be used as part of a sequencing-by-synthesis technique.
[0081] As used herein, the terms“a microbiota disease” and“disease of a microbiota” refer to a change in the composition of a microbiota, including without limitation very small changes in a relative abundance of one or more organisms within the microbiota as compared to a healthy control. Microbiota diseases can result from, e.g., infections with pathogens including viruses, bacteria and eukaryotic parasites, antibiotic exposure as well as other causes. Exemplary microbiota diseases in the GI tract include, but are not limited to, irritable bowel syndrome (IBS), inflammatory bowel disease (IBD), chronic fatigue syndrome, obesity, rheumatoid arthritis, ankylosing spondylitis, colitis, small intestinal cancer, colorectal cancer, metabolic syndrome, cardiovascular disease, Crohn's disease, gastroenteritis, food allergy, Celiac disease, gastrointestinal graft versus host disease, pouchitis, intestinal failure, short bowel syndrome, diarrhea, etc.
[0082] As used herein, the term“probiotic” refers to a substantially pure bacteria (i.e., a single isolate, of, e.g., live bacterial cells, conditionally lethal bacterial cells, inactivated bacterial cells, killed bacterial cells, spores, recombinant carrier strains), or a mixture of desired bacteria, bacteria components or bacterial extract, or bacterially-derived products (natural or synthetic bacterially-derived products such as, e.g., bacterial antigens or metabolic products) and may also include any additional components that can be administered to a mammal. Such compositions are also referred to herein as a“bacterial inoculant.”
[0083] As used herein, the term“prebiotic” refers to an agent that increases the number and/or activity of one or more desired bacteria, enhancing their growth. Non-limiting examples of prebiotics useful in the methods of the present disclosure include fructooligosaccharides (e.g., oligofructose, inulin, inulin-type fructans), galactooligosaccharides, human milk
oligosaccharides (HMO), Lacto-N-neotetraose, D-Tagatose, xylo-oligosaccharides (XOS), arabinoxylan-oligosaccharides (AXOS), N-acetylglucosamine, N-acetylgalactosamine, glucose, other five- and six-carbon sugars (such as arabinose, maltose, lactose, sucrose, cellobiose, etc.), amino acids, alcohols, resistant starch (RS), and mixtures thereof. See, e.g., Ramirez-Farias et al, Br J Nutr (2008) 4: 1-10; Pool-Zobel and Sauer, J Nutr (2007), 137:2580S-2584S. The prebiotic may be effective to fully, or partially, restore normal microbiota.
[0084] As used herein, the term“viral cluster” or“VC” refers to a set of contigs that fit certain critera described herein, which are in turn grouped together based on protein homology profiles.
[0085] As used herein, the term“stimulate” when used in connection with growth and/or activity of bacteria encompasses the term“enhance”. [0086] The terms“treat” or“treatment” of a state, disorder or condition include: (1) preventing, delaying, or reducing the incidence and/or likelihood of the appearance of at least one clinical or sub-clinical symptom of the state, disorder or condition developing in a subject that may be afflicted with or predisposed to the state, disorder or condition but does not yet experience or display clinical or subclinical symptoms of the state, disorder or condition; or (2) inhibiting the state, disorder or condition, i.e., arresting, reducing or delaying the development of the disease or a relapse thereof (in case of maintenance treatment) or at least one clinical or sub- clinical symptom thereof; or (3) relieving the disease, i.e., causing regression of the state, disorder or condition or at least one of its clinical or sub-clinical symptoms. The benefit to a subject to be treated is either statistically significant or at least perceptible to the patient or to the physician.
[0087] The terms“patient”,“individual”,“subject”,“mammal”, and“animal” are used interchangeably herein and refer to mammals, including, without limitation, human and veterinary animals (e.g., cats, dogs, cows, horses, sheep, pigs, etc.) and experimental animal models. In a preferred embodiment, the subject is a human.
[0088] As used herein, the term“therapeutically effective amount” refers to the amount of a compound, composition, particle, organism (e.g., a probiotic or a microbiota transplant), etc. that, when administered to a subject for treating (e.g., preventing or ameliorating) a state, disorder or condition, is sufficient to effect such treatment. The“therapeutically effective amount” will vary depending, e.g., on the agent being administered as well as the disease severity, age, weight, and physical conditions and responsiveness of the subject to be treated.
The terms“therapeutically effective amount” and“effective amount” are used interchangeably.
[0089] As used herein, the term“acceptable” with reference to excipients, diluents, and carriers refers to molecular entities and compositions that are generally regarded as
physiologically tolerable.
[0090] The term“carrier” refers to a diluent, adjuvant, excipient, or vehicle with which the compound is administered. Such pharmaceutical carriers can be sterile liquids, such as water and oils, including those of petroleum, animal, vegetable or synthetic origin, such as peanut oil, soybean oil, mineral oil, sesame oil and the like. Water or aqueous solution saline solutions and aqueous dextrose and glycerol solutions are preferably employed as carriers, particularly for injectable solutions. Alternatively, the carrier can be a solid dosage form carrier, including but not limited to one or more of a binder (for compressed pills), a glidant, an encapsulating agent, a flavorant, and a colorant. Suitable pharmaceutical carriers are described in“Remington’s Pharmaceutical Sciences” by E.W. Martin.
[0091] The term“about” or“approximately” means within a statistically meaningful range of a value. Such a range can be within an order of magnitude, preferably within 50%, more preferably within 20%, still more preferably within 10%, and even more preferably within 5% of a given value or range. The allowable variation encompassed by the term“about” or
“approximately” depends on the particular system under study, and can be readily appreciated by one of ordinary skill in the art.
[0092] The terms“a,”“an,” and“the” do not denote a limitation of quantity, but rather denote the presence of“at least one” of the referenced item.
[0093] The practice of the present invention employs, unless otherwise indicated, conventional techniques of statistical analysis, molecular biology (including recombinant techniques), microbiology, cell biology, and biochemistry, which are within the skill of the art. Such tools and techniques are described in detail in e.g., Sambrook et al. (2001) Molecular Cloning: A
Laboratory Manual. 3rd ed. Cold Spring Harbor Laboratory Press: Cold Spring Harbor, New York; Ausubel et al. eds. (2005) Current Protocols in Molecular Biology. John Wiley and Sons, Inc.: Hoboken, NJ; Bonifacino et al. eds. (2005) Current Protocols in Cell Biology. John Wiley and Sons, Inc.: Hoboken, NJ; Coligan et al. eds. (2005) Current Protocols in Immunology, John Wiley and Sons, Inc.: Hoboken, NJ; Coico et al. eds. (2005) Current Protocols in Microbiology, John Wiley and Sons, Inc.: Hoboken, NJ; Coligan et al. eds. (2005) Current Protocols in Protein Science, John Wiley and Sons, Inc.: Hoboken, NJ; and Enna et al. eds. (2005) Current Protocols in Pharmacology, John Wiley and Sons, Inc.: Hoboken, NJ. Additional techniques are explained, e.g., in U.S. Patent No. 7,912,698 and U.S. Patent Appl. Pub. Nos. 2011/0202322 and 2011/0307437.
[0094] The term“computing system” is intended to include stand alone machines or devices and/or a combination of machines, components, modules, systems, servers, processors, memory, detectors, user interfaces, computing device interfaces, network interfaces, hardware elements, software elements, firmware elements, and other computer-related untis. By way of example, but not limitation, a computing system can include one or more of a general-purpose computer, a special-purpose computer, a processor, a portable electronic device, a portable electronic medical instrument, a stationary or semi-stationary electronic medical instrument, or other electronic data processing apparatus.
[0095] The term“database” as referred to herein is intended to include a collection of indexed data stored on a computer readable medium. By way of example and not limitation, data in the database can include numerical values, textual values, computational representation of physical objects (including living, non-living, organic, non-organic objects, and combinations thereof), computational representation of physical phenomina, categorical classification. Various data can be linked together or otherwise indexed. By way of example and not limitation, data in the database can be represented as an indexed matrix.
[0096] The term“dataset” as referred to herein is intended to include information that can be provided to a computing system in a computer readable format.
[0097] The terms“component,”“module,”“system,”“server,”“processor,”“memory,” and the like are intended to include one or more computer-related units, such as but not limited to hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets, such as data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal.
[0098] Some implementations of the disclosed technology will be described more fully with reference to the accompanying drawings. This disclosed technology may, however, be embodied in many different forms and should not be construed as limited to the implementations set forth herein. The components described hereinafter as making up various elements of the disclosed technology are intended to be illustrative and not restrictive. [0099] It is also to be understood that the mention of one or more method steps does not preclude the presence of additional method steps or intervening method steps between those steps expressly identified. Similarly, it is also to be understood that the mention of one or more components in a device or system does not preclude the presence of additional components or intervening components between those components expressly identified.
[00100] Certain embodiments and implementations of the disclosed technology are described above with reference to block and flow diagrams of systems and methods and/or computer program products according to example embodiments or implementations of the disclosed technology. It will be understood that one or more blocks of the block diagrams and flow diagrams, and combinations of blocks in the block diagrams and flow diagrams, respectively, can be implemented by computer-executable program instructions. Likewise, some blocks of the block diagrams and flow diagrams may not necessarily need to be performed in the order presented, may be repeated, or may not necessarily need to be performed at all, according to some embodiments or implementations of the disclosed technology.
[00101] These computer-executable program instructions may be loaded onto a computing system such as a general-purpose computer, a special-purpose computer, a processor, or other programmable data processing apparatus to produce a machine, such that the instructions that execute on the computer, processor, or other programmable data processing apparatus create means for implementing one or more functions specified in the flow diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means that implement one or more functions specified in the flow diagram block or blocks.
[00102] As an example, embodiments or implementations of the disclosed technology may provide for a computer program product, including a computer-usable medium having a computer-readable program code or program instructions embodied therein, said computer- readable program code adapted to be executed to implement one or more functions specified in the flow diagram block or blocks. Likewise, the computer program instructions may be loaded onto a computer or other programmable data processing apparatus to cause a series of operational elements or steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions that execute on the computer or other programmable apparatus provide elements or steps for implementing the functions specified in the flow diagram block or blocks.
[00103] Accordingly, blocks of the block diagrams and flow diagrams support combinations of means for performing the specified functions, combinations of elements or steps for performing the specified functions, and program instruction means for performing the specified functions. It will also be understood that each block of the block diagrams and flow diagrams, and combinations of blocks in the block diagrams and flow diagrams, can be implemented by special-purpose, hardware-based computer systems that perform the specified functions, elements or steps, or combinations of special-purpose hardware and computer instructions.
[00104] In some embodiments presented herein, IBD markers can be identified using unidentified viral genome sequences derived from a cohort of healthy subjects and a cohort of subjects diagnosed with IBD. Individuals in each cohort can be human. The viral genome sequences can be unidentified in that they are not taxonomically classified in a viral genome database. The viral genome sequences can be unidentified in that they are considered“viral dark matter” as described herein and would otherwise be understood by a person of ordinary skill in the art. The viral genome sequences can be unidentified at the order level, at the family level, strain level, or any intervening level. The viral genome sequences can be unidentified in that they are classified taxonomically, at some level, in a viral genome database, however, the viral genome sequences have not been compared to the classification database. Viral genome sequences can include sequenced VLPs, molecules that closely resemble viruses, but are non- infectious because they contain no viral genetic material. Viral genome sequences can be derived from gastrointestinal (GI) microbiota samples provided from individuals in each cohort.
Metagenomic assembly can be performed on the samples using short reads to resolve viral genomes. The reads can subsequently be aligned to determine abundance, or count of members in each viral genome. The resolved viral genomes can include unidentified viral genome sequences. To the extent that resolved viral genomes can be identified, the IBD markers can also be identified using the identified viral genomes.
[00105] In some embodiments of the various methods presented herein, IBD markers can be identified using unidentified viral genome sequences derived from a cohort of healthy subjects and a cohort of subjects diagnosed with IBD. Protein clustering and protein homology can be performed on the whole virome, including the unidenfied viral genome sequences, from each cohort, resulting in viral clusters. A viral cluster can each include one or more unidentified viral genome sequences. The viral clusters can each respectively be associated with the cohort of healthy subjects, the cohort of subjects diagnosed with IBD, or both cohorts. IBD markers can be identified by comparing viral clusters associated with the healthy cohort to viral clusters associated with the cohort diagnosed with IBD. The IBD markers can thereby be identified without relying on categorization of viral genome sequences in a database. In some
embodiments, viral clusters associated with IBD can further be associated with one or both of a sub-cohort diagnosed with Crohn’s disease (CD) and a sub-cohort diagnosed with ulcerative colitis (UC).
[00106] The viral genome sequences can be represented as datasets that are readable by a computational device or system. For instance, the viral genome sequences can be represented as viral contigs. Each viral genome sequence can be represented in whole or in part. Each viral genome sequence can be represented with resolution at the strain level.
[00107] Each dataset can be associated with a cohort and/or sub-cohort. The datasets collectively can include a significant number of viral genome sequence reads within the GI microbiota samples provided from the individuals. In some embodiments, the dataset is performed by sequencing VLP DNA isolated from GI microbiota sample(s). The VLP DNA may be isolated from GI microbiota samples and prepared by any of the various methods of preparing DNA known in the art, such as those described in Thurber R.V. et al., 2009,
Laboratory procedures to generate viral metagenomes. Nat Protoc 4:470-483; Reyes, A., et al, 2010, Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 466:334- 338; and Minot S. et al., 2011, The human gut virome: inter-individual variation and dynamic response to diet. Genome Res 21 : 1616-1625, each of which is incorporated by reference herein in its entirety. The datasets collectively can include a number of viral genome sequence reads within the GI microbiota samples. The reads per sample can include the ranges of 15% to 97%, 25% to 97%, 50% to 97%, 60% to 97%, 70% to 97%, 80% to 97%, and 90% to 97%.
[00108] The viral genome sequences can be respresented as protein sequences. Alternatively, the viral genome sequences can be represented as a sequence from which protein sequences or protein content can be derived (e.g. genetic sequence). [00109] Protein clustering and protein homology can be performed on the whole virome, including the unidenfied viral genome sequences, from each cohort, resulting in viral clusters. To the extent that the whole virome includes identified viral genome sequences, the identified viral genome sequences can be included in the protein clustering and protein homology analysis. Proteins can be derived from each dataset based on the viral genome sequences. The proteins can be organized into protein clusters (PCs) using Markov cluster (MCL)-based protein families, transitive clustering (TransClust), spectral clustering of protein sequences (SCPS), High-Fidelity clustering of protein sequences (HiFix) or other appropriate technique. Additional clustering techniques are described in Bernardes et al., BMC Bioinformatics (2015) 16:34“Evaluation and improvements of clustering algorithms for detecting remote homologous protein families”, incorporated by reference herein.
[00110] To determine protein homology viral genome sequences, or protein sequences derived therefrom can be evaluated pairwise such that each pair is given a similarity score based on the shared protein content between the sequences within the pair. Viral clusters can determined based on the similarity scores.
[00111] A viral cluster can include one or more unidentified viral genome sequences. A viral cluster can be completely populated by unidentified viral genome sequences. A viral cluster can be unassociated with a known taxon. A viral cluster can represent an unidentified taxon of higher rank than a strain and of lower rank than a family.
[00112] The viral clusters can each respectively be associated with the cohort of healthy subjects, the cohort of subjects diagnosed with IBD, or both cohorts. Or, said another way, a collection of viral clusters associated with the healthy cohort can be created such that each viral cluster in the collection includes at least one viral genome derived from the healthy cohort, and another collection of viral clusters associated with the cohort diagnosed with IBD can be created such that this collection of viral clusters includes at least one viral genome derived from the cohort diagnosed with IBD.
[00113] IBD markers can be identified by comparing viral clusters associated with the healthy cohort to viral clusters associated with the cohort diagnosed with IBD. The IBD markers can thereby be identified without relying on categorization of viral genome sequences in a database. [00114] Viral clusters associated with IBD can further be associated with one or both of a sub cohort diagnosed with Crohn’s disease (CD) and a sub-cohort diagnosed with ulcerative colitis (UC).
[00115] IBD markers can be defined as a viral cluster that is prevalent in at least one cohort and/or sub-cohort and minimal or absent in at least one other cohort or sub-cohort. In other words, the IBD markers can include viral clusters that are found predominantly in the healthy cohort and not in the IBD cohort and viral clusters that are found predominantly in the IBD cohort and not in the healthy cohort.
[00116] When the IBD cohort is further sub-divided into CD and UC, the IBD markers can include viral clusters that are found predominantly in the CD cohort and not the UC cohort and vice-versa, regardless of whether the same viral clusters are predominant in both the healthy and IBD cohorts. IBD marker clusters can identified by comparing the viral clusters associated with the CD sub-cohort to the UC sub-cohort. Within the total collection of IBD marker clusters, the IBD marker clusters can include a first subset of IBD marker clusters that are viral clusters more prevalently found in subjects diagnosed with CD compared to UC and a second subset of IBD marker clusters that are viral clusters more prevelantly found in subjects diagnosed with UC compared to CD.
[00117] The IBD markers can include viral clusters that contain no identified viral sequences. An IBD marker can be unassociated with a known taxon. An IBD marker cluster can represent an unidentified taxon of higher rank than a strain and of lower rank than a family.
[00118] To the extent that the whole virome includes identified viral genome sequences, the identified viral genome sequences can be included in the protein clustering and protein homology analysis. A viral cluster including an identified viral genome sequence can represent an unidentified taxon of higher rank than a strain and of lower rank than a family. A viral cluster including an identified viral genome sequence can be associated with one or more cohorts and/or sub-cohorts. Identified viral genome sequences can be clustered by protein clustering and protein homology to create reference viral clusters. Reference viral clusters can be associated with one or more cohorts and/or sub-cohorts. Identification of IBD marker clusters can include comparing reference viral clusters associated with the healthy cohort to reference viral clusters associated with the cohort diagnosed with IBD. Similarly, IBD marker clusters can include comparing reference viral clusters associated with CD with reference viral clusters associated with UC. [00119] To the extent that the whole virome includes identified viral genome sequences, the IBD markers can include viral clusters that contain at least one identified viral sequence. An IBD marker cluster containing an identified viral sequence can include an unidentified grouping of viral sequences. An IBD marker cluster can be an unidentified grouping of viral sequences, optionally comprising an identified viral sequence. An IBD marker cluster containing an identified viral sequence can represent an identified taxon.
[00120] Identification of the IBD markers as described above can be perfomed on a computing system having one or more processors and a memory with instructions thereon that can be performed by the processor(s). The computing system can receive datasets associated with each cohort and/or sub-cohort that each respectively include unidentified viral genome sequences. The viral genome sequences can be represented as a viral contig or other suitable computer-readable format. The computing system can create viral clusters for each dataset associated with each cohort and/or sub-cohort. Clustering can use a protein clustering algorithm to group like protiens and a protein homology algorithm to group viral genome sequences, including unidentified viral genome sequences, into viral clusters. Viral clusters can be compared across cohorts and/or sub cohorts to identify marker clusters. Marker clusters can represent clusters highly represented in at least one cohort and/or sub-cohort that is also marginally represented in at least one other cohort and/or sub- cohort.
[00121] Identification of the marker clusters can be performed using machine learning. The datasets can include an associaton for each viral cluster to a known variable, the known variable being the health state of the patient (healthy, IBD diagnosis, and optionally CD diagnosis and/or UC diagnosis). For each health state, the system can determine a correlated set of viral clusters from the total set of viral clusters. Viral clusters having a strong correlation to the presence or absence of a given health state can be identified as viral clusters.
[00122] Identification of the marker clusters can be performed using a beta diversity analysis on the viral clusters. A count table can be created by summing the counts of the viral genomic sequences (potentially represented as viral contigs) in each viral cluster. The count table can be subjected to an ordination method to determine beta diversity. The beta diversity analysis can be performed through principal coordinates analysis (PCoA), principal components analysis (PC A), non-metric multidimensional scaling (NMDS), canonical correspondence analysis (CCA), redundancy analysis (RDA), and/or other suitable technique. [00123] Identification of the marker clusters can be performed using a calculation of differential abundance of viral clusters across cohorts and/or sub-cohorts. The calculation can be executed using a test or software package such as available through DESeq2, t-test, Wilcoxon rank-sum test, edgeR package, metagenomieSeq package, ANCOM package, and/or other suitable technique, algorithm, or software package.
[00124] Figure 16 is block diagram illustrating an example system 100 for identifying IBD marker clusters. The system 100 can include a non-transient memory 120 with executable instructions thereon to perform methods for identifying IBD marker clusters as described herein, a processor 130 in communication with the memory 120 capable of receiving and executing the instructions from the memory 120, to identy IBD marker clusters, and an output interface 140 capable of outputting a representation of the IBD marker clusters identified by the processor 130. The system can be in communication with a data store 110 on which cohort datasets are stored. The processor 130 can be configured to receive the datasets from the datastore 110, receive instructions from the memory 120, compute IBD marker clusters by performing operations on the datasets according to the executable instructions, and provide a representation of the IBD marker clusters to the output interface 140. The representation of the IBD marker clusters can be a computer-readable representation and/or a human user interface. Preferrably, output interface 140 can provide a means for conveying a computer readable representation of the IBD marker clusters to a digital storage medium such that the IBD marker clusters can be accessed by an IBD diagnosis device such as an example IBD diagnosis device as described herein. The system 100 can be contained within a singular device, potentially even a singular semiconductor chip (e.g. system on a chip), or can be distributed across multiple devices at multiple geographical locations as would be understood by a person of ordinary skill in the art. For instance, the data store 110 can be provided by a data server at a location remote to the processor 130 via a network (e.g. internet), and the processor 130 can be located on a computing device remote from the memory 120 and the executable instructions can be transmitted from through a network (e.g. internet) to the processor.
[00125] In various embodiments, a subject can be diagnosed with IBD by analyzing
unidentified viral genome sequences derived from the subject. Viral genome sequences can be obtained from the subject through a fecal sample or other means. The viral genome sequences can be derived from a GI microbiota sample obtained from the subject. The viral genome sequences can include unidentified viral genome sequences. The viral genome sequences can be represented as a subject dataset. The subject dataset can be in a computer readable format. The analysis can include clustering the viral genome sequences from the subject, including the unidentified viral genome sequences obtained from the patient. Clustering of the subject’s viral genome sequences can be carried out similar to as described above. The collection of viral clusters created based on the subject’s viral genome sequences can be compared to IBD markers. The IBD markers can be identified through analysis of a healthy cohort and a cohort diagnosed with IBD similar to as described above. The subject can be diagnosed with IBD based on analysis and comparison of viral genome sequences alone. Alternatively, bacteria derived from the subject can be analyzed for the purpose of IBD diagnosis of the subject and analysis of the viral genome sequences can be performed in conjunction such that the combination of bacterial and viral analysis can be used to diagnose the subject with IBD.
[00126] The marker clusters may comprise one or more viral clusters from taxa Siphoviridae, Myoviridae, Podoviridae, CrAss-like, or Microviridae. The marker clusters may comprise viral clusters from Siphoviridae. The marker clusters may comprise viral clusters from Myoviridae. The marker clusters may comprise viral clusters from Podoviridae. The marker clusters may comprise CrAss-like viral clusters. The marker clusters may comprise viral clusters from Microviridae.
[00127] The marker clusters may comprise one or more of the following exemplary viral clusters: vc2, vc6, vc7, vc13, vc14, vc15, vc17, vc19, vc21, vc22, vc23, vc24, vc25, vc28, vc29, vc36, vc37, vc38, vc39, vc40, vc42, vc45, vc48, vc53, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc75, vc76, vc77, vc78, vc79, vc80, vc82, vc84, vc85, vc86, vc88, vc89, vc91, vc92, vc94, vc95, vc96, vc97, vc98, vc99, vc1Ol, vc102, vc103, vc104, vc108, vc109, vc1 l2, vc1 l3, vc1 l5, vc1 l7, vc1 l8, vc122, vc123, vc124, vc130, vc132, vc136, vc138, vc142, vc143, vc152, vc154, vc155, vc160, vc161, vc175, vc178, vc181, vc190, vc193, vc205, vc209, vc216, vc218, vc225, vc232, vc263, vc264, vc281, vc284, vc298, vc320, vc411, vc413, vc420, vc456, and vc467. Alternatively, vc5, vc9, vc1O may be used as marker clusters.
[00128] In various embodiments, an increased abundance of one or more of the following marker clusters in the subject sample, as compared to that of a sample from a healthy patient or control, is indicative of the presence of IBD in the subject: vc2, vc13, vc14, vc15, vc17, vc21, vc22, vc36, vc40, vc48, vc53, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc77, vc78, vc79, vc80, vc85, vc88, vc89, vc91, vc94, vc95, vc97, vc102, vc108, vc1 l3, vc1 l 5, vc1 l7, vc1 l8, vc122, vc123, vc130, vc132, vc142, vc152, vc155, vc160, vc161, vc175, vc178, vc181, vc205, vc218, vc232, vc263, vc264, vc281, vc298, vc413, and vc420. The abundance may be increased by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00129] In various embodiments, an increased abundance of one or more of the following marker clusters in the subject sample, as compared to that of a sample from a healthy patient or control, is indicative of the presence of Crohn’s Disease (CD) in the subject: vc15, vc66, vc71, vc73, vc77, vc78, vc79, vc80, vc91, vc94, vc108, vc1 l3, vc1 l7, vc1 l 8, vc132, vc142, vc155, vc160, vc178, vc232, vc264, vc281, vc298, and vc420. In some embodiments, an increased abundance of one or more viral clusters selected from vc28 in the subject sample as compared to a healthy control is indicative of the presence of CD in the subject. In these embodiments, the abundance may be increased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00130] In various embodiments, an increased abundance of one or more of the following marker clusters in the subject sample, as compared to that of a sample from a healthy patient or control, is indicative of the presence of ulcerative colitis (UC) in the subject: vc2, vc17, vc21, vc22, vc53, vc70, vc74, vc85, vc88, vc89, vc1 l5, vc122, vc123, vc130, vc152, vc161, vc175, vc181, vc205, vc218, vc263, and vc413. In some embodiments, an increased abundance of viral cluster vc2 in the subject sample as compared to a healthy control is indicative of the presence of UC in the subject. In some embodiments, an increased abundance of one or more viral clusters selected from vc38 vc46, vc48, vc54, vc57, vc62, vc64, vc69, vc71, vc108, vc111, vc114, vc115, vc128, vc159, vc162, vc215, vc220, vc242, vc340, vc374, and vc392 in the subject sample as compared to a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject. In some embodiments, an increased abundance of one or more viral clusters selected from vc16, vc119, and vc163 in the subject sample as compared to a patient with a flare-up of ulcerative colitis (UC) is indicative of the presence of UC in remission in the subject. In these embodiments, the abundance may be increased by 10-99%, 10-20%, 20- 30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00131] In various embodiments, a decreased abundance of one or more of the following marker clusters in the subject sample, as compared to that of a sample from a healthy patient or control, is indicative of the presence of IBD in the subject: vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1Ol, vc103, vc104, vc109, vc1 12, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411, vc456, and vc467. In some embodiments, a decreased abundance of one or more viral clusters selected from vc7, vc25, vc47, and vc64 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject. The abundance may be decreased 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100- fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00132] In various embodiments, a decreased abundance of one or more of the following marker clusters in the subject sample, as compared to that of a sample from a healthy patient or control, is indicative of the presence of Crohn’s Disease (CD) in the subject: vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284. The abundance may be decreased by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00133] In various embodiments, an increased abundance of vc98 and/or vc103 in the subject sample, as compared to that of a sample from a healthy patient or control, is indicative of the presence of ulcerative colitis (UC) in the subject. The abundance may be increased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100- fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00134] The dataset may be prepared by sequencing VLP DNA isolated from GI microbiota sample(s). A fourth dataset may be obtained that represents bacterial sequences derived from the GI microbiota sample obtained from the subject, with the fourth dataset for the presence of bacterial taxa associated with IBD. The fourth dataset may be obtained by sequencing 16S rDNA or a V region (e.g., V4 region) of 16S rDNA in the GI microbiota sample. The presence of IBD in the subject may be determined based at least in part on the comparison of the fourth dataset to at least one of a healthy control and a control diagnosed with IBD.
[00135] In various embodiments, the GI microbiota sample is a fecal sample. In various embodiments, the GI microbiota sample is a cecal sample. In various embodiments, the GI microbiota sample is an ileal sample. In various embodiments, the GI microbiota sample is a colonic microbiota sample. In various embodiments, microbiota from other sites can be used, such as oral microbiota samples, nasal microbiota samples, skin microbiota samples, and vaginal microbiota samples.
[00136] In various embodiments, the subject is human.
[00137] The methods may further comprise administering an IBD treatment to the subject. IBD treatments include conventional treatments such as mesalamine, steroids, immunomodulators, and dietary modification. IBD treatments may also comprise administration of compositions comprising viruses and bacteria, as described below.
[00138] Also provided is a method for preventing and/or treating IBD in a subject. An effective amount of one or more viruses from any of the following viral clusters is administered: vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1Ol, vc103, vc104, vc109, vc1 l2, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411 , vc456, and vc467. The method may further comprise administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,
Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,
Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.
The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.
[00139] Also provided is a method for preventing and/or treating IBD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39. The method may further comprise administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium,
Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus selected from
Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV,
Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia. The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota. [00140] Also provided is a method for preventing and/or treating CD in a subject. An effective amount of one or more viruses from any of the following viral clusters is administered: vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284.
[00141] Also provided is a method for preventing and/or treating CD in a subject in need thereof. An effective amount of one or more viruses from any of the following viral clusters is administered: vc1 O, vc23, and vc39. The method may further comprise administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium,
Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter. The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.
[00142] Also provided is a method for preventing and/or treating UC in a subject in need thereof. An effective amount of one or more viruses from any of the following viral clusters is administered: vc1 O, vc23, and vc39. The method may further comprise administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium,
Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter. The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.
[00143] Also provided is a method for preventing and/or treating UC in a subject. An effective amount of a virus from a viral cluster vc98 and/or vc103 is administered. The method may further comprise administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof. The composition stimulates growth and/or activity in the GI microbiota of the subject of the bacterial genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus Akkermansia. The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.
[00144] Also provided is a method for preventing and/or treating IBD in a subject in need thereof. An effective amount of a probiotic or a prebiotic composition or a combination thereof is administered to the subject. The composition stimulates growth and/or activity of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus,
Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,
Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides,
Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.
The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.
[00145] Also provided is a method for preventing and/or treating CD in a subject in need thereof. An effective amount of a probiotic or a prebiotic composition or a combination thereof is administered to the subject. The composition stimulates growth and/or activity of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter,
Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter. The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.
[00146] Also provided is a method for preventing and/or treating UC in a subject in need thereof. An effective amount of a probiotic or a prebiotic composition or a combination thereof is administered to the subject. The composition stimulates growth and/or activity of the genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus. In some embodiments, the probiotic composition comprises one or more bacterial strains from the genus Akkermansia. The prebiotic or probiotic composition may be effective to fully, or partially, restore normal microbiota.
[00147] Any of the above methods may further comprise administering to the subject additional diagnostic tests for IBD, CD and/or UC.
[00148] Any of the above methods may further comprise enrolling the subject in a clinical trial.
[00149] Bacterial taxa associated with IBD may comprise one or more of the following bacterial genera: Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, Flavonifractor, Catenibacterium, Ruminococcus, Coprococcus,
Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Dorea, Roseburia, Odoribacter, and Akkermansia. Bacterial taxa associated with IBD may comprise a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. [00150] An increased abundance of one or more of the following bacterial genera in the subject sample as compared to a healthy control may be indicative of the presence of IBD in the subject: Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, and Flavonifr actor. The abundance may be increased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00151] An increased abundance of one or more of the following bacterial genera in the subject sample as compared to a healthy control may be indicative of the presence of Crohn’s Disease (CD) in the subject: Clostridium XlVa, Blautia, Megasphaera, and Fusobacterium. The abundance may be increased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
An increased abundance of the bacterial genus Flavonifractor in the subject sample as compared to a healthy control may be indicative of the presence of ulcerative colitis (UC) in the subject.
An increased abundance of one or more bacterial species selected from Bacteroides fragilis and Ruminococcus gnavus in the subject sample as compared to a healthy control may be indicative of the presence of UC in the subject. An increased abundance of Ruminococcus gnavus in the subject sample as compared to a control sample from a patient with UC in remission may be indicative of the presence of a flare-up of UC in the subject. An increased abundance of
Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes in the subject sample as compared to a control sample from a patient with a flare-up of UC in remission may be indicative of the presence of UC in remission in the subject. The abundance may be increased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold. [00152] A decreased abundance of one or more of the following bacterial genera in the subject sample as compared to a healthy control may be indicative of the presence of IBD in the subject: Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV,
Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia. The abundance may be decreased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100- fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00153] A decreased abundance of one or more of the following bacterial genera in the subject sample as compared to a healthy control may be indicative of the presence of Crohn’s Disease (CD) in the subject: Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter. The abundance may be decreased by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150- 200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00154] A decreased abundance, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000- fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10- fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500- fold, or by about 1,000-fold, of the bacterial genus Akkermansia in the subject sample as compared to a healthy control may be indicative of the presence of ulcerative colitis (UC) in the subject.
[00155] The analysis of the collection of the viral clusters created based on the subject’s viral genome sequences can include identifying common clusters present in both the collection of viral clusters associated with the subject and present in the collection of marker clusters. For each common cluster, a relative abundance of members within that cluster found in the subject’s GI microbiota sample can be determined. For each common cluster, a correlation value can be associated with each common cluster in the collection of marker clusters. The comparision of the viral clusters derived from the subject to the marker clusters can include comparing the relative abundance of members within each common cluster associated with the patient to the correlation value of each common cluster in the collection of marker clusters.
[00156] In some embodiments, the subject can be diagnosed with Crohn’s disease if there is a decrease in the abundance of a virus of a viral taxon listed in Table 13 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00157] In some embodiments, the subject can be diagnosed with ulcerative colitis if there is a decrease in the abundance of a virus of a viral taxon listed in Table 14 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00158] In some embodiments, the subject can be diagnosed with Crohn’s disease if there is an increase in the abundance of a virus of a viral taxon listed in Table 15 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00159] In some embodiments, the subject can be diagnosed with ulcerative colitis if there is an increase in the abundance of a virus of a viral taxon listed in Table 16 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10- 99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100- 150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00160] In some embodiments, the subject can be diagnosed with Crohn’s disease if there is an increase in the abundance of bacteria of a bacterial taxon listed in Table 15 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00161] In some embodiments, the subject can be diagnosed with ulcerative colitis if there is an increase in the abundance of bacteria of a bacterial taxon listed in Table 16 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00162] In some embodiments, the subject can be diagnosed with Crohn’s disease if there is an increase in the abundance of bacteria of a bacterial taxon listed in Table 17 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00163] In some embodiments, the subject can be diagnosed with ulcerative colitis if there is an increase in the abundance of bacteria of a bacterial taxon listed in Table 18 in the subject as compared to a reference amount of the abundance of the virus in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90- 110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold. [00164] In some embodiments, the subject can be diagnosed with IBD (e.g., Crohn’s disease or ulcerative colitis) if in the subject the abundance of one or more viruses in vc23 is reduced as compared the abundance of the same one or more viruses in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00165] In some embodiments, the subject can be diagnosed with IBD (e.g., Crohn’s disease or ulcerative colitis) if in the subject the abundance of one or more viruses in vc39 is reduced as compared the abundance of the same one or more viruses in one or more healthy subjects, e.g., by 10-99%, 10-20%, 20-30%, 30-50%, 40-60%, 50-70%, 60-80%, 70-90%, 80-100%, 90-110%, 100-150%, 120-170%, 150-200%, by two-fold to 1,000-fold, about two-fold, by about three-fold, by about four-fold, by about five-fold, by about 10-fold, by about 20-fold, by about 50-fold, by about 100-fold, by about 200-fold, by about 500-fold, or by about 1,000-fold.
[00166] In some embodiments, the subject can be diagnosed with IBD (e.g., Crohn’s disease or ulcerative colitis) if in the subject the abundance of one or more viruses in vc1O is reduced as compared the abundance of the same one or more viruses in one or more healthy subjects.
[00167] In various embodiments, a kit for determining the presence of IBD in a subject can include a device to receive viral genome sequences, including unidentified viral genome sequences, from an individual subject and diagnose the subject for IBD based at least in part on the virome marker clusters. The viral genome sequences can be derived from a GI microbiota sample provided by the subject. The GI microbiota sample can be a fecal sample. The GI microbiota sample can be a cecal sample. The GI microbiota sample can be an ileal sample. The GI microbiota sample can be a colonic microbiota sample. In various embodiments, microbiota from other sites can be used, such as oral microbiota samples, nasal microbiota samples, skin microbiota samples, and vaginal microbiota samples.
[00168] Diagnosis can include clustering the received viral genome sequences and comparing the subject’s viral genome clusters to the marker clusters. The viral genome sequences can be clustered by protein clustering and protein homology as described herein. The device can also analyse bacteria from the subject for the purpose of IBD diagnosis. The bacteria can be derived from the same GI microbiota sample provided by the subject used to obtain the viral genome sequences or a separate GI microbiota sample. The subject can be diagnosed for IBD based on the analysis of the bacteria and/or the analysis of the viral genome sequences. The IBD diagnosis can include a diagnosis for ulcerative colitis and/or Crohn’s disease.
[00169] Figure 17 is a block diagram illustration of a system or device 200 (referred to herein for simplicity as“device”) that be used as part of a kit for detecting IBD or health in a subject. The device 200 can include a dataset input module 210 configured to receive a dataset derived from a GI microbiota sample of a subject, a clustering module 220 configured to determine viral genome sequences within the subject’s dataset and cluster the viral genome sequences into viral clusters, a marker cluster input module 230 configured to receive an input that is based on IBD marker clusters, a cluster comparison module 240 that is configured to compare the subject’s viral clusters to the input representation of the marker clusters, and an output interface 250 configured to provide an indication of health or disease based on the comparison of the subject’s viral clusters to the representation of the marker clusters. The modules 210, 220, 230, 240 can be implemented by a computing system in hardware and/or software according to the principles described herein and as would be appreciated and understood by a person of ordinary skill in the art.
[00170] The dataset input module, when implemented at least in part by hardware, can include a wired or wireless receiver capable of receiving an electronic signal representative of the subject’s dataset. The clustering module 220, when implemented at least in part of hardware, can include a processor in communication with a memory with instructions thereon to create viral clusters based on and/or associated with the subject’s dataset according the the principles described herein. The marker cluster input module 230, when implemented at least in part by hardware, can include a wired or wireless receiver capable of receiving an electronic signal representative of IBD marker clusters. Additionally, or alternatively, the marker cluster input module 230 can include a memory store with a representation of the IBD marker clusters stored thereon. The cluster comparison module 240, when implemented at least in part by hardware, can include memory with instructions thereon to compare the subject’s viral clusters to the IBD marker clusters and provide as an output and indication of health or disease. The output interface 250, when implemented at least in part by hardware, can include a wired or wireless transmitter configured to transmit an electronic signal representative of the indication nfo health or disease. Additionally, or alternatively, the output interface 250 can include a user interface configured to provide an auditory, visual, or other sensory indication to a user that can be interpreted as an indication of health or disease.
[00171] In one aspect, the disclosure provides a method for treating dysbiosis in the gastrointestinal tract of a subject (e.g., human) in need thereof, said method comprising administering to said subject a therapeutically effective amount of any of the viruses described herein. In some embodiments, the virus is from any viral taxon listed in Table 13. In some embodiments, the virus is from any viral taxon listed in Table 14
[00172] In one aspect, the disclosure provides a method for treating dysbiosis in the gastrointestinal tract of a subject (e.g., human) in need thereof, said method comprising administering to said subject a therapeutically effective amount of an inhibitor of, or an agent that specifically targets, any of the viruses described herein. In some embodiments, the virus is from any viral taxon listed in Table 11. In some embodiments, the virus is from any viral taxon listed in Table 12
[00173] In one aspect, the disclosure provides a method for treating dysbiosis in the gastrointestinal tract of a subject (e.g., human) in need thereof, said method comprising administering to said subject a therapeutically effective amount of any of the bacteria described herein. In some embodiments, the bacteria is from any bacterial taxon listed in Table 17. In some embodiments, the bacteria is from any bacterial taxon listed in Table 18.
[00174] In one aspect, the disclosure provides a method for treating dysbiosis in the gastrointestinal tract of a subject (e.g., human) in need thereof, said method comprising administering to said subject a therapeutically effective amount of an inhibitor of, or agent that specifically targets, any of the bacteria described herein. In some embodiments, the bacteria is from any bacterial taxon listed in Table 15. In some embodiments, the bacteria is from any bacterial taxon listed in Table 16.
[00175] In another aspect, the disclosure provides a method for treating a gastrointestinal (GI) disorder in a subject (e.g., human) in need thereof, said method comprising administering to said subject a therapeutically effective amount of any of the virus compositions described herein. Non-limiting examples of encompassed GI disorders include, e.g., inflammatory bowel disease (IBD), ulcerative colitis, Crohn's disease, irritable bowel syndrome (IBS), infectious gastroenteritis, non-infectious gastroenteritis, food allergy, and gastrointestinal graft versus host disease. [00176] The disclosure also provides pharmaceutical compositions comprising the viruses and/or bacteria of the disclosure. The compositions disclosed herein can be formulated into a variety of forms and administered by a number of different means. Non-limiting examples of useful routes of delivery include oral, topical, rectal, mucosal, sublingual, nasal, intravenous, subcutaneous, and via naso/oro-gastric gavage. The active agent may be systemic after administration or may be localized by the use of regional administration, intramural administration, or use of an implant that acts to retain the active dose at the site of implantation. The active agent, vector, virus, bacteriophage, particle, or a bacterial inoculant can be mixed with a carrier and (for easier delivery to the digestive tract) applied to liquid or solid food, or feed or to drinking water. The carrier material should be non-toxic to the
virus/bacteriophage/bacteria and the subject/patient. Non-limiting examples of formulations useful in the methods of the present disclosure include oral capsules and saline suspensions for use in feeding tubes, transmission via nasogastric tube, or enema. If live virus, bacteriophage or bacteria are used, the carrier should preferably contain an ingredient that promotes viability of the virus/bacteriophage/bacteria during storage. The formulation can include added ingredients to improve palatability, improve shelf-life, impart nutritional benefits, and the like. If a reproducible and measured dose is desired, the formulation can be administered by a rumen cannula. In certain embodiments, the formulation used in the methods of the disclosure further comprises a buffering agent. Examples of useful buffering agents include saline, sodium bicarbonate, milk, yogurt, infant formula, and other dairy products.
[00177] Bacteria-containing formulations may also comprise one or more prebiotics which promote growth and/or immunomodulatory activity of the bacteria in the formulation. While it is possible to use a compound, vector, virus, bacteriophage, particle, or a bacterial inoculant of the present disclosure for therapy as is, it may be preferable to administer it in a pharmaceutical formulation, e.g., in admixture with a suitable pharmaceutical excipient, diluent or carrier selected with regard to the intended route of administration and standard pharmaceutical practice. The excipient, diluent and/or carrier must be“acceptable” in the sense of being compatible with the other ingredients of the formulation and not deleterious to the recipient thereof. Acceptable excipients, diluents, and carriers for therapeutic use are well known in the pharmaceutical art, and are described, for example, in Remington: The Science and Practice of Pharmacy. Lippincott Williams & Wilkins (A.R. Gennaro edit. 2005). The choice of pharmaceutical excipient, diluent, and carrier can be selected with regard to the intended route of administration and standard pharmaceutical practice. Although there are no physical limitations to delivery of the formulations of the present disclosure, oral delivery is preferred for delivery to the digestive tract because of its ease and convenience, and because oral formulations readily accommodate additional mixtures, such as milk, yogurt, and infant formula.
[00178] Oral delivery may also include the use of nanoparticles that can be targeted, e.g., to the GI tract of the subject, such as those described in Yun et al., Adv Drug Deliv Rev. 2013, 65(6):822-832 (e.g., mucoadhesive nanoparticles, negatively charged carboxylate- or sulfate- modified particles, etc.). Non-limiting examples of other methods of targeting delivery of compositions to the GI tract are discussed in U.S. Pat. Appl. Pub. No. 2013/0149339 and references cited therein (e.g., pH sensitive compositions [such as, e.g., enteric polymers which release their contents when the pH becomes alkaline after the enteric polymers pass through the stomach], compositions for delaying the release [e.g., compositions which use hydrogel as a shell or a material which coats the active substance with, e.g., in vivo degradable polymers, gradually hydrolyzable polymers, gradually water-soluble polymers, and/or enzyme degradable polymers], bioadhesive compositions which specifically adhere to the colonic mucosal membrane, compositions into which a protease inhibitor is incorporated, a carrier system being specifically decomposed by an enzyme present in the colon).
[00179] For oral administration, the active ingredient(s) can be administered in solid dosage forms, such as capsules, tablets, and powders, or in liquid dosage forms, such as elixirs, syrups, and suspensions. A capsule typically comprises a core material comprising a bacterial composition and a shell wall that encapsulates the core material. In some embodiments, the core material comprises at least one of a solid, a liquid, and an emulsion. In other embodiments, the shell wall material comprises at least one of a soft gelatin, a hard gelatin, and a polymer. Suitable polymers include, but are not limited to: cellulosic polymers such as hydroxypropyl cellulose, hydroxyethyl cellulose, hydroxypropyl methyl cellulose (HPMC), methyl cellulose, ethyl cellulose, cellulose acetate, cellulose acetate phthalate, cellulose acetate trimellitate,
hydroxypropylmethyl cellulose phthalate, hydroxypropylmethyl cellulose succinate and carboxymethylcellulose sodium; acrylic acid polymers and copolymers, such as those formed from acrylic acid, methacrylic acid, methyl acrylate, ammonio methylacrylate, ethyl acrylate, methyl methacrylate and/or ethyl methacrylate (e.g., those copolymers sold under the trade name “Eudragit”); vinyl polymers and copolymers such as polyvinyl pyrrolidone, polyvinyl acetate, polyvinylacetate phthalate, vinylacetate crotonic acid copolymer, and ethylene-vinyl acetate copolymers; and shellac (purified lac). In yet other embodiments, at least one polymer functions as taste-masking agents.
[00180] The active component(s) can be encapsulated in gelatin capsules together with inactive ingredients and powdered carriers, such as glucose, lactose, sucrose, mannitol, starch, cellulose or cellulose derivatives, magnesium stearate, stearic acid, sodium saccharin, talcum, magnesium carbonate. Examples of additional inactive ingredients that may be added to provide desirable color, taste, stability, buffering capacity, dispersion or other known desirable features are red iron oxide, silica gel, sodium lauryl sulfate, titanium dioxide, and edible white ink. Similar diluents can be used to make compressed tablets. Both tablets and capsules can be manufactured as sustained release products to provide for continuous release of medication over a period of hours. Compressed tablets can be sugar coated or film coated to mask any unpleasant taste and protect the tablet from the atmosphere, or enteric-coated for selective disintegration in the gastrointestinal tract. Liquid dosage forms for oral administration can contain coloring and flavoring to increase patient acceptance.
[00181] Formulations suitable for parenteral administration include aqueous and nonaqueous, isotonic sterile injection solutions, which can contain antioxidants, buffers, bacteriostats, and solutes that render the formulation isotonic with the blood of the intended recipient, and aqueous and nonaqueous sterile suspensions that can include suspending agents, solubilizers, thickening agents, stabilizers, and preservatives.
[00182] Alternatively, powders or granules embodying the bacterial and viral compositions disclosed herein can be incorporated into a food product. In some embodiments, the food product is a drink for oral administration. Non-limiting examples of a suitable drink include fruit juice, a fruit drink, an artificially flavored drink, an artificially sweetened drink, a carbonated beverage, a sports drink, a liquid diary product, a shake, an alcoholic beverage, a caffeinated beverage, infant formula and so forth. Other suitable means for oral administration include aqueous and nonaqueous solutions, emulsions, suspensions and solutions and/or suspensions reconstituted from non-effervescent granules, containing at least one of suitable solvents, preservatives, emulsifying agents, suspending agents, diluents, sweeteners, coloring agents, and flavoring agents. The food product can be a solid foodstuff. Suitable examples of a solid foodstuff include without limitation a food bar, a snack bar, a cookie, a brownie, a muffin, a cracker, an ice cream bar, a frozen yogurt bar, and the like.
[00183] In other embodiments, the bacterial and viral compositions disclosed herein are incorporated into a therapeutic food. In some embodiments, the therapeutic food is a ready-to- use food that optionally contains some or all essential macronutrients and micronutrients. In another embodiment, the compositions disclosed herein are incorporated into a supplementary food that is designed to be blended into an existing meal. In one embodiment, the supplemental food contains some or all essential macronutrients and micronutrients. In another embodiment, the bacterial compositions disclosed herein are blended with or added to an existing food to fortify the food's protein nutrition. Examples include food staples (grain, salt, sugar, cooking oil, margarine), beverages (coffee, tea, soda, beer, liquor, sports drinks), snacks, sweets and other foods.
[00184] The useful dosages of the compositions and formulations of the disclosure will vary widely, depending upon the nature of the disease, the patient’s medical history, the frequency of administration, the manner of administration, the clearance of the agent from the host, and the like. The initial dose may be larger, followed by smaller maintenance doses. The dose may be administered as infrequently as weekly or biweekly, or fractionated into smaller doses and administered daily, semi- weekly, etc., to maintain an effective dosage level.
Additional Embodiments:
1. A method for identifying a plurality of viral marker clusters for determining the presence of inflammatory bowel disease (IBD) using viral genome sequences, the method comprising: obtaining a first dataset representing a first plurality of viral genome sequences derived from gastrointestinal (GI) microbiota samples of a healthy cohort;
obtaining a second dataset representing a second plurality of viral genome sequences derived from GI microbiota samples of a cohort diagnosed with IBD;
creating a first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort; creating a second plurality of viral clusters by using protein clustering to group like proteins derived from the second dataset and by using protein homology to group viral genome sequences of the second dataset, each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD; and
identifying a plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters.
2. The method of embodiment 1 ,
wherein at least a portion of the first plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database, and
wherein at least a portion of the second plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database.
3. The method of embodiments 1 or 2, wherein a totality of the first plurality and second plurality of viral genome sequences are each unassociated with a viral taxonomic category derived from a viral genome database.
4. The method of any one of embodiments 1-3, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises using machine learning to identify the plurality of marker clusters.
5. The method of any one of embodiments 1-4, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises identifying the plurality of marker clusters unassociated with a known taxon.
6. The method of any one of embodiments 1-5, wherein each of the viral clusters in the plurality of marker clusters respectively represent an unidentified taxon of higher rank than a strain and of lower rank than a family.
7. The method of any one of embodiments 1-6, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises performing beta diversity analysis on the first plurality of viral clusters and the second plurality of viral clusters.
8. The method of embodiment 7, wherein performing the beta diversity analysis comprises performing a scaling and ordination technique selected from a group consisting of principal coordinates analysis (PCoA), principal components analysis (PCA), non-metric multidimensional scaling (NMDS), canonical correspondence analysis (CCA), and redundancy analysis (RDA).
9. The method of any one of embodiments 1-8, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises calculating differential abundance of viral clusters in the first plurality of viral clusters and the second plurality of viral clusters.
10. The method of any one of embodiments 1-9, wherein the healthy cohort and the cohort diagnosed with IBD are each human cohorts.
11. The method of any one of embodiments 1-10, further comprising:
associating a first data subset of the second dataset with a first sub-cohort diagnosed with IBD and Crohn's disease (CD);
associating a second data subset of the second dataset with a second sub-cohort diagnosed with IBD and ulcerative colitis (UC);
associating a first subset of viral clusters of the second plurality of viral clusters with the first sub-cohort;
associating a second subset of viral clusters of the second plurality of viral clusters with the second sub-cohort; and
identifying a first subset of marker clusters of the plurality of marker clusters and a second subset of marker clusters of the plurality of marker clusters by comparing the first subset of viral clusters to the second subset of viral clusters. 12. The method of any one of embodiments 1-11, further comprising:
representing the viral genome sequences in the first dataset each respectively as a first viral contig of a protein sequence; and
representing the viral genome sequences in the second dataset each respectively as a second viral contig of a protein sequence.
13. The method of any one of embodiments 1-12,
wherein the first dataset further represents a first plurality of identified viral genome sequences derived from the healthy cohort,
wherein the second dataset further represents a second plurality of identified viral genome sequences derived from the cohort diagnosed with IBD, and
wherein the method further comprises:
creating a first plurality of reference viral clusters using protein clustering to group like proteins and protein homology to group identified viral genome sequences of the first plurality of identified viral genome sequences;
creating a second plurality of reference viral clusters using protein clustering to group like proteins and protein homology to group identified viral genome sequences of the second plurality of identified viral genome sequences; and
wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters further comprises identifying the plurality of marker clusters by comparing a combination of the first plurality of viral clusters and the first plurality of reference viral clusters to a combination of the second plurality of viral clusters and the second plurality of reference viral clusters.
14. The method of embodiment 13,
wherein the first plurality of identified viral genome sequences are associated with a viral taxonomic category present in a viral genome database, and
wherein the second plurality of identified viral genome sequences are associated with a viral taxonomic category present in a viral genome database. 15. A method for determining the presence of inflammatory bowel disease (IBD) in a subject, the method comprising:
obtaining an individual viral dataset representing a plurality of viral genome sequences derived from a GI microbiota sample obtained from the subject;
creating a plurality of subject viral clusters using protein clustering to group like proteins derived from the individual viral dataset and by using protein homology to group unidentified viral genome sequences of the individual viral dataset, each viral cluster in the plurality of subject viral clusters comprising one or more viral genome sequences derived from the subject;
obtaining a plurality of marker clusters indicative of the presence or absence of IBD; and comparing the plurality of subject viral clusters to the plurality of marker clusters.
16. The method of embodiment 15, wherein at least a portion of the plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database.
17. The method of embodiments 15 or 16, wherein a totality of the plurality of viral genome sequences are each unassociated with a viral taxonomic category derived from a viral genome database.
18. The method of any one of embodiments 15-17, wherein at least a portion of the plurality of marker clusters are unassociated with a viral taxonomic category derived from a viral genome database.
19. The method of any one of embodiments 15-18, further comprising determining the presence of IBD in the subject based at least in part on the comparison of the plurality of subject viral clusters to the plurality of marker clusters.
20. The method of any one of embodiments 15-19, wherein the marker clusters comprise one or more viral clusters from taxa Siphoviridae, Myoviridae, Podoviridae, CrAss-like, or Microviridae. 21. The method of any one of embodiments 15-20, wherein the plurality of marker clusters comprises one or more viral clusters selected from vc2, vc6, vc7, vc13, vc14, vc15, vc17, vc19, vc21, vc22, vc23, vc24, vc25, vc28, vc29, vc36, vc37, vc38, vc39, vc40, vc42, vc45, vc48, vc53, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc75, vc76, vc77, vc78, vc79, vc80, vc82, vc84, vc85, vc86, vc88, vc89, vc91, vc92, vc94, vc95, vc96, vc97, vc98, vc99, vc1 Ol, vc102, vc103, vc104, vc108, vc109, vc1 l2, vei l 3, vei l 5, vc1 l7, vei l 8, vc122, vc123, vc124, vc130, vc132, vc136, vc138, vc142, vc143, vc152, vc154, vc155, vc160, vc161, vc175, vc178, vc181, vc190, vc193, vc205, vc209, vc216, vc218, vc225, vc232, vc263, vc264, vc281, vc284, vc298, vc320, vc411, vc413, vc420, vc456, and vc467.
22. The method of embodiments 15-21, wherein an increased abundance of one or more viral clusters selected from vc2, vc13, vc14, vc15, vc17, vc21, vc22, vc36, vc40, vc48, vc53, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc77, vc78, vc79, vc80, vc85, vc88, vc89, vc91, vc94, vc95, vc97, vc102, vc108, vc1 l3, vc1 l5, vc1 l7, vc1 l 8, vc122, vc123, vc130, vc132, vc142, vc152, vc155, vc160, vc161, vc175, vc178, vc181, vc205, vc218, vc232, vc263, vc264, vc281, vc298, vc413, and vc420 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of IBD in the subject.
23. The method of embodiment 22, wherein an increased abundance of one or more viral clusters selected from vc15, vc66, vc71, vc73, vc77, vc78, vc79, vc80, vc91, vc94, vc108, vc1 l3, vc1 l7, vc1 l8, vc132, vc142, vc155, vc160, vc178, vc232, vc264, vc281, vc298, and vc420 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
24. The method of embodiment 22, wherein an increased abundance of one or more viral clusters selected from vc28 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
25. The method of embodiment 22, wherein an increased abundance of one or more viral clusters selected from vc2, vc17, vc21, vc22, vc53, vc70, vc74, vc85, vc88, vc89, vc1 l5, vc122, vc123, vc130, vc152, vc161, vc175, vc181, vc205, vc218, vc263, and vc413 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
26. The method of embodiment 22, wherein an increased abundance of viral cluster vc2 in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
27. The method of any one of embodiments 15-21, wherein an increased abundance of one or more viral clusters selected from vc38 vc46, vc48, vc54, vc57, vc62, vc64, vc69, vc71, vc108, vc1 l l, vc1 l4, vei l 5, vc128, vc159, vc162, vc215, vc220, vc242, vc340, vc374, and vc392 in the subject sample as compared to a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject.
28. The method of any one of embodiments 15-21, wherein an increased abundance of one or more viral clusters selected from vc16, vc119, and vc163 in the subject sample as compared to a patient with a flare-up of ulcerative colitis (UC) is indicative of the presence of UC in remission in the subject.
29. The method of any one of embodiments 15-21, wherein a decreased abundance of one or more viral clusters selected from vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1 Ol, vc103, vc104, vc109, vc1 l2, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411, vc456, and vc467 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of IBD in the subject.
30. The method of embodiment 29, wherein a decreased abundance of one or more viral clusters selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject. 31. The method of embodiment 29, wherein a decreased abundance of one or more viral clusters selected from vc7, vc25, vc47, and vc64 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
32. The method of embodiment 29, wherein a decreased abundance of vc98 and/or vc103 viral cluster in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
33. The method of any one of embodiments 15-32, wherein obtaining the dataset(s) is performed by sequencing VLP DNA isolated from GI microbiota sample(s).
34. The method of any one of embodiments 15-33, further comprising:
obtaining an individual bacteriome dataset representing bacterial sequences derived from the GI microbiota sample obtained from the subject; and
evaluating the individual bacteriome dataset for the presence of bacterial taxa associated with IBD.
35. The method of embodiment 34, further comprising determining the presence of IBD in the subject based at least in part on the comparison of the individual bacteriome dataset to at least one of a healthy control and a control diagnosed with IBD.
36. The method of embodiment 34 or embodiment 35, wherein the bacterial taxa associated with IBD comprise one or more bacterial genera selected from Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, Flavonifractor, Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Bamesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Dorea, Roseburia, Odoribacter, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera. 37. The method of embodiment 36, wherein an increased abundance of one or more bacterial genera selected from Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, and Flavonifractor in the subject sample as compared to a healthy control is indicative of the presence of IBD in the subject.
38. The method of embodiment 37, wherein an increased abundance of one or more bacterial genera selected from Clostridium XlVa, Blautia, Megasphaera, and Fusobacterium in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
39. The method of embodiment 34 or embodiment 35, wherein an increased abundance of one or more bacterial species selected from Bacteroides fragilis and Ruminococcus gnavus in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
40. The method of embodiment 34 or embodiment 35, wherein an increased abundance of Ruminococcus gnavus in the subject sample as compared to a control sample from a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject.
41. The method of embodiment 34 or embodiment 35, wherein an increased abundance of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes in the subject sample as compared to a control sample from a patient with a flare-up of ulcerative colitis (UC) in remission is indicative of the presence of UC in remission in the subject.
42. The method of embodiment 37, wherein an increased abundance of bacterial genus Flavonifractor in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
43. The method of embodiment 36, wherein a decreased abundance of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia in the subject sample as compared to a healthy control is indicative of the presence of IBD in the subject.
44. The method of embodiment 43, wherein a decreased abundance of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifr actor, Dorea, Roseburia, and Odoribacter in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
45. The method of embodiment 43, wherein a decreased abundance of bacterial genus Akkermansia in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
46. The method of any one of embodiments 34-45, wherein obtaining the individual bacteriome dataset is performed by sequencing 16S rDNA or a V region of 16S rDNA in the GI microbiota sample.
47. The method of embodiment 46, wherein the V region is V4 region.
48. The method of any one of embodiments 15-47, wherein the GI microbiota sample is a fecal sample, a cecal sample, an ileal sample, or a colonic microbiota sample.
49. The method of any one of embodiments 15-48, wherein the subject is human.
50. The method of any one of embodiments 15-49, further comprising administering an IBD treatment to the subject.
51. The method of any one of embodiments 15-50, further comprising administering to the subject additional diagnostic tests for IBD, CD and/or UC. 52. The method of any one of embodiment s 15-51, further comprising enrolling the subject in a clinical trial.
53. The method of any one of embodiments 15-52, wherein comparing the plurality of subject viral clusters to the plurality of marker clusters comprises:
identifying common clusters present in the plurality of subject viral clusters and the plurality of marker clusters;
determining relative abundance of members within each common cluster in the plurality of subject viral clusters;
associating a correlation value with each common cluster in the plurality of marker clusters; and
comparing the relative abundance of members within each common cluster in the plurality of subject viral clusters to the correlation value of each common cluster in the plurality of marker clusters.
54. A kit for determining the presence of inflammatory bowel disease (IBD) in a subject comprising:
a device to:
receive a first dataset representing a plurality of unidentified viral genome sequences derived from a GI microbiota sample obtained from the subject;
receive a second dataset representing a plurality of viral genome IBD marker clusters;
create a plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group unidentified viral genome sequences of the plurality of unidentified viral genome sequences, each viral cluster in the plurality of viral clusters comprising one or more unidentified viral genome sequences of the plurality of unidentified genome sequences; and
compare the first plurality of viral clusters to the second dataset; and determine the presence of IBD based at least in part on the comparison of the plurality of viral clusters to the second dataset. 55. The kit of embodiment 54, wherein the device is further configured to:
receive a third dataset representing bacteria from the GI microbiota sample obtained from the subject;
evaluate the third dataset for the purpose of IBD diagnosis; and
determine the presence of IBD based at least in part on the evaluation of the third database.
56. The kit of embodiment 54 or 55, wherein the GI microbiota sample is one or more of group consisting a fecal sample, a cecal sample, an ileal sample, and a colonic microbiota sample.
57. The kit of any of embodiments 54-56, wherein the IBD is ulcerative colitis (UC).
58. The kit of any of embodiments 54-56, wherein the IBD is Crohn's disease (CD).
59. The kit of any one of embodiments 54-58, wherein the subject is human.
60. A system comprising:
one or more processors;
a memory in communication with the one or more processors and storing instructions thereon that, when executed by the one or more processors, are configured to cause the system to:
receive a first dataset representing a first plurality of viral genome sequences derived from a healthy cohort;
receive a second dataset representing a second plurality of viral genome sequences derived from a cohort diagnosed with IBD;
create a first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort;
create a second plurality of viral clusters by using protein clustering to group like proteins derived from the second dataset and by using protein homology to group viral genome sequences of the second dataset, each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD; and
identify a plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters.
61. A method for preventing and/or treating inflammatory bowel disease (IBD) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc101, vc103, vc104, vc109, vc1 l2, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411, vc456, and vc467.
62. A method for preventing and/or treating IBD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
63. A method for preventing and/or treating Crohn's disease (CD) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284.
64. A method for preventing and/or treating CD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
65. A method for preventing and/or treating ulcerative colitis (UC) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster vc98 and/or vc103. 66. A method for preventing and/or treating CD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
67. The method of embodiment 61 , further comprising administering to the subj ect an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
68. The method of embodiment 63 , further comprising administering to the subj ect an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
69. The method of embodiment 65, further comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of the bacterial genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus. 70. A method for preventing and/or treating IBD in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
71. A method for preventing and/or treating CD in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
72. A method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of the genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus.
73. The method of embodiment 67 or embodiment 70, wherein said probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.
74. The method of embodiment 68 or embodiment 71, wherein said probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter.
75. The method of embodiment 69 or embodiment 72, wherein said probiotic composition comprises one or more bacterial strains from the genus Akkermansia.
76. The method of any one of embodiments 67-75, wherein the V region is V4 region.
77. A method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.
78. A method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic comprising one or more of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.
79. The method of any one of embodiments 61-78, wherein the subject is human.
EXAMPLES
[00185] The present invention is also described and demonstrated by way of the following examples. However, the use of these and other examples anywhere in the specification is illustrative only and in no way limits the scope and meaning of the invention or of any exemplified term. Likewise, the invention is not limited to any particular preferred embodiments described here. Indeed, many modifications and variations of the invention may be apparent to those skilled in the art upon reading this specification, and such variations can be made without departing from the invention in spirit or in scope. The invention is therefore to be limited only by the terms of the appended claims along with the full scope of equivalents to which those claims are entitled.
[00186] As an illustration of the principles described above, an analysis the whole-virome of a published keystone IBD virome cohort is presented herein. Using protein-based clustering of viral sequences, example methods applied according to the present invention are demonstrated to overcome high levels of inter-individual variation in the gut virome and reveal compositional changes within the gut virome in subjects with IBD. Virome changes are shown to reflect alterations in bacterial composition. Viromes of individuals with Crohn’s disease can be characterized by increased numbers of temperate phage sequences. No substantial change is observed in viral alpha diversity across cohorts. Incorporating both the bacteriome and virome composition is demonstrated to offered more accurate classification of subjects as healthy or diseased compared to classification based on the bacteriome alone.
[00187] An analysis of a keystone dataset consisting of subjects with CD, UC and healthy controls is presented herein. This analysis overcomes strain-level resolution using protein homology and MCL to create higher taxonomic ranks and reveals hitherto unseen compositional patterns across the virome in health and disease. The analysis includes up to 97% of reads per sample (78.55 ± 18.79% (meant SD)), rather than the 15% used in a prior publication also relying on the same keystone dataset(Norman et al, 2015, which is incorporated herein by reference in its entirety). As demonstrated in the presently illustrated analysis, it is possible to identify patterns across individuals and cohorts. Alterations in the virome were observed and potential disease biomarkers were identified for further characterization. This disclosure shows that virome alterations mimic that of the bacteriome and offers an improved method for classifying IBD patients from healthy subjects.
[00188] Without wishing to be bound by theory, the described approach provides insight into the viral dark matter in human health and disease. The methods also allow cohort comparisons and overcome problems associated with the high level of inter-individual variation. By identifying sequences which are associated with health and disease, this approach provides a framework for identifying novel virome biomarkers and targets for further wet-lab
characterization. [00189] It has been previously reported that the human gut virome exhibits high levels of inter individual variation (Reyes et al, 2010. Nature, 466, 334-8) which is exacerbated by the need to analyze the virome at an assembly level which leads to strain level analysis. Unlike 16S analysis performed on bacteria, which is assessed at higher taxonomic ranks such as family and genus, viral and phage taxonomy does not have a similar defined structure which makes comparisons of cohorts very difficult. Strain level resolution hampers cohort comparisons due to a lack of commonality amongst in samples in the dataset which masks compositional patterns occurring at higher taxonomic ranks. The analysis presented here originally utilized vOTUs (viral contigs made non-redundant at 90% identity over 90% of the length), but this level of resolution masked shared signals across cohorts. This was overcome by clustering viral genomes based on their protein content using vContact2 (Bolduc et al., 2017. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ, 5, e3243) to create a higher, protein based, taxonomic rank. This revealed shared virome features while retaining relevant biological signals across the cohort (as seen by VCs across subjects), increased the variation explained in the beta diversity (increased eigenvalues) and decreased the abundance of unique viral VCs per subject across the dataset. Viral clustering also enabled the detection of a core virome in healthy subjects, consisting of eight VCs across 50% of the cohort. This proved to be a key differentiator between health and disease throughout the analysis. Many of these core VCs were differentially abundant in health and disease and were primary drivers of PCoA separation and machine learning predictions. Without wishing to be bound by theory, the method can allow for improved comparisons of cohorts for a multiude of disease conditions and body sites by enabling the analysis of the whole virome with a reduction of the level of uniqueness at strain level; differently abundant VCs across cohorts can themselves be viral marker clusters or can lead to the identification of viral marker clusters that can be used to classify individual subject samples as indicative of health or disease.
[00190] In beta diversity analysis, IBD subjects shifted significantly away from healthy controls thus providing evidence of compositional differences in the gut viromes. Drivers of these separations were associated with Myoviridae and Siphoviridae in IBD and Microviridae and crAss-like phages in healthy subjects. This was also reflected in differential abundance analysis with many VCs classified as Myoviridae and Siphoviridae significantly increased in subjects with IBD. No differences were found between the alpha diversity of the subjects with IBD and healthy controls, a finding contradictory to previous publications (Zuo et al, 2019. Gut), including previous analysis of this dataset (Norman et al, 2015. Cell, 160, 447-60). These findings may reflect the limited scope of database dependent analysis methods and that changes in these subsets of the viral community are not reflected in the virome as a whole. The findings presented here also suggest that although the composition of the virome is altered in subjects with disease the number of viruses remains consistent. The alpha diversity of VCs was assessed with a classification in the order Caudovirales to replicate the work of Norman et al., (2015.
Cell, 160, 447-60). Results were in agreement in that an increase in Caudovirales diversity in the CD cohort was observed, however no significance for UC versus controls was observed when analyzing the total virome. This could reflect a more global change with abundant lytic phage being replaced with temperate phage in IBD environments. It was speculated that a more stressful environment for the bacteria could result in lysogenic phage being released. This stress would also result in a decreased bacterial alpha diversity in IBD which was also observed here.
In terms of beta diversity subjects with IBD had the smallest distances between points which could also be attributed to the low diversity. The incorporation of <)>Q33, a lactococcal phage not native to the human gut, at known concentrations allowed for the quantification of the total bacteriophage loads in the fecal samples. Interestingly, viral load was inversely proportional to observed VCs suggesting that dominance of particular viruses rather than an expansion of many could be responsible for a higher viral load.
[00191] This study provides new evidence of a correlation between alterations in the whole human gut virome to IBD. As previously reported the genus Faecalibacterium (Lopez-Siles et al., 2015. Appl Environ Microbiol, 81, 7582-92; Lopez-Siles et al., 2018. Front Cell Infect Microbiol, 8, 281; Gevers et al, 2014. Cell Host Microbe, 15, 382-392; Pascal et al, 2017. Gut, 66, 813-822; Machiels et al, 2014. Gut, 63, 1275-83) was depleted in IBD cohorts along with Ruminococcaceae (Gevers et al, 2014). Furthermore, it was found many differentially abundant taxa in agreement with the literature including Fusobacterium (Pascal et al., 2017, Strauss et al., 2011. Inflamm Bowel Dis, 17, 1971-8; Gevers et al., 2014), Veillonella (Gevers et al., 2014) and Ruminococcus gnavus (Joossens et al., 2011. Gut, 60, 631-7; Willing et al., 2010.
Gastroenterology, 139, 1844-1854 el) which correlated towards the shift in beta diversity as previously found in subjects with IBD. The trends observed in both alpha and beta diversity are also in agreement with previous reports (Halfvarson et al, 2017. Nat Microbiol, 2, 17004; Manichanh et al., 2006. Gut, 55, 205-11 ; Dicksved et al., 2008. ISME J, 2, 716-27; Pascal et al., 2017. Gut, 66, 813-822) and the previous analysis of this dataset (Norman et al, 2015. Cell, 160, 447-60), thus providing validity to the cohort tested and the methods. Although it is not possible from a cross-sectional study to know whether the virome alters the bacteriome or vice-versa, the datasets complement each other. The beta diversity trends were observed in both datasets and this was confirmed through Procrustes analysis. Although 16S showed improved accuracy in classifying subjects with IBD from controls, the addition of the virome improved upon this classification to over 94% area under the curve and to over 85% accuracy. All subjects with IBD were correctly classified, proving further evidence that the gut bacteriome and virome are truly altered in IBD disease, such that the gut bacaterome and virome are potential avenues for any of diagnosis, therapeutics and correction.
[00192] A common trend observed in the present virome analysis is the increased severity in the virome alteration of CD patients in comparison to UC, a finding replicated in the 16S. In PCoA, CD is located further from the healthy controls while subjects with CD are also the least stable cohort. There were also a greater number of differentially abundant clusters between healthy and CD compared to healthy and UC. Subjects with CD also had more differentially abundant RSVs than UC versus controls in the bacteriome, together with being located furthest from controls on the PCoA. This CD cohort had the least beta stability which may also be linked to having the lowest diversity. Interestingly, CD had a significantly higher diversity of Caudovirales and an increased number of reads aligned to lysogenic VCs when compared to healthy controls.
Furthermore, when differentiating cohorts using machine learning, many of the top 20 importance factors were differentially abundant between CD and controls for both virome and 16S models.
[00193] In light of the findings of this study two potential scenarios were proposed to describe the host-bacteriome-virome interaction in IBD and particular CD. One hypothesis is that oxidative stress in the gut during inflammation (Rigottier-Gois, 2013. ISME J, 7, 1256-61) creates a more stressful living environment for the bacteria which lyse and release lysogenic phage, also resulting in decreased bacterial diversity. These lysogenic phage are then detected in virome analysis as seen in this study and previously reported (Norman et al., 2015. Cell, 160, 447-60, Zuo et al., 2019. Gut mucosal virome alterations in ulcerative colitis. Gut). Secondly, although less probable, an event causing more lysogenic induction would result in increased bacterial lysis, therefore a decreased bacterial alpha diversity, and more lysogenic phage being detected. Currently it is not possible to pinpoint the precise mechanism of action, however it is clear that there are specific compositional differences in the viromes of subjects with IBD.
[00194] Examining the interaction of virome composition and disease state in UC in greater detail revealed more subtle changes than that seen between health and disease. This finding, in conjunction with the overall comparison between UC, CD and healthy controls, suggests the virome is not only less perturbed between healthy and UC, but also between flare and remission. This may reflect the disease severity of UC relative to CD or that they may interact with the host in different ways. Differences in disease locations, severity and risk factors such as the potential paradoxical relationship between CD and UC with smoking (Berkowitz et al., 2018. Front Immunol, 9, 74) have previously alluded to differences in disease etiologies. It is possible that the virome composition does not alter significantly between disease states to the same extent as disease versus healthy. However, disease status was not tested in a CD cohort which may be more fruitful as the difference in CD were more exaggerated.
[00195] Those of ordinary skill in the art can envision additional steps and methods for analysis. Increased sample sizes, particularly for disease state, may increase the ability to detect any potential alterations between flare and remission. Inclusion of one or more of food frequency questionnaires, medical history details, and medication history (Maier et al, 2018. Nature, 555, 623-628) may provide for improved analysis of the microbiota. Addition of metadata, such as that including household controls, can assist in the statistical analysis and can allow the exploration of environmental effects. Other DNA amplification steps besides MDA
amplification, which has a known bias towards Microviridae (Parras-Molto et al, 2018.
Microbiome, 6, 119), can be undertaken. More modern methods such as the accel-NGS prep kit may remove the need for amplification and may provide for more reliable indication of diversity (Roux et al, 2016. Towards quantitative viromics for both double-stranded and single-stranded DNA viruses. PeerJ, 4, e2777).
[00196] This study provides a detailed analysis of whole virome composition comparing CD/UC and healthy controls date. It also represents a detailed study of the unidentified majority of the virome in human disease and provides insights, paving the way for better understanding of the human virome as a whole. This analysis shows that analysis of the dark matter can be used to detect accurate profiles of the human gut virome. Although it is not yet possible to conclude if the bacteriome shapes the virome or vice-versa, they do correlate with each other, as shown by Procrustes analysis, and can assist in the classification of subjects with IBD from healthy controls. This analysis provides a method for the comparison of whole viromes across cohorts in diseases other than IBD, which will give further insights into how a fuller understanding of the role of the microbiome in health and disease can be beneficial.
[00197]
Datasets
[00198] A publicly available dataset which was generated on human gut virome composition associated with IBD (Norman et al., 2015) was utilized. The dataset was analyzed with a novel whole- virome analysis protocol that provided novel insights into compositional changes of the virome, and any potential role of such changes in IBD. The dataset (Norman dataset) comprised 165 virome samples from 130 subjects, more specifically 61 healthy controls, 27 subjects with Crohn’s disease (CD), and 42 subjects with ulcerative colitis (UC). Of these, six samples were known to be collected during CD flare, eight in CD remission, 13 in UC flare, and 20 in UC remission. To build upon these findings, a second dataset (Simponi dataset] was generated that consisted of longitudinal samples from 40 subjects with UC. These samples included 82 samples from periods of flareand 31 samples from periods of remission, allowing for the investigation of the impact of disease status on gut virome composition. 16S rRNA gene sequencing data was also obtained and performed for 149 (Normal dataset) and 109 (Simponi dataset! samples.
Protein-based clustering can overcome virome individuality and allow cohort comparisons
[00199] In order to compare the viromes of subjects with IBD to healthy controls, the overall composition was initially investigated at a viral contig level through PCoA of beta diversity (Figure 1A). The viral contigs represent individual viral genomes (whole or part), and therefore the resolution is at the strain level. This level of resolution was reflected in the extremely high levels of individuality amongst subjects at an assembly level. It was also observed in beta diversity as individuals were the primary drivers of separation with longitudinal samples grouping together. This individual specificity masked compositional differences of the virome. Each of the cohorts (control, CD and UC) showed little divergence with overlapping ellipses, while PC-axes 1 and 2 described very little of the variation in the dataset (4.85% and 3.59%). [00200] To overcome this high interpersonal variation, investigation of disease specific compositional changes in the virome a lower taxonomic resolution (i.e. a higher taxonomic rank) was developed. This was achieved by clustering viral contigs based on protein similarities and MCL using vContact2 (see methods). The viral contigs clustered into 472 Viral Clusters (VCs) of >2 members with 2,382 singletons remaining. (Singletons are henceforward referred to as a VC with one member.) The resulting VCs formed a new count table and so a VC-based PCoA was produced (Figure IB).
[00201] Samples largely grouped per condition with noticeable increases in the eigenvalues to 10.36% and 5.58% variation explained for PC-axes 1 and 2 respectively. However, it should be noted that samples with true deviation from the main cohort (such as subjects N208 and N56) remained distinctive, suggesting that the clustering process retains true compositional differences. To further determine if the use of clustering decreased the masking effect of intra individuality, the viral contig and VC relative abundances were plotted for control subjects and colored by the percent abundance of shared viral contigs/clusters across the remainder of the control cohort (Figure 1C). The relative abundance of viral contigs unique to each subject was 14% (± 8%) on average while an abundance of only 1.7% (± 4%) on average per subject was shared across 30% the healthy control cohort. There were no individual viral contigs shared across greater than 50% of the cohort. In contrast, clustering reduced the mean abundance of clusters unique to the subject to 1.3% (± 3%), while there was a mean abundance of 15% (± 6%) per subject shared across 30% of the cohort, 7.1% (± 6.6%) across 50% and 0.7% (± 1.4%) across 70%. A total of eight VCs were now shared across 30% of CD and UC cohorts where previously no viral contigs had been found (Figure ID). Therefore the use of clustering resulted in increased commonality across the dataset and therefore allowed for the comparison of viromes between cohorts.
Analysis of viral clusters reveals IBP specific alterations in the gut virome
[00202] PCoA beta diversity analysis using Spearman distances found CD relapse and remission located furthest from controls (p-value: 0.0023 and 0.0032, respectively) followed by UC relapse/remission (p-values: 0.002/0.0023) (Figure 2A). Small variations were observed in the disease state of each condition but were not significant although this may be due to small sample sizes. PCoA without the division of disease status showed CD and UC beta diversity significantly differed from healthy controls (p-values: 0.0002 and 0.0002; Figure 7A). The healthy cohort was also the least diverse across subjects, followed by UC, having the smallest pairwise distances between points (Figure 7B). This level of commonality was also reflected when comparing the number of clusters shared across each cohort as previously addressed
(Figure ID). There was an observable core virome (defined as presence across >50% of subjects) in the healthy cohort with two VCs (vc2 and vc7) shared across >70% of subjects and six (vc1, vc1O, vc23, vc25, vc32, vc39) across >50%. The majority of these VCs were unclassified (i.e. did not cluster with known viral genomes) with the exception of vc1 , classified as Siphoviridae, and vc1 O which is a crAss-like phage. In contrast, core VC’s were not found across UC subjects and just one core VC (vc32 unclassified) was found across CD subjects.
[00203] To assess the impact of whole virome analysis, Caudovirales and overall viral alpha diversity were assessed as these had been found to be different across cohorts using database dependent methods in the original analysis. Alpha diversity was calculated for each sample using the VC count tables. There were no significant differences across the cohorts and disease states for each disease (Figure 2B). Differences between the IBD conditions and healthy controls were compared using an additional diversity measurement (Shannon diversity) but no significance was found (Figures 8A-8B). The alpha diversity of any VCs assigned to Caudovirales families was also compared (Figures 9A-9B) and significance was found for CD (increased) versus healthy only.
[00204] DeSeq2 analysis revealed a number of classifiable VCs, including two crAss-like phages and two Microviridae, at significantly increased abundances in healthy controls compared to CD (Figure 2C) and UC (Figure 2D) vc19 and vc320 (crass-like phages) were absent from all CD and only vc320 was in one subject with UC, but other clusters classified as crAss-like phages were present. Conversely VCs classified as Siphoviridae (nine for CD, eight for UC) and Myoviridae (one for CD, two for UC) were increased in CD and UC versus controls. However, many of the most differentially abundant clusters were taxonomically unassigned and were therefore classed as being a part of the“viral dark matter”. 49 VCs were increased in control samples relative to CD with 86 differentially abundant in total. 25 VCs were increased in control samples relative to UC with 59 differentially abundant in total. All DeSeq2 results are found in Table 1 (control vs CD) and Table 2 (control vs UC). Interestingly, 30 of the 37 VCs (81%) increased in CD compared to controls and 28 of the 34 (82%) for UC versus controls were categorized as lysogenic. This was just 32% and 24% for VC increased in CD and UC, respectively. Further investigation into the presence of lysogenic VC abundance in each cohort indicated that CD subjects have significantly more reads aligned to lysogenic VCs than healthy controls (Figures 10A-10B).
The bacteriome also differs between patients with IBP and controls
[00205] Bacteriome data was compared to previously published studies to verify this dataset is not an outlier. Beta diversity (unweighted UniFrac) showed CD (relapse/remission) samples grouping furthest from controls (p-value: 0.0065, 0.0332) followed by UC (relapse/remission) (p-value: 0.018, 0.001) (Figure 3A), which was reflected in the virome composition.
Interestingly, and in contrast to the virome, the control bacteriome contained the largest variation amongst samples with CD having the smallest distances between points (Figures 7C-7D).
[00206] Decreased alpha diversity was observed (Chaol diversity) in the IBD cohorts versus healthy controls with the largest differences observed in CD flare (p-value: 0.012) and remission (p-value: 0.018) along with UC Flare (p-value: 0.051) (Figure 3B). Due to the small sample sizes this analysis was also repeated without the division of disease status and using various metrics (Figures 8C-8D). For both Chaol diversity and Shannon diversity, the healthy cohort was significantly higher than both IBD cohorts, while UC was also significantly increased when compared to CD (0.001).
[00207] A large number of taxa were found differentially abundant between control and CD (Figure 3C) and control versus UC (Figure 3D). A total of 113 taxa were decreased in CD versus controls while just 17 were increased. Similarly, 69 were increased in control vs UC and only 21 significantly increased in UC. Many of the taxa increased in controls versus both IBD cohorts were of the phyla Firmicutes and included the genera Faecalibacterium and the families Ruminococcaceae and Clostridiales. The most differentially abundant RSVs increased in CD versus controls included Fusobacterium and Veillonella, while the most increased in UC versus controls were Clostridium senso stricto and Lachnospiraceae (DeSeq2 results are listed in Table 3 (control vs CD) and Table 4 (control vs UC)). Correlations between PCoA and abundance counts reveal key drivers of gut microbiome composition
[00208] The drivers of significant shifts in beta diversity were assessed through correlations between PC coordinates and the VCs (for the virome) and RSVs (for the bacteriome). There were 25 VCs significantly correlated to PC-axes 1 and/or 2 (Figure 4A). Dependent upon the correlation coefficient, the associations could further be broken down into four quadrants. In quadrant 1 (top left), towards subjects with IBD, there were 18 significantly correlated and comprised of eight Siphoviridae, two Myoviridae, two Heterogeneous and six unclassified
(Table 5). Quadrant 3 (bottom left) one Myoviridae and 1 unclassified VC were significantly correlated towards subjects with IBD. VCs classed as Microviridae and crAss-like phages were significantly correlated towards the healthy controls (quadrant 4, bottom right), while there were also two unclassified VCs.
[00209] There were 76 RSVs significantly correlated towards controls (quadrant 1) for the beta diversity of 16S composition (Figure 4B). The correlations with the highest correlation coefficient (rho values) included RSVs with taxonomic assignments to Firmicutes,
Ruminococcaceae and Alistipes. Quadrant 3 correlations, also towards controls, contained 46 RSVs including Alistipes indistinctus and Clostridiales. For Quadrant 4, towards IBD subjects, four RSVs were significantly correlated including Ruminococcus gnaves and Flavonifractor plautii (Table 6).
[00210] The relationship between the virome and 16S composition was investigated through Procrustes analysis (Figures 10A and 10B). There was a significant positive correlation with an observed correlation coefficient of 0.7143 (p-value of 0.001). However, VC alpha diversity did not significantly correlate with observed bacterial species (Figure 8E), although there was significant weak correlation with Shannon diversity (p-value: 0.038, rho: 0.194) (Figure 8F).
Alterations in virome composition are less distinct between UC activity states
[00211] Differences in disease states (flare and remission) were investigated using a second cohort of 40 subjects with ulcerative colitis, sampled longitudinally resulting in 113 virome and 109 16S samples. Beta diversity analysis of virome composition using VCs (Figure 5A) did not show significant separation between flare and remission (p-value: 0.17). However, unclassified viral cluster vc40 was found to be significant (Table 7). In 16S analysis, the shift between flare and remission in beta diversity was not significant (p-value: 0.022) and there were 14 RSVs correlated to PC-axes 1 or 2 (Figure 5B, Table 8). RSVs towards the shift in UC remission (quadrant 1 ) included Faecalibacterium prausnitzii, Dorea longicatena and Coprococcus comes. An RSV classified as Ruminococcus gnavus was the only RSV which correlated towards UC flare. The virome and 16S were correlated using Procrustes analysis and there was a significant positive correlation, in agreement with previous results, with an observed correlation coefficient of 0.906 (p-value of 0.001) (Figure 12).
[00212] Although the median alpha diversity was higher in the virome for UC flare (Figure 5C) and UC remission for the 16S (Figure 5D), these values were not significant when assessed using both Chaol and Shannon diversity, again in agreement with the previous analysis. Viral load was estimated through spiking with a known concentration of lactococcal )>Q33 phage and was found to be negatively correlated with viral alpha diversity (rho: -0.415, p-value: 0.009) (Figure 13A). Viral diversity was also investigated over time and in relation to disease status (Figure 13B) and although there were fluctuations in the time series there was no observable trend with disease status and a comparison resulted no significant differences (p-value: 0.383).
[00213] Two crAss-like phages were increased in subjects in remission when compared to flare along with 2 Siphoviridae, 1 Microviridae and 7 unclassified phage (Figure 5E, Table 9).
Conversely there were 39 VCs increased in flare. These included two Anelloviridae, one Myoviridae, ten Siphoviridae and 24 unclassified. Bacteroides and Dialister were the only RSVs increased in remission while seven RSVs were increased in flare including Enterococcus, Prevotella and Streptococcus (Figure 5F, Table 10).
Virome composition aids the classification between Health and Disease
[00214] The ability of the virome and 16S composition to differentiate between patients with IBD and healthy controls was tested through the use of machine learning. Sample sizes were increased by combining UC and CD samples to form a composite IBD cohort. The virome alone (Figure 6A) yielded an accuracy of 0.769 (p-value of 0.032) with four of the top five contributors (vc39, vc23, vc38 and vc45) being increased in controls versus both IBD states. All five of these clusters were unclassified but two had CRISPR protospacer alignments to
Lachnospiraceae and Parabacteroides while the remaining two had hits to Bacteroides. The 16S alone had a greater predictive power than the virome (accuracy: 0.824, p-value: 0.008) with an RSV classified as Ruminococcaceae contributing the largest gain followed by a Clostridiales and Odoribacter splancgnicus (Figure 6B). The virome and the 16S were combined and the predictive power measured (Figure 6C). The accuracy increased to 0.853 (p-value: 0.0026) with the virome contributing to five of the top 20 most important features. Of these, 4 had CRISPR protospacers to bacteria including the order Clostridales, the family Lachnospiraceae, genus Pseudoflavonifractor, Clostridium and Johnsonella along with Fusobacterium and Bacteroides (Figure 14). Differences between CD and healthy proved to be the main predictors of disease with 11 VCs/RSVs being decreased in CD and one increased when compared to controls.
[00215] ROC curve analysis was performed as a second measure of accuracy of each model (Figure 6D). The AUC (area under the curve) of the virome alone was 78.31%, a decrease compared to 16S AUC which yielded an AUC of 89.72%. However, the virome and 16S combined had the largest AUC with 94.79%, predicting all 16 patients with IBD as IBD and only misclassifying five controls as IBD.
Key VCs revealed by the analysis of IBD viromes
[00216] Through various approaches of virome analysis ten key VCs consistently emerged (Figures 15A-15J). A key VC was defined as any which was core in one cohort and largely absent from another and/or significantly correlated in the PCoA axes and differentially abundant between the cohorts. vc23, vc39 and vc1O were present in the healthy core and largely absent from the subjects with CD (7, 14 and 26% respectively) and UC (12, 14 and 40% respectively). These three VCs were all in the top seven importance factors in the machine learning while vc39 and vc23 were in the top two. vc23, although unclassified, contained CRISPR protospacers to Parabacteroides, while vc39, also unclassified, had hits to undefined Lachnospiraceae. CIO, a crAss-like phage, did not feature any CRISPR protospacer alignments.
[00217] The remaining 7 key VCs (vc17, vc13, vc5, vc15, vc9, vc22 and vc1Ol) were all significantly correlated to the PC-axes and were at significantly increased abundance in UC and/or CD compared to healthy controls, with the exception of vc1Ol which was increased in control and UC versus CD. vc13, vc15, vc17, all classified as Siphoviridae, had CRISPR protospacer hits to a number of genus of the Firmicutes, including Blautia, Coprobacillus, Pentoiphilus, Ruminococcus, Enterococcus, Lactobacillus, Streptococcus and Clostridium
(Figure 14). vc5, vc9, vc22, classified as Myoviridae, contained CRISPR protospacers to Firmicutes genera Clostridium, Coprobacillus, Enterococcus, Lactobacillus, Johnsonella, Roseburia, Ruminococcus, Veillonella and Flavonifractor along with the Proteobacteria Parasutterella (Figure 14). Finally, vc1Ol, a Microviridae, did not have any CRISPR protospacer alignments.
[00218] The key VCs were shown to be effective as marker clusters for classifying individual subject GI microbiota datasets within the larger dataset as either diseased or healthy.
[00219] Below are the methods used in the Examples described above.
STAR Methods / Data Download
Original Norman et al.. cohort
[00220] Raw sequencing reads (virome and 16S) for the Norman et al., 2015 cohort were downloaded using a link in the original publication (Norman et al., 2015. Cell, 160, 447-60).
[00221] Simponi cohort
To build upon these findings, the Simponi cohort, consisting of longitudinal samples from 40 subjects with UC, including 82 samples from periods of flareand 31 samples from periods of remission, was processed and analyzed. The processing included extraction of fecal VLP DNA, library preparation and sequencing. Processing also included extraction of fecal DNA, library preparation and 16S sequencing. Q33 spiking was also performed.
Bioinformatic viral processing
[00222] The samples described in Norman et al. were used. Raw sequence (2,199,754 ± 983,529) quality was assessed using FASTQC and filtered using Trimmomatic using the following parameters: SLIDINGWINDOW: 4:20, MINLEN: 60 HEADCROP 15; CROP 225. Human reads were removed using Kraken (v.0.10.5) (Wood and Salzberg, 2014. Genome Biol, 15, R46) and version 38 of the human genome, which resulted in a mean of 1,130,518 ± 436,424 sequences per sample. SPAdes meta (Nurk et al., 2017. Genome Res, 27, 824-834) was utilized to assemble the reads into contigs per sample accurately (Sutton et al, 2019. Microbiome, 7, 12) which were subsequently pooled and retained if longer than lkb. Redundancy was removed with 90% identity over 90% of the length (of the shorter) retaining the longest contig in each case. Bacterial contamination was removed by using an extensive set of inclusion criteria to select viral sequences only. Briefly, contigs were required to be: 1) VirSorter (Roux et al, 2015a. VirSorter: mining viral signal from microbial genomic data. PeerJ, 3, e985) positive, 2) circular, 3) a minimum of 2 pVogs with at least 3 per lkb (Grazziotin et al, 2017. Nucleic Acids Res, 45, D491-D498), 4) alignment to an in-house crAssphage database (threshold: le 10) (Guerin et al, 2018. Cell Host Microbe, 24, 653-664 e6), 5) greater than 3kb with no hits to the nt database (vx.y) (threshold: le 10), 6) hits to viral RefSeq database (threshold: le 10) (v.89), and less than 3 ribosomal proteins as predicted using the COG database (Tatusov et al., 2000. Nucleic Acids Res, 28, 33-6).
[00223] Quality reads were subsequently aligned to the reference set of viral sequences (n = 7,605) using bowtie2. Using SAMTools, a count table was generated and finally a 75% breadth of coverage filter was employed to predict spurious bowtie2 alignments. Any viral sequences which did not feature a recruited read coverage of at least 1 over 75% of the total sequence length were set to 0. The final set of viral contigs was 7,582.
Clustering and Taxonomy
[00224] Protein sequences were predicted using Prodigal (Hyatt et al., 2010. BMC
Bioinformatics, 11, 119) (n=121,021) and subsequently clustered using vContact2 (Bolduc et al, 2017. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ, 5, e3243) using a pc-inflation and vc-inflation of 1.5 with all other parameters set to default. This resulted in 472 viral clusters of >2 members and 2,382 singletons, hereby referred to as a viral cluster (VC) with one member. A cluster count table was generated by summing all the counts from the previous table in each cluster. Taxonomic classification was assigned to a cluster using vContact2 and a custom database of viral genomes formed from the concatenation of the taxonomically classified portion of the NCBI's Viral RefSeq (v.89) and the JGI's IMG-VR (downloaded 9 January 2019). The resulting clusters were classified to family level based on the presence of reference genomes within. Clusters containing genomes from multiple families, were termed "heterogeneous", and may arise from disagreement between protein based phylogeny and current taxonomic classification discussed further by Bolduc et al. CRISPR protospacers were predicted from the human microbiome project bacterial reference genomes (VERSION/REF) using PILRCR (Edgar, 2007). These were aligned to VLS using blastn (-task“blastn-short”) and formatted with blastn formatter. (The top alignments with an evalue score <le-5 to each VLS was retained in each case). A VC was deemed lysogenic if it contained VLS with alignments to PVOGs featuring annotated integrase genes or site specific recombinase genes.
Simponi
[00225] The same processing as described above was performed for the Simponi cohort where 2,523,262 ± 1,289,619 raw reads were quality filtered (Trimmomatic: SLIDINGWINDOW:
4:20, MINLEN: 60 HEADCROP 15; CROP 135 (fwd), 120 (rev)) and assembled yielding 8,089 viral contigs in the final count table which lead to 484 clusters of > 2 members and 4,521 of one member.
Bioinformatic 16S processing
[00226] The samples described in Norman et al. were used. Read quality was assessed on the raw reads (68,146 ± 32,196) using FastQC before and after quality filtering using Trimmomatic; HEADCROP: 15 CROP:235 SLIDINGWINDOW: 4: 20 MINLEN:30. The trimmed reads of the Norman et al. 16S dataset were then processed using DADA2 (Callahan et al., 2016. Nat Methods, 13, 581-3) (vl.10.1). To do this, reads were quality filtered further (truncLen=230, maxEE=1.4, truncQ=l 1), before dereplication and de novo chimera removal (method =
"consensus"). 16S reads published in this study were processed using the same method
(truncLen=c(180,100), maxEE=1.4, truncQ=2) and the resulting sequence tables of both datasets merged in DADA2. Chimeras were removed de novo from the combined datasets
(method- 1 consensus"), followed by a round of reference based chimera removal using UCHIME (Edgar et al., 2011. Bioinformatics, 27, 2194-200) (v4.2) against the ChimeraSlayer Gold database. Resulting non-chimeric RSVs were sorted by length, with all RSVs having a minimum length of 200 bp and a maximum of 260bp retained. The final count table resulted in a mean of 41,060 ± 17,131 counts per sample. Classification of retained RSVs was achieved using mothur (Schloss et al, 2009. Appl Environ Microbiol, 75, 7537-41) (vl .38.0, bootstrap >=80), while SPINGO (Allard et al., 2015. BMC Bioinformatics, 16, 324) (vl.3, bootstrap >= 0.8, similarity >=0.5) was used for species level classification. The RDP vl 1.4 database was used in both instances. Simponi
[00227] The same methods as above were employed to process the 16S raw data from the Simponi cohort. There were 382,602 ± 181,911 raw reads. The following Trimmomatic parameters were applied: HEADCROP:20 SLIDINGWINDOW:4:20 CROP:210 MINLEN:50, resulting in a mean of 76,619 ± 40,278 counts in the final count table per sample after being subjected to the bioinformatics pipeline.
Data Analysis and Statistics
[00228] All statistics and figure generation were performed in R (v.3.5.1). Alpha and beta diversity was calculated using phyloseq (v.1.26) while differential abundance was with DeSeq2 (v.1.22.1). Correlations were using the spearman method, an Adonis from the vegan library (v.2.5-3 was utilized to investigate for significance in the beta diversity while Procrustes coordinates and significance was generating using procuste and procuste.randtest from the vegan library. Significance was defined as less than 0.05 and all adjustments (where required) was using the Benjamini-hochberg method. For all statistical tests, one sample was chosen at random per subject. All figures were generated using ggplot2 (v.3.1.0). Machine learning was carried out in R using the XgBoost package (0.71.2). In each case, the model was trained on 70% of the data and results refer to the remaining 30% of the data which tested the model. Parameters were optimized for each model. ROC curves and accuracy were performed using the R library RORC (v.1.0-7). Contig figures were generated using GView (vx.y).
Data Availability
[00229] Accession numbers for downloading the original raw sequences used in Norman et al. can be found in Norman et al, 2015.
[00230] Below are the tables referenced in the Examples described above.
[00231] The abbreviations in the table are described as follows:
[00232] “DeSeq2” refers to software for estimating variance-mean dependence in count data from high-throughput sequencing assays, and for testing differential expression based on a model using the negative binomial distribution. See, Love MI, Huber W, Anders S (2014).“Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome Biology, 15, 550. [00233] “lfcSE” refers to the logfoldchangeStandard Error calculation performed by DeSeq2.
[00234] “stat” refers to the Wald statistic calculation performed by DeSeq2.
[00235] The“p-value” ranges from zero to one and indicates the probability of finding such values from a given null (HO) hypothesis.
[00236] The“padj” value is the p-value adjusted for multiple testing using the Benjamini- Hochberg method.
[00237] The remaining values, e.g., for“RSV”,“Control Mean”,“Control Median”,“Control Present”,“CD Mean”,“CD Median”,“CD Present”,“UC Mean”,“UC Median”,“UC Present”, “logfdr”,“Domain”, and“Classification”,“pel” and“pc2”, refer to the control, Crohn’s disease, and ulcerative colitis samples or are understood by those of ordinary skill who work with DeSeq2.
[00238] The present invention is not to be limited in scope by the specific embodiments described herein. Indeed, various modifications of the invention in addition to those described herein will become apparent to those skilled in the art from the foregoing description and the accompanying figures. Such modifications are intended to fall within the scope of the appended claims. It is further to be understood that all values are approximate and are provided for description.
[00239] Patents, patent applications, publications, product descriptions, and protocols are cited throughout this application, the disclosures of which are incorporated herein by reference in their entireties for all purposes.
Table 1. DeSeq2 results of VC counts comparing Healthy controls to CD
Figure imgf000084_0001
Figure imgf000085_0001
Figure imgf000086_0001
Figure imgf000087_0001
Figure imgf000088_0001
Figure imgf000089_0001
Figure imgf000090_0001
Figure imgf000091_0001
Figure imgf000092_0001
Figure imgf000093_0001
Figure imgf000094_0001
Figure imgf000095_0001
Figure imgf000096_0001
Figure imgf000097_0001
Figure imgf000098_0001
Figure imgf000099_0001
Figure imgf000100_0001
Figure imgf000101_0001
Table 2. DeSeq2 results of VC counts comparing Healthy controls to UC
Figure imgf000101_0002
Figure imgf000102_0001
Figure imgf000103_0001
Figure imgf000104_0001
Figure imgf000105_0001
Figure imgf000106_0001
Figure imgf000107_0001
Figure imgf000108_0001
Figure imgf000109_0001
Figure imgf000110_0001
Figure imgf000111_0001
Figure imgf000112_0001
Figure imgf000113_0001
Table 3. DeSeq2 results of 16S RVS counts comparing Healthy controls to CD
Figure imgf000113_0002
Figure imgf000114_0001
Figure imgf000115_0001
Figure imgf000116_0001
Figure imgf000117_0001
Figure imgf000118_0001
Figure imgf000119_0001
Figure imgf000120_0001
Figure imgf000121_0001
Figure imgf000122_0001
Figure imgf000123_0001
Figure imgf000124_0001
Figure imgf000125_0001
Figure imgf000126_0001
Figure imgf000127_0001
Figure imgf000128_0001
Figure imgf000129_0001
Figure imgf000130_0001
Figure imgf000131_0001
Figure imgf000132_0001
Figure imgf000133_0001
Figure imgf000134_0001
Figure imgf000135_0001
Figure imgf000136_0001
Figure imgf000137_0001
Figure imgf000138_0001
Figure imgf000139_0001
Figure imgf000140_0001
Figure imgf000141_0001
Figure imgf000142_0001
Figure imgf000143_0001
Figure imgf000144_0001
Figure imgf000145_0001
Figure imgf000146_0001
Figure imgf000147_0001
Figure imgf000148_0001
Figure imgf000149_0001
Figure imgf000150_0001
Figure imgf000151_0001
Figure imgf000152_0001
Figure imgf000153_0001
Figure imgf000154_0001
Figure imgf000155_0001
Figure imgf000156_0001
Figure imgf000157_0001
Figure imgf000158_0001
Figure imgf000159_0001
Figure imgf000160_0001
Figure imgf000161_0001
Figure imgf000162_0001
Figure imgf000163_0001
Figure imgf000164_0001
Figure imgf000165_0001
Figure imgf000166_0001
Figure imgf000167_0001
Figure imgf000168_0001
Figure imgf000169_0001
Figure imgf000170_0001
Figure imgf000171_0001
Figure imgf000172_0001
Figure imgf000173_0001
Figure imgf000174_0001
Figure imgf000175_0001
Figure imgf000176_0001
Figure imgf000177_0001
Figure imgf000178_0001
Table 4. DeSeq2 results of 16S RVS counts comparing Healthy controls to UC
Figure imgf000178_0002
Figure imgf000179_0001
Figure imgf000180_0001
Figure imgf000181_0001
Figure imgf000182_0001
Figure imgf000183_0001
Figure imgf000184_0001
Figure imgf000185_0001
Figure imgf000186_0001
Figure imgf000187_0001
Figure imgf000188_0001
Figure imgf000189_0001
Figure imgf000190_0001
Figure imgf000191_0001
Figure imgf000192_0001
Figure imgf000193_0001
Figure imgf000194_0001
Figure imgf000195_0001
Figure imgf000196_0001
Figure imgf000197_0001
Figure imgf000198_0001
Figure imgf000199_0001
Figure imgf000200_0001
Figure imgf000201_0001
Figure imgf000202_0001
Figure imgf000203_0001
Figure imgf000204_0001
Figure imgf000205_0001
Figure imgf000206_0001
Figure imgf000207_0001
Figure imgf000208_0001
Figure imgf000209_0001
Figure imgf000210_0001
Figure imgf000211_0001
Figure imgf000212_0001
Figure imgf000213_0001
Figure imgf000214_0001
Figure imgf000215_0001
Figure imgf000216_0001
Figure imgf000217_0001
Figure imgf000218_0001
Figure imgf000219_0001
Figure imgf000220_0001
Figure imgf000221_0001
Figure imgf000222_0001
Figure imgf000223_0001
Figure imgf000224_0001
Figure imgf000225_0001
Figure imgf000226_0001
Figure imgf000227_0001
Figure imgf000228_0001
Figure imgf000229_0001
Figure imgf000230_0001
Figure imgf000231_0001
Figure imgf000232_0001
Figure imgf000233_0001
Figure imgf000234_0001
Figure imgf000235_0001
Figure imgf000236_0001
Figure imgf000237_0001
Figure imgf000238_0001
Table 5. Spearman correlations between VC counts and PC-axes 1 and 2 from Spearman distances
Figure imgf000238_0002
Figure imgf000239_0001
Table 6. Spearman correlations between RSV counts and PC-axes 1 and 2 from unweighted
UniFrac distances
Figure imgf000239_0002
Figure imgf000240_0001
Figure imgf000241_0001
Figure imgf000242_0001
Figure imgf000243_0001
Figure imgf000244_0001
Figure imgf000245_0001
Figure imgf000246_0001
Figure imgf000247_0001
Figure imgf000248_0001
Table 7. Spearman correlations between VC counts and PC-axes 1 and 2 from Spearman distances
Figure imgf000248_0002
Figure imgf000249_0001
Figure imgf000250_0001
Figure imgf000251_0001
Figure imgf000252_0001
Figure imgf000253_0001
Figure imgf000254_0001
Figure imgf000255_0001
Figure imgf000256_0001
Figure imgf000257_0001
Figure imgf000258_0001
Figure imgf000259_0001
Figure imgf000260_0001
Figure imgf000261_0001
Figure imgf000262_0001
Figure imgf000263_0001
Figure imgf000264_0001
Figure imgf000265_0001
Figure imgf000266_0001
Figure imgf000267_0001
Figure imgf000268_0001
Figure imgf000269_0001
Figure imgf000270_0001
Figure imgf000271_0001
Figure imgf000272_0001
Figure imgf000273_0001
Figure imgf000274_0001
Figure imgf000275_0001
Figure imgf000276_0001
Figure imgf000277_0001
Figure imgf000278_0001
Figure imgf000279_0001
Figure imgf000280_0001
Figure imgf000281_0001
Figure imgf000282_0001
Figure imgf000283_0001
Figure imgf000284_0001
Figure imgf000285_0001
Figure imgf000286_0001
Figure imgf000287_0001
Figure imgf000288_0001
Figure imgf000289_0001
Figure imgf000290_0001
Figure imgf000291_0001
Figure imgf000292_0001
Figure imgf000293_0001
Figure imgf000294_0001
Figure imgf000295_0001
Figure imgf000296_0001
Figure imgf000297_0001
Figure imgf000298_0001
Figure imgf000299_0001
Figure imgf000300_0001
Figure imgf000301_0001
Figure imgf000302_0001
Figure imgf000303_0001
Figure imgf000304_0001
Figure imgf000305_0001
Figure imgf000306_0001
Figure imgf000307_0001
Figure imgf000308_0001
Figure imgf000309_0001
Figure imgf000310_0001
Figure imgf000311_0001
Figure imgf000312_0001
Figure imgf000313_0001
Figure imgf000314_0001
Figure imgf000315_0001
Figure imgf000316_0001
Figure imgf000317_0001
Figure imgf000318_0001
Figure imgf000319_0001
Figure imgf000320_0001
Figure imgf000321_0001
Figure imgf000322_0001
Figure imgf000323_0001
Figure imgf000324_0001
Figure imgf000325_0001
Figure imgf000326_0001
Figure imgf000327_0001
Figure imgf000328_0001
Figure imgf000329_0001
Figure imgf000330_0001
Figure imgf000331_0001
Figure imgf000332_0001
Figure imgf000333_0001
Figure imgf000334_0001
Figure imgf000335_0001
Figure imgf000336_0001
Figure imgf000337_0001
Figure imgf000338_0001
Figure imgf000339_0001
Figure imgf000340_0001
Figure imgf000341_0001
Figure imgf000342_0001
Figure imgf000343_0001
Figure imgf000344_0001
Figure imgf000345_0001
Figure imgf000346_0001
Figure imgf000347_0001
Figure imgf000348_0001
Figure imgf000349_0001
Figure imgf000350_0001
Figure imgf000351_0001
Figure imgf000352_0001
Figure imgf000353_0001
Figure imgf000354_0001
Figure imgf000355_0001
Figure imgf000356_0001
Figure imgf000357_0001
Figure imgf000358_0001
Figure imgf000359_0001
Figure imgf000360_0001
Figure imgf000361_0001
Figure imgf000362_0001
Figure imgf000363_0001
Figure imgf000364_0001
Figure imgf000365_0001
Figure imgf000366_0001
Figure imgf000367_0001
Figure imgf000368_0001
Figure imgf000369_0001
Figure imgf000370_0001
Figure imgf000371_0001
Figure imgf000372_0001
Figure imgf000373_0001
Figure imgf000374_0001
Figure imgf000375_0001
Figure imgf000376_0001
Figure imgf000377_0001
Figure imgf000378_0001
Figure imgf000379_0001
Figure imgf000380_0001
Figure imgf000381_0001
Figure imgf000382_0001
Figure imgf000383_0001
Figure imgf000384_0001
Figure imgf000385_0001
Figure imgf000386_0001
Figure imgf000387_0001
Figure imgf000388_0001
Figure imgf000389_0001
Figure imgf000390_0001
Figure imgf000391_0001
Figure imgf000392_0001
Figure imgf000393_0001
Figure imgf000394_0001
Figure imgf000395_0001
Figure imgf000396_0001
Figure imgf000397_0001
Figure imgf000398_0001
Figure imgf000399_0001
Figure imgf000400_0001
Figure imgf000401_0001
Figure imgf000402_0001
Figure imgf000403_0001
Figure imgf000404_0001
Figure imgf000405_0001
Figure imgf000406_0001
Figure imgf000407_0001
Figure imgf000408_0001
Figure imgf000409_0001
Figure imgf000410_0001
Figure imgf000411_0001
Figure imgf000412_0001
Figure imgf000413_0001
Figure imgf000414_0001
Figure imgf000415_0001
Figure imgf000416_0001
Figure imgf000417_0001
Figure imgf000418_0001
Figure imgf000419_0001
Figure imgf000420_0001
Figure imgf000421_0001
Figure imgf000422_0001
Figure imgf000423_0001
Figure imgf000424_0001
Figure imgf000425_0001
Figure imgf000426_0001
Figure imgf000427_0001
Figure imgf000428_0001
Figure imgf000429_0001
Figure imgf000430_0001
Figure imgf000431_0001
Figure imgf000432_0001
Figure imgf000433_0001
Figure imgf000434_0001
Figure imgf000435_0001
Figure imgf000436_0001
Figure imgf000437_0001
Figure imgf000438_0001
Figure imgf000439_0001
Figure imgf000440_0001
Figure imgf000441_0001
Figure imgf000442_0001
Figure imgf000443_0001
Figure imgf000444_0001
Figure imgf000445_0001
Figure imgf000446_0001
Figure imgf000447_0001
Figure imgf000448_0001
Figure imgf000449_0001
Figure imgf000450_0001
Figure imgf000451_0001
Figure imgf000452_0001
Figure imgf000453_0001
Figure imgf000454_0001
Figure imgf000455_0001
Figure imgf000456_0001
Figure imgf000457_0001
Figure imgf000458_0001
Figure imgf000459_0001
Figure imgf000460_0001
Figure imgf000461_0001
Figure imgf000462_0001
Figure imgf000463_0001
Figure imgf000464_0001
Figure imgf000465_0001
Figure imgf000466_0001
Figure imgf000467_0001
Figure imgf000468_0001
Figure imgf000469_0001
Figure imgf000470_0001
Figure imgf000471_0001
Figure imgf000472_0001
Figure imgf000473_0001
Figure imgf000474_0001
Figure imgf000475_0001
Figure imgf000476_0001
Figure imgf000477_0001
Figure imgf000478_0001
Figure imgf000479_0001
Figure imgf000480_0001
Figure imgf000481_0001
Figure imgf000482_0001
Figure imgf000483_0001
Figure imgf000484_0001
Figure imgf000485_0001
Figure imgf000486_0001
Figure imgf000487_0001
Figure imgf000488_0001
Figure imgf000489_0001
Figure imgf000490_0001
Figure imgf000491_0001
Figure imgf000492_0001
Figure imgf000493_0001
Figure imgf000494_0001
Figure imgf000495_0001
Figure imgf000496_0001
Figure imgf000497_0001
Figure imgf000498_0001
Figure imgf000499_0001
Figure imgf000500_0001
Figure imgf000501_0001
Figure imgf000502_0001
Figure imgf000503_0001
Figure imgf000504_0001
Figure imgf000505_0001
Figure imgf000506_0001
Figure imgf000507_0001
Figure imgf000508_0001
Figure imgf000509_0001
Figure imgf000510_0001
Figure imgf000511_0001
Figure imgf000512_0001
Figure imgf000513_0001
Figure imgf000514_0001
Figure imgf000515_0001
Figure imgf000516_0001
Figure imgf000517_0001
Figure imgf000518_0001
Figure imgf000519_0001
Figure imgf000520_0001
Figure imgf000521_0001
Figure imgf000522_0001
Figure imgf000523_0001
Figure imgf000524_0001
Figure imgf000525_0001
Figure imgf000526_0001
Figure imgf000527_0001
Figure imgf000528_0001
Figure imgf000529_0001
Figure imgf000530_0001
Figure imgf000531_0001
Figure imgf000532_0001
Figure imgf000533_0001
Figure imgf000534_0001
Figure imgf000535_0001
Figure imgf000536_0001
Figure imgf000537_0001
Figure imgf000538_0001
Figure imgf000539_0001
Figure imgf000540_0001
Figure imgf000541_0001
Figure imgf000542_0001
Figure imgf000543_0001
Figure imgf000544_0001
Figure imgf000545_0001
Figure imgf000546_0001
Figure imgf000547_0001
Figure imgf000548_0001
Figure imgf000549_0001
Figure imgf000550_0001
Figure imgf000551_0001
Figure imgf000552_0001
Figure imgf000553_0001
Figure imgf000554_0001
Figure imgf000555_0001
Figure imgf000556_0001
Figure imgf000557_0001
Figure imgf000558_0001
Figure imgf000559_0001
Figure imgf000560_0001
Figure imgf000561_0001
Figure imgf000562_0001
Table 8. Spearman correlations between RSV counts and PC-axes 1 and 2 from unweighted
UniFrac distances
Figure imgf000562_0002
Figure imgf000563_0001
Figure imgf000564_0001
Figure imgf000565_0001
Figure imgf000566_0001
Figure imgf000567_0001
Figure imgf000568_0001
Figure imgf000569_0001
Figure imgf000570_0001
Figure imgf000571_0001
Figure imgf000572_0001
Figure imgf000573_0001
Figure imgf000574_0001
Figure imgf000575_0001
Figure imgf000576_0001
Figure imgf000577_0001
Figure imgf000578_0001
Figure imgf000579_0001
Figure imgf000580_0001
Figure imgf000581_0001
Figure imgf000582_0001
Figure imgf000583_0001
Figure imgf000584_0001
Figure imgf000585_0001
Figure imgf000586_0001
Figure imgf000587_0001
Figure imgf000588_0001
Figure imgf000589_0001
Figure imgf000590_0001
Figure imgf000591_0001
Figure imgf000592_0001
Figure imgf000593_0001
Figure imgf000594_0001
Figure imgf000595_0001
Figure imgf000596_0001
Figure imgf000597_0001
Figure imgf000598_0001
Figure imgf000599_0001
Figure imgf000600_0001
Figure imgf000601_0001
Figure imgf000602_0001
Figure imgf000603_0001
Figure imgf000604_0001
Figure imgf000605_0001
Figure imgf000606_0001
Figure imgf000607_0001
Figure imgf000608_0001
Figure imgf000609_0001
Figure imgf000610_0001
Figure imgf000611_0001
Figure imgf000612_0001
Figure imgf000613_0001
Figure imgf000614_0001
Figure imgf000615_0001
Figure imgf000616_0001
Figure imgf000617_0001
Figure imgf000618_0001
Figure imgf000619_0001
Figure imgf000620_0001
Figure imgf000621_0001
Figure imgf000622_0001
Figure imgf000623_0001
Figure imgf000624_0001
Figure imgf000625_0001
Figure imgf000626_0001
Figure imgf000627_0001
Figure imgf000628_0001
Figure imgf000629_0001
Figure imgf000630_0001
Figure imgf000631_0001
Figure imgf000632_0001
Figure imgf000633_0001
Figure imgf000634_0001
Figure imgf000635_0001
Figure imgf000636_0001
Figure imgf000637_0001
Figure imgf000638_0001
Figure imgf000639_0001
Figure imgf000640_0001
Figure imgf000641_0001
Figure imgf000642_0001
Figure imgf000643_0001
Figure imgf000644_0001
Figure imgf000645_0001
Figure imgf000646_0001
Figure imgf000647_0001
Figure imgf000648_0001
Figure imgf000649_0001
Figure imgf000650_0001
Figure imgf000651_0001
Figure imgf000652_0001
Figure imgf000653_0001
Figure imgf000654_0001
Figure imgf000655_0001
Figure imgf000656_0001
Figure imgf000657_0001
Figure imgf000658_0001
Figure imgf000659_0001
Figure imgf000660_0001
Figure imgf000661_0001
Figure imgf000662_0001
Figure imgf000663_0001
Figure imgf000664_0001
Figure imgf000665_0001
Figure imgf000666_0001
Figure imgf000667_0001
Figure imgf000668_0001
Figure imgf000669_0001
Figure imgf000670_0001
Figure imgf000671_0001
Figure imgf000672_0001
Figure imgf000673_0001
Figure imgf000674_0001
Figure imgf000675_0001
Figure imgf000676_0001
Figure imgf000677_0001
Figure imgf000678_0001
Figure imgf000679_0001
Figure imgf000680_0001
Figure imgf000681_0001
Figure imgf000682_0001
Figure imgf000683_0001
Figure imgf000684_0001
Figure imgf000685_0001
Figure imgf000686_0001
Figure imgf000687_0001
Figure imgf000688_0001
Figure imgf000689_0001
Figure imgf000690_0001
Figure imgf000691_0001
Figure imgf000692_0001
Figure imgf000693_0001
Figure imgf000694_0001
Figure imgf000695_0001
Figure imgf000696_0001
Figure imgf000697_0001
Figure imgf000698_0001
Figure imgf000699_0001
Figure imgf000700_0001
Figure imgf000701_0001
Figure imgf000702_0001
Figure imgf000703_0001
Figure imgf000704_0001
Figure imgf000705_0001
Figure imgf000706_0001
Figure imgf000707_0001
Figure imgf000708_0001
Figure imgf000709_0001
Table 9. DeSeq2 results of VC counts comparing UC Flare to UC Remission
Figure imgf000709_0002
Figure imgf000710_0001
Figure imgf000711_0001
Figure imgf000712_0001
Figure imgf000713_0001
Figure imgf000714_0001
Figure imgf000715_0001
Figure imgf000716_0001
Figure imgf000717_0001
Figure imgf000718_0001
Figure imgf000719_0001
Figure imgf000720_0001
Table 10. DeSeq2 results of RSV counts comparing UC Flare to UC Remission
Figure imgf000720_0002
Figure imgf000721_0001
Figure imgf000722_0001
Figure imgf000723_0001
Figure imgf000724_0001
Figure imgf000725_0001
Figure imgf000726_0001
Figure imgf000727_0001
Figure imgf000728_0001
Figure imgf000729_0001
Figure imgf000730_0001
Figure imgf000731_0001
Figure imgf000732_0001
Figure imgf000733_0001
Figure imgf000734_0001
Figure imgf000735_0001
Figure imgf000736_0001
Figure imgf000737_0001
Figure imgf000738_0001
Figure imgf000739_0001
Figure imgf000740_0001
Figure imgf000741_0001
Figure imgf000742_0001
Figure imgf000743_0001
Figure imgf000744_0001
Figure imgf000745_0001
Figure imgf000746_0001
Figure imgf000747_0001
Figure imgf000748_0001
Figure imgf000749_0001
Figure imgf000750_0001
Figure imgf000751_0001
Figure imgf000752_0001
Figure imgf000753_0001
Figure imgf000754_0001
Figure imgf000755_0001
Figure imgf000756_0001
Figure imgf000757_0001
Figure imgf000758_0001
Figure imgf000759_0001
Figure imgf000760_0001
Figure imgf000761_0001
Figure imgf000762_0001
Figure imgf000763_0001
Figure imgf000764_0001
Figure imgf000765_0001
Figure imgf000766_0001
Figure imgf000767_0001
Figure imgf000768_0001
Figure imgf000769_0001
Figure imgf000770_0001
Figure imgf000771_0001
Figure imgf000772_0001
Figure imgf000773_0001
Figure imgf000774_0001
Figure imgf000775_0001
Figure imgf000776_0001
Figure imgf000777_0001
Figure imgf000778_0001
Figure imgf000779_0001
Figure imgf000780_0001
Figure imgf000781_0001
Figure imgf000782_0001
Figure imgf000783_0001
Figure imgf000784_0001
Figure imgf000785_0001
Figure imgf000786_0001
Figure imgf000787_0001
Figure imgf000788_0001
Figure imgf000789_0001
Figure imgf000790_0001
Figure imgf000791_0001
Figure imgf000792_0001
Figure imgf000793_0001
Figure imgf000794_0001
Figure imgf000795_0001
Figure imgf000796_0001
Figure imgf000797_0001
Figure imgf000798_0001
Figure imgf000799_0001
Figure imgf000800_0001
Figure imgf000801_0001
Figure imgf000802_0001
Figure imgf000803_0001
Figure imgf000804_0001
Figure imgf000805_0001
Figure imgf000806_0001
Figure imgf000807_0001
Figure imgf000808_0001
Figure imgf000809_0001
Figure imgf000810_0001
Figure imgf000811_0001
Figure imgf000812_0001
Figure imgf000813_0001
Figure imgf000814_0001
Figure imgf000815_0001
Figure imgf000816_0001
Figure imgf000817_0001
Figure imgf000818_0001
Figure imgf000819_0001
Figure imgf000820_0001
Figure imgf000821_0001
Figure imgf000822_0001
Figure imgf000823_0001
Figure imgf000824_0001
Figure imgf000825_0001
Figure imgf000826_0001
Figure imgf000827_0001
Figure imgf000828_0001
Figure imgf000829_0001
Figure imgf000830_0001
Figure imgf000831_0001
Figure imgf000832_0001
Figure imgf000833_0001
Figure imgf000834_0001
Figure imgf000835_0001
Figure imgf000836_0001
Figure imgf000837_0001
Figure imgf000838_0001
Figure imgf000839_0001
Figure imgf000840_0001
Figure imgf000841_0001
Figure imgf000842_0001
Figure imgf000843_0001
Figure imgf000844_0001
Figure imgf000845_0001
Figure imgf000846_0001
Figure imgf000847_0001
Figure imgf000848_0001
Figure imgf000849_0001
Figure imgf000850_0001
Figure imgf000851_0001
Figure imgf000852_0001
Figure imgf000853_0001
Figure imgf000854_0001
Figure imgf000855_0001
Figure imgf000856_0001
Figure imgf000857_0001
Figure imgf000858_0001
Figure imgf000859_0001
Figure imgf000860_0001
Figure imgf000861_0001
Figure imgf000862_0001
Figure imgf000863_0001
Figure imgf000864_0001
Figure imgf000865_0001
Figure imgf000866_0001
Table 11. Viral Clusters for which an Increase in Abundance is Associated with Crohn’s Disease
Figure imgf000866_0002
Figure imgf000867_0001
Figure imgf000868_0001
Figure imgf000869_0001
Table 12. Viral Clusters for which an Increase in Abundance is Associated with Ulcerative Colitis
Figure imgf000869_0002
Figure imgf000870_0001
Figure imgf000871_0001
Figure imgf000872_0001
Figure imgf000873_0001
Figure imgf000874_0001
Table 13. Viral Clusters for which a Decrease in Abundance is Associated with Crohn’s Disease
Figure imgf000874_0002
Figure imgf000875_0001
Figure imgf000876_0001
Figure imgf000877_0001
Figure imgf000878_0001
Figure imgf000879_0001
Figure imgf000880_0002
Table 14. Viral Clusters for which a Decrease in Abundance is Associated with Ulcerative Colitis
Figure imgf000880_0001
Figure imgf000881_0001
Figure imgf000882_0001
Figure imgf000883_0001
Table 15. Bacterial taxa for which an Increase in Abundance is Associated with Crohn’s Disease
Figure imgf000883_0002
Figure imgf000884_0001
Figure imgf000885_0001
Figure imgf000886_0001
Table 16. Bacterial taxa for which an Increase in Abundance is Associated with Ulcerative Colitis
Figure imgf000886_0002
Figure imgf000887_0001
Table 17. Bacterial taxa for which a Decrease in Abundance is Associated with Crohn’s Disease
Figure imgf000887_0002
Figure imgf000888_0001
Figure imgf000889_0001
Figure imgf000890_0001
Figure imgf000891_0001
Figure imgf000892_0001
Figure imgf000893_0001
Figure imgf000894_0001
Figure imgf000895_0001
Figure imgf000896_0001
Figure imgf000897_0001
Figure imgf000898_0001
Figure imgf000899_0001
Figure imgf000900_0001
Figure imgf000901_0001
Figure imgf000902_0001
Figure imgf000903_0001
Figure imgf000904_0001
Figure imgf000905_0001
Figure imgf000906_0001
Figure imgf000907_0001
Figure imgf000908_0001
Figure imgf000909_0001
Figure imgf000910_0001
Table 18. Bacterial taxa for which a Decrease in Abundance is Associated with Ulcerative Colitis
Figure imgf000910_0002
Figure imgf000911_0001
Figure imgf000912_0001
Figure imgf000913_0001
Figure imgf000914_0001
Figure imgf000915_0001

Claims

Claims
1. A method for identifying a plurality of viral marker clusters for determining the presence of inflammatory bowel disease (IBD) using viral genome sequences, the method comprising: obtaining a first dataset representing a first plurality of viral genome sequences derived from gastrointestinal (GI) microbiota samples of a healthy cohort;
obtaining a second dataset representing a second plurality of viral genome sequences derived from GI microbiota samples of a cohort diagnosed with IBD;
creating a first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort;
creating a second plurality of viral clusters by using protein clustering to group like proteins derived from the second dataset and by using protein homology to group viral genome sequences of the second dataset, each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD; and
identifying a plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters.
2. The method of claim 1,
wherein at least a portion of the first plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database, and
wherein at least a portion of the second plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database.
3. The method of claims 1 or 2, wherein a totality of the first plurality and second plurality of viral genome sequences are each unassociated with a viral taxonomic category derived from a viral genome database.
4. The method of any one of claims 1 -3, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises using machine learning to identify the plurality of marker clusters.
5. The method of any one of claims 1 -4, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises identifying the plurality of marker clusters unassociated with a known taxon.
6. The method of any one of claims 1-5, wherein each of the viral clusters in the plurality of marker clusters respectively represent an unidentified taxon of higher rank than a strain and of lower rank than a family.
7. The method of any one of claims 1 -6, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises performing beta diversity analysis on the first plurality of viral clusters and the second plurality of viral clusters.
8. The method of claim 7, wherein performing the beta diversity analysis comprises performing a scaling and ordination technique selected from a group consisting of principal coordinates analysis (PCoA), principal components analysis (PCA), non-metric multidimensional scaling (NMDS), canonical correspondence analysis (CCA), and redundancy analysis (RDA).
9. The method of any one of claims 1 -8, wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters comprises calculating differential abundance of viral clusters in the first plurality of viral clusters and the second plurality of viral clusters.
10. The method of any one of claims 1-9, wherein the healthy cohort and the cohort diagnosed with IBD are each human cohorts.
11. The method of any one of claims 1-10, further comprising:
associating a first data subset of the second dataset with a first sub-cohort diagnosed with IBD and Crohn's disease (CD); associating a second data subset of the second dataset with a second sub-cohort diagnosed with IBD and ulcerative colitis (UC);
associating a first subset of viral clusters of the second plurality of viral clusters with the first sub-cohort;
associating a second subset of viral clusters of the second plurality of viral clusters with the second sub-cohort; and
identifying a first subset of marker clusters of the plurality of marker clusters and a second subset of marker clusters of the plurality of marker clusters by comparing the first subset of viral clusters to the second subset of viral clusters.
12. The method of any one of claims 1-11, further comprising:
representing the viral genome sequences in the first dataset each respectively as a first viral contig of a protein sequence; and
representing the viral genome sequences in the second dataset each respectively as a second viral contig of a protein sequence.
13. The method of any one of claims 1-12,
wherein the first dataset further represents a first plurality of identified viral genome sequences derived from the healthy cohort,
wherein the second dataset further represents a second plurality of identified viral genome sequences derived from the cohort diagnosed with IBD, and
wherein the method further comprises:
creating a first plurality of reference viral clusters using protein clustering to group like proteins and protein homology to group identified viral genome sequences of the first plurality of identified viral genome sequences;
creating a second plurality of reference viral clusters using protein clustering to group like proteins and protein homology to group identified viral genome sequences of the second plurality of identified viral genome sequences; and
wherein the step of identifying the plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters further comprises identifying the plurality of marker clusters by comparing a combination of the first plurality of viral clusters and the first plurality of reference viral clusters to a combination of the second plurality of viral clusters and the second plurality of reference viral clusters.
14. The method of claim 13,
wherein the first plurality of identified viral genome sequences are associated with a viral taxonomic category present in a viral genome database, and
wherein the second plurality of identified viral genome sequences are associated with a viral taxonomic category present in a viral genome database.
15. A method for determining the presence of inflammatory bowel disease (IBD) in a subject, the method comprising:
obtaining an individual viral dataset representing a plurality of viral genome sequences derived from a GI microbiota sample obtained from the subject;
creating a plurality of subject viral clusters using protein clustering to group like proteins derived from the individual viral dataset and by using protein homology to group unidentified viral genome sequences of the individual viral dataset, each viral cluster in the plurality of subject viral clusters comprising one or more viral genome sequences derived from the subject;
obtaining a plurality of marker clusters indicative of the presence or absence of IBD; and comparing the plurality of subject viral clusters to the plurality of marker clusters.
16. The method of claim 15, wherein at least a portion of the plurality of viral genome sequences are unassociated with a viral taxonomic category derived from a viral genome database.
17. The method of claims 15 or 16, wherein a totality of the plurality of viral genome sequences are each unassociated with a viral taxonomic category derived from a viral genome database.
18. The method of any one of claims 15-17, wherein at least a portion of the plurality of marker clusters are unassociated with a viral taxonomic category derived from a viral genome database.
19. The method of any one of claims 15-19, further comprising determining the presence of IBD in the subject based at least in part on the comparison of the plurality of subject viral clusters to the plurality of marker clusters.
20. The method of any one of claims 15-19, wherein the marker clusters comprise one or more viral clusters from taxa Siphoviridae, Myoviridae, Podoviridae, CrAss-like, or Microviridae.
21. The method of any one of claims 15-20, wherein the plurality of marker clusters comprises one or more viral clusters selected from vc2, vc6, vc7, vc13, vc14, vc15, vc17, vc19, vc21, vc22, vc23, vc24, vc25, vc28, vc29, vc36, vc37, vc38, vc39, vc40, vc42, vc45, vc48, vc53, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc75, vc76, vc77, vc78, vc79, vc80, vc82, vc84, vc85, vc86, vc88, vc89, vc91, vc92, vc94, vc95, vc96, vc97, vc98, vc99, vc1Ol, vc102, vc103, vc104, vc108, vc109, vc1 l2, vc1 l3, vc1 l 5, vc1 l7, vc1 l 8, vc122, vc123, vc124, vc130, vc132, vc136, vc138, vc142, vc143, vc152, vc154, vc155, vc160, vc161, vc175, vc178, vc181, vc190, vc193, vc205, vc209, vc216, vc218, vc225, vc232, vc263, vc264, vc281, vc284, vc298, vc320, vc411, vc413, vc420, vc456, and vc467.
22. The method of any one of claims 15-21, wherein an increased abundance of one or more viral clusters selected from vc2, vc13, vc14, vc15, vc17, vc21, vc22, vc36, vc40, vc48, vc53, vc66, vc68, vc69, vc70, vc71, vc73, vc74, vc77, vc78, vc79, vc80, vc85, vc88, vc89, vc91, vc94, vc95, vc97, vc102, vc108, vc1 l3, vc1 l5, vc1 l7, vc1 l 8, vc122, vc123, vc130, vc132, vc142, vc152, vc155, vc160, vc161, vc175, vc178, vc181, vc205, vc218, vc232, vc263, vc264, vc281, vc298, vc413, and vc420 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of IBD in the subject.
23. The method of claim 22, wherein an increased abundance of one or more viral clusters selected from vc15, vc66, vc71, vc73, vc77, vc78, vc79, vc80, vc91, vc94, vc108, vc1 l3, vc1 l7, vc1 l8, vc132, vc142, vc155, vc160, vc178, vc232, vc264, vc281, vc298, and vc420 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
24. The method of claim 22, wherein an increased abundance of one or more viral clusters selected from vc28 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
25. The method of claim 22, wherein an increased abundance of one or more viral clusters selected from vc2, vc17, vc21, vc22, vc53, vc70, vc74, vc85, vc88, vc89, vc1 l5, vc122, vc123, vc130, vc152, vc161, vc175, vc181, vc205, vc218, vc263, and vc413 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
26. The method of claim 22, wherein an increased abundance of viral cluster vc2 in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
27. The method of any one of claims 15-21 , wherein an increased abundance of one or more viral clusters selected from vc38 vc46, vc48, vc54, vc57, vc62, vc64, vc69, vc71, vc108, vc111, vc1 l4, vc115, vc128, vc159, vc162, vc215, vc220, vc242, vc340, vc374, and vc392 in the subject sample as compared to a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject.
28. The method of any one of claims 15-21 , wherein an increased abundance of one or more viral clusters selected from vc16, vc119, and vc163 in the subject sample as compared to a patient with a flare-up of ulcerative colitis (UC) is indicative of the presence of UC in remission in the subject.
29. The method of any one of claims 15-21, wherein a decreased abundance of one or more viral clusters selected from vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1Ol, vc103, vc104, vc109, vc1 l2, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411, vc456, and vc467 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of IBD in the subject.
30. The method of claim 29, wherein a decreased abundance of one or more viral clusters selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284 in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
31. The method of claim 29, wherein a decreased abundance of one or more viral clusters selected from vc7, vc25, vc47, and vc64 in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
32. The method of claim 29, wherein a decreased abundance of vc98 and/or vc103 viral cluster in the plurality of subject viral clusters as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
33. The method of any one of claims 15-32, wherein obtaining the dataset(s) is performed by sequencing VLP DNA isolated from GI microbiota sample(s).
34. The method of any one of claims 15-33, further comprising:
obtaining an individual bacteriome dataset representing bacterial sequences derived from the GI microbiota sample obtained from the subject; and
evaluating the individual bacteriome dataset for the presence of bacterial taxa associated with IBD.
35. The method of claim 34, further comprising determining the presence of IBD in the subject based at least in part on the comparison of the individual bacteriome dataset to at least one of a healthy control and a control diagnosed with IBD.
36. The method of claim 34 or claim 35, wherein the bacterial taxa associated with IBD comprise one or more bacterial genera selected from Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, Flavonifr actor, Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Dorea, Roseburia, Odoribacter, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
37. The method of claim 36, wherein an increased abundance of one or more bacterial genera selected from Clostridium XlVa, Blautia, Veillonella, Clostridium sensu stricto, Megasphaera, Fusobacterium, and Flavonifractor in the subject sample as compared to a healthy control is indicative of the presence of IBD in the subject.
38. The method of claim 37, wherein an increased abundance of one or more bacterial genera selected from Clostridium XlVa, Blautia, Megasphaera, and Fusobacterium in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
39. The method of claim 34 or claim 35, wherein an increased abundance of one or more bacterial species selected from Bacteroides fragilis and Ruminococcus gnavus in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
40. The method of claim 34 or claim 35, wherein an increased abundance of Ruminococcus gnavus in the subject sample as compared to a control sample from a patient with ulcerative colitis (UC) in remission is indicative of the presence of a flare-up of UC in the subject.
41. The method of claim 34 or claim 35, wherein an increased abundance of
Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes in the subject sample as compared to a control sample from a patient with a flare-up of ulcerative colitis (UC) in remission is indicative of the presence of UC in remission in the subject.
42. The method of claim 37, wherein an increased abundance of bacterial genus Flavonifractor in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
43. The method of claim 36, wherein a decreased abundance of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifractor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia in the subject sample as compared to a healthy control is indicative of the presence of IBD in the subject.
44. The method of claim 43, wherein a decreased abundance of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter in the subject sample as compared to a healthy control is indicative of the presence of Crohn’s Disease (CD) in the subject.
45. The method of claim 43, wherein a decreased abundance of bacterial genus Akkermansia in the subject sample as compared to a healthy control is indicative of the presence of ulcerative colitis (UC) in the subject.
46. The method of any one of claims 34-45, wherein obtaining the individual bacteriome dataset is performed by sequencing 16S rDNA or a V region of 16S rDNA in the GI microbiota sample.
47. The method of claim 46, wherein the V region is V4 region.
48. The method of any one of claims 15-47, wherein the GI microbiota sample is a fecal sample.
49. The method of any one of claims 15-48, wherein the subject is a human.
50. The method of any one of claims 15-49, further comprising administering an IBD treatment to the subject.
51. The method of any one of claims 15-50, further comprising administering to the subject additional diagnostic tests for IBD, CD and/or UC.
52. The method of any one of claims 15-51, further comprising enrolling the subject in a clinical trial.
53. The method of any one of claims 15-52, wherein comparing the plurality of subject viral clusters to the plurality of marker clusters comprises:
identifying common clusters present in the plurality of subject viral clusters and the plurality of marker clusters;
determining relative abundance of members within each common cluster in the plurality of subject viral clusters;
associating a correlation value with each common cluster in the plurality of marker clusters; and
comparing the relative abundance of members within each common cluster in the plurality of subject viral clusters to the correlation value of each common cluster in the plurality of marker clusters.
54. A kit for determining the presence of inflammatory bowel disease (IBD) in a subject comprising:
a device to:
receive a first dataset representing a plurality of unidentified viral genome sequences derived from a GI microbiota sample obtained from the subject;
receive a second dataset representing a plurality of viral genome IBD marker clusters;
create a plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group unidentified viral genome sequences of the plurality of unidentified viral genome sequences, each viral cluster in the plurality of viral clusters comprising one or more unidentified viral genome sequences of the plurality of unidentified genome sequences; and
compare the first plurality of viral clusters to the second dataset; and determine the presence of IBD based at least in part on the comparison of the plurality of viral clusters to the second dataset.
55. The kit of claim 54, wherein the device is further configured to:
receive a third dataset representing bacteria from the GI microbiota sample obtained from the subject;
evaluate the third dataset for the purpose of IBD diagnosis; and
determine the presence of IBD based at least in part on the evaluation of the third database.
56. The kit of claim 54 or 55, wherein the GI microbiota sample is one or more of group consisting a fecal sample, a cecal sample, an ileal sample, and a colonic microbiota sample.
57. The kit of any of claims 54-56, wherein the IBD is ulcerative colitis (UC).
58. The kit of any of claims 54-56, wherein the IBD is Crohn's disease (CD).
59. The kit of any one of claims 54-58, wherein the subject is human.
60. A system comprising:
one or more processors;
a memory in communication with the one or more processors and storing instructions thereon that, when executed by the one or more processors, are configured to cause the system to:
receive a first dataset representing a first plurality of viral genome sequences derived from a healthy cohort;
receive a second dataset representing a second plurality of viral genome sequences derived from a cohort diagnosed with IBD; create a first plurality of viral clusters by using protein clustering to group like proteins derived from the first dataset and by using protein homology to group viral genome sequences of the first dataset, each viral cluster in the first plurality of viral clusters comprising one or more viral genome sequences derived from the healthy cohort;
create a second plurality of viral clusters by using protein clustering to group like proteins derived from the second dataset and by using protein homology to group viral genome sequences of the second dataset, each viral cluster in the second plurality of viral clusters comprising one or more viral genome sequences derived from the cohort diagnosed with IBD; and
identify a plurality of marker clusters by comparing the first plurality of viral clusters to the second plurality of viral clusters.
61. A method for preventing and/or treating inflammatory bowel disease (IBD) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc23, vc24, vc25, vc29, vc37, vc38, vc39, vc42, vc45, vc55, vc56, vc58, vc60, vc61, vc62, vc64, vc75, vc76, vc82, vc84, vc86, vc89, vc92, vc96, vc98, vc99, vc1 Ol, vc103, vc104, vc109, vc1 l2, vc124, vc136, vc138, vc143, vc154, vc190, vc193, vc209, vc216, vc225, vc284, vc320, vc411, vc456, and vc467.
62. A method for preventing and/or treating IBD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
63. A method for preventing and/or treating Crohn's disease (CD) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc6, vc7, vc19, vc25, vc29, vc37, vc42, vc45, vc56, vc58, vc60, vc61, vc64, vc82, vc86, vc89, vc92, vc99, vc104, vc109, vc124, vc136, vc154, vc190, and vc284.
64. A method for preventing and/or treating CD in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
65. A method for preventing and/or treating ulcerative colitis (UC) in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster vc98 and/or vc103.
66. A method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a virus from a viral cluster selected from vc1O, vc23, and vc39.
67. The method of claim 1 , further comprising administering to the subj ect an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
68. The method of claim 63, further comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
69. The method of claim 65, further comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity in the GI microbiota of the subject of the bacterial genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus.
70. A method for preventing and/or treating IBD in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
71. A method for preventing and/or treating CD in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of one or more bacterial genera selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said one or more bacterial genera.
72. A method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of the genus Akkermansia or a closely related OTU which has at least 90% sequence identity to 16S rRNA over its entire length or has at least 90% sequence identity to any single V region of 16S rRNA of said bacterial genus.
73. The method of claim 67 or claim 70, wherein said probiotic composition comprises one or more bacterial strains from the genus selected from Catenibacterium, Ruminococcus, Coprococcus, Methanobrevibacter, Clostridium IV, Faecalibacterium, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Howardella, Bifidobacterium, Oscillibacter, Parabacteroides, Flavonifr actor, Blautia, Dorea, Roseburia, Odoribacter, Catenibacterium, and Akkermansia.
74. The method of claim 68 or claim 71, wherein said probiotic composition comprises one or more bacterial strains from the genus selected from Ruminococcus, Methanobrevibacter, Clostridium IV, Barnesiella, Dialister, Ruminococcus2, Alistipes, Sporobacter, Bifidobacterium, Oscillibacter, Flavonifractor, Dorea, Roseburia, and Odoribacter.
75. The method of claim 69 or claim 72, wherein said probiotic composition comprises one or more bacterial strains from the genus Akkermansia.
76. The method of any one of claims 67-75, wherein the V region is V4 region.
77. A method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic or a prebiotic composition or a combination thereof, wherein said composition(s) stimulates growth and/or activity of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.
78. A method for preventing and/or treating UC in a subject in need thereof, said method comprising administering to the subject an effective amount of a probiotic comprising one or more of Faecalibacterium prausnitzii, Dorea longicatena or Coprococcus comes.
79. The method of any one of claims 61-78, wherein the subject is human.
PCT/IB2020/055047 2019-06-14 2020-05-27 Materials and methods for assessing virome and microbiome matter WO2020250068A1 (en)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US201962861807P 2019-06-14 2019-06-14
US201962861776P 2019-06-14 2019-06-14
US201962861818P 2019-06-14 2019-06-14
US201962861746P 2019-06-14 2019-06-14
US62/861,746 2019-06-14
US62/861,776 2019-06-14
US62/861,818 2019-06-14
US62/861,807 2019-06-14

Publications (1)

Publication Number Publication Date
WO2020250068A1 true WO2020250068A1 (en) 2020-12-17

Family

ID=70922091

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2020/055047 WO2020250068A1 (en) 2019-06-14 2020-05-27 Materials and methods for assessing virome and microbiome matter

Country Status (1)

Country Link
WO (1) WO2020250068A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111808939A (en) * 2020-03-23 2020-10-23 昆明医科大学第一附属医院 Diagnostic marker for auxiliary diagnosis of ulcerative colitis
CN112750501A (en) * 2020-12-29 2021-05-04 上海派森诺生物科技股份有限公司 Optimized analysis method for macrovirome process
CN114317674A (en) * 2021-12-31 2022-04-12 青岛锐翌精准医学检验有限公司 Marker microorganism for rheumatoid arthritis and application thereof

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100074872A1 (en) 2008-09-25 2010-03-25 New York University Compositions and methods for characterizing and restoring gastrointestinal, skin, and nasal microbiota
US7912698B2 (en) 2005-08-26 2011-03-22 Alexander Statnikov Method and system for automated supervised data analysis
US20110202322A1 (en) 2009-01-19 2011-08-18 Alexander Statnikov Computer Implemented Method for Discovery of Markov Boundaries from Datasets with Hidden Variables
US20110307437A1 (en) 2009-02-04 2011-12-15 Aliferis Konstantinos Constantin F Local Causal and Markov Blanket Induction Method for Causal Discovery and Feature Selection from Data
US20130149339A1 (en) 2010-06-04 2013-06-13 The University Of Tokyo Composition for inducing proliferation or accumulation of regulatory t cells
US20180125900A1 (en) * 2016-10-17 2018-05-10 New York University Probiotic compositions for improving metabolism and immunity

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7912698B2 (en) 2005-08-26 2011-03-22 Alexander Statnikov Method and system for automated supervised data analysis
US20100074872A1 (en) 2008-09-25 2010-03-25 New York University Compositions and methods for characterizing and restoring gastrointestinal, skin, and nasal microbiota
US20110202322A1 (en) 2009-01-19 2011-08-18 Alexander Statnikov Computer Implemented Method for Discovery of Markov Boundaries from Datasets with Hidden Variables
US20110307437A1 (en) 2009-02-04 2011-12-15 Aliferis Konstantinos Constantin F Local Causal and Markov Blanket Induction Method for Causal Discovery and Feature Selection from Data
US20130149339A1 (en) 2010-06-04 2013-06-13 The University Of Tokyo Composition for inducing proliferation or accumulation of regulatory t cells
US20180125900A1 (en) * 2016-10-17 2018-05-10 New York University Probiotic compositions for improving metabolism and immunity

Non-Patent Citations (60)

* Cited by examiner, † Cited by third party
Title
"Current Protocols in Molecular Biology", 2005, JOHN WILEY AND SONS, INC.
ALLARD ET AL., BMC BIOINFORMATICS, vol. 16, 2015, pages 324
ANDREY N. SHKOPOROV ET AL: "The human gut virome is highly diverse, stable and individual-specific", BIORXIV, 3 June 2019 (2019-06-03), XP055720288, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/657528v1> DOI: 10.1101/657528 *
BERKOWITZ ET AL., FRONT IMMUNOL, vol. 9, 2018, pages 74
BERNARDES ET AL.: "Evaluation and improvements of clustering algorithms for detecting remote homologous protein families", BMC BIOINFORMATICS, vol. 16, 2015, pages 34, XP021212715, DOI: 10.1186/s12859-014-0445-4
BJURSELL ET AL., JOURNAL OF BIOLOGICAL CHEMISTRY, vol. 281, 2006, pages 36269 - 36279
BOLDUC ET AL.: "vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria", PEERJ, vol. 5, 2017, pages e3243
CALLAHAN ET AL., NAT METHODS, vol. 13, 2016, pages 581 - 3
CANCHAYA ET AL., CURRENT OPINION IN MICROBIOLOGY, vol. 6, 2003, pages 417 - 424
COSTELLO ET AL., SCIENCE, vol. 324, 2009, pages 1190 - 2
DICKSVED ET AL., ISME J, vol. 2, 2008, pages 716 - 27
ECKBURG ET AL., SCIENCE, vol. 308, 2005, pages 1635 - 8
EDGAR ET AL., BIOINFORMATICS, vol. 27, 2011, pages 2194 - 200
FERNANDES ET AL., J PEDIATR GASTROENTEROL NUTR, vol. 68, 2019, pages 30 - 36
FORSLUND ET AL., NATURE, vol. 528, 2015, pages 262 - 266
FORSTER ET AL., NAT BIOTECHNOL, vol. 37, 2019, pages 186 - 192
GEVERS ET AL., CELL HOST MICROBE, vol. 15, 2014, pages 382 - 392
GEVERS ET AL., VEILLONELLA, 2014
GRAZZIOTIN ET AL., NUCLEIC ACIDS RES, vol. 45, 2017, pages D491 - D498
GUERIN ET AL., CELL HOST MICROBE, vol. 24, 2018, pages 653 - 664 e6
HALFVARSON ET AL., NAT MICROBIOL, vol. 2, 2017, pages 17004
HO BIN JANG ET AL: "Gene sharing networks to automate genome-based prokaryotic viral taxonomyABSTRACT", BIORXIV, 29 January 2019 (2019-01-29), XP055720424, Retrieved from the Internet <URL:https://www.biorxiv.org/content/10.1101/533240v1> DOI: 10.1101/533240 *
HYATT ET AL., BMC BIOINFORMATICS, vol. 11, 2010, pages 119
JASON M. NORMAN ET AL: "Disease-Specific Alterations in the Enteric Virome in Inflammatory Bowel Disease", CELL, vol. 160, no. 3, 1 January 2015 (2015-01-01), AMSTERDAM, NL, pages 447 - 460, XP055720670, ISSN: 0092-8674, DOI: 10.1016/j.cell.2015.01.002 *
JOOSSENS ET AL., GUT, vol. 60, 2011, pages 631 - 7
KRISHNAMURTHYWANG, VIRUS RES, vol. 239, 2017, pages 136 - 142
LE CHATELIER ET AL., NATURE, vol. 500, 2013, pages 541 - 6
LOPEZ-SILES ET AL., APPL ENVIRON MICROBIOL, vol. 81, 2015, pages 7582 - 92
LOPEZ-SILES ET AL., FRONT CELL INFECT MICROBIOL, vol. 8, 2018, pages 281
MACHIELS ET AL., GUT, vol. 63, 2014, pages 1275 - 83
MAHOWALD ET AL., PNAS, vol. 10, 2009, pages 3698 - 3703
MAIER ET AL., NATURE, vol. 555, 2018, pages 623 - 628
MANICHANH ET AL., GUT, vol. 55, 2006, pages 205 - 11
MINOT S. ET AL.: "The human gut virome: inter-individual variation and dynamic response to diet", GENOME RES, vol. 21, 2011, pages 1616 - 1625
NORMAN ET AL., CELL, vol. 160, 2015, pages 447 - 60
NURK ET AL., GENOME RES, vol. 27, 2017, pages 824 - 834
PARRAS-MOLTO ET AL., MICROBIOME, vol. 6, 2018, pages 119
PASCAL ET AL., GUT, vol. 66, 2017, pages 813 - 822
POOL-ZOBELSAUER, J NUTR, vol. 137, 2007, pages 2580S - 2584S
RAMIREZ-FARIAS ET AL., BR J NUTR, vol. 4, 2008, pages 1 - 10
REYES, A. ET AL.: "Viruses in the faecal microbiota of monozygotic twins and their mothers", NATURE, vol. 466, 2010, pages 334 - 338
RIGOTTIER-GOIS, ISME J, vol. 7, 2013, pages 1256 - 61
ROUX ET AL.: "Towards quantitative viromics for both double-stranded and single-stranded DNA viruses", PEERJ, vol. 4, 2016, pages e2777
ROUX ET AL.: "Viral dark matter and virus-host interactions resolved from publicly available microbial genomes", ELIFE, 2015, pages 4
ROUX ET AL.: "VirSorter: mining viral signal from microbial genomic data", PEERJ, vol. 3, 2015, pages e985
SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 2001, COLD SPRING HARBOR LABORATORY PRESS
SCHLOSS ET AL., APPL ENVIRON MICROBIOL, vol. 75, 2009, pages 7537 - 41
SHKOPOROV ANDREY N ET AL: "Bacteriophages of the Human Gut: The "Known Unknown" of the Microbiome", CELL HOST & MICROBE, vol. 25, no. 2, 13 February 2019 (2019-02-13), pages 195 - 209, XP085602333, ISSN: 1931-3128, DOI: 10.1016/J.CHOM.2019.01.017 *
STRAUSS ET AL., INFLAMM BOWEL DIS, vol. 17, 2011, pages 1971 - 8
SUTTON ET AL., MICROBIOME, vol. 7, 2019, pages 12
TAO ZUO ET AL: "Gut mucosal virome alterations in ulcerative colitis", GUT MICROBIOTA, vol. 68, no. 7, 6 March 2019 (2019-03-06), UK, pages 1169 - 1179, XP055720085, ISSN: 0017-5749, DOI: 10.1136/gutjnl-2018-318131 *
TAO ZUO: "Gut mucosal virome alterations in ulcerative colitis - Supplementary Appendix", GUT MICROBIOTA, 6 March 2019 (2019-03-06), XP055720350, Retrieved from the Internet <URL:https://gut.bmj.com/content/68/7/1169> [retrieved on 20200805] *
TATUSOV ET AL., NUCLEIC ACIDS RES, vol. 28, 2000, pages 33 - 6
THURBER R.V. ET AL.: "Laboratory procedures to generate viral metagenomes", NAT PROTOC, vol. 4, 2009, pages 470 - 483
WEITZ ET AL., ISME J, vol. 9, 2015, pages 1352 - 64
WILLING ET AL., GASTROENTEROLOGY, vol. 139, 2010, pages 1844 - 1854
WOODSALZBERG, GENOME BIOL, vol. 15, 2014, pages R46
YUN ET AL., ADV DRUG DELIV REV., vol. 65, no. 6, 2013, pages 822 - 832
ZUO ET AL., GUT, 2019
ZUO ET AL.: "Gut mucosal virome alterations in ulcerative colitis", GUT, 2019

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111808939A (en) * 2020-03-23 2020-10-23 昆明医科大学第一附属医院 Diagnostic marker for auxiliary diagnosis of ulcerative colitis
CN112750501A (en) * 2020-12-29 2021-05-04 上海派森诺生物科技股份有限公司 Optimized analysis method for macrovirome process
CN112750501B (en) * 2020-12-29 2024-04-02 上海派森诺生物科技股份有限公司 Optimized analysis method for macro virus group flow
CN114317674A (en) * 2021-12-31 2022-04-12 青岛锐翌精准医学检验有限公司 Marker microorganism for rheumatoid arthritis and application thereof
CN114317674B (en) * 2021-12-31 2024-04-12 青岛锐翌精准医学检验有限公司 Rheumatoid arthritis marker microorganism and application thereof

Similar Documents

Publication Publication Date Title
Limon et al. Commensal fungi in health and disease
Frese et al. Persistence of supplemented Bifidobacterium longum subsp. infantis EVC001 in breastfed infants
Ayeni et al. Infant and adult gut microbiome and metabolome in rural Bassa and urban settlers from Nigeria
Kalyana Chakravarthy et al. Dysbiosis in the gut bacterial microbiome of patients with uveitis, an inflammatory disease of the eye
Ling et al. Altered fecal microbiota composition associated with food allergy in infants
WO2020250068A1 (en) Materials and methods for assessing virome and microbiome matter
Lozupone et al. HIV-induced alteration in gut microbiota: driving factors, consequences, and effects of antiretroviral therapy
Berry et al. Intestinal microbiota: a source of novel biomarkers in inflammatory bowel diseases?
Leser et al. Culture-independent analysis of gut bacteria: the pig gastrointestinal tract microbiota revisited
WO2012159023A2 (en) Gut microflora as biomarkers for the prognosis of cirrhosis and brain dysfunction
Wang et al. Detecting microbial dysbiosis associated with pediatric Crohn disease despite the high variability of the gut microbiota
Shetty et al. Opportunities and challenges for gut microbiome studies in the Indian population
Aljumaah et al. The gut microbiome, mild cognitive impairment, and probiotics: A randomized clinical trial in middle-aged and older adults
Esworthy et al. A strong impact of genetic background on gut microflora in mice
Lopetuso et al. Towards a disease-associated common trait of gut microbiota dysbiosis: The pivotal role of Akkermansia muciniphila
Hoenigl et al. Glucan rich nutrition does not increase gut translocation of beta‐glucan
Karimi et al. Molecular epidemiology of Enterocytozoon bieneusi and Encephalitozoon sp., among immunocompromised and immunocompetent subjects in Iran
Zhao et al. Influence of novel highly pathogenic avian influenza A (H5N1) virus infection on migrating whooper swans fecal microbiota
Shin et al. Molecular diagnostics for verifying an etiological agent of emaciation disease in cultured olive flounder Paralichthys olivaceus in Korea
Zhu et al. Understanding of the site-specific microbial patterns towards accurate identification for patients with diarrhea-predominant irritable bowel syndrome
US10465224B2 (en) Methods and materials for assessing and treating arthritis
Button et al. Precision modulation of dysbiotic adult microbiomes with a human-milk-derived synbiotic reshapes gut microbial composition and metabolites
Ranjan et al. Advances in characterization of probiotics and challenges in industrial application
Zhao et al. Effect of the administration of probiotics on the fecal microbiota of adult individuals
EP4368730A2 (en) Microbiome interventions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20729840

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20729840

Country of ref document: EP

Kind code of ref document: A1