WO2016000539A1 - Detecting bacterial taxa for predicting adverse pregnancy outcomes - Google Patents

Detecting bacterial taxa for predicting adverse pregnancy outcomes Download PDF

Info

Publication number
WO2016000539A1
WO2016000539A1 PCT/CN2015/082044 CN2015082044W WO2016000539A1 WO 2016000539 A1 WO2016000539 A1 WO 2016000539A1 CN 2015082044 W CN2015082044 W CN 2015082044W WO 2016000539 A1 WO2016000539 A1 WO 2016000539A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleotide sequence
bacterial
sample
bacteria
sequence
Prior art date
Application number
PCT/CN2015/082044
Other languages
French (fr)
Inventor
Stephen Siu-Chung CHIM
Chee-Yin CHEUNG
Wan-Chee CHEUNG
Meng MENG
Tak-Yeung LEUNG
Keun-Young Lee
Original Assignee
The Chinese University Of Hong Kong
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Chinese University Of Hong Kong filed Critical The Chinese University Of Hong Kong
Priority to US15/317,021 priority Critical patent/US10683557B2/en
Priority to EP15814937.7A priority patent/EP3161167B1/en
Priority to KR1020167036942A priority patent/KR20170020382A/en
Priority to ES15814937T priority patent/ES2767527T3/en
Priority to JP2017500020A priority patent/JP6539721B2/en
Priority to KR1020207002520A priority patent/KR102207858B1/en
Publication of WO2016000539A1 publication Critical patent/WO2016000539A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/689Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/16Primer sets for multiplex assays

Definitions

  • the cervix In a normal uncomplicated pregnancy, the cervix is long and closed until the late third trimester, when it will eventually shorten and dilate around the time when the fetus fully develops and is ready for birth. Contrastingly, in a pregnancy complicated by cervical shortening or advanced cervical dilation, the cervix progresses to shortening or dilation, respectively, well ahead of the normal schedule above. Consequently, certain women with these conditions will result in birth sooner than thirty-seven gestational weeks, when fetal development is incomplete. Hence, this may lead to neonatal mortality and morbidity. To prolong such pregnancy, the clinicians may place a cerclage or a cervical pessary to support the cervix. Based primarily on consensus and expert opinion, it is recommended that cerclage placement may be beneficial if intra-amniotic infection is ruled out. Similar guidelines are suggested for pessary placement. To rule out infection effectively, a highly sensitive detection method is required.
  • the present disclosure is based, in part, on the discovery that the level of bacteria belonging to a specific group of bacterial taxa (e.g., a bacterial species or genera) in a woman’s cervix is increased in correlation with the likelihood of an adverse pregnancy outcome , such as a premature birth at less than 34 weeks or 37 weeks of gestational age, or an adverse neonatal condition, such as an Apgar score of less than 7 at 1 or 5 minutes, chorioamnionitis, respiratory distress syndrome, bronchopulmonary dysplasia, intraventricular hemorrhage, neonatal sepsis and neonatal death within 7 days of birth.
  • an adverse pregnancy outcome such as a premature birth at less than 34 weeks or 37 weeks of gestational age
  • an adverse neonatal condition such as an Apgar score of less than 7 at 1 or 5 minutes, chorioamnionitis, respiratory distress syndrome, bronchopulmonary dysplasia, intraventricular hemorrhage, neonatal sepsis and neonatal death
  • the present invention provides a method for determining the risk of an adverse pregnancy or neonatal outcome for a subject, e.g., a pregnant woman or a non-pregnant woman.
  • the method includes the steps of (a) detecting in a biological sample taken from the subject the level of bacteria from at least one bacterial taxa selected from the group consisting of Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum, Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No.
  • JF295520.1 Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9; and (b) determining that the subject has an increased risk for an adverse pregnancy or neonatal outcome if the level of bacteria from the at least one bacterial genus is greater than that of a standard control level.
  • the method includes detecting the level of Sneathia sanguinegens, Parvimonas micra, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Peptoniphilus lacrimalis, Megasphaera cerevisiae, Parvibacter caecicola in a biological sample taken from the subject, and determining that the subject has an increased risk of an adverse pregnancy outcome, if the total level of the bacteria is increased compared to a standard control level.
  • adverse pregnancy or neonatal outcome includes preterm birth at ⁇ 34 weeks, preterm birth at ⁇ 37 weeks, delivery within about 1-196 days after the biological sample is taken, delivery within about 1-196 days after a clinical intervention is performed, an Apgar score at 1 minute of ⁇ 7, an Apgar score at 5 minutes of ⁇ 7, chorioamnionitis, respiratory distress syndrome, bronchopulmonary dysplasia, intraventricular hemorrhage, neonatal death within 7 days after birth or neonatal sepsis.
  • the method includes determining that the subject has a risk of having advanced cervical dilation or premature cervical shortening if the level of bacteria belonging the at least one bacterial taxon is greater than a standard control level
  • the method includes detecting the level of Parvimonas micra, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Peptoniphilus lacrimalis, Megasphaera cerevisiae, Parvibacter caecicola in a biological sample taken from the subject, and determining that the subject has an increased risk of an adverse pregnancy outcome, such as delivery within 7 days after cervical intervention (i.e., cerclage/pessary intervention) , if the total level of the bacteria is increased compared to a standard control level.
  • the method also includes determining that the subject is at risk of having an infection in the amniotic cavity, uterine cavity, cervix or vagina.
  • One class of Megasphaera cerevisiae can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_113307.1.
  • bacteria of the taxon Megasphaera cerevisiae are detected as having 16S rRNA nucleotide sequence with at least 93%or 94%sequence identity to the sequence of GenBank Accession No. NR_113307.1.
  • bacteria of the taxon Megasphaera cerevisiae are detected as having 16S rRNA nucleotide sequence shown in SEQ ID NO: 5.
  • Alloscardovia omnicolens bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_042583.1.
  • bacteria of the taxon Alloscardovia omnicolens are detected as having 16S rRNA genomic sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_042583.1.
  • Ureaplasma urealyticum bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA genomic sequence of GenBank Accession No. NR_102836.1.
  • bacteria of the taxon Ureaplasma urealyticum are detected as having 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_102836.1.
  • Ureaplasma parvum bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA genomic sequence of GenBank Accession No. NR_074176.1.
  • bacteria of the taxon Ureaplasma parvum are detected as having 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_074176.1.
  • bacteria of the taxon Ureaplasma parvum are detected as having 16S rRNA nucleotide sequence shown in SEQ ID NO: 7.
  • Atopobium vaginae bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_117757.1.
  • bacteria of the taxon Atopobium vaginae are detected as having a 16S rRNA nucleotide sequence with 97%sequence identity to the nucleotide sequence of GenBank Accession No. NR_117757.1.
  • bacteria of the taxon Atopobium vaginae are detected as having a 16S rRNA nucleotide sequence shown in SEQ ID NO: 8.
  • Parvibacter caecicola bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_117374.1.
  • bacteria of the taxon Parvibacter caecicola are detected as having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No. NR_117374.1.
  • bacteria of the taxon Parvibacter caecicola are detected as having a 16S rRNA nucleotide sequence shown in SEQ ID NO: 9.
  • Lactobacillus casei bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_075032.1.
  • bacteria of the taxon Lactobacillus casei are detected as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_075032.1.
  • Veillonella montpellierensis bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_028839.1.
  • bacteria of the taxon Veillonella montpellierensis are detected as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_028839.1.
  • Anaerococcus senegalensis bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_118220.1.
  • bacteria of the taxon Anaerococcus senegalensis are detected as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_118220.1.
  • Bulleidia extructa bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_028773.1.
  • bacteria of the taxon Bulleidia extructa are detected as having 16S rRNA nucleotide sequence with 97%sequence identity to the nucleotide sequence of GenBank Accession No. NR_028773.1.
  • Mycoplasma hominis bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_113679.1.
  • bacteria of the taxon Mycoplasma hominis are detected as having a 16S rRNA genomic sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_113679.1.
  • Propionimicrobium lymphophilum bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_114337.1.
  • bacteria of the taxon Propionimicrobium lymphophilum are detected as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_114337.1.
  • One class of uncultured bacteria relevant to the present disclosure can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. JQ781443.1.
  • bacteria of this taxon have a 16S rRNA nucleotide sequence with 92%sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1.
  • Corynebacterium pyruviciproducens bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_116569.1.
  • bacteria of the taxon Corynebacterium pyruviciproducens are detected as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_116569.1.
  • Another class of Megasphaera cerevisiae can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_113307.1.
  • bacteria of the taxon Megasphaera cerevisiae are detected as having 16S rRNA nucleotide sequence with at least 93%or 94%sequence identity to the sequence of GenBank Accession No. NR_113307.1.
  • bacteria of the taxon Megasphaera cerevisiae are detected as having 16S rRNA nucleotide sequence shown in SEQ ID NO: 29.
  • Acidipila rosea bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_113179.1.
  • bacteria of the taxon Acidipila rosea are detected as having a 16S rRNA nucleotide sequence with 97%sequence identity to the nucleotide sequence of GenBank Accession No. NR_113179.1.
  • Murdochiella asaccharolytica bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_116331.1.
  • bacteria of the taxon Murdochiella asaccharolytica are detected as having a 16S rRNA nucleotide sequence with 99%sequence identity to the nucleotide sequence of GenBank Accession No. NR_116331.1.
  • Another class of uncultured bacteria relevant to the present disclosure can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. JF295520.1.
  • bacteria of this taxon are detected as having a 16S rRNA nucleotide sequence with 90%sequence identity to the nucleotide sequence of GenBank Accession No. JF295520.1.
  • Howardella ureilytica bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_044022.2.
  • bacteria of the taxon Howardella ureilytica are detected as having a 16S rRNA nucleotide sequence with 93%sequence identity to the nucleotide sequence of GenBank Accession No. NR_044022.2.
  • Actinobaculum schaalii bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_116869.1.
  • bacteria of the taxon Actinobaculum schaalii are detected as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_116869.1.
  • Peptoniphilus duerdenii bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_116346.1.
  • bacteria of the taxon Peptoniphilus duerdenii are detected as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_116346.1.
  • Fastidiosipila sanguinis bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_042186.1.
  • bacteria of the taxon Fastidiosipila sanguinis are detected as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_042186.1.
  • Sneathia sanguinegens can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. AJ344093.1.
  • bacteria of the taxon Sneathia sanguinegens are detected as having 16S rRNA nucleotide sequence with at least 93%or 94%sequence identity to the sequence of GenBank Accession No. AJ344093.1.
  • Parvimonas micra can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_114338.1.
  • bacteria of the taxon Parvimonas micra are detected as having 16S rRNA nucleotide sequence with at least 93%, e.g., 93%, 94%, 95%, 96%, 975 or more, sequence identity to the sequence of GenBank Accession No. NR_114338.1.
  • bacteria of the taxon Parvimonas micra are detected as having 16S rRNA nucleotide sequence shown in SEQ ID NO: 43.
  • Peptoniphilus lacrimalis can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession Nos. AB971812.1 and NR_041938.1.
  • bacteria of the taxon Peptoniphilus lacrimalis are detected as having 16S rRNA nucleotide sequence with 100%sequence identity to the sequence of GenBank Accession Nos. AB971812.1 and NR_041938.1.
  • bacteria of the taxon Peptoniphilus lacrimalis are detected as having 16S rRNA nucleotide sequence shown in SEQ ID NO: 44.
  • the method includes detecting at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 different bacterial taxa.
  • the subject is a pregnant woman at about 13 weeks to about 37 weeks of gestation. In other embodiments, the pregnant woman is between about 13 to about 25 gestational weeks. In some cases, the pregnant woman has a prematurely opened cervix. The woman may be at risk of having an infection in the amniotic cavity, uterine cavity, cervix or vagina. In other embodiments, the subject is a non-pregnant woman. In some embodiments, the non-pregnant woman is planning for her future pregnancy. In some cases, the subject has a history of preterm birth, stillbirth or miscarriage. In some cases, the subject is planning to receive clinical intervention, such as cerlclage intervention or progesterone supplementation before or after pregnancy.
  • the method also includes extracting nucleic acids from the biological sample taken from the subject prior to performing step (a) .
  • the sample is a cervical swab (including swab sample of the external os) , a vaginal swab (including swab sample of the fornix) , a urine sample, an amniotic fluid sample, a maternal blood sample (maternal whole blood sample) , a maternal serum sample, a maternal plasma sample, or a cervical mucus sample.
  • the sample is a placental swab, an umbilical swab, or any sample taken directly or indirectly from the reproductive system.
  • the sample is taken directly or indirectly from the gastrointestinal system of the pregnant subject, including a buccal swab, a throat swab, an anal swab, a rectal swab or stool sample.
  • a buccal swab a buccal swab, a throat swab, an anal swab, a rectal swab or stool sample.
  • the step of detecting includes a polynucleotide amplification assay.
  • the amplification assay is a polymerase chain reaction (PCR) assay.
  • the PCR assay can be a quantitative PCR assay.
  • the step of detecting includes sequence-specific probe/primer hybridization, which can occur in the absence of polynucleotide amplification.
  • the step of detecting includes polynucleotide sequence determination, such as but not limited to, massive parallel sequencing.
  • JF295520.1 Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA genomic sequence in Tables 4A, 4C and 8, a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9, or any combination thereof is elevated compared to a standard control, the pregnant subject is at risk of having an adverse pregnancy or neonatal outcome.
  • the level of bacteria from one or more of the taxa such as Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No.
  • JF295520.1 Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA genomic sequence in Tables 4A, 4C and 8, and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9, in a cervical or vaginal swab sample from a pregnant woman is detected to be higher compared to the standard control, the woman has an increased likelihood of having an adverse pregnancy and/or neonatal outcome.
  • Possible adverse pregnancy or neonatal outcomes include, but are not limited to, preterm birth at ⁇ 34 weeks, preterm birth at ⁇ 37 weeks, an Apgar score at 1 minute of ⁇ 7, an Apgar score at 5 minutes of ⁇ 7, chorioamnionitis, respiratory distress syndrome, bronchopulmonary dysplasia, intraventricular hemorrhage, neonatal death within 7 days after birth, and/or neonatal sepsis.
  • the possible adverse pregnancy or neonatal outcomes further include delivery within a period of about 1-196 days (e.g., 1 day, 7 days, 14 days, 21 days, 28 days, 56 days, 84 days, 112 days, 140 days, 168 days, or 196 days) after the biological sample is taken, delivery within a period of about 1-196 days (e.g., 1 day, 7 days, 14 days, 21 days, 28 days, 56 days, 84 days, 112 days, 140 days, 168 days, or 196 days) after the clinical intervention is performed.
  • 1-196 days e.g., 1 day, 7 days, 14 days, 21 days, 28 days, 56 days, 84 days, 112 days, 140 days, 168 days, or 196 days
  • the method provided herein can be used to determine that the pregnant woman has a risk of having advanced cervical dilation or advanced cervical shortening. For instance, if the level of bacteria from at least one bacterial taxon, such as Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No.
  • bacterial taxon such as Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma par
  • JF295520.1 Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9 is greater than that of a standard control level, the woman has an increased likelihood of having advanced cervical dilation or premature cervical shortening.
  • Such a subject is predicted to exhibit cervical dilation of, for example, ⁇ 2cm, without active labor between 13 weeks to up to 37 weeks of gestation, or cervical shortening prior to full-term labor and delivery.
  • the method can further include detecting in the biological sample the level of bacteria from at least one bacterial taxon selected from the group consisting of Jonquetella anthropi, Aerococcus urinae, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4B and 4D, and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4F and 4H; and determining that the subject has an increased risk for an adverse pregnancy outcome if the level of bacteria from the at least one bacterial taxon is lower than that of a standard control level.
  • the method can include repeating steps (a) and (b) at a later time using a sample type of the biological sample from the subject, wherein an increase in the level of bacteria from the at least one bacterial taxon at the later time as compared to the level determined in the original step (a) indicates an increased risk of having an adverse pregnancy or neonatal outcome.
  • a physician may provide treatment for the woman to minimize the risk of such adverse outcome. For example, the woman may be closely monitored throughout pregnancy or be timely transferred to a tertiary unit with neonatal intensive care.
  • the present invention provides a kit for determining the risk of having an adverse pregnancy or neonatal outcome in a subject.
  • the kit can include (a) a standard control that provides a biological sample taken from a pregnant female and containing bacteria belonging to at least one bacterial taxon selected from the group consisting of Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No.
  • JF295520.1 Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, and a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9.
  • the one or more agent may include one or more pairs of oligonucleotide primers that specifically hybridize to and amplify a polynucleotide of the at least one bacterial genus in an amplification assay.
  • the one or more agents may further include a polynucleotide probe that specifically hybridizes to a polynucleotide sequence of the at least one bacterial taxon.
  • the kit includes an instruction manual.
  • the present invention provides a method for determining whether a pregnant subject has an increased risk of having advanced cervical dilation or premature cervical shortening.
  • the method includes the steps of (a) extracting nucleic acids from a biological sample taken from the subject; (b) detecting in the nucleic acids the level of at least one bacterial taxon selected from the group consisting of Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No.
  • JF295520.1 Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9; and (c) determining that the subject has an increased risk of having advanced cervical dilation or premature cervical shortening if the level of the at least one bacterial taxon is greater than that of a standard control level.
  • a pregnant woman is predicted to be at risk of having a prematurely dilated cervix and/or a prematurely shortened cervix, if it is determined that she has an increased level of bacteria from one or more of the selected bacterial taxa in her cervix or vagina.
  • bacteria from the group such as Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No.
  • JF295520.1 Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9, or any combination thereof in her cervix or vagina at, for example, about 13 to about 25 gestational weeks, she is likely to have a dilated cervix or shortened cervix prior to about 34 to about 37 gestational weeks. At least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 different bacterial taxa can be detected in the sample.
  • the biological sample is a cervical swab, a vaginal swab, a urine sample, an amniotic fluid sample, a maternal blood sample, a maternal serum sample, a maternal plasma sample, a cervical mucus sample, a placental swab, an umbilical cord swab or any sample taken directly or indirectly from the reproductive system or the gastrointestinal system.
  • the detecting step can be a polynucleotide amplification assay, an assay involving polynucleotide sequence determination, or an assay involving sequence-specific probe/primer hybridization.
  • the amplification assay is a polymerase chain reaction (PCR) assay.
  • the PCR assay is a quantitative PCR assay or a reverse-transcriptase PCR assay.
  • FIGS. 1A-1E show the association between clinical outcomes, such as spontaneous preterm birth (sPTB) after intervention and term birth (TB) after intervention, and bacterial taxa.
  • FIG. 1A shows the clinical outcomes of 25 cervical insufficiency (CI) patients after intervention. Each column represents a patient (P1-P25) in ascending order of gestational age (GA) at delivery. Black rectangles: P1-P10 resulting in “spontaneous preterm birth (sPTB) ⁇ 34 weeks after intervention” . White rectangles: P11-P25 resulting in “term birth (TB) ⁇ 37 weeks after intervention” . Latency, the interval between treatment and delivery. RDS, respiratory distress syndrome. BPD, bronchopulmonary dysplasia.
  • FIGS. 1B and 1C show the 10 most abundant bacterial taxa in the “sPTB after intervention” and “TB after intervention” CI cervices, respectively. Shown values are the log10 of the abundance values (normalized sequencing read counts, the Cumulative Sum Scaling (CSS) method) of each bacterial taxon (row) in the cervical swab sample of each patient (column) . Normalized read counts are transformed into log (abundance) as noted in FIG. 1E.
  • CCSS Cumulative Sum Scaling
  • Each row represents an operational taxonomic unit (Otu) formed by clustering sequences of ⁇ 97%identity.
  • Otu is taxonomically classified at the genus level using the Ribosomal Database Project (RDP) Bayesian rRNA Classifier (Version 2.9, September 2014, RDP 16S rRNA training set 10) .
  • Lactobacillus are further matched against the 16S rRNA database (GenBank) using BLAST (highest score) and MOLE-BLAST (best multiple-alignment of BLAST matches) for deriving the species information.
  • Seven taxa remain as differentially abundant after adjustment for multiple testing by the False Discovery Rate (FDR) method (p ⁇ 0.05 and q-value ⁇ 0.05, i.e. FDR ⁇ 5%, in asterisk) .
  • the latter 6 taxa with p ⁇ 0.01 are further selected for calculating a summary score, namely the log (base 10) total abundance of 6 selected taxa (LA6) value, for each cervical swab sample. Total abundance is the arithmetic sum of abundances of the selected taxa in common (linear) scale.
  • FIGS. 2A-2C provides statistical analysis and LA6 values for the two groups, e.g., patients with spontaneous preterm birth (sPTB) after intervention and patients with term birth (TB) after intervention.
  • FIG. 2A provides LA6 values (the total abundance of the 6 selected bacterial taxa in logarithmic (base 10) scale) in two groups of cervical insufficiency (CI) patients both receiving clinical intervention but resulting in different outcomes. Cervical swab samples for measuring the LA6 were collected from CI patients before cerclage/pessary intervention.
  • FIG. 2B shows the proportion of undelivered pregnancies at different gestational period in CI patients with LA6>1.15 (LA6-positive) vs. those with LA6 ⁇ 1.15 (LA6-negative) .
  • LA6-positive patients delivered earlier after clinical intervention than the LA6-positive patients (median gestational age at delivery of 23.7 weeks vs. 38.4 weeks; 95%confidence interval, 20.6 weeks -25.4 weeks vs. 38.0 weeks -38.7 weeks; Chi-squared, 32.352; df, 1; Logrank test, p ⁇ 0.0001; hazard ratio, 6.24; 95%confidence interval, 1.50 to 25.9) .
  • the 95%confidence intervals of the percentages of undelivered pregnancies for LA6-positive and LA6-negative patients are shown as hairlines around the bold solid line and the bold dotted line, respectively.
  • FIG. 2C shows the proportion of undelivered pregnancies at different days after treatment in LA6-positive vs.
  • LA6-negative CI patients LA6-positive patients delivered for a shorter period after intervention than their LA6-negative counterparts (median number days between intervention and delivery, 10 days vs. 126 days; 95%confidence interval, 8 days-32 days vs. 112 days-134 days; Chi-squared, 32.520; df, 1; Logrank test, p ⁇ 0.00001; hazard ratio, 6.34; 95%confidence interval, 1.51 to 26.6) .
  • an adverse pregnancy outcome refers to a condition that reduces the chance of delivering/birthing a healthy baby.
  • Non-limiting examples of an adverse pregnancy outcome includes multiple first trimester miscarriages, a second trimester pregnancy loss, preterm birth (e.g., spontaneous or indicated) , preterm pre-clampsia, preterm clampsia, fetal growth restriction, abruption placenta, fetal death/stillbirth, birth defects, Apgar score at 1 minute of ⁇ 7, Apgar score at 5 minute of ⁇ 7, clinical chorioamnioitis, pathological chorioamnioitis, neonatal respiratory distress syndrome, neonatal bronchopulmonary dysplasia, neonatal sepsis, neonatal intraventricular hemorrhage, etc.
  • cervical insufficiency refers to a condition of the cervix, such as weakening or advanced dilation of the cervix that can lead to second-trimester pregnancy loss or birth. Cervical weakness, premature cervical shortening, premature or advanced cervical dilation, cervical trauma, a structural abnormality of the cervix, or any combination thereof can contribute to cervical insufficiency. Clinical interventions to manage cervical insufficiency include, but are not limited to, progesterone supplementation, cervical cerclage, and cervical pessary.
  • bacterial taxon refers to the taxonomy, i.e., the rank-based classification of bacteria.
  • the hierarchical biological classification includes life, domain, kingdom, phylum, class, order, family, genus and species.
  • biological sample includes sections of tissues such as biopsy and autopsy samples, and frozen sections taken for histologic purposes, or processed forms of any of such samples.
  • Biological samples include a cervical swab, a vaginal swab, a uterine swab, blood and blood fractions or products (e.g., serum, plasma, platelets, red blood cells, and the like) , sputum or saliva, lymph and tongue tissue, cultured cells, e.g., primary cultures, explants, and transformed cells, stool, urine, a biopsy tissue etc.
  • a biological sample is typically obtained from a eukaryotic organism, which may be a mammal, may be a primate and may be a human subject.
  • biopsy refers to the process of removing a tissue sample for diagnostic or prognostic evaluation, and to the tissue specimen itself. Any biopsy technique known in the art can be applied to the diagnostic and prognostic methods of the present invention. The biopsy technique applied will depend on the tissue type to be evaluated (e.g., cervix, vagina, tongue, colon, prostate, kidney, bladder, lymph node, liver, bone marrow, blood cell, stomach tissue, etc. ) among other factors. Representative biopsy techniques include, but are not limited to, a swab biopsy, excisional biopsy, incisional biopsy, needle biopsy, surgical biopsy, and bone marrow biopsy and may comprise colonoscopy. A wide range of biopsy techniques are well known to those skilled in the art who will choose between them and implement them with minimal experimentation.
  • isolated nucleic acid molecule means a nucleic acid molecule that is separated from other nucleic acid molecules that are usually associated with the isolated nucleic acid molecule.
  • an "isolated” nucleic acid molecule includes, without limitation, a nucleic acid molecule that is free of nucleotide sequences that naturally flank one or both ends of the nucleic acid in the genome of the organism from which the isolated nucleic acid is derived (e.g., a cDNA or genomic DNA fragment produced by PCR or restriction endonuclease digestion) .
  • an isolated nucleic acid molecule can be introduced into a vector (e.g., a cloning vector or an expression vector) for convenience of manipulation or to generate a fusion nucleic acid molecule.
  • an isolated nucleic acid molecule can include an engineered nucleic acid molecule such as a recombinant or a synthetic nucleic acid molecule.
  • nucleic acid refers to deoxyribonucleic acids (DNA) or ribonucleic acids (RNA) and polymers thereof in either single-or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogs of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides.
  • nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) , alleles, orthologs, single nucleotide polymorphisms (SNPs) , and complementary sequences as well as the sequence explicitly indicated.
  • degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucleic Acid Res. 19: 5081 (1991) ; Ohtsuka et al., J. Biol. Chem.
  • nucleic acid is used interchangeably with gene, cDNA, and mRNA encoded by a gene.
  • polypeptide, ” “peptide, ” and “protein” are used interchangeably herein to refer to a polymer of amino acid residues.
  • the terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers and non-naturally occurring amino acid polymers.
  • the terms encompass amino acid chains of any length, including full-length proteins (i.e., antigens) , wherein the amino acid residues are linked by covalent peptide bonds.
  • amino acid refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids.
  • Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, ⁇ -carboxyglutamate, and O-phosphoserine.
  • amino acid analogs refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., an a carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid.
  • amino acid mimetics refers to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that functions in a manner similar to a naturally occurring amino acid.
  • Amino acids may include those having non-naturally occurring D-chirality, as disclosed in WO01/12654, which may improve the stability (e.g., half-life) , bioavailability, and other characteristics of a polypeptide comprising one or more of such D-amino acids. In some cases, one or more, and potentially all of the amino acids of a therapeutic polypeptide have D-chirality.
  • Amino acids may be referred to herein by either the commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes.
  • nucleotide or amino acid sequences refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same (for example, a variant of a bacterial protein or interest used in the method of this invention (e.g., for predicting adverse pregnancy outcomes) has at least 80%sequence identity, preferably 85%, 90%, 91%, 92%, 93, 94%, 95%, 96%, 97%, 98%, 99%, or 100%identity, to a reference sequence, e.g., a corresponding wild-type bacterial protein of interest) , when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection.
  • a reference sequence e.g., a corresponding wild-type bacterial protein of interest
  • sequences are then said to be “substantially identical. ”
  • this definition also refers to the complement of a test sequence.
  • the identity exists over a region that is at least about 50 amino acids or nucleotides in length, or more preferably over a region that is 75-100 amino acids or nucleotides in length.
  • sequence comparison typically one sequence acts as a reference sequence, to which test sequences are compared.
  • test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated.
  • sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters. For sequence comparison of nucleic acids and proteins, the BLAST and BLAST 2.0 algorithms and the default parameters discussed below are used.
  • a “comparison window” includes reference to a segment of any one of the number of contiguous positions selected from the group consisting of from 20 to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned.
  • Methods of alignment of sequences for comparison are well-known in the art. Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith &Waterman, Adv. Appl. Math. 2: 482 (1981) , by the homology alignment algorithm of Needleman &Wunsch, J. Mol. Biol.
  • HSPs high scoring sequence pairs
  • T is referred to as the neighborhood word score threshold (Altschul et al., supra) .
  • These initial neighborhood word hits acts as seeds for initiating searches to find longer HSPs containing them.
  • the word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased.
  • Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always>0) and N (penalty score for mismatching residues; always ⁇ 0) .
  • M forward score for a pair of matching residues
  • N penalty score for mismatching residues
  • Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached.
  • the BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment.
  • the BLASTP program uses as defaults a word size (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see, e.g., Henikoff and Henikoff, Proc. Natl. Acad. Sci. USA 89: 10915 (1989) ) .
  • the BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul, Proc. Nat’ l. Acad. Sci. USA, 90: 5873-5787 (1993) ) .
  • One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N) ) , which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance.
  • P(N) the smallest sum probability
  • a nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001.
  • nucleic acid sequences or polypeptides are substantially identical is that the polypeptide encoded by the first nucleic acid is immunologically cross reactive with the antibodies raised against the polypeptide encoded by the second nucleic acid, as described below.
  • a polypeptide is typically substantially identical to a second polypeptide, for example, where the two peptides differ only by conservative substitutions.
  • Another indication that two nucleic acid sequences are substantially identical is that the two molecules or their complements hybridize to each other under stringent conditions, as described below.
  • Yet another indication that two nucleic acid sequences are substantially identical is that the same primers can be used to amplify the sequence.
  • stringent hybridization conditions and “high stringency” refer to conditions under which a probe will hybridize to its target subsequence, typically in a complex mixture of nucleic acids, but to no other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures.
  • stringent conditions are selected to be about 5-10 °C lower than the thermal melting point (T m ) for the specific sequence at a defined ionic strength pH.
  • T m is the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50%of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at T m , 50%of the probes are occupied at equilibrium) .
  • Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide.
  • a positive signal is at least two times background, preferably 10 times background hybridization.
  • Exemplary stringent hybridization conditions can be as following: 50%formamide, 5 x SSC, and 1%SDS, incubating at 42 °C, or, 5 x SSC, 1%SDS, incubating at 65 °C, with wash in 0.2 x SSC, and 0.1%SDS at 65°C.
  • Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides which they encode are substantially identical. This occurs, for example, when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. In such cases, the nucleic acids typically hybridize under moderately stringent hybridization conditions.
  • Exemplary "moderately stringent hybridization conditions" include a hybridization in a buffer of 40%formamide, 1 M NaCl, 1%SDS at 37°C, and a wash in 1x SSC at 45°C. A positive hybridization is at least twice background.
  • Alternative hybridization and wash conditions can be utilized to provide conditions of similar stringency. Additional guidelines for determining hybridization parameters are provided in numerous references, e.g., Current Protocols in Molecular Biology, ed. Ausubel, et al.
  • an “increase” or a “decrease” refers to a detectable positive or negative change in quantity from a comparison control, e.g., an established standard control (such as an average expression level of a bacterial mRNA or protein found in normal cervical or vaginal tissue from a pregnant control subject) .
  • An increase is a positive change that is typically at least 10%, or at least 20%, or 50%, or 100%, and can be as high as at least 2-fold or at least 5-fold or even 10-fold of the control value.
  • a decrease is a negative change that is typically at least 10%, or at least 20%, 30%, or 50%, or even as high as at least 80%or 90%of the control value.
  • a “polynucleotide hybridization method” as used herein refers to a method for detecting the presence and/or quantity of a pre-determined polynucleotide sequence based on its ability to form Watson-Crick base-pairing, under appropriate hybridization conditions, with a polynucleotide probe of a known sequence. Examples of such hybridization methods include Southern blot, Northern blot, and in situ hybridization.
  • Primers refer to oligonucleotides that can be used in an amplification method, such as a polymerase chain reaction (PCR) , to amplify a nucleotide sequence based on the polynucleotide sequence corresponding to a gene of interest, e.g., the cDNA or genomic sequence for a specific bacterial gene or a portion thereof.
  • PCR polymerase chain reaction
  • at least one of the PCR primers for amplification of a polynucleotide sequence is sequence-specific for that polynucleotide sequence. The exact length of the primer will depend upon many factors, including temperature, source of the primer, and the method used.
  • the oligonucleotide primer typically contains at least 10, or 15, or 20, or 25 or more nucleotides, although it may contain fewer nucleotides or more nucleotides.
  • the factors involved in determining the appropriate length of primer are readily known to one of ordinary skill in the art.
  • primer pair means a pair of primers that hybridize to opposite strands a target DNA molecule or to regions of the target DNA which flank a nucleotide sequence to be amplified.
  • primer site means the area of the target DNA or other nucleic acid to which a primer hybridizes.
  • label, ” “detectable label, ” or “ ” detectable moiety” is a composition detectable by spectroscopic, photochemical, biochemical, immunochemical, chemical, or other physical means.
  • useful labels include 32 P, fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA) , biotin, digoxigenin, or haptens and proteins that can be made detectable, e.g., by incorporating a radioactive component into the peptide or used to detect antibodies specifically reactive with the peptide.
  • a detectable label is attached to a probe or a molecule with defined binding characteristics (e.g., a polypeptide with a known binding specificity or a polynucleotide) , so as to allow the presence of the probe (and therefore its binding target) to be readily detectable.
  • defined binding characteristics e.g., a polypeptide with a known binding specificity or a polynucleotide
  • Standard control refers to a predetermined amount or concentration of bacteria belonging to a specific bacterial genus, a bacterial polynucleotide or a bacterial polypeptide that is present in an established normal tissue sample, e.g., a normal cervical tissue sample.
  • the standard control value is suitable for the use of a method of the present invention, to serve as a basis for comparing the amount of a specific bacterial genus, mRNA or protein that is present in a test sample.
  • An established sample serving as a standard control provides an average amount of the bacterial genus, mRNA or protein that is typical for a cervical tissue sample of an average, healthy pregnant human with, for example, a closed cervix or normal-length cervix, as conventionally defined.
  • a standard control value may vary depending on the nature of the sample, the manner of sample collection, as well as other factors such as the gender, age, ethnicity of the subjects (and in the case of pregnant women, gestational age) based on whom such a control value is established.
  • the selected group of pregnant humans generally have a similar gestational-age to that of a subject whose cervical tissue sample is tested for indication of a risk of having an adverse pregnancy or neonatal outcome.
  • other factors such as age, the status of receiving the same or similar kind of intervention (e.g., pessary/cerclage intervention) , ethnicity, medical history are also considered and preferably closely matching between the profiles of the test subject and the selected group of individuals establishing the “average” value.
  • amount refers to the quantity of a bacterial taxon of interest, a bacterial polynucleotide of interest or a bacterial polypeptide of interest present in a sample. Such quantity may be expressed in the absolute terms, i.e., the total quantity of the bacterial taxon, polynucleotide or polypeptide in the sample, or in the relative terms, i.e., the concentration of the bacterial taxon, polynucleotide or polypeptide in the sample.
  • subject includes individuals who seek medical attention due to a potential risk of having an adverse pregnancy outcome or neonatal outcome, e.g., any pregnant individual. Subjects also include individuals who have had an adverse pregnancy or neonatal outcome during a prior pregnancy.
  • the invention is based, in part, on the discovery of differentially abundant bacterial taxa in the cervical swab samples of women with advanced cervical dilation/cervical shortening, compared with those in appropriately-controlled samples from appropriately-matched women without the corresponding condition.
  • bacteria taxa e.g., Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having at least 92%sequence identity to the nucleotide sequence of GenBank Accession No.
  • bacterial taxa e.g., Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bull
  • JQ781443.1 Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, uncultured bacteria having at least 90%sequence identity to the nucleotide sequence of GenBank Accession No.
  • nucleic acids sizes are given in either kilobases (kb) or base pairs (bp) . These are estimates derived from agarose or acrylamide gel electrophoresis, from sequenced nucleic acids, or from published DNA sequences.
  • kb kilobases
  • bp base pairs
  • proteins sizes are given in kilodaltons (kDa) or amino acid residue numbers. Protein sizes are estimated from gel electrophoresis, from sequenced proteins, from derived amino acid sequences, or from published protein sequences.
  • Oligonucleotides that are not commercially available can be chemically synthesized, e.g., according to the solid phase phosphoramidite triester method first described by Beaucage and Caruthers, Tetrahedron Lett. 22: 1859-1862 (1981) , using an automated synthesizer, as described in Van Devanter et al., Nucleic Acids Res. 12: 6159-6168 (1984) . Purification of oligonucleotides is performed using any art-recognized strategy, e.g., native acrylamide gel electrophoresis or anion-exchange high performance liquid chromatography (HPLC) as described in Pearson and Reanier, J. Chrom. 255: 137-149 (1983) .
  • HPLC high performance liquid chromatography
  • the present invention relates to measuring the amount of bacteria of a specific bacteria taxon found in a pregnant woman’s cervix or vagina, especially in a cervical swab or vaginal swab sample, as a means to assess the risk of having an adverse pregnancy outcome or neonatal outcome, such as preterm labor and preterm delivery.
  • the first steps of practicing this invention are to obtain a cervical or vaginal tissue sample from a test subject, such that the nucleic acids, e.g., RNA or DNA, contained in the sample may be analyzed.
  • the amount of bacteria of a specific bacteria taxon found in a pregnant woman’s cervix or vagina can be represented by the amount of the specific bacteria in a biological sample that is not from the cervix or the vagina.
  • a biological sample such as cervical or vaginal tissue, cervical mucus, amniotic fluid or maternal blood is obtained from a person to be tested or monitored using a method of the present invention.
  • Collection of cervical or vaginal epithelial cells, cervical mucus, amniotic fluid or maternal blood (e.g., material whole blood, maternal serum and/or maternal plasma) from an individual is performed in accordance with the standard protocol hospitals or clinics generally follow, such as during a cervical screening.
  • An appropriate amount of cervical or vaginal epithelium, scraped cells, mucus, and/or biological fluid is collected and may be stored according to standard procedures prior to further preparation.
  • the analysis of the bacteria found in a pregnant patient's sample according to the present invention may be performed using, e.g., cells, tissue, mucosa, or fluids found in the sample.
  • the methods for preparing cell, tissue or fluid samples for nucleic acid extraction are well known among those of skill in the art.
  • a subject's cervical or vaginal mucosa sample can be treated to such that bacterial DNA or RNA in the sample can be analyzed.
  • RNA contamination should be eliminated to avoid interference with DNA analysis.
  • Pretreatment of the biological sample with lysis buffer and enzymes, including mutanolysin and proteinase K, can also be used before the extraction.
  • Methods for detecting target DNA include either PCR analysis, quantitative analysis with fluorescence labelling or Southern blot analysis.
  • the target DNA can be the gene encoding the 16S ribosomal RNA (the 16S rRNA gene) , or other genes or genomic sequences of interest possessed by a specific bacterial taxon.
  • PCR polymerase chain reaction
  • PCR amplification is typically used in practicing the present invention, one of skill in the art will recognize that amplification of the relevant genomic sequence may be accomplished by any known method, such as the ligase chain reaction (LCR) , transcription- mediated amplification, and self-sustained sequence replication or nucleic acid sequence-based amplification (NASBA) , each of which provides sufficient amplification. More recently developed branched-DNA technology may also be used to quantitatively determining the amount of specific bacterial mRNA markers. For a detailed description of branched-DNA signal amplification for direct quantitation of nucleic acid sequences in clinical samples, see, for example, Nolte, Adv. Clin. Chem. 33: 201-235, 1998.
  • LCR ligase chain reaction
  • NASBA nucleic acid sequence-based amplification
  • Additional means suitable for detecting a polynucleotide sequence for practicing the methods of the present invention include but are not limited to mass spectrometry, primer extension, polynucleotide hybridization, real-time PCR, melting curve analysis, high resolution melting analysis, heteroduplex analysis, massively parallel sequencing (e.g., next-gen sequencing) , and electrophoresis.
  • RNA preparation e.g., described by Sambrook and Russell, Molecular Cloning: A Laboratory Manual 3d ed., 2001
  • various commercially available reagents or kits such as Trizol reagent (Invitrogen, Carlsbad, CA) , Oligotex Direct mRNA Kits (Qiagen, Valencia, CA) , RNeasy Mini Kits (Qiagen, Hilden, Germany) , and Poly Series 9600 TM (Promega, Madison, WI) , may also be used to obtain mRNA from a biological sample from a test subject. Combinations of more than one of these methods may also be used.
  • RNA transcripts of interest that is expressed by bacteria of a specific bacterial taxon may be quantified.
  • the amount of 16S ribosomal RNA (rRNA) for a particular bacterial taxon such as, but not limited to, Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having at least 92%sequence identity to the nucleotide sequence of GenBank Accession No.
  • JQ781443.1 Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, uncultured bacteria having at least 90%sequence identity to the nucleotide sequence of GenBank Accession No.
  • RNA transcript level is an amplification-based method, e.g., by polymerase chain reaction (PCR) , especially reverse transcription-polymerase chain reaction (RT-PCR) .
  • PCR polymerase chain reaction
  • RT-PCR reverse transcription-polymerase chain reaction
  • a DNA copy (cDNA) of a bacterial RNA transcript of interest Prior to the amplification step, a DNA copy (cDNA) of a bacterial RNA transcript of interest must be synthesized. This is achieved by reverse transcription, which can be carried out as a separate step, or in a homogeneous reverse transcription-polymerase chain reaction (RT-PCR) , a modification of the polymerase chain reaction for amplifying RNA.
  • RT-PCR homogeneous reverse transcription-polymerase chain reaction
  • PCR PCR reagents and protocols are also available from commercial vendors, such as Roche Molecular Systems.
  • PCR is most usually carried out as an automated process with a thermostable enzyme. In this process, the temperature of the reaction mixture is cycled through a denaturing region, a primer annealing region, and an extension reaction region automatically. Machines specifically adapted for this purpose are commercially available.
  • PCR amplification of the target RNA is typically used in practicing the present invention.
  • amplification of these bacterial RNA species in the sample may be accomplished by any known method, such as ligase chain reaction (LCR) , transcription-mediated amplification, and self-sustained sequence replication or nucleic acid sequence-based amplification (NASBA) , each of which provides sufficient amplification.
  • LCR ligase chain reaction
  • NASBA nucleic acid sequence-based amplification
  • More recently developed branched-DNA technology may also be used to quantitatively determining the amount of specific bacterial RNA markers.
  • the bacterial DNA or RNA transcripts of interest can also be detected using other standard techniques, well-known to those of skill in the art. Although the detection step is typically preceded by an amplification step, amplification is not required in the methods of the invention. For instance, the DNA or RNA may be identified by size fractionation (e.g., gel electrophoresis) , whether or not proceeded by an amplification step.
  • size fractionation e.g., gel electrophoresis
  • the presence of a band of the same size as the standard comparison is an indication of the presence of a target DNA or RNA, the amount of which may then be compared to the control based on the intensity of the band.
  • oligonucleotide probes specific to the DNA or RNA of interest can be used to detect the presence of such DNA or RNA species and indicate the amount of DNA or RNA in comparison to the standard comparison, based on the intensity of signal imparted by the probe.
  • Sequence-specific probe hybridization is a well-known method of detecting a particular nucleic acid comprising other species of nucleic acids. Under sufficiently stringent hybridization conditions, the probes hybridize specifically only to substantially complementary sequences. The stringency of the hybridization conditions can be relaxed to tolerate varying amounts of sequence mismatch.
  • hybridization formats well known in the art, including but not limited to, solution phase, solid phase, or mixed phase hybridization assays.
  • the following articles provide an overview of the various hybridization assay formats: Singer et al., Biotechniques, 4: 230, 1986; Haase et al., Methods in Virology, pp. 189-226, 1984; Wilkinson, In situ Hybridization, Wilkinson ed., IRL Press, Oxford University Press, Oxford; and Hames and Higgins eds., Nucleic Acid Hybridization: A Practical Approach, IRL Press, 1987.
  • the hybridization complexes are detected according to well-known techniques.
  • Nucleic acid probes capable of specifically hybridizing to a target nucleic acid i.e., the RNA or the amplified DNA
  • One common method of detection is the use of autoradiography using probes labeled with 3 H, 125 I, 35 S, 14 C, or 32 P, or the like.
  • the choice of radioactive isotope depends on research preferences due to ease of synthesis, stability, and half-lives of the selected isotopes.
  • labels include compounds (e.g., biotin and digoxigenin) , which bind to anti-ligands or antibodies labeled with fluorophores, chemiluminescent agents, and enzymes.
  • probes can be conjugated directly with labels such as fluorophores, chemiluminescent agents or enzymes. The choice of label depends on sensitivity required, ease of conjugation with the probe, stability requirements, and available instrumentation.
  • probes and primers necessary for practicing the present invention can be synthesized and labeled using well known techniques.
  • Oligonucleotides used as probes and primers may be chemically synthesized according to the solid phase phosphoramidite triester method first described by Beaucage and Caruthers, Tetrahedron Letts., 22: 1859-1862, 1981, using an automated synthesizer, as described in Needham-VanDevanter et al., Nucleic Acids Res. 12:6159-6168, 1984. Purification of oligonucleotides is by either native acrylamide gel electrophoresis or by anion-exchange HPLC as described in Pearson and Regnier, J. Chrom., 255: 137-149, 1983.
  • PCR polymerase chain reaction
  • Additional means suitable for detecting a polynucleotide sequence for practicing the methods of the present invention include but are not limited to mass spectrometry, primer extension, polynucleotide hybridization, real-time PCR, melting curve analysis, high resolution melting analysis, heteroduplex analysis, massively parallel sequencing, and electrophoresis.
  • a group of healthy pregnant women, pregnant women who are not at risk of having an adverse pregnancy outcome or neonatal outcome, or pregnant women who are later confirmed to deliver within the normal time frame of their pregnancy, as conventionally defined is first selected.
  • the group may include a group of pregnant women who have had a full-term labor and delivery.
  • the group of pregnant women may have had a full-term labor and delivery after clinical intervention, such as cerclage or pessary.
  • These individuals are within the appropriate parameters, if applicable, for the purpose of screening for and/or monitoring risk of adverse pregnancy outcomes using the methods of the present invention.
  • the individuals may be of a similar gestational age and comparable health status.
  • the individuals are of similar age, or similar ethnic background.
  • the normal delivery time of the selected individuals will be confirmed later on, and anyone among the selected individuals who turn out to give birth sooner or later than the normal delivery time frame will be excluded from the group to provide data as a “standard control. ”
  • the healthy status of the selected individuals is confirmed by well established, routinely employed methods including but not limited to general physical examination of the individuals and general review of their medical history.
  • the selected group of healthy individuals must be of a reasonable size, such that the average amount/concentration of bacteria of one or more bacterial taxa in the cervical tissue sample obtained from the group can be reasonably regarded as representative of the normal or average level among the general population of healthy pregnant women.
  • the selected group comprises at least 10 pregnant human subjects.
  • an average value for the bacteria of one or more taxa is established based on the individual values found in each subject of the selected healthy control group, this average or median or representative value or profile is considered a standard control. A standard deviation is also determined during the same process. In some cases, separate standard controls may be established for separately defined groups having distinct characteristics such as age, gestational age, or ethnic background.
  • the abundance levels of bacteria belonging to at least 1, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more bacterial taxa can be used to calculate a cervical microbe score.
  • the subject’s score can be compared to a cut-off value established by the scores of the standard control group, and used to determine if the pregnant subject is likely to have an adverse pregnancy outcome.
  • the bacteria belong to at least 3, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more bacteria taxa.
  • the bacteria include Sneathia sanguinegens, Parvimonas micra, Ureaplasma urealyticum (or Ureaplasma parvum) , Atopobium vaginae, Peptoniphilus lacrimalis, Megasphaera cerevisiae, and Paravibacter caecicola.
  • each log 10 (abundance) value for the selected bacteria taxon can be transformed into the linear scale and then added together, after which the total value is log-transformed and expressed as a cervical microbe score in the log 10 scale.
  • a cervical microbe score of greater than 1.15 can indicate that the pregnant subject is at risk of having a spontaneous preterm birth.
  • the pregnant subject is at risk of having a spontaneous preterm birth after intervention, such as cerclage or pessary.
  • a cervical score of greater than 1.15 can indicate that the pregnant subject has an increased likelihood of delivering at less than about 37.0 weeks, e.g., about 36.5 weeks, about 36.0 weeks, about 35.0 weeks, about 34.0 weeks, about 33.0 weeks, about 32.0 weeks, about 31.0 weeks, about 30.0 weeks, about 29.0 weeks, about 28.0 weeks, about 27.0 weeks, about 26.0 weeks, about 25.0 weeks, about 24.0 weeks, about 23.0 weeks, about 22.0 weeks, about 21.0 weeks, or less.
  • the abundance level of a bacterial taxon is determined by quantitative PCR assay or sequence-specific hybridization assay targeting polynucleotide sequences specific to the concerned taxon. In other embodiments, for each biological sample, the abundance level of a bacterial taxon is determined by massive parallel sequencing of the marker gene sequences which serve as proxy of the respective taxon. In some instances, the abundance level of a bacterial taxon in a sample is the normalized read counts of the 16S ribosomal RNA (rRNA) gene sequence representing the concerned taxon. In some cases, the normalized read counts are calculated from the raw read counts using established normalization methods to minimize the technical variation between samples, such as different sequencing depth per sample or different library size per sample. In some cases, the abundance level of a bacterial taxon is expressed in the log 10 scale.
  • the abundance of a selected group of bacterial taxa are combined by addition or multiplication to give the cervical microbe score, as illustrated in the LA6 value in Example 2.
  • the score is calculated in 3 steps: (a) the abundance levels of one sub-selected group of bacterial taxa (e.g., taxa that are significantly more abundant in the dilated cervices than the closed cervices) are combined by addition or multiplication; (b) the abundance levels of another sub-selected group of taxa (e.g., taxa that are significantly more abundant in the closed cervices than the dilated cervices) are also combined by addition or multiplication; and (c) the cervical microbe score is calculated by subtracting or dividing between the sum or product of the first selected group of taxa and the sum or product of the second sub-selected group of taxa.
  • the abundance levels of one sub-selected group of bacterial taxa e.g., taxa that are significantly more abundant in the dilated cervices than the closed cervices
  • the abundance levels of another sub-selected group of taxa e.g., taxa that are significantly more abundant in the closed cer
  • the cervical microbe score is calculated based on the number of kinds of selected bacterial taxa that are present in a sample, as illustrated in the DIBT1 test, DIBT2 test and DIBT3 test in Example 1.
  • the cervical microbe score is the number of selected bacterial taxa, and a subject with score greater than or equal to 1 indicates increased risk of adverse pregnancy or neonatal outcome.
  • the cervical microbiome score is calculated based on the ranking of selected bacterial taxa among all taxa detected in a sample.
  • the invention provides compositions and kits for practicing the methods described herein to assess the level of bacteria from one or more specific taxa in a pregnant subject, which can be used for various purposes such as determining the risk of having an adverse pregnancy or neonatal outcome.
  • Kits for carrying out assays for determining the RNA level of bacteria of a bacterial taxon of interest typically include at least one oligonucleotide useful for specific hybridization with at least one segment of a coding sequence of interest or its complementary sequence.
  • this oligonucleotide is labeled with a detectable moiety.
  • the kits may include at least two oligonucleotide primers that can be used in the amplification of at least one segment of a bacterial DNA or RNA transcript of interest by PCR, particularly by RT-PCR.
  • Kits for carrying out assays for determining the protein level of bacteria of a bacterial taxon of interest typically include at least one antibody useful for specific binding to the target protein amino acid sequence.
  • this antibody is labeled with a detectable moiety.
  • the antibody can be either a monoclonal antibody or a polyclonal antibody.
  • the kits may include at least two different antibodies, one for specific binding to the target protein (i.e., the primary antibody) and the other for detection of the primary antibody (i.e., the secondary antibody) , which is often attached to a detectable moiety.
  • kits also include an appropriate standard control.
  • the standard controls indicate the average value of a target protein or a target mRNA expressed by bacteria from a specific bacterial taxon in the cervical epithelium of healthy, pregnant subjects who are not at risk of having an adverse pregnancy or neonatal outcome.
  • standard control may be provided in the form of a set value.
  • the kits of this invention may provide instruction manuals to guide users in analyzing test samples and assessing the risk of having an adverse pregnancy event, such as preterm delivery, in a test subject.
  • Example 1 Differentially abundant bacterial taxa in the cervix of women with pregnancy-associated complications.
  • PCR polymerase chain reaction
  • Ureaplasmas were also detected in pregnancies of normal uncomplicated outcomes (Gray, Robinson et al., 1992, Gerber, Vial et al., 2003, Perni, Vardhana et al., 2004) . It is possible that Ureaplasmas are just part of the commensal bacteria, or normal flora, residing also in the reproductive tract of women resulting in a term pregnancy and normal neonate. This highlights the limitation of detecting for only a few bacterial species, selected by a candidate approach, in many of the previous studies.
  • the cervical swab investigates a non-invasive sample type, the cervical swab, which is readily obtainable from any pregnant women with or without complications. This facilitates appropriate matching with a normal control group and data comparison between the disease and control groups, unlike amniotic fluid which requires invasive procedures risking fetal loss and not recommended for normal pregnancy unless there is an indication.
  • the cervical swab investigates the cervical swab, which is readily obtainable from any pregnant women at any gestational age. This facilitates not only matching with a control group but also a possibly early sampling and hence detection and treatment, unlike amniotic fluid or the placenta, which are usually obtained after 14 weeks or 37 weeks of gestation, respectively.
  • rRNA 16S ribosomal RNA
  • the purpose of the study described herein was to systematically profile all bacteria and identify a list of bacterial taxa and their partial genomic sequences that are differentially abundant in the cervix of women with advanced cervical dilation or cervical shortening.
  • the study was designed to (i) systematically profile the bacterial taxa in the cervix of women with advanced cervical dilation ( “the dilated cervix” ) , and compare with those in the cervix of appropriately-matched women without this condition ( “the closed cervix” ) ; (ii) to systematically profile the bacterial taxa in the cervix of women with cervical shortening ( "the shortened cervix” ) , and compare with those in the cervix of appropriately-matched women without this condition ( "the normal-length cervix” ) ; (iii) to systematically identify a list of differentially abundant bacterial taxa in the dilated cervix, relative to those in the closed cervix using the data in (i)
  • cervical swab samples To minimize the chance of contamination by the environment, the clinical staff or other parts of the female reproductive tract, the cervical swab sample was collected using a Calgiswab Type III (Puritan, Guilford, Maine, USA) before any other procedures immediately upon opening up of the female reproductive tract by the speculum. To ensure the same anatomical locations were sampled and compared, each cervical swab sample was collected from a fixed position on the peripheral side (the 12 o'clock position facing the clinician) of the external os. To maintain consistency for fair comparison across all samples, a single clinician collected all samples from the dilated cervix and the closed cervix groups.
  • each swab was collected by rotating 360 degrees once. To minimize any increased risk of infecting the participants or her fetus in the uterus, the swabs were collected without touching the cervical mucus plug and were sterile. During the collection of swab sample, extra care was also taken not to touch the labia or any parts of the female reproductive tract other than the external os. To monitor for contamination of bacteria in the operation room, the reagents and collection procedures, another negative control swab was collected in parallel with each cervical swab but without touching the patients.
  • the cervical swab and the negative control swabs were immersed in sterile and nuclease-free water and stored at -80°C until extraction.
  • the swabs were extracted for genomic DNA using an established method (Method 1 in Yuan, Cohen et al., 2012) , which would ensure fair representation of bacterial communities commonly found in the female reproductive tracts.
  • This method involved the pretreatment of the sample by the mutanolysin (Sigma-Aldrich) and a column-based DNA extraction method (QiaAmp DNA Mini Kit, Qiagen) . To minimize any batch variation, all samples were extracted on the same day.
  • PCR amplification and massively parallel sequencing Since the cervical swab samples inevitably would comprise human genomic DNA among the bacterial genomic DNA, we have specifically amplified the 16S rRNA gene, which is commonly possessed by all bacteria, but not by human. To facilitate the amplification of genomic DNA sequences of essentially all bacteria, we have chosen to use a pair of PCR primers, namely V4 and V5, which were complementary to the highly conserved regions 16S rRNA gene (Claesson, Wang et al. 2010) . We have checked using the Ribosomal Database Project (RDP) (Wang, Garrity et al.
  • RDP Ribosomal Database Project
  • the sequences of the forward and reverse primers are 5’ - [Primer A Key sequence] [MID sequence] AYT GGG YDT AAA GNG-3’ (SEQ NO ID : 1) , and 5’ - [Primer B-Key] CCG TCA ATT YYT TTR AGT TT-3’ (SEQ ID NO: 2) , respectively, where Primer A Key sequence, Primer B Key sequence and MID sequence are described in the "454 Sequencing System Guidelines for Amplicon Experimental Design July 2011" for the massively parallel sequencing platform GX-FLX 454 Titanium (Roche) .
  • PCR was performed as a 50- ⁇ L reaction with 2.5 units of the FastStart Taq DNA polymerase (FastStart HiFi PCR System dNTPack, Roche) , 4 mM MgCl 2 , 100 nM of each primer and 200 ⁇ M dNTPs. All PCR were run on a PTC-100 thermal cycler (Bio-Rad) using the following thermocycling conditions: 95°C for 2 minutes, followed by 33 cycles of 95°C for 30 seconds, 40°C for 30 seconds, and 72°C for 1 minute, with a final extension at 72°C for 5 minutes and 25°C for 5 minutes. We then subjected the PCR product to electrophoresis.
  • FastStart HiFi PCR System dNTPack Roche
  • All PCR were run on a PTC-100 thermal cycler (Bio-Rad) using the following thermocycling conditions: 95°C for 2 minutes, followed by 33 cycles of 95°C for 30 seconds, 40°C for 30 seconds, and 72
  • raw sequencing data were denoised at the flowgram level, using an implementation of Pyronoise (Quince, Lanzen et al., 2011) on mothur (Schloss, Westcott et al., 2009) .
  • Raw reads were flowgram-denoised, quality-and length-filtered, chimera-removed, aligned, pre-clustered and clustered into operational taxonomic units, which were then taxonomically classified, based on the Ribosomal Database Project (RDP) training set (v9, 2012) .
  • RDP Ribosomal Database Project
  • Metastats which is specially designed for this type of sequencing data (White, Nagarajan et al. 2009) .
  • Metastats features a non-parametric T-test and a heuristic to use Fisher exact test if a certain taxon appears at an average of less than 1 read per sample (the so-called sparse count problem which poses challenge for detecting significant changes in this type of data) .
  • DIBT1 and DIBT2 may use massively parallel genomic sequencing data or, more preferably, data from species-specific PCR for detecting the presence of a given bacterial taxon. If any of these 9 or 15 differentially increased taxa was present in that sample, we defined it as DIBT1 positive or DIBT2 positive, respectively. Otherwise, if all of these 9 or 15 taxa are absent in a sample, we defined it as DIBT1 negative or DIBT2 negative, respectively
  • DIBT1 or DIBT2 To explore the association between DIBT1 or DIBT2 and adverse pregnancy or neonatal outcomes, we performed the Fisher exact test. To explore the potential of DIBT1 or DIBT2 in predicting key pregnancy and neonatal outcomes, we calculated the true positives, false positives, false negatives and true negatives of DIBT1 or DIBT2 in predicting these outcomes (Tables 5A and 5B, respectively) . Also, we calculated the sensitivity, specificity, positive and negative predictive values of DIBT1 and DIBT2 in predicting these outcomes (Tables 5A and 5B, respectively) .
  • DIBT1 intraventriclar hemorrhage
  • 17 taxa (Table 7) were significantly increased and 7 taxa (not shown) were significantly decreased in the short cervix group, relative to the normal-length cervix group.
  • the 16S rRNA genomic sequences are listed in Table 8.
  • the nearest species classification of these 17 genomic sequences based on the BLAST nucleotide alignment against the 16S ribosomal RNA database (performed using the NCBI BLAST website in June 2014) are listed in Table 9.
  • DIBT Differentially Increased Bacteria Test
  • DIBT3 may use massively parallel genomic sequencing data or, more preferably, data from species-specific PCR for detecting the presence of a given bacterial taxon. If any of these 17 differentially increased taxa was present in that sample, we defined it as DIBT3 positive. Otherwise, if all of these 17 taxa are absent in a sample, we defined it as DIBT3 negative, respectively. The association of the DIBT3 positive results and adverse pregnancy or neonatal outcome is tabulated in Table 10.
  • DIBT3 premature cervical dilation
  • the bacterial markers provided herein were used to accurately predict adverse pregnancy outcomes and neonatal outcomes based on a molecular test performed as early as 13 weeks of gestation (Tables 5A, 5B and 10) .
  • the method described herein facilitates early intervention, such as close monitoring or timely transfer to a tertiary treatment unit with neonatal intensive care.
  • Example 2 Cervical microbiome signature for the identification of cervical insufficiency patients resulting in spontaneous preterm birth after clinical intervention
  • Cervical insufficiency is a risk factor for preterm birth (PTB) , which is associated with neonatal morbidity and perinatal death. It is manifested in the affected women as having a prematurely dilated (cervical dilation, 1 cm -5 cm) or shortened cervix (cervical length ⁇ 25 mm) in the second, instead of the third, trimester.
  • CI patients with intraamniotic infection resulted in a 4-fold higher rate of PTB ⁇ 34 weeks after cerclage intervention (Romero et al., Am J Obstet Gynecol 167, 1086-1091 (1992) ) , compared with those receiving the same intervention but with no IAI.
  • Patients who received cerclage intervention only if IAI was ruled out resulted in a lower rate of PTB ⁇ 34 weeks, compared to those who received the intervention without testing for IAI (Mays et al., Obstet Gynecol 95, 652-655 (2000) ) .
  • IAI is highly prevalent (38%-51%) in CI patients (Romero et al., Am J Obstet Gynecol 167, 1086-1091 (1992) ; Mays et al., Obstet Gynecol 95, 652-655 (2000) )
  • experts have suggested clinicians to consider ruling out IAI using pre-cerclage amniocentesis to detect for microorganisms (Berghella et al., Am J Obstet Gynecol 209, 181-192 (2013) ; Airoldi et al., Am J Perinatol 26, 63-68 (2009) ) .
  • Massively parallel sequencing has facilitated a culture-independent and hence provided more sensitive and comprehensive view of microorganisms colonizing different body sites.
  • the placenta which is located inside the amniotic cavity has been thought to be sterile.
  • a MPS-based microbiome and metagenomic study published in this journal has revealed that the placenta harbors a low biomass microbiome that varies in association with a remote history of maternal antenatal infection and preterm birth (Aagaard et al., Sci Transl Med 6, 237ra265 (2014) ) .
  • the present study was performed to investigate the association between the antenatal cervical microbiome and the outcome of preterm birth from CI patients.
  • RDS respiratory distress syndrome
  • BPD bronchopulmonary dysplasia
  • IVH intraventricular haemorrhage
  • ROP retinopathy of prematurity
  • FOG. 1A perinatal mortality
  • Each cervical swab sample collected before the clinical intervention was subjected to extraction of bacterial DNA, PCR amplification of the 16S ribosomal RNA (rRNA) gene and massively parallel sequencing of the amplicon (Titanium, GS-FLX 454, Roche) .
  • the PCR primers targeting the V4 and V5 regions of the 16S rRNA gene could amplify over 9,600 well-established bacteria of known 16S rRNA sequences for analysis (Claesson et al., Nucleic Acids Res 38, e200 (2010) ) . This scope was wider than most of the published microbiome studies of the female reproductive tract.
  • the processed reads from all samples with at least 97%sequence identity were clustered as one operational taxonomic unit (Otu, i.e., a bacterial taxon) . Totally, 152 bacterial taxa were detected in all 25 cervices.
  • CCS Cumulative Sum Scaling
  • FIG. 1B shows the 10 most abundant bacterial taxa observed in the 10 “sPTB after intervention” cervices. Contrary to a healthy female reproductive tract predominated by Lactobacilli, a member of the Gardnerella genus (Otu 4) has been identified as the most abundant bacterial taxa in the “sPTB after treatment” cervices [FIG. 1B, row #1, i.e., the taxon with the greatest total log (abundance) values in the 10 “sPTB after intervention” cervices] .
  • LA6 value refers to the log10 (total abundance of the 6 differentially abundant taxa) .
  • Parvimonas Otu 16
  • Ureaplasma Otu 56
  • Atopobium Otu 42
  • Peptoniphilus Otu 28
  • Megasphaera Otu 47
  • Paraeggerthella Otu 40
  • the median values of LA6 were 2.61 and 0.78 in the "sPTB after treatment” and the “TB after treatment” groups, respectively (FIG. 2A) .
  • the median LA6 values are shown to be increased by 3.36-fold in the former group (Mann-Whitney, p ⁇ 0.0001) .
  • ROC receiver-operating characteristics
  • LA6-positive patients delivered earlier after clinical intervention than the LA6-positive patients (FIG. 2B, median gestational age at delivery of 23.7 weeks vs. 38.4 weeks; 95%confidence interval, 20.6 weeks -25.4 weeks vs. 38.0 weeks -38.7 weeks; Chi-squared, 32.352; df, 1; Logrank test, p ⁇ 0.0001; hazard ratio, 6.24; 95%confidence interval, 1.50 to 25.9; MedCalc, version 14.12) .
  • LA6-positive patients delivered for a shorter period after intervention than their LA6-negative counterparts (FIG. 2C median number days between intervention and delivery, 10 days vs. 126 days; 95%confidence interval, 8 days -32 days vs. 112 days -134 days; Chi-squared, 32.520; df, 1; Logrank test, p ⁇ 0.00001; hazard ratio, 6.34; 95%confidence interval, 1.51 to 26.6)
  • LA6 The cervical microbiome signature identified in this study, namely LA6, has been illustrated to provide prognostic information after cerclage/pessary intervention.
  • CI patients tested positive for LA6 are at an increased risk to deliver at a significantly earlier gestational age after clinical intervention than those tested negative (23.7 weeks vs. 38.4 weeks; hazard ratio, 6.24; 95%confidence interval, 1.50 to 25.9) .
  • LA6-positive patients are also at an increased risk to deliver much sooner after intervention than their LA6-negative counterparts (10 days vs. 126 days; hazard ratio, 6.34; 95%confidence interval, 1.51 to 26.6) .
  • a cervical swab sample was collected from the CI patient by rotating a sterile Dacron swab 360° once on the external os.
  • the swab was obtained from the os at the 12 o’ clock position facing the clinician.
  • the swab sample was collected immediately upon opening up of the reproductive tract by speculum, and before any other procedures. Antiseptic techniques were applied. Special care was taken to avoid the swab to come into contact with any part of the reproductive tract (e.g., vagina, labia) , other than that designated location of the cervix.
  • Each Otu is taxonomically classified at the genus level using the Ribosomal Database Project (RDP) Bayesian rRNA Classifier (Version 2.9, September 2014, RDP 16S rRNA training set 10) . Lactobacillus are further matched against the 16S rRNA database (GenBank) using BLAST (highest score) and MOLE-BLAST (best multiple-alignment of BLAST matches) for deriving the species information.
  • RDP Ribosomal Database Project
  • Table 2A Sequencing data of the differentially abundant (significantly increased) bacterial taxa identified by comparing all 342 taxa in the 19 dilated vs. the 13 closed cervices (normalized by random subsampling) .
  • Table 2B Sequencing data of the differentially abundant (significantly decreased) bacterial taxa identified by comparing all 342 taxa in the 19 dilated vs. the 13 closed cervices (normalized by random subsampling) .
  • Table 2C Sequencing data of the differentially abundant (significantly increased) bacterial taxa identified by comparing all 349 taxa in the 10 dilated vs. the 10 closed cervices (1:1 matched by gestational age at sample collection, matched to ⁇ 2 weeks, normalized as ratio of total read count for each sample. ) .
  • Table 3A Genus level classification of the differentially abundant (significantly increased) bacterial taxa identified by comparing all 342 taxa in the 19 dilated vs. the 13 closed cervices (normalized by random subsampling) .
  • Table 3D Genus level classification of the differentially abundant (significantly decreased) bacterial taxa identified by comparing all 349 taxa in the 10 dilated vs. the 10 closed cervices (1:1 matched by gestational age at sample collection, matched to ⁇ 2 weeks, normalized as ratio of total read count for each sample. ) .
  • Table 4A Sequences of the 16S rRNA gene of bacterial taxa of the differentially abundant (significantly increased) bacterial taxa identified by comparing all 342 taxa in the 19 dilated vs. the 13 closed cervices (normalized by random subsampling) .
  • d-c-019 (SEQ ID NO: 3) ; d-c-030 (SEQ ID NO: 4) ; d-c-037 (SEQ ID NO: 5) ; d-c-040 (SEQ ID NO: 6) ; d-c-043 (SEQ ID NO: 7) ; d-c-045 (SEQ ID NO: 8) ; d-c-038 (SEQ ID NO: 9) ; d-c-047 (SEQ ID NO: 10) ; d-c-054 (SEQ ID NO: 11) .
  • Table 4B Sequences of the 16S rRNA gene of bacterial taxa of the differentially abundant (significantly decreased) bacterial taxa identified by comparing all 342 taxa in the 19 dilated vs. the 13 closed cervices.
  • Table 4C Sequence of the 16S rRNA gene of bacterial taxa of the differentially abundant (significantly increased bacterial taxa identified by comparing all 349 taxa in the 10 dilated vs. the 10 closed cervices (1:1 matched by gestational age at sample collection, matched to ⁇ 2 weeks, normalized as ratio of total read count for each sample) .
  • d-c-012 (SEQ ID NO: 19) ; d-c-030 (SEQ ID NO: 20) ; d-c-040 (SEQ ID NO: 21) ; d-c-047 (SEQ ID NO: 22) ; d-c-050 (SEQ ID NO: 23) ; d-c-053 (SEQ ID NO: 24) ; d-c-068 (SEQ ID NO: 25) ; d-c-071 (SEQ ID NO: 26) ; d-c-072 (SEQ ID NO: 27) ; d-c-081 (SEQ ID NO: 28) ; d-c-015 (SEQ ID NO: 29) ; d-c-0837 (SEQ ID NO: 30) ; d-c-087 (SEQ ID NO: 31) ; d-c-088 (SEQ ID NO: 32) ; d-c-105 (SEQ ID NO: 33) .
  • Table 4D Sequences of the 16S rRNA gene of bacterial taxa differentially abundant (significantly decreased) bacterial taxa identified by comparing all 349 taxa in the 10 dilated vs. the 10 closed cervices (1:1 matched by gestational age at sample collection, matched to ⁇ 2 weeks, normalized as ratio of total read count for each sample. ) .
  • d-c-018 (SEQ ID NO: 34) ; d-c-052 (SEQ ID NO: 35) ; d-c-074 (SEQ ID NO: 36) ; d-c-076 (SEQ ID NO: 37) ; d-c-082 (SEQ ID NO: 38) .
  • Table 4E Species level classification by BLAST alignment to the 16S rRNA database of NCBI. Differentially abundant (significantly increased) bacterial taxa identified by comparing all 342 taxa in the 19 dilated vs. the 13 closed cervices (normalized by random subsampling) were aligned.
  • Table 4F Species level classification by BLAST alignment to the 16S rRNA database of NCBI. Differentially abundant (significantly decreased) bacterial taxa identified by comparing all 342 taxa in the 19 dilated vs. the 13 closed cervices.
  • Table 4G Species level classification by BLAST alignment to the 16S rRNA database of NCBI. Differentially abundant (significantly increased) bacterial taxa identified by comparing all 349 taxa in the 10 dilated vs. the 10 closed cervices (1:1 matched by gestational age at sample collection, matched to ⁇ 2 weeks, normalized as ratio of total read count for each sample. ) were aligned.
  • Table 4H Species level classification by BLAST alignment to the 16S rRNA database of NCBI. Differentially abundant (significantly decreased) bacterial taxa identified by comparing all 349 taxa in the 10 dilated vs. the 10 closed cervices (1:1 matched by gestational age at sample collection, matched to ⁇ 2 weeks, normalized as ratio of total read count for each sample. ) are aligned.
  • NA represents no significant BLAST match in the 16S ribosomal RNA database, and may represent novel, previously unreported, species, which are now identified by sequencing in this study.
  • * represents taxon with identity ⁇ 97%and may represent novel, previously unreported, species, which are now identified by sequencing in this study.
  • Table 5A Association between the proposed test, DIBT1, and adverse pregnancy or neonatal outcomes. Sample with one or more selected taxa in the DIBT1 is considered as tested positive.
  • Table 5B Association between the proposed test, DIBT2, and adverse pregnancy or neonatal outcomes. Sample with one or more selected taxa in the DIBT2 is considered as tested positive.
  • s-n-006 (SEQ ID NO: 39) ; s-n-007 (SEQ ID NO: 40) ; s-n-008 (SEQ ID NO: 41) ; s-n-012 (SEQ ID NO: 42) ; s-n-014 (SEQ ID NO: 43) ; s-n-022 (SEQ ID NO: 44) ; s-n-024 (SEQ ID NO: 45) ; s-n-025 (SEQ ID NO: 46) ; s-n-027 (SEQ ID NO: 47) ; s-n-028 (SEQ ID NO: 48) ; s-n-029 (SEQ ID NO: 49) ; s-n-030 (SEQ ID NO: 50) ; s-n-046 (SEQ ID NO: 51) ; s-n-054 (SEQ ID NO: 52) ; s-n-063 (SEQ ID NO: 53) ; s
  • Table 9 Species level classification of by BLAST alignment to the 16S rRNA database at NCBI.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Zoology (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

It provides a method for predicting the risk of an adverse pregnancy or neonatal outcome for a pregnant subject by detecting the elevated level of bacteria from one or more selected bacterial taxa (e.g., genera or species). A kit useful for such a method is also provided. In addition, it provides a method for determining the risk of having advanced cervical dilation and/or premature cervical shortening based on differentially abundant bacterial taxa.

Description

DETECTING BACTERIAL TAXA FOR PREDICTING ADVERSE PREGNANCY OUTCOMES
CROSS-REFERENCES TO RELATED APPLICATIONS
This application claims priority to U.S. Provisional Application No. 62/018,920, filed June 30, 2014, the disclosure of which is hereby incorporated by reference in its entirety for all purposes.
BACKGROUND OF THE INVENTION
In a normal uncomplicated pregnancy, the cervix is long and closed until the late third trimester, when it will eventually shorten and dilate around the time when the fetus fully develops and is ready for birth. Contrastingly, in a pregnancy complicated by cervical shortening or advanced cervical dilation, the cervix progresses to shortening or dilation, respectively, well ahead of the normal schedule above. Consequently, certain women with these conditions will result in birth sooner than thirty-seven gestational weeks, when fetal development is incomplete. Hence, this may lead to neonatal mortality and morbidity. To prolong such pregnancy, the clinicians may place a cerclage or a cervical pessary to support the cervix. Based primarily on consensus and expert opinion, it is recommended that cerclage placement may be beneficial if intra-amniotic infection is ruled out. Similar guidelines are suggested for pessary placement. To rule out infection effectively, a highly sensitive detection method is required.
Currently, the detection of bacteria in a sample relies on culture, microscopy and bacterial species-specific polymerase chain reaction (PCR) assays, which only offer low to moderate sensitivity. Such sensitivity could be improved if we can specifically target the bacterial taxa which are differentially abundant in the abnormal (i.e., short/dilated) cervix, but not those which are only part of the "normal" flora residing in a "normal" cervix. However, to-date, there are limited data on those taxa, because the bacterial taxa in the abnormal cervix have not been systematically profiled and compared with those in the "normal" cervix from appropriately-matched women. Given the prevalence and implications of premature birth, there exists a need for new methods to more accurately detect an increased risk of an adverse pregnancy outcome in women, such that preventive measures may be timely taken to reduce or eliminate the chances of premature birth or neonatal complications. This invention fulfills this and other related needs.
BRIEF SUMMARY OF THE INVENTION
The present disclosure is based, in part, on the discovery that the level of bacteria belonging to a specific group of bacterial taxa (e.g., a bacterial species or genera) in a woman’s cervix is increased in correlation with the likelihood of an adverse pregnancy outcome , such as a premature birth at less than 34 weeks or 37 weeks of gestational age, or an adverse neonatal condition, such as an Apgar score of less than 7 at 1 or 5 minutes, chorioamnionitis, respiratory distress syndrome, bronchopulmonary dysplasia, intraventricular hemorrhage, neonatal sepsis and neonatal death within 7 days of birth. As such, in a first aspect, the present invention provides a method for determining the risk of an adverse pregnancy or neonatal outcome for a subject, e.g., a pregnant woman or a non-pregnant woman. The method includes the steps of (a) detecting in a biological sample taken from the subject the level of bacteria from at least one bacterial taxa selected from the group consisting of Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum, Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1, Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 90%sequence identity to the nucleotide sequence of GenBank Accession No. JF295520.1, Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9; and (b) determining that the subject has an increased risk for an adverse pregnancy or neonatal outcome if the level of bacteria from the at least one bacterial genus is greater than that of a standard control level. In some embodiments, the method includes detecting the level of Sneathia sanguinegens, Parvimonas micra, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Peptoniphilus lacrimalis, Megasphaera cerevisiae, Parvibacter caecicola in a biological sample taken from the subject, and determining that the subject has an increased risk of an adverse pregnancy outcome, if the total level of the bacteria is increased compared to a standard control level.
In some embodiments, adverse pregnancy or neonatal outcome includes preterm birth at<34 weeks, preterm birth at<37 weeks, delivery within about 1-196 days after the biological  sample is taken, delivery within about 1-196 days after a clinical intervention is performed, an Apgar score at 1 minute of<7, an Apgar score at 5 minutes of<7, chorioamnionitis, respiratory distress syndrome, bronchopulmonary dysplasia, intraventricular hemorrhage, neonatal death within 7 days after birth or neonatal sepsis. In some cases, the method includes determining that the subject has a risk of having advanced cervical dilation or premature cervical shortening if the level of bacteria belonging the at least one bacterial taxon is greater than a standard control level
In other embodiments, the method includes detecting the level of Parvimonas micra, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Peptoniphilus lacrimalis, Megasphaera cerevisiae, Parvibacter caecicola in a biological sample taken from the subject, and determining that the subject has an increased risk of an adverse pregnancy outcome, such as delivery within 7 days after cervical intervention (i.e., cerclage/pessary intervention) , if the total level of the bacteria is increased compared to a standard control level. In some cases, the method also includes determining that the subject is at risk of having an infection in the amniotic cavity, uterine cavity, cervix or vagina.
One class of Megasphaera cerevisiae can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_113307.1. In some embodiments, bacteria of the taxon Megasphaera cerevisiae are detected as having 16S rRNA nucleotide sequence with at least 93%or 94%sequence identity to the sequence of GenBank Accession No. NR_113307.1. In some embodiments, bacteria of the taxon Megasphaera cerevisiae are detected as having 16S rRNA nucleotide sequence shown in SEQ ID NO: 5.
Alloscardovia omnicolens bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_042583.1. In some embodiments, bacteria of the taxon Alloscardovia omnicolens are detected as having 16S rRNA genomic sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_042583.1.
Ureaplasma urealyticum bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA genomic sequence of GenBank Accession No. NR_102836.1. In some embodiments, bacteria of the taxon Ureaplasma urealyticum are  detected as having 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_102836.1. Ureaplasma parvum bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA genomic sequence of GenBank Accession No. NR_074176.1. In some embodiments, bacteria of the taxon Ureaplasma parvum are detected as having 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_074176.1. In some embodiments, bacteria of the taxon Ureaplasma parvum are detected as having 16S rRNA nucleotide sequence shown in SEQ ID NO: 7.
Atopobium vaginae bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_117757.1. In some embodiments, bacteria of the taxon Atopobium vaginae are detected as having a 16S rRNA nucleotide sequence with 97%sequence identity to the nucleotide sequence of GenBank Accession No. NR_117757.1. In some embodiments, bacteria of the taxon Atopobium vaginae are detected as having a 16S rRNA nucleotide sequence shown in SEQ ID NO: 8.
Parvibacter caecicola bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_117374.1. In some embodiments, bacteria of the taxon Parvibacter caecicola are detected as having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No. NR_117374.1. In some embodiments, bacteria of the taxon Parvibacter caecicola are detected as having a 16S rRNA nucleotide sequence shown in SEQ ID NO: 9.
Lactobacillus casei bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_075032.1. In some embodiments, bacteria of the taxon Lactobacillus casei are detected as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_075032.1.
Veillonella montpellierensis bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_028839.1. In some embodiments, bacteria of the taxon Veillonella montpellierensis are detected as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_028839.1.
Anaerococcus senegalensis bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_118220.1. In some embodiments, bacteria of the taxon Anaerococcus senegalensis are detected as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_118220.1.
Bulleidia extructa bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_028773.1. In some embodiments, bacteria of the taxon Bulleidia extructa are detected as having 16S rRNA nucleotide sequence with 97%sequence identity to the nucleotide sequence of GenBank Accession No. NR_028773.1.
Mycoplasma hominis bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_113679.1. In some embodiments, bacteria of the taxon Mycoplasma hominis are detected as having a 16S rRNA genomic sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_113679.1.
Propionimicrobium lymphophilum bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_114337.1. In some embodiments, bacteria of the taxon Propionimicrobium lymphophilum are detected as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_114337.1.
One class of uncultured bacteria relevant to the present disclosure can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. JQ781443.1. In some embodiments, bacteria of this taxon have a 16S rRNA nucleotide sequence with 92%sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1.
Corynebacterium pyruviciproducens bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_116569.1. In some embodiments, bacteria of the taxon Corynebacterium pyruviciproducens are detected as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_116569.1.
Another class of Megasphaera cerevisiae can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_113307.1. In some embodiments, bacteria of the taxon Megasphaera cerevisiae are detected as having 16S rRNA nucleotide sequence with at least 93%or 94%sequence identity to the sequence of GenBank Accession No. NR_113307.1. In some embodiments, bacteria of the taxon Megasphaera cerevisiae are detected as having 16S rRNA nucleotide sequence shown in SEQ ID NO: 29.
Acidipila rosea bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_113179.1. In some embodiments, bacteria of the taxon Acidipila rosea are detected as having a 16S rRNA nucleotide sequence with 97%sequence identity to the nucleotide sequence of GenBank Accession No. NR_113179.1.
Murdochiella asaccharolytica bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_116331.1. In some embodiments, bacteria of the taxon Murdochiella asaccharolytica are  detected as having a 16S rRNA nucleotide sequence with 99%sequence identity to the nucleotide sequence of GenBank Accession No. NR_116331.1.
Another class of uncultured bacteria relevant to the present disclosure can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. JF295520.1. In some embodiments, bacteria of this taxon are detected as having a 16S rRNA nucleotide sequence with 90%sequence identity to the nucleotide sequence of GenBank Accession No. JF295520.1.
Howardella ureilytica bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_044022.2. In some embodiments, bacteria of the taxon Howardella ureilytica are detected as having a 16S rRNA nucleotide sequence with 93%sequence identity to the nucleotide sequence of GenBank Accession No. NR_044022.2.
Actinobaculum schaalii bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_116869.1. In some embodiments, bacteria of the taxon Actinobaculum schaalii are detected as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_116869.1.
Peptoniphilus duerdenii bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_116346.1. In some embodiments, bacteria of the taxon Peptoniphilus duerdenii are detected as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_116346.1.
Fastidiosipila sanguinis bacteria can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_042186.1. In some embodiments, bacteria of the taxon Fastidiosipila sanguinis are detected  as having a 16S rRNA nucleotide sequence with 100%sequence identity to the nucleotide sequence of GenBank Accession No. NR_042186.1.
Sneathia sanguinegens can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. AJ344093.1. In some embodiments, bacteria of the taxon Sneathia sanguinegens are detected as having 16S rRNA nucleotide sequence with at least 93%or 94%sequence identity to the sequence of GenBank Accession No. AJ344093.1.
Parvimonas micra can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession No. NR_114338.1. In some embodiments, bacteria of the taxon Parvimonas micra are detected as having 16S rRNA nucleotide sequence with at least 93%, e.g., 93%, 94%, 95%, 96%, 975 or more, sequence identity to the sequence of GenBank Accession No. NR_114338.1. In some embodiments, bacteria of the taxon Parvimonas micra are detected as having 16S rRNA nucleotide sequence shown in SEQ ID NO: 43.
Peptoniphilus lacrimalis can be identified as having a 16S rRNA nucleotide sequence with at least 90%, e.g., at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%sequence identity to the 16S rRNA nucleotide sequence of GenBank Accession Nos. AB971812.1 and NR_041938.1. In some embodiments, bacteria of the taxon Peptoniphilus lacrimalis are detected as having 16S rRNA nucleotide sequence with 100%sequence identity to the sequence of GenBank Accession Nos. AB971812.1 and NR_041938.1. In some embodiments, bacteria of the taxon Peptoniphilus lacrimalis are detected as having 16S rRNA nucleotide sequence shown in SEQ ID NO: 44. In some embodiments, the method includes detecting at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 different bacterial taxa.
In some embodiments, the subject is a pregnant woman at about 13 weeks to about 37 weeks of gestation. In other embodiments, the pregnant woman is between about 13 to about 25 gestational weeks. In some cases, the pregnant woman has a prematurely opened cervix. The woman may be at risk of having an infection in the amniotic cavity, uterine cavity, cervix or vagina. In other embodiments, the subject is a non-pregnant woman. In some embodiments, the  non-pregnant woman is planning for her future pregnancy. In some cases, the subject has a history of preterm birth, stillbirth or miscarriage. In some cases, the subject is planning to receive clinical intervention, such as cerlclage intervention or progesterone supplementation before or after pregnancy.
In some embodiments, the method also includes extracting nucleic acids from the biological sample taken from the subject prior to performing step (a) . In some embodiments, the sample is a cervical swab (including swab sample of the external os) , a vaginal swab (including swab sample of the fornix) , a urine sample, an amniotic fluid sample, a maternal blood sample (maternal whole blood sample) , a maternal serum sample, a maternal plasma sample, or a cervical mucus sample. In some embodiments, the sample is a placental swab, an umbilical swab, or any sample taken directly or indirectly from the reproductive system. This includes any sampling from the surface of the female reproductive tract via scraping, cutting, flushing, douching, applying a stream of gas, a liquid, a vacuum, a suction force, a form of energy (e.g., electrostatic field, LASER) or a gradient of chemicals (e.g., chemoattractants to induce chemotaxis) . In some embodiment, the sample is taken directly or indirectly from the gastrointestinal system of the pregnant subject, including a buccal swab, a throat swab, an anal swab, a rectal swab or stool sample.
In some embodiments, the step of detecting includes a polynucleotide amplification assay. In some instances, the amplification assay is a polymerase chain reaction (PCR) assay. Optionally, the PCR assay can be a quantitative PCR assay. In some cases, the step of detecting includes sequence-specific probe/primer hybridization, which can occur in the absence of polynucleotide amplification. In other embodiments, the step of detecting includes polynucleotide sequence determination, such as but not limited to, massive parallel sequencing.
For instance, if the level of bacteria belonging a bacterial taxa including Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1, Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 90%sequence identity to the nucleotide sequence of GenBank Accession No. JF295520.1, Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila  sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA genomic sequence in Tables 4A, 4C and 8, a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9, or any combination thereof is elevated compared to a standard control, the pregnant subject is at risk of having an adverse pregnancy or neonatal outcome. When an increase in the level of bacteria of one or more of the selected bacterial taxa compared to the standard control is determined, it indicates that the woman has an increased risk of premature birth or delivering a child with a neonatal complication. For example, when the level of bacteria from one or more of the taxa, such as Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1, Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 90%sequence identity to the nucleotide sequence of GenBank Accession No. JF295520.1, Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA genomic sequence in Tables 4A, 4C and 8, and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9, in a cervical or vaginal swab sample from a pregnant woman is detected to be higher compared to the standard control, the woman has an increased likelihood of having an adverse pregnancy and/or neonatal outcome. Possible adverse pregnancy or neonatal outcomes include, but are not limited to, preterm birth at<34 weeks, preterm birth at<37 weeks, an Apgar score at 1 minute of<7, an Apgar score at 5 minutes of<7, chorioamnionitis, respiratory distress syndrome, bronchopulmonary dysplasia, intraventricular hemorrhage, neonatal death within 7 days after birth, and/or neonatal sepsis. Furthermore, if the subject is a pregnant woman, the possible adverse pregnancy or neonatal outcomes further include delivery within a period of about 1-196 days (e.g., 1 day, 7 days, 14 days, 21 days, 28 days, 56 days, 84 days, 112 days, 140 days, 168 days, or 196 days) after the biological sample is taken, delivery within a period of about 1-196 days (e.g., 1 day, 7 days, 14 days, 21 days, 28 days, 56 days, 84 days, 112 days, 140 days, 168 days, or 196 days) after the clinical intervention is performed.
The method provided herein can be used to determine that the pregnant woman has a risk of having advanced cervical dilation or advanced cervical shortening. For instance, if the level of bacteria from at least one bacterial taxon, such as Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1, Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, uncultured bacteria as having a 16S rRNA nucleotide sequence with at least 90%sequence identity to the nucleotide sequence of GenBank Accession No. JF295520.1, Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9 is greater than that of a standard control level, the woman has an increased likelihood of having advanced cervical dilation or premature cervical shortening. Such a subject is predicted to exhibit cervical dilation of, for example, ≥2cm, without active labor between 13 weeks to up to 37 weeks of gestation, or cervical shortening prior to full-term labor and delivery.
In some instances, the method can further include detecting in the biological sample the level of bacteria from at least one bacterial taxon selected from the group consisting of Jonquetella anthropi, Aerococcus urinae, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4B and 4D, and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4F and 4H; and determining that the subject has an increased risk for an adverse pregnancy outcome if the level of bacteria from the at least one bacterial taxon is lower than that of a standard control level.
If the pregnant woman is predicted of having an adverse pregnancy or neonatal outcome, the method can include repeating steps (a) and (b) at a later time using a sample type of the biological sample from the subject, wherein an increase in the level of bacteria from the at least one bacterial taxon at the later time as compared to the level determined in the original step (a) indicates an increased risk of having an adverse pregnancy or neonatal outcome. Furthermore, once a pregnant woman is indicated as having increased risk of experiencing an adverse pregnancy or neonatal outcome, a physician may provide treatment for the woman to  minimize the risk of such adverse outcome. For example, the woman may be closely monitored throughout pregnancy or be timely transferred to a tertiary unit with neonatal intensive care.
In another aspect, the present invention provides a kit for determining the risk of having an adverse pregnancy or neonatal outcome in a subject. The kit can include (a) a standard control that provides a biological sample taken from a pregnant female and containing bacteria belonging to at least one bacterial taxon selected from the group consisting of Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1, Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 90%sequence identity to the nucleotide sequence of GenBank Accession No. JF295520.1, Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8; and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9and (b) one or more agents that specifically and quantitatively identify at least one bacterial taxon selected from the group consisting of Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1, Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 90%sequence identity to the nucleotide sequence of GenBank Accession No. JF295520.1, Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, and a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9. In some embodiments, the one or more agent may include one or more pairs of oligonucleotide primers  that specifically hybridize to and amplify a polynucleotide of the at least one bacterial genus in an amplification assay. In some embodiments, the one or more agents may further include a polynucleotide probe that specifically hybridizes to a polynucleotide sequence of the at least one bacterial taxon. In some instances, the kit includes an instruction manual.
In another aspect, the present invention provides a method for determining whether a pregnant subject has an increased risk of having advanced cervical dilation or premature cervical shortening. The method includes the steps of (a) extracting nucleic acids from a biological sample taken from the subject; (b) detecting in the nucleic acids the level of at least one bacterial taxon selected from the group consisting of Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1, Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 90%sequence identity to the nucleotide sequence of GenBank Accession No. JF295520.1, Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9; and (c) determining that the subject has an increased risk of having advanced cervical dilation or premature cervical shortening if the level of the at least one bacterial taxon is greater than that of a standard control level. For example, a pregnant woman is predicted to be at risk of having a prematurely dilated cervix and/or a prematurely shortened cervix, if it is determined that she has an increased level of bacteria from one or more of the selected bacterial taxa in her cervix or vagina. If she has more bacteria from the group, such as Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92%sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1, Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, uncultured bacteria having a 16S  rRNA nucleotide sequence with at least 90%sequence identity to the nucleotide sequence of GenBank Accession No. JF295520.1, Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9, or any combination thereof in her cervix or vagina at, for example, about 13 to about 25 gestational weeks, she is likely to have a dilated cervix or shortened cervix prior to about 34 to about 37 gestational weeks. At least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 different bacterial taxa can be detected in the sample.
In some embodiments, the biological sample is a cervical swab, a vaginal swab, a urine sample, an amniotic fluid sample, a maternal blood sample, a maternal serum sample, a maternal plasma sample, a cervical mucus sample, a placental swab, an umbilical cord swab or any sample taken directly or indirectly from the reproductive system or the gastrointestinal system. The detecting step can be a polynucleotide amplification assay, an assay involving polynucleotide sequence determination, or an assay involving sequence-specific probe/primer hybridization. In some instances, the amplification assay is a polymerase chain reaction (PCR) assay. Optionally, the PCR assay is a quantitative PCR assay or a reverse-transcriptase PCR assay.
BRIEF DESCRIPTION OF DRAWINGS
FIGS. 1A-1E show the association between clinical outcomes, such as spontaneous preterm birth (sPTB) after intervention and term birth (TB) after intervention, and bacterial taxa. FIG. 1A shows the clinical outcomes of 25 cervical insufficiency (CI) patients after intervention. Each column represents a patient (P1-P25) in ascending order of gestational age (GA) at delivery. Black rectangles: P1-P10 resulting in “spontaneous preterm birth (sPTB) <34 weeks after intervention” . White rectangles: P11-P25 resulting in “term birth (TB) ≥37 weeks after intervention” . Latency, the interval between treatment and delivery. RDS, respiratory distress syndrome. BPD, bronchopulmonary dysplasia. IVH, intraventricular haemorrhage. ROP, retinopathy of prematurity. Neonatal death, death within 28 days after delivery. Y, Yes; n, no; -, not determined. FIGS. 1B and 1C show the 10 most abundant bacterial taxa in the “sPTB after intervention” and “TB after intervention” CI cervices, respectively. Shown values are the log10 of the abundance values (normalized sequencing read counts, the Cumulative Sum Scaling (CSS) method) of each bacterial taxon (row) in the cervical swab sample of each patient (column) .  Normalized read counts are transformed into log (abundance) as noted in FIG. 1E. Each row represents an operational taxonomic unit (Otu) formed by clustering sequences of≥97%identity. Each Otu is taxonomically classified at the genus level using the Ribosomal Database Project (RDP) 
Figure PCTCN2015082044-appb-000001
Bayesian rRNA Classifier (Version 2.9, September 2014, RDP 16S rRNA training set 10) . Lactobacillus are further matched against the 16S rRNA database (GenBank) using BLAST (highest score) and MOLE-BLAST (best multiple-alignment of BLAST matches) for deriving the species information. FIG. 1D shows the differentially abundant bacterial taxa between the “sPTB after intervention” (n=10) and the “TB after intervention” (n=15) groups (Mann-Whitney rank sum test) . Seven taxa remain as differentially abundant after adjustment for multiple testing by the False Discovery Rate (FDR) method (p<0.05 and q-value<0.05, i.e. FDR<5%, in asterisk) . The latter 6 taxa with p<0.01 are further selected for calculating a summary score, namely the log (base 10) total abundance of 6 selected taxa (LA6) value, for each cervical swab sample. Total abundance is the arithmetic sum of abundances of the selected taxa in common (linear) scale.
FIGS. 2A-2C provides statistical analysis and LA6 values for the two groups, e.g., patients with spontaneous preterm birth (sPTB) after intervention and patients with term birth (TB) after intervention. FIG. 2A provides LA6 values (the total abundance of the 6 selected bacterial taxa in logarithmic (base 10) scale) in two groups of cervical insufficiency (CI) patients both receiving clinical intervention but resulting in different outcomes. Cervical swab samples for measuring the LA6 were collected from CI patients before cerclage/pessary intervention. After the treatment, 10 patients resulted in spontaneous preterm birth<34 weeks (the “sPTB after intervention” group, circles) , and 15 patients resulted in term birth≥37 weeks (the “TB after treatment” group, triangles) . The 6 taxa were selected based on their significantly different abundances (p<0.01) between these two groups in the massively parallel sequencing data (FIG. 1D). The long and short horizontal lines of the error bar are drawn to the median and interquartile range, respectively. FIG. 2B shows the proportion of undelivered pregnancies at different gestational period in CI patients with LA6>1.15 (LA6-positive) vs. those with LA6≤1.15 (LA6-negative) . LA6-positive patients delivered earlier after clinical intervention than the LA6-positive patients (median gestational age at delivery of 23.7 weeks vs. 38.4 weeks; 95%confidence interval, 20.6 weeks -25.4 weeks vs. 38.0 weeks -38.7 weeks; Chi-squared, 32.352; df, 1; Logrank test, p<0.0001; hazard ratio, 6.24; 95%confidence interval, 1.50 to 25.9) . The 95%confidence intervals of the percentages of undelivered pregnancies for LA6-positive and  LA6-negative patients are shown as hairlines around the bold solid line and the bold dotted line, respectively. FIG. 2C shows the proportion of undelivered pregnancies at different days after treatment in LA6-positive vs. LA6-negative CI patients. LA6-positive patients delivered for a shorter period after intervention than their LA6-negative counterparts (median number days between intervention and delivery, 10 days vs. 126 days; 95%confidence interval, 8 days-32 days vs. 112 days-134 days; Chi-squared, 32.520; df, 1; Logrank test, p<0.00001; hazard ratio, 6.34; 95%confidence interval, 1.51 to 26.6) .
DEFINITIONS
In this disclosure the term “or” is generally employed in its sense including “and/or” unless the content clearly dictates otherwise.
The term “adverse pregnancy or neonatal outcome” refers to a condition that reduces the chance of delivering/birthing a healthy baby. Non-limiting examples of an adverse pregnancy outcome includes multiple first trimester miscarriages, a second trimester pregnancy loss, preterm birth (e.g., spontaneous or indicated) , preterm pre-clampsia, preterm clampsia, fetal growth restriction, abruption placenta, fetal death/stillbirth, birth defects, Apgar score at 1 minute of<7, Apgar score at 5 minute of<7, clinical chorioamnioitis, pathological chorioamnioitis, neonatal respiratory distress syndrome, neonatal bronchopulmonary dysplasia, neonatal sepsis, neonatal intraventricular hemorrhage, etc.
The term “cervical insufficiency” refers to a condition of the cervix, such as weakening or advanced dilation of the cervix that can lead to second-trimester pregnancy loss or birth. Cervical weakness, premature cervical shortening, premature or advanced cervical dilation, cervical trauma, a structural abnormality of the cervix, or any combination thereof can contribute to cervical insufficiency. Clinical interventions to manage cervical insufficiency include, but are not limited to, progesterone supplementation, cervical cerclage, and cervical pessary.
The term “bacterial taxon” refers to the taxonomy, i.e., the rank-based classification of bacteria. The hierarchical biological classification includes life, domain, kingdom, phylum, class, order, family, genus and species.
The term “biological sample” or “sample” includes sections of tissues such as biopsy and autopsy samples, and frozen sections taken for histologic purposes, or processed forms of any of such samples. Biological samples include a cervical swab, a vaginal swab, a uterine swab,  blood and blood fractions or products (e.g., serum, plasma, platelets, red blood cells, and the like) , sputum or saliva, lymph and tongue tissue, cultured cells, e.g., primary cultures, explants, and transformed cells, stool, urine, a biopsy tissue etc. A biological sample is typically obtained from a eukaryotic organism, which may be a mammal, may be a primate and may be a human subject.
The term “biopsy” refers to the process of removing a tissue sample for diagnostic or prognostic evaluation, and to the tissue specimen itself. Any biopsy technique known in the art can be applied to the diagnostic and prognostic methods of the present invention. The biopsy technique applied will depend on the tissue type to be evaluated (e.g., cervix, vagina, tongue, colon, prostate, kidney, bladder, lymph node, liver, bone marrow, blood cell, stomach tissue, etc. ) among other factors. Representative biopsy techniques include, but are not limited to, a swab biopsy, excisional biopsy, incisional biopsy, needle biopsy, surgical biopsy, and bone marrow biopsy and may comprise colonoscopy. A wide range of biopsy techniques are well known to those skilled in the art who will choose between them and implement them with minimal experimentation.
In this disclosure the term “isolated” nucleic acid molecule means a nucleic acid molecule that is separated from other nucleic acid molecules that are usually associated with the isolated nucleic acid molecule. Thus, an "isolated" nucleic acid molecule includes, without limitation, a nucleic acid molecule that is free of nucleotide sequences that naturally flank one or both ends of the nucleic acid in the genome of the organism from which the isolated nucleic acid is derived (e.g., a cDNA or genomic DNA fragment produced by PCR or restriction endonuclease digestion) . Such an isolated nucleic acid molecule can be introduced into a vector (e.g., a cloning vector or an expression vector) for convenience of manipulation or to generate a fusion nucleic acid molecule. In addition, an isolated nucleic acid molecule can include an engineered nucleic acid molecule such as a recombinant or a synthetic nucleic acid molecule. A nucleic acid molecule existing among hundreds to millions of other nucleic acid molecules within, for example, a nucleic acid library (e.g., a cDNA or genomic library) or a gel (e.g., agarose, or polyacrylamine) containing restriction-digested genomic DNA, is not an "isolated" nucleic acid.
The term “nucleic acid, ” “nucleotide or “polynucleotide” refers to deoxyribonucleic acids (DNA) or ribonucleic acids (RNA) and polymers thereof in either single-or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing  known analogs of natural nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) , alleles, orthologs, single nucleotide polymorphisms (SNPs) , and complementary sequences as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucleic Acid Res. 19: 5081 (1991) ; Ohtsuka et al., J. Biol. Chem. 260: 2605-2608 (1985) ; and Rossolini et al., Mol. Cell. Probes 8:91-98 (1994) ) . The term nucleic acid is used interchangeably with gene, cDNA, and mRNA encoded by a gene.
The terms “polypeptide, ” “peptide, ” and “protein” are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers and non-naturally occurring amino acid polymers. As used herein, the terms encompass amino acid chains of any length, including full-length proteins (i.e., antigens) , wherein the amino acid residues are linked by covalent peptide bonds.
The term “amino acid” refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids. Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, γ-carboxyglutamate, and O-phosphoserine. For the purposes of this application, amino acid analogs refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., an a carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid. For the purposes of this application, amino acid mimetics refers to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that functions in a manner similar to a naturally occurring amino acid.
Amino acids may include those having non-naturally occurring D-chirality, as disclosed in WO01/12654, which may improve the stability (e.g., half-life) , bioavailability, and other characteristics of a polypeptide comprising one or more of such D-amino acids. In some cases, one or more, and potentially all of the amino acids of a therapeutic polypeptide have D-chirality.
Amino acids may be referred to herein by either the commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes.
The terms “identical” or percent “identity, ” in the context of describing two or more polynucleotide or amino acid sequences, refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same (for example, a variant of a bacterial protein or interest used in the method of this invention (e.g., for predicting adverse pregnancy outcomes) has at least 80%sequence identity, preferably 85%, 90%, 91%, 92%, 93, 94%, 95%, 96%, 97%, 98%, 99%, or 100%identity, to a reference sequence, e.g., a corresponding wild-type bacterial protein of interest) , when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection. Such sequences are then said to be “substantially identical. ” With regard to polynucleotide sequences, this definition also refers to the complement of a test sequence. Preferably, the identity exists over a region that is at least about 50 amino acids or nucleotides in length, or more preferably over a region that is 75-100 amino acids or nucleotides in length.
For sequence comparison, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated. The sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters. For sequence comparison of nucleic acids and proteins, the BLAST and BLAST 2.0 algorithms and the default parameters discussed below are used.
A “comparison window” , as used herein, includes reference to a segment of any one of the number of contiguous positions selected from the group consisting of from 20 to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. Methods of alignment of sequences for comparison are well-known in the art. Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith &Waterman, Adv. Appl. Math. 2: 482 (1981) , by the homology alignment algorithm of Needleman &Wunsch, J. Mol. Biol. 48: 443 (1970) , by the search for similarity method of Pearson &Lipman, Proc. Nat’ l. Acad. Sci. USA 85: 2444 (1988) , by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI) , or by manual alignment and visual inspection (see, e.g., Current Protocols in Molecular Biology (Ausubel et al., eds. 1995 supplement) ) .
Examples of algorithms that are suitable for determining percent sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al., (1990) J. Mol. Biol. 215: 403-410 and Altschul et al. (1977) Nucleic Acids Res. 25: 3389-3402, respectively. Software for performing BLAST analyses is publicly available at the National Center for Biotechnology Information website, ncbi. nlm. nih. gov. The algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al., supra) . These initial neighborhood word hits acts as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always>0) and N (penalty score for mismatching residues; always<0) . For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for  nucleotide sequences) uses as defaults a word size (W) of 28, an expectation (E) of 10, M=1, N=-2, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a word size (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see, e.g., Henikoff and Henikoff, Proc. Natl. Acad. Sci. USA 89: 10915 (1989) ) .
The BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul, Proc. Nat’ l. Acad. Sci. USA, 90: 5873-5787 (1993) ) . One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N) ) , which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a nucleic acid is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001.
An indication that two nucleic acid sequences or polypeptides are substantially identical is that the polypeptide encoded by the first nucleic acid is immunologically cross reactive with the antibodies raised against the polypeptide encoded by the second nucleic acid, as described below. Thus, a polypeptide is typically substantially identical to a second polypeptide, for example, where the two peptides differ only by conservative substitutions. Another indication that two nucleic acid sequences are substantially identical is that the two molecules or their complements hybridize to each other under stringent conditions, as described below. Yet another indication that two nucleic acid sequences are substantially identical is that the same primers can be used to amplify the sequence.
The terms "stringent hybridization conditions" and “high stringency” refer to conditions under which a probe will hybridize to its target subsequence, typically in a complex mixture of nucleic acids, but to no other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Techniques in Biochemistry and Molecular Biology--Hybridization with Nucleic Probes, "Overview of principles of hybridization and the strategy of nucleic acid assays" (1993) and will be readily understood by those skilled in the art. Generally, stringent conditions are selected to be about 5-10 ℃ lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength pH. The Tm is the temperature (under defined ionic strength, pH, and nucleic concentration) at which 50%of the probes complementary to the target hybridize to the target  sequence at equilibrium (as the target sequences are present in excess, at Tm, 50%of the probes are occupied at equilibrium) . Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. For selective or specific hybridization, a positive signal is at least two times background, preferably 10 times background hybridization. Exemplary stringent hybridization conditions can be as following: 50%formamide, 5 x SSC, and 1%SDS, incubating at 42 ℃, or, 5 x SSC, 1%SDS, incubating at 65 ℃, with wash in 0.2 x SSC, and 0.1%SDS at 65℃.
Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides which they encode are substantially identical. This occurs, for example, when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. In such cases, the nucleic acids typically hybridize under moderately stringent hybridization conditions. Exemplary "moderately stringent hybridization conditions" include a hybridization in a buffer of 40%formamide, 1 M NaCl, 1%SDS at 37℃, and a wash in 1x SSC at 45℃. A positive hybridization is at least twice background. Those of ordinary skill will readily recognize that alternative hybridization and wash conditions can be utilized to provide conditions of similar stringency. Additional guidelines for determining hybridization parameters are provided in numerous references, e.g., Current Protocols in Molecular Biology, ed. Ausubel, et al.
The phrase “specifically binds” when used in the context of referring to a polynucleotide sequence forming a double-stranded complex with another polynucleotide sequence describes “polynucleotide hybridization” based on the Watson-Crick base-pairing, as provided in the definition for the term “polynucleotide hybridization method. ”
As used in this application, an “increase” or a “decrease” refers to a detectable positive or negative change in quantity from a comparison control, e.g., an established standard control (such as an average expression level of a bacterial mRNA or protein found in normal cervical or vaginal tissue from a pregnant control subject) . An increase is a positive change that is typically at least 10%, or at least 20%, or 50%, or 100%, and can be as high as at least 2-fold or at least 5-fold or even 10-fold of the control value. Similarly, a decrease is a negative change that is typically at least 10%, or at least 20%, 30%, or 50%, or even as high as at least 80%or 90%of the control value. Other terms indicating quantitative changes or differences from a comparative basis, such as "more, " "less, " "higher, " and "lower, " are used in this application in the same fashion as described above. In contrast, the term “substantially the same” or “substantially  lack of change” indicates little to no change in quantity from the standard control value, typically within ± 10%of the standard control, or within ± 5%, 2%, or even less variation from the standard control.
A “polynucleotide hybridization method” as used herein refers to a method for detecting the presence and/or quantity of a pre-determined polynucleotide sequence based on its ability to form Watson-Crick base-pairing, under appropriate hybridization conditions, with a polynucleotide probe of a known sequence. Examples of such hybridization methods include Southern blot, Northern blot, and in situ hybridization.
“Primers” as used herein refer to oligonucleotides that can be used in an amplification method, such as a polymerase chain reaction (PCR) , to amplify a nucleotide sequence based on the polynucleotide sequence corresponding to a gene of interest, e.g., the cDNA or genomic sequence for a specific bacterial gene or a portion thereof. Typically at least one of the PCR primers for amplification of a polynucleotide sequence is sequence-specific for that polynucleotide sequence. The exact length of the primer will depend upon many factors, including temperature, source of the primer, and the method used. For example, for diagnostic and prognostic applications, depending on the complexity of the target sequence, the oligonucleotide primer typically contains at least 10, or 15, or 20, or 25 or more nucleotides, although it may contain fewer nucleotides or more nucleotides. The factors involved in determining the appropriate length of primer are readily known to one of ordinary skill in the art. In this disclosure the term "primer pair" means a pair of primers that hybridize to opposite strands a target DNA molecule or to regions of the target DNA which flank a nucleotide sequence to be amplified. In this disclosure the term "primer site" , means the area of the target DNA or other nucleic acid to which a primer hybridizes.
A “label, ” “detectable label, ” or " ” detectable moiety” is a composition detectable by spectroscopic, photochemical, biochemical, immunochemical, chemical, or other physical means. For example, useful labels include 32P, fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA) , biotin, digoxigenin, or haptens and proteins that can be made detectable, e.g., by incorporating a radioactive component into the peptide or used to detect antibodies specifically reactive with the peptide. Typically a detectable label is attached to a probe or a molecule with defined binding characteristics (e.g., a polypeptide with a known binding specificity or a polynucleotide) , so as to allow the presence of the probe (and therefore its binding target) to be readily detectable.
“Standard control” as used herein refers to a predetermined amount or concentration of bacteria belonging to a specific bacterial genus, a bacterial polynucleotide or a bacterial polypeptide that is present in an established normal tissue sample, e.g., a normal cervical tissue sample. The standard control value is suitable for the use of a method of the present invention, to serve as a basis for comparing the amount of a specific bacterial genus, mRNA or protein that is present in a test sample. An established sample serving as a standard control provides an average amount of the bacterial genus, mRNA or protein that is typical for a cervical tissue sample of an average, healthy pregnant human with, for example, a closed cervix or normal-length cervix, as conventionally defined. A standard control value may vary depending on the nature of the sample, the manner of sample collection, as well as other factors such as the gender, age, ethnicity of the subjects (and in the case of pregnant women, gestational age) based on whom such a control value is established.
The term “average, ” as used in the context of describing a human who is pregnant and not at risk of having an adverse pregnancy or neonatal outcome, as conventionally defined, refers to certain characteristics, especially the amount of bacteria of one or more specific bacterial taxa, found in the person's cervix that are representative of a randomly selected group of pregnant humans who are free of any risk of having an adverse pregnancy or neonatal outcome. This selected group should comprise a sufficient number of humans such that the average amount of bacteria of the specific taxa in the cervix among these individuals reflects, with reasonable accuracy, the corresponding amount of bacteria of the taxa in the general population of healthy, normal, pregnant humans. In addition, the selected group of pregnant humans generally have a similar gestational-age to that of a subject whose cervical tissue sample is tested for indication of a risk of having an adverse pregnancy or neonatal outcome. Moreover, other factors such as age, the status of receiving the same or similar kind of intervention (e.g., pessary/cerclage intervention) , ethnicity, medical history are also considered and preferably closely matching between the profiles of the test subject and the selected group of individuals establishing the “average” value.
The term “amount” or “level” as used in this application refers to the quantity of a bacterial taxon of interest, a bacterial polynucleotide of interest or a bacterial polypeptide of interest present in a sample. Such quantity may be expressed in the absolute terms, i.e., the total quantity of the bacterial taxon, polynucleotide or polypeptide in the sample, or in the relative terms, i.e., the concentration of the bacterial taxon, polynucleotide or polypeptide in the sample.
The term “subject” includes individuals who seek medical attention due to a potential risk of having an adverse pregnancy outcome or neonatal outcome, e.g., any pregnant individual. Subjects also include individuals who have had an adverse pregnancy or neonatal outcome during a prior pregnancy.
DETAILED DESCRIPTION OF THE INVENTION
I. Introduction
The invention is based, in part, on the discovery of differentially abundant bacterial taxa in the cervical swab samples of women with advanced cervical dilation/cervical shortening, compared with those in appropriately-controlled samples from appropriately-matched women without the corresponding condition. The increased level of bacteria from particular bacterial taxa (e.g., Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having at least 92%sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1, Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, uncultured bacteria having at least 90%sequence identity to the nucleotide sequence of GenBank Accession No. JF295520.1, Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9) is also predictive of a risk of having an adverse pregnancy or neonatal outcome, such as preterm birth (e.g., spontaneous preterm birth) <34 weeks, preterm birth (e.g., spontaneous preterm birth) <37 weeks, delivery within a period after the biological sample is taken (e.g., 1 day, 7 days, 14 days, 21 days, 28 days, 56 days, 84 days, 112 days, 140 days, 168 days, 196 days) , delivery within a period after the clinical intervention is performed (e.g., 1 day, 7 days, 14 days, 21 days, 28 days, 56 days, 84 days, 112 days, 140 days, 168 days, 196 days) , an Apgar score at 1 minute<7, an Apgar score at 5 minutes<7, chorioamnionitis (e.g., clinical or pathological) , respiratory distress syndrome, bronchopulmonary dysplasia, intraventricular hemorrhage, neonatal death within 7 days after birth and neonatal sepsis.
II. General Methodology
Practicing this invention utilizes routine techniques in the field of molecular biology. Basic texts disclosing the general methods of use in this invention include Sambrook and Russell, Molecular Cloning, A Laboratory Manual (3rd ed. 2001) ; Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990) ; and Current Protocols in Molecular Biology (Ausubel et al., eds., 1994) ) .
For nucleic acids, sizes are given in either kilobases (kb) or base pairs (bp) . These are estimates derived from agarose or acrylamide gel electrophoresis, from sequenced nucleic acids, or from published DNA sequences. For proteins, sizes are given in kilodaltons (kDa) or amino acid residue numbers. Protein sizes are estimated from gel electrophoresis, from sequenced proteins, from derived amino acid sequences, or from published protein sequences.
Oligonucleotides that are not commercially available can be chemically synthesized, e.g., according to the solid phase phosphoramidite triester method first described by Beaucage and Caruthers, Tetrahedron Lett. 22: 1859-1862 (1981) , using an automated synthesizer, as described in Van Devanter et al., Nucleic Acids Res. 12: 6159-6168 (1984) . Purification of oligonucleotides is performed using any art-recognized strategy, e.g., native acrylamide gel electrophoresis or anion-exchange high performance liquid chromatography (HPLC) as described in Pearson and Reanier, J. Chrom. 255: 137-149 (1983) .
III. Acquisition of Tissue Samples and Analysis of Bacterial Taxa
The present invention relates to measuring the amount of bacteria of a specific bacteria taxon found in a pregnant woman’s cervix or vagina, especially in a cervical swab or vaginal swab sample, as a means to assess the risk of having an adverse pregnancy outcome or neonatal outcome, such as preterm labor and preterm delivery. Thus, the first steps of practicing this invention are to obtain a cervical or vaginal tissue sample from a test subject, such that the nucleic acids, e.g., RNA or DNA, contained in the sample may be analyzed. In some embodiments, the amount of bacteria of a specific bacteria taxon found in a pregnant woman’s cervix or vagina can be represented by the amount of the specific bacteria in a biological sample that is not from the cervix or the vagina.
A. Acquisition and Preparation of Biological Samples
A biological sample, such as cervical or vaginal tissue, cervical mucus, amniotic fluid or maternal blood is obtained from a person to be tested or monitored using a method of the  present invention. Collection of cervical or vaginal epithelial cells, cervical mucus, amniotic fluid or maternal blood (e.g., material whole blood, maternal serum and/or maternal plasma) from an individual is performed in accordance with the standard protocol hospitals or clinics generally follow, such as during a cervical screening. An appropriate amount of cervical or vaginal epithelium, scraped cells, mucus, and/or biological fluid is collected and may be stored according to standard procedures prior to further preparation.
The analysis of the bacteria found in a pregnant patient's sample according to the present invention may be performed using, e.g., cells, tissue, mucosa, or fluids found in the sample. The methods for preparing cell, tissue or fluid samples for nucleic acid extraction are well known among those of skill in the art. For example, a subject's cervical or vaginal mucosa sample can be treated to such that bacterial DNA or RNA in the sample can be analyzed.
B. Extraction and Quantitation of DNA
There are numerous methods for extracting bacterial DNA from a biological sample. Methods for extracting DNA from a biological sample are well known and routinely practiced in the art of molecular biology, see, e.g., Sambrook and Russell, supra. RNA contamination should be eliminated to avoid interference with DNA analysis. Pretreatment of the biological sample with lysis buffer and enzymes, including mutanolysin and proteinase K, can also be used before the extraction. Methods for detecting target DNA include either PCR analysis, quantitative analysis with fluorescence labelling or Southern blot analysis. The target DNA can be the gene encoding the 16S ribosomal RNA (the 16S rRNA gene) , or other genes or genomic sequences of interest possessed by a specific bacterial taxon.
A variety of polynucleotide amplification methods are well established and frequently used in research. For instance, the general methods of polymerase chain reaction (PCR) for polynucleotide sequence amplification are well known in the art and are thus not described in detail herein. For a review of PCR methods, protocols, and principles in designing primers, see, e.g., Innis, et al., PCR Protocols: A Guide to Methods and Applications, Academic Press, Inc. N.Y., 1990. PCR reagents and protocols are also available from commercial vendors, such as Roche Molecular Systems.
Although PCR amplification is typically used in practicing the present invention, one of skill in the art will recognize that amplification of the relevant genomic sequence may be accomplished by any known method, such as the ligase chain reaction (LCR) , transcription- mediated amplification, and self-sustained sequence replication or nucleic acid sequence-based amplification (NASBA) , each of which provides sufficient amplification. More recently developed branched-DNA technology may also be used to quantitatively determining the amount of specific bacterial mRNA markers. For a detailed description of branched-DNA signal amplification for direct quantitation of nucleic acid sequences in clinical samples, see, for example, Nolte, Adv. Clin. Chem. 33: 201-235, 1998.
Techniques for polynucleotide sequence determination are also well established and widely practiced in the relevant research field. For instance, the basic principles and general techniques for polynucleotide sequencing are described in various research reports and treatises on molecular biology and recombinant genetics, such as Wallace et al., supra; Sambrook and Russell, supra, and Ausubel et al., supra. DNA sequencing methods routinely practiced in research laboratories, either manual or automated, can be used for practicing the present invention. Additional means suitable for detecting a polynucleotide sequence for practicing the methods of the present invention include but are not limited to mass spectrometry, primer extension, polynucleotide hybridization, real-time PCR, melting curve analysis, high resolution melting analysis, heteroduplex analysis, massively parallel sequencing (e.g., next-gen sequencing) , and electrophoresis.
C. Extraction and Quantitation of RNA
One skilled in the art recognizes that there are numerous methods for extracting bacterial RNA from a biological sample. The general methods of RNA preparation (e.g., described by Sambrook and Russell, Molecular Cloning: A Laboratory Manual 3d ed., 2001) can be followed; various commercially available reagents or kits, such as Trizol reagent (Invitrogen, Carlsbad, CA) , Oligotex Direct mRNA Kits (Qiagen, Valencia, CA) , RNeasy Mini Kits (Qiagen, Hilden, Germany) , and Poly
Figure PCTCN2015082044-appb-000002
Series 9600TM (Promega, Madison, WI) , may also be used to obtain mRNA from a biological sample from a test subject. Combinations of more than one of these methods may also be used.
It is essential that all contaminating DNA be eliminated from the RNA preparations. Thus, careful handling of the samples, thorough treatment with DNase, and proper negative controls in the amplification and quantification steps should be used.
D. PCR-Based Quantitative Determination of RNA Level
Once RNA is extracted from a sample, the amount of any RNA transcripts of interest that is expressed by bacteria of a specific bacterial taxon may be quantified. For example, the amount of 16S ribosomal RNA (rRNA) for a particular bacterial taxon, such as, but not limited to, Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum or Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having at least 92%sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1, Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, uncultured bacteria having at least 90%sequence identity to the nucleotide sequence of GenBank Accession No. JF295520.1, Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9 may be detected and measured. The preferred method for determining the RNA transcript level is an amplification-based method, e.g., by polymerase chain reaction (PCR) , especially reverse transcription-polymerase chain reaction (RT-PCR) .
Prior to the amplification step, a DNA copy (cDNA) of a bacterial RNA transcript of interest must be synthesized. This is achieved by reverse transcription, which can be carried out as a separate step, or in a homogeneous reverse transcription-polymerase chain reaction (RT-PCR) , a modification of the polymerase chain reaction for amplifying RNA. Methods suitable for PCR amplification of ribonucleic acids are described by Romero and Rotbart in Diagnostic Molecular Biology: Principles and Applications pp. 401-406; Persing et al., eds., Mayo Foundation, Rochester, MN, 1993; Egger et al., J. Clin. Microbiol. 33: 1442-1447, 1995; and U.S. Patent No. 5,075,212.
The general methods of PCR are well known in the art and are thus not described in detail herein. For a review of PCR methods, protocols, and principles in designing primers, see, e.g., Innis, et al., PCR Protocols: A Guide to Methods and Applications, Academic Press, Inc. N.Y., 1990. PCR reagents and protocols are also available from commercial vendors, such as Roche Molecular Systems.
PCR is most usually carried out as an automated process with a thermostable enzyme. In this process, the temperature of the reaction mixture is cycled through a denaturing region, a primer annealing region, and an extension reaction region automatically. Machines specifically adapted for this purpose are commercially available.
Although PCR amplification of the target RNA is typically used in practicing the present invention. One of skill in the art will recognize, however, that amplification of these bacterial RNA species in the sample may be accomplished by any known method, such as ligase chain reaction (LCR) , transcription-mediated amplification, and self-sustained sequence replication or nucleic acid sequence-based amplification (NASBA) , each of which provides sufficient amplification. More recently developed branched-DNA technology may also be used to quantitatively determining the amount of specific bacterial RNA markers. For a review of branched-DNA signal amplification for direct quantitation of nucleic acid sequences in clinical samples, see Nolte, Adv. Clin. Chem. 33: 201-235, 1998.
E. Other Quantitative Methods for DNA and RNA
The bacterial DNA or RNA transcripts of interest can also be detected using other standard techniques, well-known to those of skill in the art. Although the detection step is typically preceded by an amplification step, amplification is not required in the methods of the invention. For instance, the DNA or RNA may be identified by size fractionation (e.g., gel electrophoresis) , whether or not proceeded by an amplification step. After running a sample in an agarose or polyacrylamide gel and labeling with ethidium bromide according to well-known techniques (see, e.g., Sambrook and Russell, supra) , the presence of a band of the same size as the standard comparison is an indication of the presence of a target DNA or RNA, the amount of which may then be compared to the control based on the intensity of the band. Alternatively, oligonucleotide probes specific to the DNA or RNA of interest can be used to detect the presence of such DNA or RNA species and indicate the amount of DNA or RNA in comparison to the standard comparison, based on the intensity of signal imparted by the probe.
Sequence-specific probe hybridization is a well-known method of detecting a particular nucleic acid comprising other species of nucleic acids. Under sufficiently stringent hybridization conditions, the probes hybridize specifically only to substantially complementary sequences. The stringency of the hybridization conditions can be relaxed to tolerate varying amounts of sequence mismatch.
A number of hybridization formats well known in the art, including but not limited to, solution phase, solid phase, or mixed phase hybridization assays. The following articles provide an overview of the various hybridization assay formats: Singer et al., Biotechniques, 4: 230, 1986; Haase et al., Methods in Virology, pp. 189-226, 1984; Wilkinson, In situ Hybridization, Wilkinson ed., IRL Press, Oxford University Press, Oxford; and Hames and Higgins eds., Nucleic Acid Hybridization: A Practical Approach, IRL Press, 1987.
The hybridization complexes are detected according to well-known techniques. Nucleic acid probes capable of specifically hybridizing to a target nucleic acid, i.e., the RNA or the amplified DNA, can be labeled by any one of several methods typically used to detect the presence of hybridized nucleic acids. One common method of detection is the use of autoradiography using probes labeled with 3H, 125I, 35S, 14C, or 32P, or the like. The choice of radioactive isotope depends on research preferences due to ease of synthesis, stability, and half-lives of the selected isotopes. Other labels include compounds (e.g., biotin and digoxigenin) , which bind to anti-ligands or antibodies labeled with fluorophores, chemiluminescent agents, and enzymes. Alternatively, probes can be conjugated directly with labels such as fluorophores, chemiluminescent agents or enzymes. The choice of label depends on sensitivity required, ease of conjugation with the probe, stability requirements, and available instrumentation.
The probes and primers necessary for practicing the present invention can be synthesized and labeled using well known techniques. Oligonucleotides used as probes and primers may be chemically synthesized according to the solid phase phosphoramidite triester method first described by Beaucage and Caruthers, Tetrahedron Letts., 22: 1859-1862, 1981, using an automated synthesizer, as described in Needham-VanDevanter et al., Nucleic Acids Res. 12:6159-6168, 1984. Purification of oligonucleotides is by either native acrylamide gel electrophoresis or by anion-exchange HPLC as described in Pearson and Regnier, J. Chrom., 255: 137-149, 1983.
F. Amplification and Sequence Analysis
An amplification reaction may performed prior to the sequence analysis. A variety of polynucleotide amplification methods are well established and frequently used in research. For instance, the general methods of polymerase chain reaction (PCR) for polynucleotide sequence amplification are well known in the art and are thus not described in detail herein. For a review of PCR methods, protocols, and principles in designing primers, see, e.g., Innis, et al., PCR  Protocols: A Guide to Methods and Applications, Academic Press, Inc. N.Y., 1990. PCR reagents and protocols are also available from commercial vendors, such as Roche Molecular Systems.
Techniques for polynucleotide sequence determination are also well established and widely practiced in the relevant research field. For instance, the basic principles and general techniques for polynucleotide sequencing are described in various research reports and treatises on molecular biology and recombinant genetics, such as Wallace et al., supra; Sambrook and Russell, supra, and Ausubel et al., supra. DNA sequencing methods routinely practiced in research laboratories, either manual or automated, can be used for practicing the present invention. Additional means suitable for detecting a polynucleotide sequence for practicing the methods of the present invention include but are not limited to mass spectrometry, primer extension, polynucleotide hybridization, real-time PCR, melting curve analysis, high resolution melting analysis, heteroduplex analysis, massively parallel sequencing, and electrophoresis.
IV. Establishing a Standard Control
In order to establish a standard control for a particular sample type (e.g., cervical swab or vaginal swab) for practicing the method of this invention, a group of healthy pregnant women, pregnant women who are not at risk of having an adverse pregnancy outcome or neonatal outcome, or pregnant women who are later confirmed to deliver within the normal time frame of their pregnancy, as conventionally defined is first selected. For example, the group may include a group of pregnant women who have had a full-term labor and delivery. In some embodiments, the group of pregnant women may have had a full-term labor and delivery after clinical intervention, such as cerclage or pessary. These individuals are within the appropriate parameters, if applicable, for the purpose of screening for and/or monitoring risk of adverse pregnancy outcomes using the methods of the present invention. For instance, the individuals may be of a similar gestational age and comparable health status. Optionally, the individuals are of similar age, or similar ethnic background.
The normal delivery time of the selected individuals will be confirmed later on, and anyone among the selected individuals who turn out to give birth sooner or later than the normal delivery time frame will be excluded from the group to provide data as a “standard control. ”
The healthy status of the selected individuals is confirmed by well established, routinely employed methods including but not limited to general physical examination of the individuals and general review of their medical history.
Furthermore, the selected group of healthy individuals must be of a reasonable size, such that the average amount/concentration of bacteria of one or more bacterial taxa in the cervical tissue sample obtained from the group can be reasonably regarded as representative of the normal or average level among the general population of healthy pregnant women. Preferably, the selected group comprises at least 10 pregnant human subjects.
Once an average value for the bacteria of one or more taxa is established based on the individual values found in each subject of the selected healthy control group, this average or median or representative value or profile is considered a standard control. A standard deviation is also determined during the same process. In some cases, separate standard controls may be established for separately defined groups having distinct characteristics such as age, gestational age, or ethnic background.
V. Predicting Risk of Adverse Pregnancy or Neonatal Outcome Based on Cervical Microbe Score
Using the methods described herein, it can be predicted whether a pregnant subject has a likelihood of having an adverse pregnancy outcome, e.g., spontaneous preterm birth or birth prior to 34 weeks in the current pregnancy. Likewise, it can be predicted whether a non-pregnant subject has a likelihood of having an adverse pregnancy outcome in the future pregnancy. In some embodiments, for each subject, the abundance levels of bacteria belonging to at least 1, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more bacterial taxa can be used to calculate a cervical microbe score. The subject’s score can be compared to a cut-off value established by the scores of the standard control group, and used to determine if the pregnant subject is likely to have an adverse pregnancy outcome. In some embodiments, the bacteria belong to at least 3, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more bacteria taxa. In some cases, the bacteria include Sneathia sanguinegens, Parvimonas micra, Ureaplasma urealyticum (or Ureaplasma parvum) , Atopobium vaginae, Peptoniphilus lacrimalis, Megasphaera cerevisiae, and Paravibacter caecicola. In other cases, the abundance levels of Parvimonas micra, Ureaplasma urealyticum (or Ureaplasma parvum) , Atopobium vaginae, Peptoniphilus lacrimalis, Megasphaera cerevisiae, and Paravibacter caecicola is used to  calculate the cervical microbe score for the subject. In some embodiments, each log10 (abundance) value for the selected bacteria taxon can be transformed into the linear scale and then added together, after which the total value is log-transformed and expressed as a cervical microbe score in the log10 scale. In some instances, a cervical microbe score of greater than 1.15 can indicate that the pregnant subject is at risk of having a spontaneous preterm birth. In some cases, the pregnant subject is at risk of having a spontaneous preterm birth after intervention, such as cerclage or pessary. In some embodiments, a cervical score of greater than 1.15 can indicate that the pregnant subject has an increased likelihood of delivering at less than about 37.0 weeks, e.g., about 36.5 weeks, about 36.0 weeks, about 35.0 weeks, about 34.0 weeks, about 33.0 weeks, about 32.0 weeks, about 31.0 weeks, about 30.0 weeks, about 29.0 weeks, about 28.0 weeks, about 27.0 weeks, about 26.0 weeks, about 25.0 weeks, about 24.0 weeks, about 23.0 weeks, about 22.0 weeks, about 21.0 weeks, or less.
In some embodiments, for each biological sample, the abundance level of a bacterial taxon is determined by quantitative PCR assay or sequence-specific hybridization assay targeting polynucleotide sequences specific to the concerned taxon. In other embodiments, for each biological sample, the abundance level of a bacterial taxon is determined by massive parallel sequencing of the marker gene sequences which serve as proxy of the respective taxon. In some instances, the abundance level of a bacterial taxon in a sample is the normalized read counts of the 16S ribosomal RNA (rRNA) gene sequence representing the concerned taxon. In some cases, the normalized read counts are calculated from the raw read counts using established normalization methods to minimize the technical variation between samples, such as different sequencing depth per sample or different library size per sample. In some cases, the abundance level of a bacterial taxon is expressed in the log10 scale.
In some embodiments, the abundance of a selected group of bacterial taxa (e.g., taxa that are significantly more abundant in the dilated cervices than the closed cervices) are combined by addition or multiplication to give the cervical microbe score, as illustrated in the LA6 value in Example 2. In other embodiments, the score is calculated in 3 steps: (a) the abundance levels of one sub-selected group of bacterial taxa (e.g., taxa that are significantly more abundant in the dilated cervices than the closed cervices) are combined by addition or multiplication; (b) the abundance levels of another sub-selected group of taxa (e.g., taxa that are significantly more abundant in the closed cervices than the dilated cervices) are also combined by addition or multiplication; and (c) the cervical microbe score is calculated by subtracting or  dividing between the sum or product of the first selected group of taxa and the sum or product of the second sub-selected group of taxa. In other embodiments, the cervical microbe score is calculated based on the number of kinds of selected bacterial taxa that are present in a sample, as illustrated in the DIBT1 test, DIBT2 test and DIBT3 test in Example 1. In the DIBT1, DIBT2 and DIBT3 tests, the cervical microbe score is the number of selected bacterial taxa, and a subject with score greater than or equal to 1 indicates increased risk of adverse pregnancy or neonatal outcome. In other embodiments, the cervical microbiome score is calculated based on the ranking of selected bacterial taxa among all taxa detected in a sample.
VI. Kits
The invention provides compositions and kits for practicing the methods described herein to assess the level of bacteria from one or more specific taxa in a pregnant subject, which can be used for various purposes such as determining the risk of having an adverse pregnancy or neonatal outcome.
Kits for carrying out assays for determining the RNA level of bacteria of a bacterial taxon of interest typically include at least one oligonucleotide useful for specific hybridization with at least one segment of a coding sequence of interest or its complementary sequence. Optionally, this oligonucleotide is labeled with a detectable moiety. In some cases, the kits may include at least two oligonucleotide primers that can be used in the amplification of at least one segment of a bacterial DNA or RNA transcript of interest by PCR, particularly by RT-PCR.
Kits for carrying out assays for determining the protein level of bacteria of a bacterial taxon of interest typically include at least one antibody useful for specific binding to the target protein amino acid sequence. Optionally, this antibody is labeled with a detectable moiety. The antibody can be either a monoclonal antibody or a polyclonal antibody. In some cases, the kits may include at least two different antibodies, one for specific binding to the target protein (i.e., the primary antibody) and the other for detection of the primary antibody (i.e., the secondary antibody) , which is often attached to a detectable moiety.
Typically, the kits also include an appropriate standard control. The standard controls indicate the average value of a target protein or a target mRNA expressed by bacteria from a specific bacterial taxon in the cervical epithelium of healthy, pregnant subjects who are not at risk of having an adverse pregnancy or neonatal outcome. In some cases such standard control may be provided in the form of a set value. In addition, the kits of this invention may provide  instruction manuals to guide users in analyzing test samples and assessing the risk of having an adverse pregnancy event, such as preterm delivery, in a test subject.
EXAMPLES
The following examples are provided by way of illustration only and not by way of limitation. Those of skill in the art will readily recognize a variety of non-critical parameters that could be changed or modified to yield essentially the same or similar results.
Example 1. Differentially abundant bacterial taxa in the cervix of women with pregnancy-associated complications.
Background
To support the cervix in an attempt to prolong pregnancy of women with advanced cervical dilation or premature cervical shortening, the clinicians may place a cerclage (Owen, Hankins et al. 2009) or a cervical pessary. However, it was shown that women with infection, such as intra-amniotic infection, despite placement of cerclage, were associated with adverse pregnancy outcomes, including pregnancy loss, preterm birth, rupture of membrane (Romero, Gonzalez et al., 1992) . Specifically, among 33 women with cervical dilation>or=2 cm, intact membrane, and without active labor between 14 and 24 weeks of gestation, 17 (51.5%) were found to have microbial invasion of the amniotic cavity. All patients with microbial invasion of the amniotic cavity had complications. Patients who underwent cervical cerclage in the presence of a positive amniotic fluid culture had rupture of membranes, clinical chorioamnionitis or pregnancy loss (Romero, Gonzalez et al., 1992) . Hence, cerclage may not be beneficial to this subset of patients with advanced cervical dilation. Based primarily on consensus and expert opinion, it has been recommended that cerclage placement may be beneficial only if intraamniotic infection is ruled out (American College of Obstetricians and Gynecologists. 2014. Cerclage for the management of cervical insufficiency. Practice Bulletin No. 142. Obstet Gynecol 2014; 123: 372-9) . Similarly, it has also been suggested to rule out infection before pessary placement, because trapping the fetus inside a highly infectious or inflammatory environment may lead to adverse neonatal outcomes, including cerebral palsy or brain damage of the fetus.
Currently, the detection of bacteria in a sample relies on culture, microscopy and bacterial species-specific polymerase chain reaction (PCR) assays, which only offer low to  moderate sensitivity and hence are not good at ruling out infection. For example, using amniotic fluid culture for aerobic and anaerobic bacteria as well as genital mycoplasmas and PCR targeting Ureaplasma urealyticum and Ureaplasma parvum, 44 patients (76%) of all 58 patients with advanced cervical dilation were tested negative of bacteria for both culture and PCR (Oh, Lee et al., 201 0) . However, eventually, among these women tested negative of bacteria, 63%(15/24) were tested positive for choriodeciduitis, 30% (7/23) positive for amnionitis, 25% (6/24) positive for funisitis, 62% (24/39) delivered a neonate with 1 min Apgar score<4, 41% (18/44) delivered a neonate who died within 1 day of birth (Oh, Lee et al., 2010) . The prevalence of inflammation and poor neonatal outcome in these 44 patients with negative culture and PCR results suggested that many of these negative test results for bacteria might be falsely negative.
Furthermore, Ureaplasmas were also detected in pregnancies of normal uncomplicated outcomes (Gray, Robinson et al., 1992, Gerber, Vial et al., 2003, Perni, Vardhana et al., 2004) . It is possible that Ureaplasmas are just part of the commensal bacteria, or normal flora, residing also in the reproductive tract of women resulting in a term pregnancy and normal neonate. This highlights the limitation of detecting for only a few bacterial species, selected by a candidate approach, in many of the previous studies.
In contrast, we have systematically profiled the presence and relative abundance of essentially all (>9,500 kinds/taxa of bacteria with known genomic sequences) in each of the cervical swab samples, obtained from women with advanced cervical dilation/shortening and without those conditions, using 16S ribosomal RNA (rRNA) -based massively parallel genomic sequencing. Our approach has overcome several limitation of those previous studies. First, our approach can universally detect essentially all bacterial taxa, but not only a selected list of candidates. Second, our approach is independent from culture, and thus fastidious bacteria may also be detected. Third, our approach does not require live bacteria, since it is based on PCR which can amplify the genomic DNA fragments from dead or live bacteria. Fourth, our approach investigates a non-invasive sample type, the cervical swab, which is readily obtainable from any pregnant women with or without complications. This facilitates appropriate matching with a normal control group and data comparison between the disease and control groups, unlike amniotic fluid which requires invasive procedures risking fetal loss and not recommended for normal pregnancy unless there is an indication. Fifth, our approach investigates the cervical swab, which is readily obtainable from any pregnant women at any gestational age. This facilitates not only matching with a control group but also a possibly early sampling and hence  detection and treatment, unlike amniotic fluid or the placenta, which are usually obtained after 14 weeks or 37 weeks of gestation, respectively.
There are many publications utilizing 16S ribosomal RNA (rRNA) -based massively parallel genomic sequencing to profile bacterial communities in many anatomical sites from "normal and healthy" humans with no complications. These include publications on the bacterial communities in vaginal swab samples of nonpregnant women or "normal and healthy" pregnant women (Ravel, Gajer et al., 2011; Aagaard, Riehle et al., 2012, Gajer, Brotman et al., 2012) . However, few have analyzed samples of women with pregnancy complications (Hummelen, Fernandes et al., 2010) .
Specifically, one was a study on the bacterial communities in the vagina of HIV-positive women, which did not provide a dataset applicable to the major obstetric population without HIV (Hummelen, Fernandes et al., 2010) . Another was a study on the bacterial communities in the placental samples delivered sooner than 37 weeks or after 37 weeks, which obviously did not facilitate a gestational age-matched comparison and thus involved a dataset confounded by gestational age difference (Aagaard, Ma et al., 2014) . Thus, to-date, there is no publication on the systematic profiling of bacterial taxa in the cervix of women with such adverse outcomes and comparison with those in the cervix of gestational age-matched women without such complications.
Our current study provides the first systematic and relatively comprehensive profile on the bacterial communities in the cervical swab samples of pregnant women with complications (advanced cervical dilation or shortening) . Moreover, for the first time, we systematically compare that profile of bacterial communities in the women with complications, with gestational age-matched women without such complications. Furthermore, for the first time, we provide a list of bacterial taxa which are differentially abundant in the cervix of women with such complications, compared with that of women without such complications. The provision of this list is an important groundwork for designing PCR assays to specifically target the "abnormal flora" in a gravid cervix, but not the "normal flora. " Hence, we reason that our study provides data for improving the detection of advanced cervical dilation-associated bacteria.
Notably, the presence of selected members of this list of bacterial taxa between 13 to 25 gestational weeks is associated or predictive of adverse outcomes, including birth sooner 34 weeks, and intraventricular hemorrhage. Hence, we also reason that our list of bacterial taxa  may also provide early prognosis of these adverse outcomes, so as to facilitate an early close-monitoring or transfer to a tertiary hospital with the proper neonatal care.
The purpose of the study described herein was to systematically profile all bacteria and identify a list of bacterial taxa and their partial genomic sequences that are differentially abundant in the cervix of women with advanced cervical dilation or cervical shortening. Specifically, the study was designed to (i) systematically profile the bacterial taxa in the cervix of women with advanced cervical dilation ( "the dilated cervix" ) , and compare with those in the cervix of appropriately-matched women without this condition ( "the closed cervix" ) ; (ii) to systematically profile the bacterial taxa in the cervix of women with cervical shortening ( "the shortened cervix" ) , and compare with those in the cervix of appropriately-matched women without this condition ( "the normal-length cervix" ) ; (iii) to systematically identify a list of differentially abundant bacterial taxa in the dilated cervix, relative to those in the closed cervix using the data in (i) ; and (iv) to systematically identify a list of differentially abundant bacterial taxa in the shortened cervix, relative to those in the normal-length cervix using the data in (ii) .
We have hypothesized that the bacterial communities colonizing the dilated cervix are different from those colonizing the normally closed cervix. To test this hypothesis, we have systematically profiled the bacterial taxa in the advanced dilated cervix ( "the dilated cervix" , n=19) and those in the normally closed cervix ( "the closed cervix" , n=13) using 16S ribosomal RNA (rRNA) -based massively parallel genomic sequencing. To systematically test for differentially abundant taxa between the two groups, we have compared the relative abundance of all the profiled taxa in the dilated cervix group and those in the closed cervix group using appropriate statistical procedures. To test if members of this list of bacterial taxa can predict any adverse pregnancy or neonatal outcomes, we have followed up these pregnancies from recruitment for this study till 28 days after birth, and calculated the sensitivity, specificity, positive predictive values, negative predictive values for predicting those outcomes whenever they were available.
Similarly, we have also systematically profiled the bacterial taxa in the shortened cervix (n=11) , and compared with those from gestational age-matched controls with a normal-length cervix (n=11) . We have also identified a list of differentially abundant taxa between the shortened cervix and the normal-length groups using statistical procedures and followed up the  clinical outcomes. Whenever the outcomes were available, we have also assessed for predictive performance by selected members of that list.
Methods and Results
Recruitment of participants. This study was conducted with ethics approval from the respective institutional review board and samples were collected from pregnant women with informed consent. In the test group (the dilated cervix group, n=19) , only pregnant women with painless advanced cervical dilation>0.8cm were recruited.
Whereas, in the reference group (the closed cervix group, n=13) , only pregnant women with no painless advanced cervical dilation>0.8 cm were recruited. Furthermore, to allow for fair comparison of results and clinical outcomes, in both the test and reference groups, only women (i) with no regular and frequent before 34 gestational weeks and (ii) with an indication to undergo cervical cerclage placement were recruited. The key characteristics of these 32 participants (19 in the dilated cervix group and 13 in the closed cervix group) are listed in Table 1.
To avoid complicating the phenotype (i.e., advanced cervical dilation) , in both the test and the reference groups, pregnancies involving preeclampsia, multiple pregnancies, fetal distress, growth restriction, chromosomal or structural abnormalities at the time of sample collection were excluded. To minimize some major confounding factors affecting the bacterial communities in the cervix, participants who had sexual activities or applied any used any other vaginal applications (e.g., vaginal medication or suppositories, douche) 48 hours before sample collection or on antibiotic or antimycotic drugs 30 days before sample collection, or ovarian tumor were also excluded. Participants who had previous miscarriage or birth between 14 to 36 weeks (n=3, all in the dilated cervix group, Fisher exact test, p=0.253) , previous surgical evacuation for miscarriage or termination of pregnancy (n=16, 9 dilated and 7 closed, p=1.00) , cervical surgery (n=6, 2 dilated and 4 closed, p=0.194) , uterine abnormality (n=6, 3 dilated and 3 closed, p=0.666) , recent abnormal vaginal discharge (n=5, 4 dilated cervix and 1 closed, p=0.625) were clearly documented in our database. These factors were not statistically different between the two groups, but would be included for interpreting the data.
Collection and DNA extraction of cervical swab samples: To minimize the chance of contamination by the environment, the clinical staff or other parts of the female reproductive tract, the cervical swab sample was collected using a Calgiswab Type III (Puritan, Guilford,  Maine, USA) before any other procedures immediately upon opening up of the female reproductive tract by the speculum. To ensure the same anatomical locations were sampled and compared, each cervical swab sample was collected from a fixed position on the peripheral side (the 12 o'clock position facing the clinician) of the external os. To maintain consistency for fair comparison across all samples, a single clinician collected all samples from the dilated cervix and the closed cervix groups. To minimize variations in collection, each swab was collected by rotating 360 degrees once. To minimize any increased risk of infecting the participants or her fetus in the uterus, the swabs were collected without touching the cervical mucus plug and were sterile. During the collection of swab sample, extra care was also taken not to touch the labia or any parts of the female reproductive tract other than the external os. To monitor for contamination of bacteria in the operation room, the reagents and collection procedures, another negative control swab was collected in parallel with each cervical swab but without touching the patients.
The cervical swab and the negative control swabs were immersed in sterile and nuclease-free water and stored at -80℃ until extraction. The swabs were extracted for genomic DNA using an established method (Method 1 in Yuan, Cohen et al., 2012) , which would ensure fair representation of bacterial communities commonly found in the female reproductive tracts. This method involved the pretreatment of the sample by the mutanolysin (Sigma-Aldrich) and a column-based DNA extraction method (QiaAmp DNA Mini Kit, Qiagen) . To minimize any batch variation, all samples were extracted on the same day.
PCR amplification and massively parallel sequencing: Since the cervical swab samples inevitably would comprise human genomic DNA among the bacterial genomic DNA, we have specifically amplified the 16S rRNA gene, which is commonly possessed by all bacteria, but not by human. To facilitate the amplification of genomic DNA sequences of essentially all bacteria, we have chosen to use a pair of PCR primers, namely V4 and V5, which were complementary to the highly conserved regions 16S rRNA gene (Claesson, Wang et al. 2010) . We have checked using the Ribosomal Database Project (RDP) (Wang, Garrity et al. 2007) , the largest public database containing 16S rRNA sequences, that our chosen pair of PCR primers could theoretically amplified>97%of>9,200 typed (established) bacterial taxa of known 16S rRNA genomic sequences. Therefore, this pair of PCR primers is applicable for a systematic and non-biased profiling of bacterial communities in this study.
We amplified the genomic DNA extracted from each swab sample using the V4-V5 PCR primer pair (Claesson, Wang et al. 2010) , which flanks the hypervariable regions V4 and V5 of the 16S rRNA gene. The sequences of the forward and reverse primers are 5’ - [Primer A Key sequence] [MID sequence] AYT GGG YDT AAA GNG-3’ (SEQ NO ID : 1) , and 5’ - [Primer B-Key] CCG TCA ATT YYT TTR AGT TT-3’ (SEQ ID NO: 2) , respectively, where Primer A Key sequence, Primer B Key sequence and MID sequence are described in the "454 Sequencing System Guidelines for Amplicon Experimental Design July 2011" for the massively parallel sequencing platform GX-FLX 454 Titanium (Roche) . Each PCR was performed as a 50-μL reaction with 2.5 units of the FastStart Taq DNA polymerase (FastStart HiFi PCR System dNTPack, Roche) , 4 mM MgCl2, 100 nM of each primer and 200 μM dNTPs. All PCR were run on a PTC-100 thermal cycler (Bio-Rad) using the following thermocycling conditions: 95℃ for 2 minutes, followed by 33 cycles of 95℃ for 30 seconds, 40℃ for 30 seconds, and 72℃ for 1 minute, with a final extension at 72℃ for 5 minutes and 25℃ for 5 minutes. We then subjected the PCR product to electrophoresis. We confirmed a single PCR amplicon of the expected size for all 32 cervical swab samples, and no PCR amplicon for all the 32 corresponding negative reagent controls. Thus, the environment, reagents and procedures were free from any contamination of unwanted bacterial 16S rRNA genomic sequences.
Subsequently, we purified all the 32 PCR products, which were derived from the 32 cervical swab samples and which were attached with the multiplex identifier (MID) sequences and adaptor sequences incorporated through the 5’ ends of the PCR primers above, according to Roche’s recommended instructions. The purified products were subjected to massively parallel genomic sequencing on using the GX-FLX 454 Titanium (Roche) , according to manufacturer's instructions, targeting at an average of around 10,000 raw sequencing reads per sample.
For each samples, raw sequencing data were denoised at the flowgram level, using an implementation of Pyronoise (Quince, Lanzen et al., 2011) on mothur (Schloss, Westcott et al., 2009) . Raw reads were flowgram-denoised, quality-and length-filtered, chimera-removed, aligned, pre-clustered and clustered into operational taxonomic units, which were then taxonomically classified, based on the Ribosomal Database Project (RDP) training set (v9, 2012) .
Systematic identification of differentially abundant bacterial taxa between the dilated cervix group (n=19) and the closed cervix group (n=13) : After all the above analytical steps, we have observed 342 bacterial taxa in the 32 cervical swab samples. To normalize the varying read  counts across different samples, we performed random subsampling using mothur (Schloss, Westcott et al. 2009) so that each of the 32 samples contained 7,594 processed reads for further analysis. Alternatively, we have also normalized by representing the read counts of each taxon in a sample as a ratio of the total read counts from that sample.
To identify the differentially abundant taxa between the dilated cervix group and the closed cervix group, we performed a statistical test, namely Metastats, which is specially designed for this type of sequencing data (White, Nagarajan et al. 2009) . In essence, Metastats features a non-parametric T-test and a heuristic to use Fisher exact test if a certain taxon appears at an average of less than 1 read per sample (the so-called sparse count problem which poses challenge for detecting significant changes in this type of data) .
After removal of sequences of taxon that appeared only once (singleton sequences) and adjustment for multiple testing using the False Discovery Rate (FDR) method at FDR<5% (Storey and Tibshirani, 2003) , 16 taxa remained as statistically significantly different (q<0.05) . Among them, 9 taxa (Table 2A) were significantly increased and 7 taxa (Table 2B) were significantly decreased in the dilated cervix group, relative to the closed cervix group. The nearest taxonomic classification of these 16 genomic sequences at the level from kingdom to genus are listed in Tables 3A and 3B. The 16S rRNA genomic sequences are listed in Tables 4A and 4B. The nearest species classification of these 16 genomic sequences based on the BLAST nucleotide alignment against the 16S ribosomal RNA database (performed using the NCBI BLAST website in June 2014) are listed in Tables 4E and 4F.
Systematic identification of differentially abundant bacterial taxa between the dilated cervix group (n=10) and the closed cervix group (n=10) that were matched at 1:1 by the nearest gestational week at sample collection: The gestational age at sample collection between the two groups are not statistically significantly different. However, to minimize any effect of gestational age on our analysis, we have matched each woman in the dilated cervix group with another women in the closed cervix group by the nearest gestational week (within 2 weeks) at sample collection. Moreover, to minimize any effect of different sample sizes of the two groups, we have performed the above matching at a 1:1 ratio. Finally, we were able to match 10 women in the dilated cervix group with 10 women in the closed cervix group for the gestational week at sample collection.
After adjustment for multiple testing using the False Discovery Rate (FDR) method at FDR<5% (Storey and Tibshirani, 2003) , 20 taxa remained as statistically significantly different (q<0.05) . Among them, 15 taxa (Table 2C) were significantly increased and 5 taxa (Table 2D) were significantly decreased in the dilated cervix group, relative to the closed cervix group. The nearest taxonomic classification of these 20 genomic sequences at the level from kingdom to genus are listed in Tables 3D and 3C. The 16S rRNA genomic sequences are listed in Tables 4C and 4D. The nearest species classification of these 16 genomic sequences based on the BLAST nucleotide alignment against the 16S ribosomal RNA database (performed using the NCBI BLAST website in June 2014) are listed in Tables 4G and 4H.
Predictive performance by selected members of the lists of differentially abundant bacterial taxa on various outcomes. We selected all 9 taxa from Table 2A and all 15 taxa with from Table 2C to construct two tests, namely the Differentially Increased Bacteria Test (DIBT) 1 and DIBT2, respectively.
DIBT1 and DIBT2 may use massively parallel genomic sequencing data or, more preferably, data from species-specific PCR for detecting the presence of a given bacterial taxon. If any of these 9 or 15 differentially increased taxa was present in that sample, we defined it as DIBT1 positive or DIBT2 positive, respectively. Otherwise, if all of these 9 or 15 taxa are absent in a sample, we defined it as DIBT1 negative or DIBT2 negative, respectively
To explore the association between DIBT1 or DIBT2 and adverse pregnancy or neonatal outcomes, we performed the Fisher exact test. To explore the potential of DIBT1 or DIBT2 in predicting key pregnancy and neonatal outcomes, we calculated the true positives, false positives, false negatives and true negatives of DIBT1 or DIBT2 in predicting these outcomes (Tables 5A and 5B, respectively) . Also, we calculated the sensitivity, specificity, positive and negative predictive values of DIBT1 and DIBT2 in predicting these outcomes (Tables 5A and 5B, respectively) .
There are significant associations between DIBT1 and advanced cervical dilation (Fisher exact test, p=0.028) , spontaneous preterm birth<28 weeks (p=0.0023) , spontaneous preterm birth<34 weeks (p=0.00065) , preterm birth<28 weeks (p=0.0057) , preterm birth<34 weeks (p=3.2x10-6) and intraventriclar hemorrhage (p=0.01) . Of note, all 7 cases of spontaneous preterm birth were detected as DIBT1 positive (no false negatives) . Also of note, among all 13 cases classified as DIBT1 positives, all underwent preterm birth<34 weeks (no  false positives) . All these preterm births were not due to preeclampsia, intrauterine growth restriction, fetal chromosomal abnormality) . Thus, we speculate these 13 preterm births may be triggered by infection of one or more of these “abnormal flora” we identified by in this study and selected for inclusion in the DIBT1.
As for DIBT2, significant associations are observed between it and spontaneous preterm birth<28 weeks (p=0.0023) , spontaneous preterm birth<34 weeks (p=0.00065) , preterm birth<28 weeks (p=0.0057) , preterm birth<34 weeks (p=0.0032) . Of note, all 7 cases of spontaneous preterm birth<34 weeks were detected by DIBT2 as positive (no false negatives) .
In another study (study B) , we have performed a similar analysis on cervical swab samples collected at matched gestational age, but from women with a short cervix or with a normal length cervix (Table 6) . For each sample, DNA was extracted and PCR-amplified by the V4V5 primers and subjected to GS-FLX 454 analysis. After adjustment for multiple testing using the False Discovery Rate (FDR) method at FDR<5% (Storey and Tibshirani, 2003) , 24 taxa remained as statistically significantly different (q<0.05) . Among them, 17 taxa (Table 7) were significantly increased and 7 taxa (not shown) were significantly decreased in the short cervix group, relative to the normal-length cervix group. The 16S rRNA genomic sequences are listed in Table 8. The nearest species classification of these 17 genomic sequences based on the BLAST nucleotide alignment against the 16S ribosomal RNA database (performed using the NCBI BLAST website in June 2014) are listed in Table 9.
Predictive performance by selected members of the lists of differentially abundant bacterial taxa on various outcomes. We selected all 17 taxa from Table 7 to construct another test, namely the Differentially Increased Bacteria Test (DIBT) 3.
DIBT3 may use massively parallel genomic sequencing data or, more preferably, data from species-specific PCR for detecting the presence of a given bacterial taxon. If any of these 17 differentially increased taxa was present in that sample, we defined it as DIBT3 positive. Otherwise, if all of these 17 taxa are absent in a sample, we defined it as DIBT3 negative, respectively. The association of the DIBT3 positive results and adverse pregnancy or neonatal outcome is tabulated in Table 10.
There are significant associations between DIBT3 and premature cervical dilation (Fisher exact test, p=0.00022) , spontaneous preterm birth<34 weeks (p=0.049) , preterm birth  <34 weeks (p=0.049) . Of note, all 3 cases of spontaneous preterm birth were detected as DIBT3 positive (no false negatives) .
The bacterial markers provided herein were used to accurately predict adverse pregnancy outcomes and neonatal outcomes based on a molecular test performed as early as 13 weeks of gestation (Tables 5A, 5B and 10) . The method described herein facilitates early intervention, such as close monitoring or timely transfer to a tertiary treatment unit with neonatal intensive care.
Example 2: Cervical microbiome signature for the identification of cervical insufficiency patients resulting in spontaneous preterm birth after clinical intervention
Bacterial taxa colonizing the cervices of cervical insufficiency (CI) patients responding differently to clinical intervention (cerclage/pessary) have not been systematically investigated. Using massively parallel sequencing, we interrogated the abundances of over 9,600 taxa per cervical swab sample obtained before intervention from serially-recruited singleton-pregnancy CI patients and appropriately-matched women without CI. We observed that the cervical microbiomes were altered in the CI patients, compared with those of the non-CI controls. Notably, we identified 6 differentially abundant taxa in patients resulting in “spontaneous preterm birth (<34 weeks, sPTB) after intervention” , compared with those resulting in “term birth (≥37 weeks) after intervention” . Using the log10 (total abundance of these 6 taxa) , LA6,>1.15 to define a positive result, we correctly classified all but one patients resulting in “sPTB after intervention” (9/10=90%) , with no false positive (0/15=0%) . LA6-positive patients, remained undelivered for a shorter period after intervention [median number days between intervention and delivery, 10 days vs. 126 days; Logrank test, p<0.00001; hazard ratio, 6.34; 95%confidence interval, 1.51 to 26.6] , compared with LA6-negative patients. Moreover, LA6-positive patients delivered earlier than their LA6-negative counterparts [median gestational age at delivery of 23.7 weeks vs. 38.4 weeks; Logrank test, p<0.0001; hazard ratio, 6.24; 95%confidence interval, 1.50 to 25.9] . Our study highlights the potential use of the pre-intervention cervical microbiome to provide prognostic information of the pregnancy after the cerclage/pessary intervention.
Introduction
Cervical insufficiency (CI) is a risk factor for preterm birth (PTB) , which is associated with neonatal morbidity and perinatal death. It is manifested in the affected women as having a  prematurely dilated (cervical dilation, 1 cm -5 cm) or shortened cervix (cervical length<25 mm) in the second, instead of the third, trimester. Clinical intervention by the placement of surgical cerclage or cervical pessary (Shirodkar, Antiseptic, 52, 299-300 (1955) ; Cross, Lancet, 274, 127 (1959) ) on the weakened cervix of CI patients has been shown to decrease the rate of PTB<28 weeks (Pereira et al., Am J Obstet Gynecol 197, 483 e481-488 (2007) ) or PTB<34 weeks (Althuisius et al., Am J Obstet Gynecol 189, 907-910 (2003) ; Goya et al., Lancet 379, 1800-1806 (2012) ) , increase the rate of neonatal survival (3) , and the interval between presentation and delivery (Pereira et al., Am J Obstet Gynecol 197, 483 e481-488 (2007) ) .
Nevertheless, the cervical intervention does not benefit every CI patient. In particular, CI patients with intraamniotic infection (IAI) resulted in a 4-fold higher rate of PTB<34 weeks after cerclage intervention (Romero et al., Am J Obstet Gynecol 167, 1086-1091 (1992) ) , compared with those receiving the same intervention but with no IAI. Patients who received cerclage intervention only if IAI was ruled out resulted in a lower rate of PTB<34 weeks, compared to those who received the intervention without testing for IAI (Mays et al., Obstet Gynecol 95, 652-655 (2000) ) . Since IAI is highly prevalent (38%-51%) in CI patients (Romero et al., Am J Obstet Gynecol 167, 1086-1091 (1992) ; Mays et al., Obstet Gynecol 95, 652-655 (2000) ) , experts have suggested clinicians to consider ruling out IAI using pre-cerclage amniocentesis to detect for microorganisms (Berghella et al., Am J Obstet Gynecol 209, 181-192 (2013) ; Airoldi et al., Am J Perinatol 26, 63-68 (2009) ) . However, amniocentesis is invasive and associated with a small but finite chance of fetal loss. Thus, we investigated the cervical colonization in CI patients and correlated the expression of the microbiome with outcomes of the cervical intervention. The results show that cervical swab sampling may be a relatively non-invasive alternative to amniocentesis on CI patients.
Massively parallel sequencing (MPS) has facilitated a culture-independent and hence provided more sensitive and comprehensive view of microorganisms colonizing different body sites. Conventionally, the placenta which is located inside the amniotic cavity has been thought to be sterile. Contrary to this, a MPS-based microbiome and metagenomic study published in this journal has revealed that the placenta harbors a low biomass microbiome that varies in association with a remote history of maternal antenatal infection and preterm birth (Aagaard et al., Sci Transl Med 6, 237ra265 (2014) ) . The present study was performed to investigate the association between the antenatal cervical microbiome and the outcome of preterm birth from CI patients.
The high-resolution data generated by MPS and the quantitative information that can be inferred from them have also broadened our knowledge on microorganisms in women’s health. Five major classes of bacterial communities (community groups) have been observed in the vaginal tract of reproductive-age non-pregnant women. In women whose vaginal proportion of non-Lactobacillus sp. increased as commonly seen in community group IV, their Nugent scores, a diagnostic factor commonly used to identify women with bacterial vaginosis, also increased (Ravel et al., Proc Natl Acad Sci U S A 108, 4680-4687 (2011) ) . Notably, women with bacterial vaginosis were more likely to have a dilated cervix (adjusted odds ratio, 4.9; 95%confidence interval 2.2 -10.9) (Kilpatrick et al., Am J Obstet Gynecol 194, 1168-1176 (2006) ) . Longitudinal study of the vaginal microbiome in non-pregnant women has revealed that some bacterial communities change markedly over short time periods, whereas others are relatively stable (Gajer et al., Sci Transl Med 4, 132ra152 (2012) ) . It is reasoned from ecological theory that less stable communities are more susceptible to invasion by pathogenic organisms (Dunstan et al., Ecology 87, 2842-2850 (2006) ) . In this study the cervical microbiomes in CI patients were compared with those of appropriately-matched women without CI.
Results
We recruited 34 cervical insufficiency (CI) patients and obtained a cervical swab sample from each before cerclage/pessary treatment. The second-trimester singleton-pregnancy women participating in this study all had: (i) painless advanced cervical dilation (1.5 cm-5.0 cm) and/or cervical shortening (cervical length<25 mm) , and; (ii) intact membrane, and; (iii) no labour contractions; during the time of cervical swab sampling.
All these CI patients received cerclage/pessary intervention due to the prematurely dilated/shortened cervix, respectively. The patients were followed up until one month after delivery. Nine pregnancies complicated by iatrogenic PTB due to preeclampsia, fetal distress, growth restriction, fetal chromosomal or structural abnormalities were excluded. Among the 25 remaining CI patients, 15 women resulted in term births (TB, delivered on or after 37 weeks of gestation) after intervention and 10 resulted in spontaneous preterm births (sPTB, delivered at less than 34 weeks) after intervention (FIG. 1A) . Of these, 7 involved neonatal morbidity including respiratory distress syndrome (RDS) , bronchopulmonary dysplasia (BPD) , intraventricular haemorrhage (IVH) and retinopathy of prematurity (ROP) or perinatal mortality (FIG. 1A) .
Profiling bacterial taxa by massively parallel
Each cervical swab sample collected before the clinical intervention was subjected to extraction of bacterial DNA, PCR amplification of the 16S ribosomal RNA (rRNA) gene and massively parallel sequencing of the amplicon (Titanium, GS-FLX 454, Roche) . The PCR primers targeting the V4 and V5 regions of the 16S rRNA gene could amplify over 9,600 well-established bacteria of known 16S rRNA sequences for analysis (Claesson et al., Nucleic Acids Res 38, e200 (2010) ) . This scope was wider than most of the published microbiome studies of the female reproductive tract.
To minimize spurious detection of “new” taxa arising from sequencing errors, the raw sequencing reads (0.68 million reads) were denoised and quality-filtered into processed reads (0.54 million reads) using well-established methods (Schloss et al., Appl Environ Microbiol 75, 7537-7541 (2009) ; Quince et al., Nat Methods 6, 639-641 (2009) ) . On average, each of the 25 samples was sequenced at a depth of 22, 000 processed high-quality reads. Such a sequencing depth was greater than most of the published microbiome studies of the female reproductive tract.
Further, the processed reads from all samples with at least 97%sequence identity were clustered as one operational taxonomic unit (Otu, i.e., a bacterial taxon) . Totally, 152 bacterial taxa were detected in all 25 cervices. To allow fairer comparison across samples sequenced at different read counts, we have performed the Cumulative Sum Scaling (CSS) normalization (Paulson et al., Nat Methods 10, 1200-1202 (2013) ) and expressed the abundance value (unit: read counts) for each taxon as its CSS-normalised read count in the log10 scale (i.e., the log (abundance) value) .
Highly abundant bacterial taxa in cervical insufficiency patients
FIG. 1B shows the 10 most abundant bacterial taxa observed in the 10 “sPTB after intervention” cervices. Contrary to a healthy female reproductive tract predominated by Lactobacilli, a member of the Gardnerella genus (Otu 4) has been identified as the most abundant bacterial taxa in the “sPTB after treatment” cervices [FIG. 1B, row #1, i.e., the taxon with the greatest total log (abundance) values in the 10 “sPTB after intervention” cervices] . In fact, 7 of the 10 most abundant bacterial taxa in this group have been classified as non-Lactobacillus genera: Gardnerella, two Sneathias, Aerococcus, Megasphaera, Pseudomonas and Anaerococcus (FIG. 1B, rows #1 to #10 and columns under “sPTB” ) .
In comparison, only 5 of the 10 most abundant bacterial taxa in the “TB after treatment” cervices have been classified as non-Lactobacillus genera (FIG. 1C, rows #1 to #10 and columns under “TB” ) . Notably, the 3 most abundant bacteria have been identified as Lactobacillus crispatus, L. iners, and L. jensenii, which are known to predominate the healthy female reproductive tract (Ravel et al., Proc Natl Acad Sci U S A 108, 4680-4687 (2011) ) .
Differentially abundant taxa in patients with different responses to intervention
Importantly, we have identified 7 bacterial taxa to be differentially abundant between the “sPTB after treatment” and the “TB after treatment” groups (FIG. 1D) . The log (abundance) values of the 7 taxa, namely Sneathia (Otu 11) , Parvimonas (Otu 16) , Ureaplasma (Otu 56) , Atopobium (Otu 42) , Peptoniphilus (Otu 28) , Megasphaera (Otu 47) and Paraeggerthella (Otu 40) , were higher in the former group (Mann-Whitney rank sum test, p<0.05; multiple testing correction was performed using the False Discovery Rate (FDR) method, FDR<5%) (FIG. 1D, rows #1 to #7, last column) . Strikingly, of these, the latter 6 taxa were exclusively observed only in the “sPTB after treatment” , but not the “TB after treatment” , group [FIG. 1D, rows #2 to #7, many log (abundance) values under the “sPTB” columns are≥0, but all log (abundance) values under the “TB” columns are 0] .
Differentially abundant taxa and outcome after intervention
To summarize the abundances of the latter 6 differentially abundant taxa (Mann-Whitney, p<0.01; Table 11) , we have calculated for each cervical swab sample a LA6 value, which refers to the log10 (total abundance of the 6 differentially abundant taxa) . Briefly, for each sample, we transformed each of its log (abundance) values for Parvimonas (Otu 16) , Ureaplasma (Otu 56) , Atopobium (Otu 42) , Peptoniphilus (Otu 28) , Megasphaera (Otu 47) and Paraeggerthella (Otu 40) back into the common linear scale. After adding those 6 abundance values in the common scale, we log-transformed that total abundance and expressed the LA6 value in the log10 scale.
The median values of LA6 were 2.61 and 0.78 in the "sPTB after treatment" and the “TB after treatment” groups, respectively (FIG. 2A) . The median LA6 values are shown to be increased by 3.36-fold in the former group (Mann-Whitney, p<0.0001) . To find the optimal threshold in identifying the “sPTB after treatment” group among CI patients receiving treatment, we have plotted the receiver-operating characteristics (ROC) curve [area under ROC curve (95%confidence interval) , 0.95 (0.84-1.06) ; p=0.0002] . Using the LA6>1.15 as a threshold in  defining a positive result, we could identify all but one “sPTB after intervention” CI patients (9/10=90%sensitivity) with no false positive (1 -0/15=100%specificity) .
Importantly, LA6-positive patients delivered earlier after clinical intervention than the LA6-positive patients (FIG. 2B, median gestational age at delivery of 23.7 weeks vs. 38.4 weeks; 95%confidence interval, 20.6 weeks -25.4 weeks vs. 38.0 weeks -38.7 weeks; Chi-squared, 32.352; df, 1; Logrank test, p<0.0001; hazard ratio, 6.24; 95%confidence interval, 1.50 to 25.9; MedCalc, version 14.12) . Also importantly, LA6-positive patients delivered for a shorter period after intervention than their LA6-negative counterparts (FIG. 2C median number days between intervention and delivery, 10 days vs. 126 days; 95%confidence interval, 8 days -32 days vs. 112 days -134 days; Chi-squared, 32.520; df, 1; Logrank test, p<0.00001; hazard ratio, 6.34; 95%confidence interval, 1.51 to 26.6)
Discussion
Cervical microbiome signature identified in this study
We have shown that the abundances of 7 bacterial taxa in CI patients were significantly increased in the women resulting in “sPTB after clinical intervention” , compared with those resulting in “TB after clinical intervention” . Further, we have calculated the LA6 value, which represents the total abundance values of the 6 most significantly increased taxa, for each CI patient. Based on the LA6 values of the 25 cervical swab samples obtained before cerclage/pessary intervention, we correctly identified 10 out of 11 “sPTB after intervention” CI patients (9/10=90%sensitivity) without any false positive in the 15 “TB after intervention” patients (1-0/15=100%specificity) .
This study was limited by its sample size of Asian pregnant women and involved a retrospective design. Nevertheless, it has provided a non-obvious, focused, panel of bacterial taxa which is associated with CI, an important risk for preterm birth. The cervical microbiome signature identified in this study, namely LA6, has been illustrated to provide prognostic information after cerclage/pessary intervention. Apparently, CI patients tested positive for LA6are at an increased risk to deliver at a significantly earlier gestational age after clinical intervention than those tested negative (23.7 weeks vs. 38.4 weeks; hazard ratio, 6.24; 95%confidence interval, 1.50 to 25.9) . Not only so, LA6-positive patients are also at an increased risk to deliver much sooner after intervention than their LA6-negative counterparts (10 days vs. 126 days; hazard ratio, 6.34; 95%confidence interval, 1.51 to 26.6) .
Materials and Methods
Recruitment of participants
Ethics approval for conducting this study was obtained from the respective Institutional Review Board. Informed consent were obtained from Asian pregnant women attending the Department of Obstetrics and Gynaecology, Prince of Wales Hospital, The Chinese University of Hong Kong or the Department of Obstetrics and Gynaecology, Hallym University, Seoul, South Korea and fulfilling the following inclusion criteria: (i) painless cervical dilation (1.0 cm-5.0 cm) in the second trimester; and (ii) intact membrane and; (iii) no labour contractions (once per 10 minutes) . Women were excluded from this study if they had: (a) a multiple pregnancy (≥2 fetuses) ; or (b) coitus or applied any vaginal applications 48 hours before collection of the cervical swab sample; or (c) used antibiotic/antimycotic drugs 30 days before collection of the cervical swab sample.
Cervical swab collection
Before the cerclage treatment, a cervical swab sample was collected from the CI patient by rotating a sterile Dacron swab 360° once on the external os. For dilated cervix, the swab was obtained from the os at the 12 o’ clock position facing the clinician. The swab sample was collected immediately upon opening up of the reproductive tract by speculum, and before any other procedures. Antiseptic techniques were applied. Special care was taken to avoid the swab to come into contact with any part of the reproductive tract (e.g., vagina, labia) , other than that designated location of the cervix.
Bacterial DNA extraction
We extracted for bacterial genomic DNA in each sample according to a published protocol, which was optimized to maintain good representation of bacterial taxa in a sample (Yuan et al., PLoS One 7, e33865 (2012) ) .
Taxonomic classification
Each Otu is taxonomically classified at the genus level using the Ribosomal Database Project (RDP) 
Figure PCTCN2015082044-appb-000003
Bayesian rRNA Classifier (Version 2.9, September 2014, RDP 16S rRNA training set 10) . Lactobacillus are further matched against the 16S rRNA database (GenBank) using BLAST (highest score) and MOLE-BLAST (best multiple-alignment of BLAST matches) for deriving the species information.
References
Aagaard, K., J. Ma, K.M. Antony, R. Ganu, J. Petrosino and J. Versalovic (2014) . "The placenta harbors a unique microbiome. " Sci Transl Med, 6 (237) : 237-265.
Aagaard, K., K. Riehle, J. Ma, N. Segata, T.A. Mistretta, C. Coarfa, S. Raza, S. Rosenbaum, I. Van den Veyver, A. Milosavljevic, D. Gevers, C. Huttenhower, J. Petrosino and J. Versalovic (2012) . "A metagenomic approach to characterization ofthe vaginal microbiome signature in pregnancy. " PLoS One 7 (6) : e36466.
Claesson, M. J., Q. Wang, O. O'S ullivan, R. Greene-Diniz, J.R. Cole, R.P. Ross and P. W. O'Toole (2010) . "Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions. " Nucleic Acids Res 38 (22) : e200.
Gajer, P., R.M. Brotman, G. Bai, J. Sakamoto, U.M. Schutte, X. Zhong, S.S. Koenig, L. Fu, Z.S. Ma, X. Zhou, Z. Abdo, L.J. Forney and J. Ravel (2012) . "Temporal dynamics of the human vaginal microbiota. " Sci Transl Med 4 (132) : 132-l52.
Gerber, S., Y. Vial, P. Hohlfeld and S.S. Witkin (2003) . "Detection ofUreaplasma urealyticum in second-trimester amniotic fluid by polymerase chain reaction correlates with subsequent preterm labor and delivery. " J Infect Dis 187 (3) : 518-521.
Gray, D. J., H.B. Robinson, J. Malone and R.B. Thomson, Jr. (1992) . "Adverse outcome in pregnancy following amniotic fluid isolation ofUreaplasma urealyticum. " Prenat Diagn 12 (2) : 111-117.
Hummelen, R., A.D. Fernandes, J.M. Macklaim, R.J. Dickson, J. Changalucha, G.B. Gloor and G. Reid (2010) . "Deep sequencing of the vaginal microbiota of women with HIV. " PLoS One 5 (8) : e12078.
Oh, K. J., S.E. Lee, H. Jung, G. Kim, R. Romero and B.H. Yoon (2010) . "Detection of ureaplasmas by the polymerase chain reaction in the amniotic fluid of patients with cervical insufficiency. " J Perinat Med 38 (3) : 261-268.
Owen, J., G. Hankins, J.D. lams, V. Berghella, J.S. Sheffield, A. Perez-Delboy, R.S. Egennan, D.A. Wing, M. Tomlinson, R. Silver, S.M. Ramin, E. R. Guzman, M. Gordon, H.Y. How, E.J. Knudtson, J.M. Szychowski, S. Cliver and J.C. Hauth (2009) . "Multicenter  randomized trial of cerclage for preterm birth prevention in high-risk women with shortened midtrimester cervical length. " Am J Obstet Gynecol 201 (4) : 375 e371-378.
Perni, S.C., S. Vardhana, I. Korneeva, S.L. Tuttle, L.R. Paraskevas, S.T. Chasen, R.B. Kalish and S.S. Witkin (2004) . "Mycoplasma hominis and Ureaplasma urealyticum in midtrimester amniotic fluid: association with amniotic fluid cytokine levels and pregnancy outcome. " Am J Obstet Gynecol 191 (4) : 1382-1386.
Quince, C., A. Lanzen, R.J. Davenport and P.J. Turnbaugh (2011) . "Removing noise from pyrosequenced amplicons. " BMC Bioinformatics 12: 38.
Ravel, J., P. Gajer, Z. Abdo, G.M. Schneider, S.S. Koenig, S.L. McCulle, S. Karlebach, R. Gorle, J. Russell, C.O. Tacket, R.M. Brotman, C.C. Davis, K. Ault, L. Peralta and L.J. Forney (2011) . "Vaginal microbiome of reproductive-age women. " Proc Natl Acad Sci USA 108 Suppl 1: 4680-4687.
Romero, R., R. Gonzalez, W. Sepulveda, F. Brandt, M. Ramirez, Y. Sorokin, M. Mazor, M.C. Treadwell and D.B. Cotton (1992) . "Infection and labor. VIII. Microbial invasion of the amniotic cavity in patients with suspected cervical incompetence: prevalence and clinical significance. " Am J Obstet Gynecol 167 (4 Pt 1) : 1086-1091.
Schloss, P.D., S.L. Westcott, T. Ryabin, J.R. Hall, M. Hartmann, E.B. Hollister, R.A. Lesniewski, B.B. Oakley, D.H. Parks, C.J. Robinson, J.W. Sahl, B. Stres, G.G., Thallinger, D.J. Van Horn and C.F. Weber (2009) . "Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. " Appl Environ Microbiol 75 (23) : 753 7-7541.
Storey, J.D. and R. Tibshirani (2003) . "Statistical significance for genomewide studies. " Proc Natl Acad Sci USA 100 (16) : 9440-9445.
Wang, Q., G.M. Garrity, J.M. Tiedje and J.R. Cole (2007) . ” Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. " Appl Environ Microbiol 73 (16) : 5261-5267.
White, J.R, N. Nagarajan and M. Pop. (2009) ″Statistical methods for detecting differentially abundant features in clinical metagenomic samples. " PLoS Comput Biol 5(4) : e1000352.
Yuan, S., D.B. Cohen, J. Ravel, Z. Abdo and L.J. Forney (2012) . "Evaluation of methods for the extraction and purification of DNA from the human microbiome. " PLoS One 7(3) : e33865.
All patents, patent applications, and other publications including sequences referred to by GenBank Accession Numbers cited in this application are incorporated by reference in the entirety for all purposes.
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, one of skill in the art will appreciate that certain changes and modifications may be practiced within the scope of the appended claims. In addition, each reference provided herein is incorporated by reference in its entirety to the same extent as if each reference was individually incorporated by reference.
Table 1. Key characteristics of all participants.
Figure PCTCN2015082044-appb-000004
Mann-Whitney rank Sum test for continuous variables. Fisher exact test for categorical variables.
**Gestational age at delivery, mode of delivery and birthweight.
Table 2A. Sequencing data of the differentially abundant (significantly increased) bacterial taxa identified by comparing all 342 taxa in the 19 dilated vs. the 13 closed cervices (normalized by random subsampling) .
Figure PCTCN2015082044-appb-000005
Adjusted for multiple testing by the False Discovery Rate method.
Table 2B. Sequencing data of the differentially abundant (significantly decreased) bacterial taxa identified by comparing all 342 taxa in the 19 dilated vs. the 13 closed cervices (normalized by random subsampling) .
Figure PCTCN2015082044-appb-000006
Adjusted for multiple testing by the False Discovery Rate method.
Table 2C. Sequencing data of the differentially abundant (significantly increased) bacterial taxa identified by comparing all 349 taxa in the 10 dilated vs. the 10 closed cervices (1:1 matched by gestational age at sample collection, matched to<2 weeks, normalized as ratio of total read count for each sample. ) .
Figure PCTCN2015082044-appb-000007
Adjusted for multiple testing by the False Discovery Rate method.
Table 3A. Genus level classification of the differentially abundant (significantly increased) bacterial taxa identified by comparing all 342 taxa in the 19 dilated vs. the 13 closed cervices (normalized by random subsampling) .
Figure PCTCN2015082044-appb-000008
Note: "Unclassified" genus represent novel genus not previously recorded in the database used for taxonomic classification, but newly identified by sequencing in this study.
Table 3B. Genus level classification of the differentially abundant (significantly decreased) bacterial taxa identified by comparing all 342 taxa in the 19 dilated vs. the 13 closed
Figure PCTCN2015082044-appb-000009
Note: "Unclassified" genus represent novel genus not previously recorded in the database used for taxonomic classification, but newly identified by sequencing in this study.
Table 3C. Genus level classification data of the differentially abundant (significantly increased) bacterial taxa identified by comparing all 349 taxa in the 10 dilated vs. the 10 closed cervices (1:1 matched by gestational age at sample collection, matched to<2 weeks, normalized as ratio of total read count for each sample. ) .
Figure PCTCN2015082044-appb-000010
Note: "Unclassified" genus represent novel genus not previously recorded in the database used for taxonomic classification, but newly identified by sequencing in this study.
Table 3D. Genus level classification of the differentially abundant (significantly decreased) bacterial taxa identified by comparing all 349 taxa in the 10 dilated vs. the 10 closed cervices (1:1 matched by gestational age at sample collection, matched to<2 weeks, normalized as ratio of total read count for each sample. ) .
Figure PCTCN2015082044-appb-000011
Note: "Unclassified" genus represent novel genus not previously recorded in the database used for taxonomic classification, but newly identified by sequencing in this study.
Table 4A. Sequences of the 16S rRNA gene of bacterial taxa of the differentially abundant (significantly increased) bacterial taxa identified by comparing all 342 taxa in the 19 dilated vs. the 13 closed cervices (normalized by random subsampling) .
Figure PCTCN2015082044-appb-000012
d-c-019 (SEQ ID NO: 3) ; d-c-030 (SEQ ID NO: 4) ; d-c-037 (SEQ ID NO: 5) ; d-c-040 (SEQ ID NO: 6) ; d-c-043 (SEQ ID NO: 7) ; d-c-045 (SEQ ID NO: 8) ; d-c-038 (SEQ ID NO: 9) ; d-c-047 (SEQ ID NO: 10) ; d-c-054 (SEQ ID NO: 11) .
Table 4B. Sequences of the 16S rRNA gene of bacterial taxa of the differentially abundant (significantly decreased) bacterial taxa identified by comparing all 342 taxa in the 19 dilated vs. the 13 closed cervices.
Figure PCTCN2015082044-appb-000013
d-c-067 (SEQ ID NO: 12) ; d-c-039 (SEQ ID NO: 3) ; d-c-052 (SEQ ID NO: 14) ; d-c-074 (SEQ ID NO: 15) ; d-c-082 (SEQ ID NO: 16) ; d-c-092 (SEQ ID NO: 17) ; d-c-098 (SEQ ID NO: 18) .
Table 4C. Sequence of the 16S rRNA gene of bacterial taxa of the differentially abundant (significantly increased bacterial taxa identified by comparing all 349 taxa in the 10 dilated vs. the 10 closed cervices (1:1 matched by gestational age at sample collection, matched to<2 weeks, normalized as ratio of total read count for each sample) .
Figure PCTCN2015082044-appb-000014
d-c-012 (SEQ ID NO: 19) ; d-c-030 (SEQ ID NO: 20) ; d-c-040 (SEQ ID NO: 21) ; d-c-047 (SEQ ID NO: 22) ; d-c-050 (SEQ ID NO: 23) ; d-c-053 (SEQ ID NO: 24) ; d-c-068 (SEQ ID NO: 25) ; d-c-071 (SEQ ID NO: 26) ; d-c-072 (SEQ ID NO: 27) ; d-c-081 (SEQ ID NO: 28) ; d-c-015 (SEQ ID NO: 29) ; d-c-0837 (SEQ ID NO: 30) ; d-c-087 (SEQ ID NO: 31) ; d-c-088 (SEQ ID NO: 32) ; d-c-105 (SEQ ID NO: 33) .
Table 4D. Sequences of the 16S rRNA gene of bacterial taxa differentially abundant (significantly decreased) bacterial taxa identified by comparing all 349 taxa in the 10 dilated vs. the 10 closed cervices (1:1 matched by gestational age at sample collection, matched to<2 weeks, normalized as ratio of total read count for each sample. ) .
Figure PCTCN2015082044-appb-000015
d-c-018 (SEQ ID NO: 34) ; d-c-052 (SEQ ID NO: 35) ; d-c-074 (SEQ ID NO: 36) ; d-c-076 (SEQ ID NO: 37) ; d-c-082 (SEQ ID NO: 38) .
Table 4E. Species level classification by BLAST alignment to the 16S rRNA database of NCBI. Differentially abundant (significantly increased) bacterial taxa identified by comparing all 342 taxa in the 19 dilated vs. the 13 closed cervices (normalized by random subsampling) were aligned.
Figure PCTCN2015082044-appb-000016
Notes:
represents taxon with sequence identity<97% as any known species and may represent novel, previously unreported, species, which are now identified by sequencing in this study.
Table 4F. Species level classification by BLAST alignment to the 16S rRNA database of NCBI. Differentially abundant (significantly decreased) bacterial taxa identified by comparing all 342 taxa in the 19 dilated vs. the 13 closed cervices.
Figure PCTCN2015082044-appb-000017
Notes:
represents taxon with sequence identity<97%as any known sequences and may represent novel, previously unreported, species, which are now identified by sequencing in this study.
Table 4G. Species level classification by BLAST alignment to the 16S rRNA database of NCBI. Differentially abundant (significantly increased) bacterial taxa identified by comparing all 349 taxa in the 10 dilated vs. the 10 closed cervices (1:1 matched by gestational age at sample collection, matched to<2 weeks, normalized as ratio of total read count for each sample. ) were aligned.
Figure PCTCN2015082044-appb-000018
Notes:
represents taxon with sequence identity<97%as any known sequences and may represent novel, previously unreported, species, which are now identified by sequencing in this study.
Table 4H. Species level classification by BLAST alignment to the 16S rRNA database of NCBI. Differentially abundant (significantly decreased) bacterial taxa identified by comparing all 349 taxa in the 10 dilated vs. the 10 closed cervices (1:1 matched by gestational age at sample collection, matched to<2 weeks, normalized as ratio of total read count for each sample. ) are aligned.
Figure PCTCN2015082044-appb-000019
Notes:
NA represents no significant BLAST match in the 16S ribosomal RNA database, and may represent novel, previously unreported, species, which are now identified by sequencing in this study.
represents taxon with identity<97%and may represent novel, previously unreported, species, which are now identified by sequencing in this study.
Table 5A. Association between the proposed test, DIBT1, and adverse pregnancy or neonatal outcomes. Sample with one or more selected taxa in the DIBT1 is considered as tested positive.
Figure PCTCN2015082044-appb-000020
*p<0.05
Table 5B. Association between the proposed test, DIBT2, and adverse pregnancy or neonatal outcomes. Sample with one or more selected taxa in the DIBT2 is considered as tested positive.
Figure PCTCN2015082044-appb-000021
*P<0.05
Table 6. Characteristics of participants in study B.
Figure PCTCN2015082044-appb-000022
T-test and Fisher exact test for continuous and discontinuous variables, respectively.
Table 7. Differentially abundant (significantly increased) bacterial taxa in the short cervix group, compared wtth the normal-length cervix group. (normalized by random subsampling. P-values are adjusted for multiple testing by the False Discovery Rate method
Figure PCTCN2015082044-appb-000023
Table 8. Genomic sequence of the 16S rRNA gene in the differentially abundant (significantly increased) bacterial taxa in the short cervix group, compared wtth the normal-length cervix group. (normalized by random subsampling.
Figure PCTCN2015082044-appb-000024
Table 8. Continued from previous page.
Figure PCTCN2015082044-appb-000025
s-n-006 (SEQ ID NO: 39) ; s-n-007 (SEQ ID NO: 40) ; s-n-008 (SEQ ID NO: 41) ; s-n-012 (SEQ ID NO: 42) ; s-n-014 (SEQ ID NO: 43) ; s-n-022 (SEQ ID NO: 44) ; s-n-024 (SEQ ID NO: 45) ; s-n-025 (SEQ ID NO: 46) ; s-n-027 (SEQ ID NO: 47) ; s-n-028 (SEQ ID NO: 48) ; s-n-029 (SEQ ID NO: 49) ; s-n-030 (SEQ ID NO: 50) ; s-n-046 (SEQ ID NO: 51) ; s-n-054 (SEQ ID NO: 52) ; s-n-063 (SEQ ID NO: 53) ; s-n-068 (SEQ ID NO: 54) ; s-n-071 (SEQ ID NO: 55) .
Table 9. Species level classification of by BLAST alignment to the 16S rRNA database at NCBI. The 16S rRNA gene in the differentially abundant (significantly increased) bacterial taxa in the short cervix group, compared wtth the normal-length cervix group were aligned. *indicates taxon with<97%nucleotide identity with known sequences and may represent novel, previously unreported, taxon.
Figure PCTCN2015082044-appb-000026
Table 10. Association between the differentially abundant (significantly increased) taxa identified in Study B, and adverse pregnancy or neonatal outcomes. Sample with one or more selected taxa in the DIBT1 is considered as tested positive.
Figure PCTCN2015082044-appb-000027
P<0.05
Table 11. Bacteria taxa comprising LA6
Figure PCTCN2015082044-appb-000028
*Percent identity between the 16S rRNA gene nucleotide sequence of the bacterial taxon identified in our study and that of the nearest BLAST match as represented by the NCBI GenBank accession number.
Figure PCTCN2015082044-appb-000029
Figure PCTCN2015082044-appb-000030
Figure PCTCN2015082044-appb-000031
Figure PCTCN2015082044-appb-000032
Figure PCTCN2015082044-appb-000033
Figure PCTCN2015082044-appb-000034
Figure PCTCN2015082044-appb-000035

Claims (24)

  1. A method for determining the risk of an adverse pregnancy or neonatal outcome for a subject, said method comprising:
    (a) detecting in a biological sample taken from the subject the level of bacteria belonging to at least one bacterial taxon selected from the group consisting of Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum, Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92% sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1, Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 90% sequence identity to GenBank Accession No. JF295520.1, Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9; and
    (b) determining that the subject has an increased risk for an adverse pregnancy or neonatal outcome if the level of bacteria belonging to the at least one bacterial taxon is greater than a standard control level.
  2. The method of claim 1, wherein at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 , 16 , 17 , 18, 19 or 20 different bacterial taxa are detected.
  3. The method of claim 1, wherein the subject is a pregnant woman or a non-pregnant woman.
  4. The method of claim 1, wherein biological sample is a cervical swab, a vaginal swab, a urine sample, an amniotic fluid sample, a maternal blood sample, a maternal serum sample, a maternal plasma sample, a cervical mucus sample, a placental swab, an umbilical cord swab or any sample taken directly or indirectly from the reproductive system or the gastrointestinal system.
  5. The method of claim 1, wherein the detecting step comprises a polynucleotide amplification assay, an assay involving polynucleotide sequence determination, or an assay involving sequence-specific probe/primer hybridization.
  6. The method of claim 5, wherein the amplification assay is a polymerase chain reaction (PCR) assay.
  7. The method of claim 6, wherein the PCR assay is a quantitative PCR assay or a reverse-transcriptase PCR assay.
  8. The method of claim 1, wherein the adverse pregnancy or neonatal outcome comprises preterm birth at < 34 weeks, preterm birth at < 37 weeks, delivery within about 1-196 days after the biological sample is taken, delivery within about 1-196 days after a clinical intervention is performed, an Apgar score at 1 minute of < 7, an Apgar score at 5 minutes of < 7, chorioamnionitis, respiratory distress syndrome, bronchopulmonary dysplasia, intraventricular hemorrhage, neonatal death within 7 days after birth or neonatal sepsis.
  9. The method of claim 1, further comprising determining that the subject has a risk of having advanced cervical dilation or premature cervical shortening if the level of bacteria belonging the at least one bacterial taxon is greater than a standard control level.
  10. The method of claim 1, further comprising determining that the subject is at risk of having an infection in the amniotic cavity, uterine cavity, cervix or vagina.
  11. The method of claim 1, further comprising extracting nucleic acids from the biological sample prior to step (a) .
  12. The method of claim 1, when the subject is predicted of having an adverse pregnancy or neonatal outcome, further comprising repeating steps (a) and (b) at a later time using another biological sample of the sample type from the subject, wherein an increase in the level of bacteria belonging to the at least one bacterial taxon at the later time as compared to the level determined in the original step (a) indicates an increased risk of having an adverse pregnancy or neonatal outcome.
  13. The method of claim 1, further comprising:
    detecting in the biological sample the level of bacteria belonging to at least one bacterial taxon selected from the group consisting of Jonquetella anthropi, Aerococcusurinae, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4B and 4D; and a bacterial taxon specified by the nearest species classification based on BLAST nucleotide alignment in Tables 4F and 4H and
    determining that the subject has an increased risk for an adverse pregnancy or neonatal outcome if the level of bacteria belonging to the at least one bacterial taxon is lower than a standard control level.
  14. A kit for determining the risk of having an adverse pregnancy or neonatal outcome in a subject, comprising
    (a) a standard control that provides a biological sample taken from a pregnant subject containing bacteria belonging to at least one bacterial taxon selected from the group consisting of Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92% sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1, Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, uncultured bacteria having at least 90% sequence identity to the nucleotide sequence of GenBank Accession No. JF295520.1, Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, and a bacterial taxon by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9; and
    (b) one or more agents that specifically and quantitatively identify bacteria belonging to at least one bacterial taxon selected from the group consisting of Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having at least 92% sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1, Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, uncultured bacteria having a 16S rRNA  nucleotide sequence with at least 90% sequence identity to the nucleotide sequence of GenBank Accession No. JF295520.1, Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, and a bacterial taxon by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9.
  15. The kit of claim 14, wherein the agent is one or more oligonucleotide primers that specifically hybridizes to and amplify a polynucleotide of the at least one bacterial taxon in an amplification assay.
  16. The kit of claim 14, wherein the agent is a polynucleotide probe that specifically hybridizes to a polynucleotide sequence of the at least one bacterial taxon.
  17. The kit of claim 14, further comprising an instruction manual.
  18. A method for determining whether a pregnant subject has an increased risk of having advanced cervical dilation or premature cervical shortening, said method comprises:
    (a) detecting in a biological sample taken from the subject the level of bacteria belonging to at least one bacterial taxon selected from the group consisting of Megasphaera cerevisiae, Alloscardovia omnicolens, Ureaplasma urealyticum, Ureaplasma parvum, Atopobium vaginae, Parvibacter caecicola, Lactobacillus casei, Veillonella montpellierensis, Anaerococcus senegalensis, Bulleidia extructa, Mycoplasma hominis, Propionimicrobium lymphophilum, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 92% sequence identity to the nucleotide sequence of GenBank Accession No. JQ781443.1, Corynebacterium pyruviciproducens, Megasphaera cerevisiae, Acidipila rosea, Murdochiella asaccharolytica, Parvimonas micra, Peptoniphilus lacrimalis, uncultured bacteria having a 16S rRNA nucleotide sequence with at least 90% sequence identity to the nucleotide sequence of GenBank Accession No. JF295520.1, Howardella ureilytica, Actinobaculum schaalii, Peptoniphilus duerdenii, Fastidiosipila sanguinis, Sneathia sanguinegens, Parvimonas micra, Peptoniphilus lacrimalis, a bacterial taxon specified by a 16S rRNA nucleotide sequence in Tables 4A, 4C and 8, and a bacterial taxon by the nearest species classification based on BLAST nucleotide alignment in Tables 4E, 4G and 9; and
    (b) determining that the subject has an increased risk of having advanced cervical dilation or premature cervical shortening if the level of bacteria belonging to the at least one bacterial taxon is greater than that of a standard control level.
  19. The method of claim 18, further comprising extracting nucleic acids from the biological sample prior to performing step (a) .
  20. The method of claim 18, wherein at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 different bacterial taxa are detected.
  21. The method of claim 18, wherein biological sample is a cervical swab, a vaginal swab, a urine sample, an amniotic fluid sample, a maternal blood sample, a maternal serum sample, a maternal plasma sample, a cervical mucus sample, a placental swab, an umbilical cord swab or any sample taken directly or indirectly from the reproductive system or the gastrointestinal system..
  22. The method of claim 18, wherein the detecting step comprises a polynucleotide amplification assay, an assay involving polynucleotide sequence determination, or an assay involving sequence-specific probe/primer hybridization.
  23. The method of claim 22, wherein the amplification assay is a polymerase chain reaction (PCR) assay.
  24. The method of claim 23, wherein the PCR assay is a quantitative PCR assay or a reverse-transcriptase PCR assay.
PCT/CN2015/082044 2014-06-30 2015-06-23 Detecting bacterial taxa for predicting adverse pregnancy outcomes WO2016000539A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US15/317,021 US10683557B2 (en) 2014-06-30 2015-06-23 Detecting bacterial taxa for predicting adverse pregnancy outcomes
EP15814937.7A EP3161167B1 (en) 2014-06-30 2015-06-23 Detecting bacterial taxa for predicting adverse pregnancy outcomes
KR1020167036942A KR20170020382A (en) 2014-06-30 2015-06-23 Detecting bacterial taxa for predicting adverse pregnancy outcomes
ES15814937T ES2767527T3 (en) 2014-06-30 2015-06-23 Detection of bacterial taxa to predict adverse pregnancy outcomes
JP2017500020A JP6539721B2 (en) 2014-06-30 2015-06-23 Detection of bacterial taxa to predict adverse pregnancy outcomes
KR1020207002520A KR102207858B1 (en) 2014-06-30 2015-06-23 Detecting bacterial taxa for predicting adverse pregnancy outcomes

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201462018920P 2014-06-30 2014-06-30
US62/018,920 2014-06-30

Publications (1)

Publication Number Publication Date
WO2016000539A1 true WO2016000539A1 (en) 2016-01-07

Family

ID=55018426

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/082044 WO2016000539A1 (en) 2014-06-30 2015-06-23 Detecting bacterial taxa for predicting adverse pregnancy outcomes

Country Status (6)

Country Link
US (1) US10683557B2 (en)
EP (1) EP3161167B1 (en)
JP (2) JP6539721B2 (en)
KR (2) KR102207858B1 (en)
ES (1) ES2767527T3 (en)
WO (1) WO2016000539A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101831416B1 (en) 2017-09-04 2018-02-22 이화여자대학교 산학협력단 Prediction of a risk of preterm delivery using differential microbial community in a sample
WO2018045359A1 (en) * 2016-09-02 2018-03-08 Karius, Inc. Detection and treatment of infection during pregnancy
KR101872457B1 (en) * 2018-04-20 2018-06-28 이화여자대학교 산학협력단 Prediction of a risk of preterm delivery using differential microbial community in a blood sample
WO2018129043A1 (en) * 2017-01-03 2018-07-12 The Trustees Of The University Of Pennsylvania Compositions and methods for predicting risk of preterm birth
JP2019511922A (en) * 2016-02-16 2019-05-09 タタ コンサルタンシー サービシズ リミテッドTATA Consultancy Services Limited Methods and systems for early risk assessment for preterm birth outcomes
WO2019100113A1 (en) 2017-11-24 2019-05-31 The University Of Western Australia Infection-related preterm birth diagnostic method
CN110423804A (en) * 2019-08-12 2019-11-08 中国福利会国际和平妇幼保健院 A kind of the biomarker set and screening method of screening missed abortion risk
WO2020178297A1 (en) * 2019-03-04 2020-09-10 Fundación Para La Investigación Biomédica Del Hospital Ramón Y Cajal Anal bacterial biomarkers for the diagnosis of anal precancerous lesions
CN113348367A (en) * 2018-10-31 2021-09-03 卡尤迪医学检验实验室(北京)有限公司 Methods, systems and kits for predicting preterm labor status
RU2793917C2 (en) * 2017-11-24 2023-04-10 Зе Юниверсити Оф Уэстерн Острейлиа Method for diagnosing infection-related preterm birth

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11959125B2 (en) 2016-09-15 2024-04-16 Sun Genomics, Inc. Universal method for extracting nucleic acid molecules from a diverse population of one or more types of microbes in a sample
US20200308627A1 (en) * 2016-09-15 2020-10-01 Sun Genomics, Inc. Universal method for extracting nucleic acid molecules from a diverse population of one or more types of microbes in a sample
JPWO2021112237A1 (en) * 2019-12-06 2021-06-10
JPWO2021112236A1 (en) * 2019-12-06 2021-06-10
CN113186311B (en) * 2021-04-27 2022-05-10 中国医学科学院北京协和医院 Application of vaginal microorganism in differential diagnosis of chronic pelvic pain syndrome

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999039007A1 (en) * 1998-01-30 1999-08-05 The Uab Research Foundation NUCLEIC ACID PROBES AND METHOD FOR DETECTING $i(UREAPLASMA UREALYTICUM)

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7625697B2 (en) * 1994-06-17 2009-12-01 The Board Of Trustees Of The Leland Stanford Junior University Methods for constructing subarrays and subarrays made thereby
AUPR512401A0 (en) * 2001-05-18 2001-06-14 Queensland University Of Technology Adherent entities and uses therefor

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999039007A1 (en) * 1998-01-30 1999-08-05 The Uab Research Foundation NUCLEIC ACID PROBES AND METHOD FOR DETECTING $i(UREAPLASMA UREALYTICUM)

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
AAGAARD, K. ET AL.: "A metagenomic approach to characterization of the vaginal microbiome signature in pregnancy", PLOS ONE, vol. 7, no. 6, 30 June 2012 (2012-06-30), pages e36466, XP055250082 *
CLAESSON, M.J. ET AL.: "Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions", NUCLEIC ACIDS RESEARCH, vol. 38, no. 22, 30 December 2010 (2010-12-30), pages 1 - 13, XP055250083 *
GAJER, P. ET AL.: "Temporal dynamics of the human vaginal microbiota", SCI TRANSL MED, vol. 132, no. 4, 2 May 2012 (2012-05-02), pages 132 - 152, XP055241847 *
GERBER, S. ET AL.: "Detection of Ureaplasma urealyticum in Second-Trimester Amniotic Fluid by Polymerase Chain Reaction Correlates with Subsequent Preterm Labor and Delivery", J INFECT DIS, vol. 187, no. 3, 1 February 2003 (2003-02-01), pages 518 - 521, XP055250002 *
GRAY, D.J. ET AL.: "Adverse outcome in pregnancy following amniotic fluid isolation of Ureaplasma urealyticum.", PRENATAL DIAGNOSIS, vol. 12, no. 2, 31 December 1992 (1992-12-31), pages 111 - 117, XP055250071 *
PERNI, S.C. ET AL.: "Mycoplasma hominis and Ureaplasma urealyticum in midtrimester amniotic fluid: Association with amniotic fluid cytokine levels and pregnancy outcome", AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, vol. 191, no. 4, 31 December 2004 (2004-12-31), pages 1382 - 1386, XP004621301 *
See also references of EP3161167A4 *
WANG, KEDI ET AL.: "Diversity of Human Vaginal Bacterial Communities in Healthy Women and Bacterial Vaginosis", LABELED IMMUNOASSAYS & CLIN MED, vol. 18, no. 6, 31 December 2011 (2011-12-31), pages 402 - 407, XP008185614 *
XU, SURONG ET AL.: "Illumina sequencing 16S rRNA tagging reveals diverse vaginal microbiomes associated with bacterial vaginosis", J SOUTH MED UNIV, vol. 33, no. 5, 10 May 2013 (2013-05-10), pages 672 - 677, XP055249822 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019511922A (en) * 2016-02-16 2019-05-09 タタ コンサルタンシー サービシズ リミテッドTATA Consultancy Services Limited Methods and systems for early risk assessment for preterm birth outcomes
EP3416653B1 (en) * 2016-02-16 2022-05-25 Tata Consultancy Services Limited Method and system for early risk assessment of preterm delivery outcome
JP7060518B2 (en) 2016-02-16 2022-04-26 タタ コンサルタンシー サービシズ リミテッド Methods and systems for early risk assessment of preterm labor outcomes
WO2018045359A1 (en) * 2016-09-02 2018-03-08 Karius, Inc. Detection and treatment of infection during pregnancy
WO2018129043A1 (en) * 2017-01-03 2018-07-12 The Trustees Of The University Of Pennsylvania Compositions and methods for predicting risk of preterm birth
US20190339269A1 (en) * 2017-01-03 2019-11-07 The Trustees Of The University Of Pennsylvania Compositions and methods for predicting risk of preterm birth
KR101831416B1 (en) 2017-09-04 2018-02-22 이화여자대학교 산학협력단 Prediction of a risk of preterm delivery using differential microbial community in a sample
EP3714061A4 (en) * 2017-11-24 2021-08-11 The University Of Western Australia Infection-related preterm birth diagnostic method
RU2793917C2 (en) * 2017-11-24 2023-04-10 Зе Юниверсити Оф Уэстерн Острейлиа Method for diagnosing infection-related preterm birth
WO2019100113A1 (en) 2017-11-24 2019-05-31 The University Of Western Australia Infection-related preterm birth diagnostic method
WO2019203422A1 (en) * 2018-04-20 2019-10-24 이화여자대학교 산학협력단 Preterm birth risk prediction using change in blood microbial community
KR101872457B1 (en) * 2018-04-20 2018-06-28 이화여자대학교 산학협력단 Prediction of a risk of preterm delivery using differential microbial community in a blood sample
CN113348367A (en) * 2018-10-31 2021-09-03 卡尤迪医学检验实验室(北京)有限公司 Methods, systems and kits for predicting preterm labor status
WO2020178297A1 (en) * 2019-03-04 2020-09-10 Fundación Para La Investigación Biomédica Del Hospital Ramón Y Cajal Anal bacterial biomarkers for the diagnosis of anal precancerous lesions
CN110423804A (en) * 2019-08-12 2019-11-08 中国福利会国际和平妇幼保健院 A kind of the biomarker set and screening method of screening missed abortion risk

Also Published As

Publication number Publication date
KR102207858B1 (en) 2021-01-26
JP6539721B2 (en) 2019-07-03
EP3161167A1 (en) 2017-05-03
KR20170020382A (en) 2017-02-22
EP3161167B1 (en) 2019-11-20
JP2017519514A (en) 2017-07-20
JP2019068853A (en) 2019-05-09
EP3161167A4 (en) 2018-04-18
KR20200011617A (en) 2020-02-03
US20170114396A1 (en) 2017-04-27
ES2767527T3 (en) 2020-06-17
US10683557B2 (en) 2020-06-16

Similar Documents

Publication Publication Date Title
EP3161167B1 (en) Detecting bacterial taxa for predicting adverse pregnancy outcomes
Fang et al. Barcoded sequencing reveals diverse intrauterine microbiomes in patients suffering with endometrial polyps
Shipitsyna et al. Composition of the vaginal microbiota in women of reproductive age–sensitive and specific molecular diagnosis of bacterial vaginosis is possible?
US10947593B2 (en) Biomarkers for premature birth
WO2016095789A1 (en) Detecting bacterial taxa for predicting preterm birth after clinical intervention
US20220251655A1 (en) Mitochondrial dna deletions associated with endometriosis
CN112368398A (en) Predicting risk of preterm birth using changes in blood microflora
WO2018062743A1 (en) Composition containing intravaginal microorganisms
CN113186311B (en) Application of vaginal microorganism in differential diagnosis of chronic pelvic pain syndrome
JP2017209063A (en) Method for identification and detection of microorganisms associated with chorioamnionitis, primer set and assay kit for detection of chorioamnionitis-associated microorganisms, and method for detecting chorioamnionitis
JP2017189166A (en) Method for diagnosing chronic pyoderma and diagnostic kit for chronic pyoderma
CN114606317B (en) Flora marker for predicting lymph node metastasis of gastric cancer and application thereof
Tonen-Wolyec et al. Analytical performances of a point-of-care loop-mediated isothermal amplification assay to detect Group B Streptococcus in intrapartum pregnant women living in the Democratic Republic of the Congo
US20170260571A1 (en) Methods and materials for treating endometrial cancer
US20240060144A1 (en) Composition and method for predicting preterm birth using microbiome analysis of vaginal fluid of pregnant women
WO2021112237A1 (en) Method for predicting onset of chorioamnionitis
US20230035343A1 (en) Method for predicting onset of chorioamnionitis
JP2024033740A (en) Testing method for cervical cancer or precancerous lesions of cervical cancer
CN113943810A (en) Reagent and kit for detecting endometrial cancer
Maldonado-Barrueco et al. Utility of culture and molecular methods using AllplexTM Bacterial Vaginosis Plus Assay (SeegeneⓇ) as a tool for endometriosis, infertility and recurrent pregnancy loss diagnosis
CN111733265A (en) Vaginal flora associated with recurrent abortion and application thereof
EP2630261A1 (en) Methods and biomarkers for detection of bladder cancer
Zendjabil et al. EVALUATION OF MIR-21-3P, MIR-96-5P AND MIR-155-5P IN PLASMA FOR EARLY DETECTION OF BREAST CANCER

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15814937

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 15317021

Country of ref document: US

ENP Entry into the national phase

Ref document number: 20167036942

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017500020

Country of ref document: JP

Kind code of ref document: A

REEP Request for entry into the european phase

Ref document number: 2015814937

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015814937

Country of ref document: EP