[Title established by the ISA under Rule 37.2] METHOD FOR DETERMINING BACTERIAL COMPOSITION OF VAGINAL MICROBIOME
Field of The Invention
The present invention relates to the technical field of microbial gene sequencing analysis, in particular, relates to a method for determining the bacterial composition of vaginal microbiome in a subject, as well as a forward primer and a 16S rDNA sequencing method used therein, wherein the subject is an Asian woman, preferably a woman from China, North Korea, South Korea, Japan, Philippines, Vietnam, Laos, or Cambodia, and more preferably a Chinese woman.
Background of The Invention
Vaginal microbiome has been recognized as a critical factor involved in the protection of females from various bacterial, fungal and viral pathogens (Garcia-Velasco JA et al., Reprod Biomed Online 2017, 35 (1) : 103-12) . In clinical studies, the bacterial composition of vaginal microbiome is roughly estimated by morphology and manually counting. Another way is based on conventional culture methods, which may overestimate the flora that can be cultured, and some fastidious bacteria are often overlooked (Relman DA, J Infect Dis 2002, 186 Suppl 2: S254-8) . With the advent of high-throughput sequencing methods, more and more studies have proposed 16S rDNA sequencing to estimate the bacterial composition of vaginal microbiome (Ravel J et al., Proc Natl Acad Sci U S A 2011, 108 Suppl 1: 4680-7; Ravel J et al., Microbiome 2013, 1 (1) : 29; and Tamarelle J et al., Sex Transm Infect 2018, 94 (8) : 616-8) .
Both biological and technical factors could affect the estimation of vaginal microbiome when 16S rDNA sequencing methods were utilized. On one hand, among various biological factors that affect the bacterial composition of vaginal microbiome, ethnic groups play an important role as they reflect the baseline of clinical diagnosis. Fettweis et al. found significant differences in the vaginal microbiomes of African American women and women of European ancestry (Fettweis JM et al., Microbiology 2014, 160 (Pt 10) : 2272-82) . Ravel et al. found that the characteristics of Lactobacillus in different ethnic groups (white, black, Hispanic, and Asian) were significantly different from their vaginal health status (Ravel J et al., Proc Natl Acad Sci U S A 2011, 108 Suppl 1: 4680-7) . On the other hand, although multiple 16S rDNA sequencing protocols have been applied in vaginal microbiome studies, not a single protocol had been proved universal for diverse ethnic groups. The differences among these protocols include PCR primer sequences (Ravel J et al., Proc Natl Acad Sci U S A 2011, 108 Suppl 1: 4680-7; Fettweis JM et al., BMC Genomics 2012, 13 Suppl 8: S17; Fadrosh DW et al., Microbiome 2014, 2 (1) : 6; and Srinivasan S et al., PLoS One 2012; 7 (6) : e37818) , target regions (Relman DA, J Infect Dis 2002, 186 Suppl 2: S254-8; Ravel J et al., Proc Natl Acad Sci U S A 2011, 108 Suppl 1: 4680-7; Ravel J et al., Microbiome 2013, 1 (1) : 29; Muzny CA et al., J Infect Dis 2018, 218 (6) : 966-78; and Mehta SD et al., Sci Rep 2017, 7 (1) : 15475) , sequencing platforms (Relman DA, J Infect Dis 2002, 186 Suppl 2: S254-8; and Gajer P et al., Sci Transl Med 2012, 4 (132) : 132ra52) , and reference databases (Van Der Pol WJ et al., J Infect Dis 2019, 219 (2) : 305-14) .
16S rDNA sequencing has been used to identify the bacterial composition of the human vaginal microbiome in multiple ethnic groups, but the study on the Chinese population's vaginal microbiome is still insufficient. In addition, no studies have examined whether different 16S rDNA sequencing protocols are an unbiased way to identify vaginal microbes. Due to the differences among ethnic groups and differences among 16S rDNA protocols, it is still unclear which 16S rDNA sequencing protocol can be best applied to the vaginal microbiome of Asian women, particularly Chinese women.
Accordingly, there is a need in the art for a method for rapidly and accurately determining the bacterial composition of vaginal microbiome in an Asian woman, particularly a Chinese woman.
Summary of The Invention
The present invention in some embodiments is directed to methods for determining the bacterial composition of vaginal microbiome in a subject, comprising applying a 16S rDNA sequencing method to a vaginal secretion sample from the subject, and processing the sequencing data obtained by the 16S rDNA sequencing method. In particular, the 16S rDNA sequencing method described above comprises amplifying the V1-V2 hyper-variable region of the 16S rDNA with a primer set comprising a forward primer 27F’ set forth in SEQ ID NO. 1 and a reverse primer. The present invention in some embodiments is directed to a 16S rDNA sequencing method comprising amplifying the V1-V2 hyper-variable region of the 16S rDNA with a primer set comprising a forward primer 27F’ set forth in SEQ ID NO. 1 and a reverse primer. Furthermore, the present invention in some embodiments is directed to a forward primer 27F’ set forth in SEQ ID NO. 1, and use thereof in a method for determining the bacterial composition of vaginal microbiome in a subject. In some preferred embodiments, the subject is an Asian woman, preferably a woman from China, North Korea, South Korea, Japan, Philippines, Vietnam, Laos, or Cambodia, and more preferably, a Chinese woman.
The summary above is not intended to describe each disclosed embodiment or every implementation of the present invention. These and other aspects of the present invention will become more readily apparent to those of ordinary skill in the art when reference is made to the following detailed description.
Brief Description of The Drawings
Figure 1: PCR primer fetching efficacy and target region identity quantification, wherein
Figure 1A: primer efficiency was quantified by the alignment of primer sequence to the reference sequences; wherein in X-axis, two reference databases were used, i.e., SLIVA and NCBI 16S Microbial, and the Y-axis showed the percentage of aligned reference sequences by certain primer sequences, including 27F’ (blue) , 27F (orange) , 338R (grey) , 341F (yellow) and 805R (dark blue) ;
Figure 1B: number of identical sequences shared by two different species was shown in bar plot, wherein the X-axis represents the reference database used; and
Figure 1C: alignment of Lactobacillus crispatus and Lactobacillus gallinarum at V3-V4 region.
Figure 2: Comparison of the 16S rDNA sequencing results from 27F-338R, 27F’-338R and 341F-806R protocols, wherein
Figure 2A: the top ten bacteria from the BV group were shown, wherein three protocols were compared, i.e., 27F-338R (blue) , 27F’-338R (orange) and 341F-806R (grey) ; and
Figure 2B: like Figure 2A, the top ten bacteria from the healthy group were showed, wherein three protocols, i.e., 27F-338R (blue) , 27F’-338R (orange) and 341F-806R (grey) , were compared.
Figure 3: Heatmap and dendrogram of the vaginal compositions from 28 healthy and 10 BV samples, wherein the vaginal compositions from 28 healthy and 10 BV samples utilizing 27F’-338R protocol were clustered and colored by relative abundance (from low to high abundance, color changes from green to red) .
Figure 4: Morphology of samples under 400× magnification after gram staining, wherein Figure 4A represents 28 normal samples, and Figure 4B represents 10 BV samples.
Figure 5: qPCR validation of the existence of Lactobacilli and Gardnerella vaginalis, wherein 10 vaginal microbiome samples from healthy women (highlighted in blue) and 5 vaginal microbiome samples from women with BV (highlighted in orange) were sampled and used to perform qPCR validation, and the difference between the Cq values of Lactobacilli and Gardnerella vaginalis was used.
Detailed Description of The Invention
In a first aspect, the present invention provides a method for determining the bacterial composition of vaginal microbiome in a subject, comprising applying a 16S rDNA sequencing method to a vaginal secretion sample from the subject, and processing the sequencing data obtained by the 16S rDNA sequencing method, wherein the 16S rDNA sequencing method comprises amplifying the V1-V2 hyper-variable region of the 16S rDNA with a primer set comprising a forward primer 27F’ set forth in SEQ ID NO. 1 and a reverse primer. In some embodiments, in the method for determining the bacterial composition of vaginal microbiome in a subject according to the first aspect of the present invention, the reverse primer is a reverse primer 338R set forth in SEQ ID NO. 2. In some embodiments, in the method for determining the bacterial composition of vaginal microbiome in a subject according to the first aspect of the present invention, the subject is a healthy woman or a woman with bacterial vaginosis. In some embodiments, in the method for determining the bacterial composition of vaginal microbiome in a subject according to the first aspect of the present invention, the subject is an Asian woman. In some preferred embodiments, in the method for determining the bacterial composition of vaginal microbiome in a subject according to the first aspect of the present invention, the subject is a woman from China, North Korea, South Korea, Japan, Philippines, Vietnam, Laos, or Cambodia. In some preferred embodiments, in the method for determining the bacterial composition of vaginal microbiome in a subject according to the first aspect of the present invention, the subject is a Chinese woman.
In a second aspect, the present invention provides a 16S rDNA sequencing method, comprising amplifying the V1-V2 hyper-variable region of the 16S rDNA with a primer set comprising a forward primer 27F’ set forth in SEQ ID NO. 1 and a reverse primer. In some embodiments, in the 16S rDNA sequencing method according to the second aspect of the present invention, the reverse primer is a reverse primer 338R set forth in SEQ ID NO. 2. In some embodiments, the 16S rDNA sequencing method according to the second aspect of the present invention is used for a method for determining the bacterial composition of vaginal microbiome in a subject. In some embodiments, the subject is a healthy woman or a woman with bacterial vaginosis. In some preferred embodiments, the subject is an Asian woman. In some preferred embodiments, the subject is a woman from China, North Korea, South Korea, Japan, Philippines, Vietnam, Laos, or Cambodia. In some more preferred embodiments, the subject is a Chinese woman.
In a third aspect, the present invention provides a primer set for 16S rDNA sequencing comprising a forward primer 27F’ set forth in SEQ ID NO. 1, and a reverse primer. In some embodiments, in the primer set according to the third aspect of the present invention, the reverse primer is a reverse primer 338R set forth in SEQ ID NO. 2. In some embodiments, the forward primer according to the third aspect of the present invention is used for a method for determining the bacterial composition of vaginal microbiome in a subject. In some preferred embodiments, the subject is a healthy woman or a woman with bacterial vaginosis. In some preferred embodiments, the subject is an Asian woman. In some preferred embodiments, the subject is a woman from China, North Korea, South Korea, Japan, Philippines, Vietnam, Laos, or Cambodia. In some more preferred embodiments, the subject is a Chinese woman.
In a fourth aspect, the present invention relates to use of the primer set according to the third aspect of the present invention in a method for determining the bacterial composition of vaginal microbiome in a subject. In some embodiments, in the use according to the fourth aspect of the present invention, the subject is a healthy woman or a woman with bacterial vaginosis. In some embodiments, in the use according to the fourth aspect of the present invention, the subject is an Asian woman. In some preferred embodiments, in the use according to the fourth aspect of the present invention, the subject is a woman from China, North Korea, South Korea, Japan, Philippines, Vietnam, Laos, or Cambodia. In some more preferred embodiments, in the use according to the fourth aspect of the present invention, the subject is a Chinese woman.
In a fifth aspect, the present invention relates to use of the primer set according to the third aspect of the present invention for the preparation of an agent for 16S rDNA sequencing.
In a sixth aspect, the present invention relates to use of the primer set according to the third aspect of the present invention for the preparation of an agent for determining the bacterial composition of vaginal microbiome in a subject. In some embodiments, the subject is a healthy woman or a woman with bacterial vaginosis. In some preferred embodiments, the subject is an Asian woman. In some preferred embodiments, the subject is a woman from China, North Korea, South Korea, Japan, Philippines, Vietnam, Laos, or Cambodia. In some more preferred embodiments, the subject is a Chinese woman.
The following explanations of terms are provided to better describe the present invention and to guide those of ordinary skill in the art in the practice of the present invention. As used herein and in the appended claims, the singular forms “a” , “an” or “the” include plural references unless the context clearly dictates otherwise.
Unless specified otherwise, all the technical and scientific terms used herein have the same meanings as commonly understood to those of ordinary skill in the art to which this disclosure belongs. In case of conflict, the present specification, including explanations of terms, will control. In order to facilitate review of the various embodiments of the present invention, the following explanations of specific terms are provided.
The term “primer” as used herein refers to short nucleic acids, such as DNA oligonucleotides of at least 10 nucleotides in length. A primer can be annealed to a complementary target DNA strand by nucleic acid hybridization to form a hybrid between the primer and the target DNA strand, and then extended along the target DNA strand by a DNA polymerase enzyme. Primer pairs can be used for amplification of a nucleic acid sequence, e.g., by the polymerase chain reaction (PCR) , or other nucleic-acid amplification methods known in the art.
The term “sample” as used herein refers to a material to be analyzed. In some embodiments, a sample is a biological sample, such as a vaginal secretion sample.
The term “subject” as used herein refers to any organism, for example, a mammalian subject, such as a human. In some embodiments, the subject is a healthy woman or a woman with bacterial vaginosis, preferably an Asian woman, more preferably a woman from China, North Korea, South Korea, Japan, Philippines, Vietnam, Laos, or Cambodia, and most preferably, a Chinese woman.
The term “16S rDNA” as used herein refers to a DNA sequence that codes for 16S ribosomal RNA (rRNA) , a component of the 30S small subunit of a prokaryotic ribosome that binds to the Shine-Dalgarno sequence. 16S rDNA contains hypervariable regions that can provide species-specific signature sequences useful for identification of bacteria. The bacterial 16S rDNA contains nine hypervariable regions (V1-V9) ranging from about 30-100 base pairs long that are involved in the secondary structure of the small ribosomal subunit. The identification of the hypervariable regions is within the ability of those skilled in the art. As used herein, a 16S rDNA sequencing method refers to a method for sequencing 16S rDNA, particularly hypervariable regions of 16S rDNA. 16S rDNA sequencing has become prevalent in medical microbiology as a rapid and cheap alternative to phenotypic methods of bacterial identification.
The details will be further described below by way of specific examples. However, it shall be understood that the specific embodiments are only used to explain the present invention and are not intended to limit the scope of the present invention. The instruments, devices, reagents, methods and the like used in the present application are instruments, devices, reagents and methods commonly used in the art unless otherwise specified.
Examples
Vaginal microbiome has profound effects on the health of women and their newborns. Recently, the 16S rDNA sequencing had been extensively utilized to evaluate the composition of human vaginal microbiome in various ethnic groups, and different amplification primers may have deviation to the obtained results.
As showed in the Background of the Invention, a series of 16S rDNA sequencing protocols with different target regions and corresponding primer sets were utilized in vaginal microbiome studies. For the target region, the longer and more distinctive the target region is, the better. However, due to the limit on reads length, only a subset of target regions is available. One recent study had performed in-silico and experimental evaluations on primer sets of V1-V3, V3-V4 and V4. In their conclusion, V4 region provides the best results on species level resolution of the vaginal microbiome (Van Der Pol WJ et al., J Infect Dis 2019, 219 (2) : 305-14) . In this evaluation, the inventors emphasized the consistency between the 16S rDNA sequencing results and clinical diagnostics, such as morphology and culture of the characteristic species. The continuity between new technologies and traditional ones was critical, especially for clinical application transplantation. Unoptimized 16S rDNA sequencing protocols utilizing V1-V2 hypervariable region would produce biased estimation.
Though highly consistent with clinical diagnosis from women of European ancestry, 16S rDNA sequencing has not been thoroughly validated in Chinese population. Our study is the first piece of work that unbiasedly investigated the human vaginal microbiome in Chinese population.
Material and Methods:
Study Population:
28 healthy women without vaginitis such as aerobic vaginitis (AV) , bacterial vaginosis (BV) , vulvovaginal candidiasis (VVC) , and trichomonas vaginitis (TV) , and 10 women with BV only were enrolled at the gynecological clinic of Beijing Tsinghua Changgung Hospital from April 2018 to October 2018. All the women were 18-50 years old and were not pregnant or breast-feeding. Written informed consents were approved by the Medical Ethics Committee of Beijing Tsinghua Changgung Hospital.
Sample Collection and DNA Extraction
The vaginal secretions were obtained via two swabs for each woman. One swab was used to prepare a dry slide for Gram staining, under 400× magnification for visual detection, to test for AV, BV, VVC, and TV. The criteria of Donders et al. was used to diagnose AV (with a score of 3 or greater) (Donders GG et al., BJOG 2002, 109 (1) : 34-43) . BV was determined by Nugent’s criteria (Nugent score of 7 or greater) (Nugent RP et al., J Clin Microbiol 1991, 29 (2) : 297-301) . The diagnosis of VVC and TV was mainly based on morphological observation under high power field (400× magnification) . The other swab was quickly plunged into a tube containing 1 ml PBS solution and stored at -80℃ until total DNA extraction of vaginal flora. The DNA of the sample was extracted through the TIANamp Bacteria DNA Kit (TIANGEN, China) according to the manufacturer's instructions. This step required additional Lysozyme (Sigma–Aldrich) , proteinase K, RNase A (Sigma–Aldrich) , and finally washed and stored the DNA with 1×TE buffer. A spectrophotometer was used (Thermo Scientific NanoDrop One) to measure the concentration and purity of the DNA extracts, which were then stored at -20℃ until needed.
Sequencing
Taking data volume, sequencing accuracy, read length and economic factors into account, in the present invention, the pair-end Illumina Solexa sequencing platform was chosen over 454 pyrosequencing platform. The V1-V2 and V3-V4 regions of the 16S rDNA were then separately amplified with
universal primers 27F (SEQ ID NO. 3: 5’-AGAGTTTGATCCTGGCTCAG-3’) and 338R (SEQ ID NO. 2: 5’-GCTGCCTCCCGTAGGAGT-3’) , 341F (SEQ ID NO. 4: 5’-CCTAYGGGRBGCASCAG-3’) and 806R (SEQ ID NO. 5: 5’-GGACTACNNGGGTATCTAAT-3’) . The V1-V2 regions were also amplified with the modified
primers 27F’ (SEQ ID NO. 1: 5’-AGRGTTYGATYCTGGCTCAG-3’) and 338R (SEQ ID NO. 2: 5’-GCTGCCTCCCGTAGGAGT-3’) . Three 16S rDNA sequencing protocols (i.e., 27F-338R, 27F’-338R and 341F-806R protocols, named after their PCR primer sets) were used to test whether the sequencing results are consistent with the clinical diagnostics, morphology and qPCR results. All PCR reactions were carried out with
High-Fidelity PCR MasterMix (New England Biolabs) . The PCR products examined with 400-450bp were chosen and mixed in equal density ratios. Then, the mixture of PCR products was purified with Qiagen Gel Extraction Kit (Qiagen, Germany) . Sequencing libraries were generated using a
DNA PCR-Free Sample Preparation Kit (Illumina, USA) following the manufacturer's recommendations and index codes were added. The library quality was assessed on the Qubit@2.0Fluorometer (Thermo Scientific) and Agilent Bioanalyzer 2100 system. At last, the library was sequenced on an Illumina HiSeq 2500 platform and 250 bp paired-end reads were generated.
Reference Database
SLIVA and NCBI were compared in the following evaluations, as the Green genes database has not been updated since 2013 (Park SC et al., Genomics Inform 2018, 16 (4) : e24) and RDP database is semi-automatic curated (Balvociute M et al., BMC Genomics 2017, 18 (Suppl 2) : 114) . For the SLIVA database, the SSU 128 Ref NR 99 version, downloaded from
https: //www. arb-silva. de, was used. For the NCBI database, the blast command of blastdbcmd was downloaded and used in June 2017. All the taxonomies were summarized into species level.
Sequencing Data Processing
Paired-end reads were assigned to samples according to the sample-specific barcode and truncated by cutting off the barcode and primer sequence. Software FLASH (V1.2.7) (Magoc T et al., Bioinformatics 2011; 27 (21) : 2957-63) was used to merge paired-end reads. According to the QIIME (V1.7.0) quality control process (Caporaso JG et al., Nat Methods 2010, 7 (5) : 335-6) , the raw tags were mass filtered under specific filtration conditions to obtain high quality clean tags (Bokulich NA et al., Nat Methods 2013, 10 (1) : 57-9) .
The 16S sequence reference index was built using the command “bowtie2-build” , with default parameters. All reads were aligned against the prebuild index using bowtie2, with parameter of “bowtie2 --local” . Alignments were associated to taxonomy by a sequence-id-to-taxonomy map, provided by the reference database, using a custom Perl script. Unique reads were counted for each taxonomy and abundance was calculated for all taxonomy. Species with abundance lower than 1%or reads number less than 5 were excluded.
qPCR Validation
Lactobacilli and Gardnerella vaginalis specific qPCR primer and probe sequences were found in previous articles (Menard JP et al., Clin Infect Dis 2008, 47 (1) : 33-43) . DNA of samples randomly selected from healthy population and BV groups were amplified using SGExcel GoldStar TaqMan qPCR Mix (Sangon Biotech) on a Bio-Rad CFX96 real-time PCR detection system.
Example 1: 27F-338R and 341F-805R 16S rDNA protocols for estimation of Chinese vaginal microbiome
Firstly, the inventors checked whether the widely used 27F-338R and 341F-805R 16S rDNA protocols could evaluate the vaginal microbiome from Chinese population accurately. 16S rDNA sequencing was applied on the collected vaginal swab samples from 28 healthy women and 10 women with BV. As shown in the Table 1, the top 10 bacteria that showed highest abundance across all the samples were denoted as the representative bacteria of vaginal microbiome. For each sample, any representative bacteria with abundance over 10%was denoted as a major species (highlighted in bold and italic) .
Table 1. Summary of vaginal microbiome compositions from healthy and BV samples.
Abbreviation: BV, bacterial vaginosis. ND, not detected.
Each row represents a sample ID and each column represents the corresponding relative abundance of a species under a 16S rDNA sequencing protocol. Only the top 10 bacteria that showed highest abundance across all the samples were shown. Abundance higher than 10%is highlighted with italic and bold font, and others are labeled ND.
First, a significant difference in the abundance of Gardnerella vaginalis was shown between 27F-338R and 341F-805R protocols: in 27F-338R protocol, only 2 out of 10 BV samples (20%) showed Gardnerella vaginalis as a major species, while in 341F-805R protocol, 10 out of 10 BV samples (100%) showed Gardnerella vaginalis as a major species. Gardnerella vaginalis was confirmed in all the BV samples by morphology and microscope results (Figure 4) , indicating that the 341F-805R protocol was more accurate. In addition, by using Lactobacilli and Gardnerella vaginalis specific primers, the qPCR validation results from 15 random samples also supported the 341F-805R protocol results (Figure 5) . Moreover, it was also noticed that the 341F-805R protocol results were supported by at least two publications, i.e. Ravel J et al. and Hickey RJ et al. reported that Gardnerella vaginalis was negligible in low abundance from the BV samples (Ravel J et al., Proc Natl Acad Sci U S A 2011, 108 Suppl 1: 4680-7; and Hickey RJ et al., MBio 2015, 6 (2) ) . Despite the differences in samples, such as ethnic groups and age, one unusual commonness was that both publications used the same primer set as the 27F-338R protocol did. This indicated that the 27F-338R protocol may lead to biased low abundance estimation of Gardnerella vaginalis, which was inconsistent with the morphology and microscope results.
It was also noted that another unexpected bacterium, Lactobacillus gallinarum, showed up as a major species in 12 out of 28 healthy samples (43%) according to the 341F-805R protocol results. In contrast, no samples showed the presence of Lactobacillus gallinarum according to the 27F-338R protocol results. To our knowledge, unlike Lactobacillus crispatus, Lactobacillus gasseri, Lactobacillus iners, and Lactobacillus jensenii, Lactobacillus gallinarum was not a Lactobacilli commonly found in vaginal microbiome (Ravel J et al., Proc Natl Acad Sci U S A 2011, 108 Suppl 1: 4680-7) . It was inferred that the differences between the 16S rDNA protocols may be responsible for such controversial results regarding Gardnerella vaginalis and Lactobacillus gallinarum.
Example 2: Biased abundance estimations caused by low fetching efficacy of primer 27F and identical sequences in the V3-V4 target region
The inventors quantified the differences between the 27F-338R and 341F-805R 16S rDNA protocols by the fetching efficacy of primer set and the identity of target regions. By doing this, the inventors evaluated the alignments of primer set and target region to the reference databases. To eliminate the potential bias caused by certain reference database, two databases were tested in parallel, i.e., SLIVA and NCBI 16S Microbial database.
Firstly, the PCR primer sequences of 27F, 338R, 341F and 805R were aligned to the reference 16S rDNA sequence databases in order to evaluate the primer fetching efficacy. As shown in Figure 1A, 27F primer could not align all of the reference sequences (88.9%in SLIVA database and 57.3%in NCBI 16S Microbial database) , compared to 100%for 338R, 341F and 805R primers (in both databases) . Two species, i.e., Gardnerella vaginalis and Bifidobacterium bifidum, were found unable to align with the 27F primer. Another human vaginal microbiome characteristic species, Atopobium vaginae, was found imperfect match with the 27F primer. This was consistent with a previous work that argued 27F primer could reduce the PCR efficiency (Frank JA et al., Appl Environ Microbiol 2008, 74 (8) : 2461-70) . This also explained why the Gardnerella vaginalis was negligible in low abundance according to the 27F-338R protocol results.
Secondly, the inventors extracted the target regions corresponding to primer sets of 27F-338R and 341F-805R (V1-V2 and V3-V4, correspondingly) and counted the identical sequences shared by different species. As shown in Figure 1B, there were much more species that shared identical sequences with others in the target region of 341F-805R protocol (1062 for SLIVA database, 747 for NCBI 16S Microbial database and 543 for intersection of the two databases) than 27F-338R protocol (36 for SLIVA database, 16 for NCBI 16S Microbial database and 0 for intersection of the two databases) . The inventors further checked the species that shared identical sequences with others, and found that Lactobacillus crispatus share identical sequence with Lactobacillus gallinarum, in the target region of 341F-805R primer set (Figure 1C) . This explained why Lactobacillus gallinarum showed in high abundance according to the 341F-806R protocol results.
To optimize the 16S rDNA protocol, the sequence of 27F primer (see Sequencing for details) was modified to allow higher PCR fetching efficacy. The modified 27F primer was denoted as 27F’ and the corresponding 16S protocol was named as 27F’-338R protocol. As shown in Figure 1A, in the SLIVA and NCBI 16S Microbial databases, the 27F’ primer aligned 92.6%and 63.4%of reference 16S rDNA sequences, respectively; higher than the alignment rate of 27F (88.9%and 57.3%, respectively) . In addition, the 27F’ primer showed perfect match with Gardnerella vaginalis, Bifidobacterium bifidum and Atopobium vaginae. In addition, as shown in Figure 1B, 27F’-338R protocol showed 24, 16 and 0 species that shared identical sequences with others in the target region, from reference database of SLIVA, NCBI 16S Microbial database and intersection of the two databases, respectively. These results indicated that the optimized 27F’-338R 16S rDNA protocol could be a better choice for human vaginal microbiome.
Example 3: Optimized 27F’-338R 16S rDNA protocol provided unbiased estimation of Chinese vaginal microbiome
The 27F’-338R protocol was further validated. Firstly, all the BV samples were merged to count the abundance of the top ten bacteria for the three 16S protocols (Figure 2A) . The top 10 species found in BV condition included Gardnerella vaginalis, Prevotella spp., Lactobacillus iners, Veillonellaceae bacterium, Sneathia amnii, Clostridiales bacterium, Atopobium vaginae, Chlamydia trachomatis, Sneathia sanguinegens and Candidatus saccharibacteria. Overall, it was noticed that the results from 27F’-338R and 341F-806R protocols were quite similar and the results from 27F-338R protocol seemed quite different. The relative abundance of Gardnerella vaginalis was about 41%, 33%and 8%, when applying the 27F’-338R and 341F-806R and 27F-338R protocols, respectively. This indicated that the low estimation of Gardnerella vaginalis according to 27F-338R protocol was recalibrated by the 27F’-338R protocol. Secondly, all the healthy samples were merged to count the abundance of top bacteria under different protocols (Figure 2B) . Unlike the BV group, the top species were mainly Lactobacilli, i.e., Lactobacillus crispatus, Lactobacillus iners, Lactobacillus jensenii, Lactobacillus gasseri, Lactobacillus gallinarum, Gardnerella vaginalis, Prevotella spp., Lactobacillus helveticus, Lactobacillus acidophilus and Streptococcus anginosus. At this time, it was noticed that the results of the 27F’-338R and 27F-338R protocols were quite similar and the results of the 341F-806R protocol seemed quite different from others. The emerging of in-relevant Lactobacillus spp., i.e, Lactobacillus gallinarum, Lactobacillus helveticus and Lactobacillus acidophilus in the 341F-806 protocol was because of misalignment due to the identical sequence in the target region. In conclusion, it was proved that the 27F’-338R protocol could recalibrate the biased estimation of Gardnerella vaginalis and Lactobacillus crisptus.
As a result, it was proved that the 27F’-338R protocol could restore the well-established community state types (CSTs) clustering (Ravel J et al., Proc Natl Acad Sci U S A 2011, 108 Suppl 1: 4680-7) . Unsupervised clustering of 28 healthy and 10 BV samples was performed using the abundance of the top 20 bacteria (Figure 3) . It was noticed that all the healthy samples were clustered together and all the BV samples were clustered together. All the BV samples showed Lactobacillus diminished and Gardnerella vaginalis dominated diverse community, which was similar to the CST-IV cluster (Ravel J et al., Proc Natl Acad Sci U S A 2011, 108 Suppl 1: 4680-7) . For the healthy samples, it was noticed that all the Lactobacillus crispatus-enriched samples were clustered together, so were the Lactobacillus gasseri-enriched samples, the Lactobacillus iners-enriched samples and the Lactobacillus jensenii-enriched samples; and they formed the CST-I, CST-II, CST-III and CST-V clusters (Ravel J et al., Proc Natl Acad Sci U S A 2011, 108 Suppl 1: 4680-7) . In summary, the 27F’-338R protocol-based 16S rDNA sequencing method could give an unbiased estimation of vaginal microbiome.
Discussion
As shown in the trial experiments of the present invention, the 27F-338R protocol under-estimated the abundance of Gardnerella vaginalis. In addition, it was shown that 16S rDNA sequencing protocol utilizing V3-V4 hypervariable region would also introduce bias: the 341F-806R protocol misaligned Lactobacillus crisptus to other in-relevant Lactobacilli. In addition, this bias only occurred in its own protocol, but could not be repeated in the other protocols. Therefore, it was inferred that such bias was associated with unoptimized 16S rDNA sequencing protocols, rather than samples or ethnic groups. The inventors have pinned down that the primer sequence and target region were the major contributor for the bias. Subsequently, the protocol was optimized, i.e., the modified 27F primer was used and the V1-V2 hyper-variable region was chosen as the target region. The optimized 16S rDNA sequencing protocol had been proven to be able to recalibrate the estimation of Gardnerella vaginalis, prevent misalignment of Lactobacillus crispatus and restore the authoritative five community state types (CSTs) .
The findings of the present application are as follows. (1) The 27F primer was not well aligned with Gardnerlla vaginalis, resulting in poor amplification effect. By degenerating the primer sequence, 27F’ could well amplify Gardnerlla vaginalis. (2) The DNA sequence of Lactobacillus crispatus was the same as that of Lactobacillus garrinarum. There was a bias in the abundance estimation of Lactobacillus crispatus when V3-V4 was used as the target region of PCR, while there was no such bias when V1-V2 was used as the target region. (3) The optimized 27F '-338R avoids the above deviation and restores the well-established community state types (CSTs) clustering.
In conclusion, the present invention provides an optimized 16S rDNA-based method for evaluating the composition of human vaginal microbiome using current common NGS sequencing platform, and it is the first piece of work that systematically investigated the human vaginal microbiome in Chinese population with above-mentioned methods.
Having illustrated and described methods for determining the bacterial composition of vaginal microbiome in a subject, it should be apparent to those of ordinary skill in the art that the disclosure can be modified in arrangement and detail without departing from such principles. In view of the many possible embodiments to which the principles of our disclosure may be applied, it should be recognized that the illustrated embodiments are only particular examples of the disclosure and should not be taken as a limitation on the scope of the disclosure. Rather, the scope of the disclosure is in accordance with the following claims.