EP3526344B1 - Identifizierung und antibiotische charakterisierung von krankheitserregern in metagenomischen proben - Google Patents
Identifizierung und antibiotische charakterisierung von krankheitserregern in metagenomischen proben Download PDFInfo
- Publication number
- EP3526344B1 EP3526344B1 EP17780452.3A EP17780452A EP3526344B1 EP 3526344 B1 EP3526344 B1 EP 3526344B1 EP 17780452 A EP17780452 A EP 17780452A EP 3526344 B1 EP3526344 B1 EP 3526344B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- reads
- ard
- marker
- database
- assigned
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 244000052769 pathogen Species 0.000 title claims description 113
- 230000003115 biocidal effect Effects 0.000 title claims description 13
- 238000012512 characterization method Methods 0.000 title description 5
- 230000001717 pathogenic effect Effects 0.000 claims description 85
- 239000003550 marker Substances 0.000 claims description 43
- 238000012163 sequencing technique Methods 0.000 claims description 41
- 238000000034 method Methods 0.000 claims description 36
- 238000013507 mapping Methods 0.000 claims description 18
- 108020004414 DNA Proteins 0.000 claims description 17
- 150000007523 nucleic acids Chemical group 0.000 claims description 13
- 230000002068 genetic effect Effects 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 8
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 4
- 230000001018 virulence Effects 0.000 claims description 4
- 241001465754 Metazoa Species 0.000 claims description 3
- 239000008280 blood Substances 0.000 claims description 3
- 210000004369 blood Anatomy 0.000 claims description 3
- 210000002700 urine Anatomy 0.000 claims description 3
- 230000000295 complement effect Effects 0.000 claims description 2
- 241000894007 species Species 0.000 description 29
- 108090000623 proteins and genes Proteins 0.000 description 14
- 239000012634 fragment Substances 0.000 description 12
- 238000010200 validation analysis Methods 0.000 description 11
- 238000009635 antibiotic susceptibility testing Methods 0.000 description 8
- 238000012165 high-throughput sequencing Methods 0.000 description 8
- 241000894006 Bacteria Species 0.000 description 7
- 244000005700 microbiome Species 0.000 description 7
- 108020004707 nucleic acids Proteins 0.000 description 6
- 102000039446 nucleic acids Human genes 0.000 description 6
- 101710181757 1,2-dihydroxy-3-keto-5-methylthiopentene dioxygenase Proteins 0.000 description 5
- 101710094863 Acireductone dioxygenase Proteins 0.000 description 5
- 241001386813 Kraken Species 0.000 description 5
- 230000003321 amplification Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 5
- 230000015654 memory Effects 0.000 description 5
- 238000003199 nucleic acid amplification method Methods 0.000 description 5
- 238000012070 whole genome sequencing analysis Methods 0.000 description 5
- 108020004465 16S ribosomal RNA Proteins 0.000 description 4
- 230000001580 bacterial effect Effects 0.000 description 4
- 210000000349 chromosome Anatomy 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 210000004027 cell Anatomy 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 238000007481 next generation sequencing Methods 0.000 description 3
- 238000003752 polymerase chain reaction Methods 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 238000010008 shearing Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- 206010035664 Pneumonia Diseases 0.000 description 2
- 239000011324 bead Substances 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000002609 medium Substances 0.000 description 2
- 230000000813 microbial effect Effects 0.000 description 2
- 230000002906 microbiologic effect Effects 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 238000003908 quality control method Methods 0.000 description 2
- 238000002560 therapeutic procedure Methods 0.000 description 2
- 239000002028 Biomass Substances 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- 241000465865 Geodermatophilaceae bacterium Species 0.000 description 1
- 101000588232 Homo sapiens N-alpha-acetyltransferase 11 Proteins 0.000 description 1
- 241000588747 Klebsiella pneumoniae Species 0.000 description 1
- 241000219470 Mirabilis Species 0.000 description 1
- 102100031640 N-alpha-acetyltransferase 11 Human genes 0.000 description 1
- BPQQTUXANYXVAA-UHFFFAOYSA-N Orthosilicate Chemical compound [O-][Si]([O-])([O-])[O-] BPQQTUXANYXVAA-UHFFFAOYSA-N 0.000 description 1
- 108091005804 Peptidases Proteins 0.000 description 1
- 239000004365 Protease Substances 0.000 description 1
- 101000886012 Rattus norvegicus ADP-ribosylation factor-like protein 3 Proteins 0.000 description 1
- 102100037486 Reverse transcriptase/ribonuclease H Human genes 0.000 description 1
- 102000006382 Ribonucleases Human genes 0.000 description 1
- 108010083644 Ribonucleases Proteins 0.000 description 1
- 240000004922 Vigna radiata Species 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 230000000843 anti-fungal effect Effects 0.000 description 1
- 230000000845 anti-microbial effect Effects 0.000 description 1
- 229940088710 antibiotic agent Drugs 0.000 description 1
- 230000008262 antibiotic resistance mechanism Effects 0.000 description 1
- 229940121375 antifungal agent Drugs 0.000 description 1
- 238000002869 basic local alignment search tool Methods 0.000 description 1
- 101150031494 blaZ gene Proteins 0.000 description 1
- 210000001124 body fluid Anatomy 0.000 description 1
- 229960001668 cefuroxime Drugs 0.000 description 1
- JFPVXVDWJQMJEE-IZRZKJBUSA-N cefuroxime Chemical compound N([C@@H]1C(N2C(=C(COC(N)=O)CS[C@@H]21)C(O)=O)=O)C(=O)\C(=N/OC)C1=CC=CO1 JFPVXVDWJQMJEE-IZRZKJBUSA-N 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- KDLRVYVGXIQJDK-AWPVFWJPSA-N clindamycin Chemical compound CN1C[C@H](CCC)C[C@H]1C(=O)N[C@H]([C@H](C)Cl)[C@@H]1[C@H](O)[C@H](O)[C@@H](O)[C@@H](SC)O1 KDLRVYVGXIQJDK-AWPVFWJPSA-N 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000009089 cytolysis Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 239000000839 emulsion Substances 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 238000012869 ethanol precipitation Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 210000003608 fece Anatomy 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000012239 gene modification Methods 0.000 description 1
- 230000005017 genetic modification Effects 0.000 description 1
- 235000013617 genetically modified food Nutrition 0.000 description 1
- 238000010362 genome editing Methods 0.000 description 1
- 239000001963 growth medium Substances 0.000 description 1
- 238000013095 identification testing Methods 0.000 description 1
- 238000000126 in silico method Methods 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 230000002934 lysing effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 238000002200 minicolumn purification Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000002205 phenol-chloroform extraction Methods 0.000 description 1
- 239000013612 plasmid Substances 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 239000001397 quillaja saponaria molina bark Substances 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 210000003296 saliva Anatomy 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 229930182490 saponin Natural products 0.000 description 1
- 150000007949 saponins Chemical class 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000009966 trimming Methods 0.000 description 1
- 241001148471 unidentified anaerobic bacterium Species 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 239000002023 wood Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/689—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/106—Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
Definitions
- the invention relates to the field of metagenomics, and in particular the characterization of antibiotic susceptibility of pathogens in metagenomic samples by asserting the presence of antibiotic resistance markers in their genomes.
- AST Antibiotic Susceptibility Testing
- the microbiology workflow involves the growth of the pathogens (e.g. on a Petri dish) to isolate them and to get a critical biomass needed for subsequent tests.
- different bacteria may require different culture conditions (e.g. aerobic vs. anaerobic bacteria), may compete during culture, or even may not grow at all if the culture conditions are not chosen in a proper manner.
- the choice of a culture medium is thus usually based on assumption about pathogens in the sample.
- tests requires pre-identification of a pathogens (e.g. Gram positive or negative) to choose the reagents of the AST. Robustness of microbiologic technics may be thus sometimes questionable.
- AST Antibiotic Susceptibility Testing
- metagenomics is a Nucleic Acid (NA) sequencing based technics which aims at characterizing the microorganism content of a sample using a linear workflow with less a priori information on this content.
- NA Nucleic Acid
- metagenomics does not involve the growth of bacteria for isolating them and the choice of a step in the metagenomic workflow does not depend on the results of the preceding steps.
- the workflow duration is substantially independent of the microorganisms contained in the sample and it is possible to process samples comprising a mix of different microorganisms (e.g. different bacterial species) and get at the same time the global picture of the microbiological content of the sample.
- HTS High Throughput Sequencing
- WGS Whole Genome Sequencing
- NGS Next Generation Sequencing
- MOCAT2 a metagenomic assembly, annotation and profiling framework
- Bioinformatics, 2016 reads are assembled using the "SOAPdenovo” assembler ( Ruibana Luo et al. "SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler", GigaSicence, 2012 ), predicted, and annotated very efficiently against a combined catalogue of functional information from multiple databases (eggNOG, KEGG, SEED, ARDB, CARD .). Taxonomic and functional profiling may be used to first identify and get the relative proportion of pathogens, and also get ARD present in the sample.
- taxonomic binning based pipelines they comprises an assignment step (also called “taxonomic binning") consisting in:
- HTS technics thus allows to have access simultaneously to the set of pathogens present in a sample but also to the set of (ARD) contained in their genomes.
- those technics cannot link ARD and pathogens, which is the main piece of information for a clinician who wants to know which pathogen is present in the sample, and which ARD (if any) this particular pathogen harbours.
- antibiotic resistance may be due the presence or absence of resistance genes but also to the presence of specific resistance genes variants, and in this case it is crucial to have access to the most accurate sequences of the resistance determinants.
- a first step to circumvent this problem is to apply the pipeline described in Guigon et al., ("Pathogen Characterization within the Microbial Flora of Bronchoalveolar Lavages by Direct Sample Sequencing", ECCMID, 2015 ), and called “Pipeline1" in the sequel of this document.
- the main steps are: quality control of the reads (filtering and trimming of reads with low quality), elimination of host DNA (filtering of human reads), taxonomic binning, assembly of reads corresponding to each pathogen present in the sample into "contigs”, and finally annotation of the contigs with respect to an ARD reference database.
- FIG. 1 illustrates a typical case of failure.
- a metagenomic sample includes DNA from a bacterial species ("species 1”) which harbours a resistance gene.
- species 1 DNA from a bacterial species
- MGE Mobile Genetic Element
- MGEs are a type a DNA moving around between bacterial genomes and are an important source of genetic variability, and thus antibiotic adaptation capability of bacteria.
- Species k the representative genome of Species 1 harbours this ARD, contrary to representative genomes of other species. This might happen, precisely because this ARD is located on a MGE.
- the micro-organism from Species 1 present in the sample under study might have acquired it recently from a strain of Species k, although this transfer has not been observed yet in the reference sequences used to build the Reference Database for taxonomic binning.
- reads located in the ARD region of Species 1 will not be retrieved with the other reads of Species 1 since those they will be set apart as representative of Species k.
- the assembly of Species 1 will lead, in the best case, to 2 contigs, and the ARD will be missing from the assembly.
- reference databases are a static snapshot of the knowledge available at a moment regarding pathogens.
- genomic modification of pathogens in connection with ARD is to update the databases.
- prior art metagenomic analysis is helpless in characterizing the antibiotic sensibility of the pathogen, and even worse, may be misleading by rendering a false result, e.g. in the aforementioned example species k as the resistant pathogen rather than species 1.
- the present invention proposes a new metagenomic analysis which allows to take into account genetic modification in markers of interest using reference database which does not reference those modifications.
- an object of the invention is a method for identifying a pathogen (e.g. bacterium) contained in a metagenomic sample and for identifying pathogenic markers (e.g. antimicrobial susceptibility, virulence,...) in the genome of said pathogen, the method comprising the step of:
- a pathogen e.g. bacterium
- pathogenic markers e.g. antimicrobial susceptibility, virulence, etc.
- the present invention takes advantage of the shearing step describe above.
- the sample comprises several individuals of each pathogen.
- these copies are not fragmented identically on purpose, thereby producing overlapping fragments, the overlap feature being thereafter use for the assembly step.
- the assembly process has the opportunity, for said pathogen, to construct contigs comprising the marker. This feature enables the reconstruction of genomes with markers that are different from the representative genomes in the taxonomic database.
- Figures 2 illustrate the invention applied to the sample described in figure 1 , namely a sample with majority DNA from a strain of Species 1 which harbours an ARD located on a GME while the taxonomic database does not store any representative genome having such a feature for Species 1.
- Reads falling in the ARD region are retrieved by mapping reads against an exhaustive ARD database, and reads falling outside the ARD are retrieved by taxonomic binning of reads against the taxonomic database. Then, for each pathogen found in the sample (here only Species 1), reads identified as Species 1 and reads mapping against the ARD are pooled together to be assembled. Because of the "clipping" feature of the reads, i.e.
- At least the portions of reads falling inside the markers have a length greater or equal to 20bp, preferably greater or equal to 25 bp, more preferably greater or equal to 50 bp.
- standard assemblers succeed in assigning a read to a known pathogen genome or a marker with a good probability even when only a small portion of said read aligns with the ARD database.
- the reads have an average length of L bp, with L > 75, and reads that are astride said marker have a portion falling outside said marker in the range [1; L-55] bp.
- the reads have an average length of L bp, with L > 100, and reads that are astride said marker have a portion falling outside said marker in the range [1; L-80] bp.
- the reads have an average length of L bp, with L > 100, and reads that are astride said marker have a portion falling outside said marker in the range [1; L-50] bp.
- the reads that are astride said marker have a first portion falling into said marker and a second portion falling outside said marker, and wherein the length of the second portion is chosen based on mapping against ARD database performance , in particular maximized while still maintaining a correct mapping performance (acceptable proportion of reads to the correct ARD).
- the length of the second portion is chosen such that the probability of good alignment with the ARD database, or probability to get a "true hit", is greater or equal to 70%, preferably greater or equal to 80%.
- the comparison of the set of reads with the second database comprises the mapping of each reads on the pathogenic markers of the second database, independently from the other reads of said set.
- the sequencing is a paired-end sequencing, and if a read is assigned to a marker, a read which it is the complementary of said read is also included in the pool.
- a produced contig comprises only reads assigned to a known marker
- said known pathogenic marker is determined to be part of the known pathogen's genome if: D ARD ⁇ 1 3 ⁇ D path ; 3 ⁇ D path where D ARD is a median sequencing depth of the reads assigned to the known marker and D path is a median sequencing depth of the reads assigned to the known pathogen. and preferable >1
- the method further comprises a step of comparing the contigs to 16SrDNA sequences and/or metaphlan2 markers, and wherein the known pathogen is confirmed based on said comparison.
- the sample is taken from a human or an animal, and wherein the first database comprises also flora and host genomes, and wherein reads assigned to flora and host genomes are filtered out.
- the metagenomic sample is a brochoalveolar lavage sample, an urine sample or a blood sample.
- the pathogenic marker are antibiotic resistance markers or virulence makers.
- Another object of the invention is a computer readable medium storing instruction for executing a method performed by a computer, the method comprising
- Said computer readable medium stores instruction for executing the aforementioned method.
- VAP Ventilarory Acquired Pneumonia
- BAL a (mini)Broncho Alveolar Lavage
- ICU Intensive Care Unit
- a BAL sample is collected from a patient, in 10 , and thereafter process in 12 for nucleic acid extraction from pathogens contained in the sample.
- This preparation comprises successively, by way of example:
- the extracted DNA is thereafter sequenced in 14 using whole genome sequencing HTS technics, e.g. a shotgun technic comprising:
- a set of reads is thereby produced and stored in 16 in a memory of a computer system.
- the DNA sequencing is preferably carried out using HTS technics which reads both ends of the fragments, for example using Illumina® dye sequencing, for instance Miseq WGS paired-end sequencing technics, as described for example in Oulas et al., "Metagenomics: Tools and Insights for Analyzing Next-Generation Sequencing Data Derived from Biodiversity Studies", Bioinform Biol Insights, 2015 . Having both ends of the reads sequenced makes assembly of the reads easier, and in particular facilitate incorporation of an ARD in the genome of a particular pathogen in the case of the taxonomic database does not include representative genomes with the ARD.
- a bioinformatics pipeline 18 according to the invention is then run on the reads to list the pathogens in the sample and figure out if their genomes harbor antibiotic resistance determinants.
- a first step 20 of the pipeline 18 consists in a pre-processing of the reads (usually called "Quality Control” (QC)), namely:
- Pipeline 18 goes on in 22 with:
- a compositional approach such as the "Kraken” tool ( Wood and Salzberg, “Kraken: ultrafast metagenomic sequence classification using exact alignments", Genome Biology, 2014 ), or “Wowpal Wabbit” tool ( Vervier et al., “Large-scale machine learning for metagenomics sequence classification", Bioinformatics, 2015 ), or a comparative approach, such as the "BWA-MEM” tool ( Li, "Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM", Genomics, 2013 ).
- a read is assigned to a pathogen if it maps entirely in a representative genome of this pathogen stored in the taxonomic database.
- Pipeline 18 also comprises a mapping 24 of each read against an ARD reference database that includes ARD of interest.
- a read is assigned to an ARD if: is assigned to an ARD if
- Figure 4 illustrates the probability to retrieve an ARD for a read falling in the ARD, according to the number of bases of the read in the ARD.
- a length of 50bp that maps on an ARD is sufficient to precisely assign a read to this ARD (or, in other words, a length of 50bp is sufficient to determine that a read comes from a genome portion having the ARD). It has been showed that the probability to retrieve a read in an ARD was 80% for reads with 250 bp outside the ARD and 50 bp in the ARD, 83% of the read outside the ARD.
- reads with a portion outside the ARD having a length in the range [0, L-50]bp are thus assigned to the ARD, L being the length of the ARD.
- L being the length of the ARD.
- reads with a length outside the ARD over 50 are assigned to ARD.
- BWA-MEM is run with the non-default parameters "-a -T 0 -k 16 -L 5 -d 100", leading to read assigned to ARD having clipped lengths in the range [0, L-50]bp.
- the reads are mapped independently against the ARD database, even if the reads are paired because of the technics used for sequencing the DNA fragment (e.g. WGS paired-end sequencing technics).
- a read is usually assigned to an ARD not only if it maps against the database but also when its counterpart read maps. However, if one only keep reads that map "in a proper pair", meaning that both reads of the pair map on the ARD database, one only gets paired-end reads with an insert size smaller than a typical ARD length ( ⁇ 1000 bp). For example, in Figure 5 only “read2.1". and “read2.2” would be retrieved as mapped in a proper pair, because they both fall in the ARD. When mapped independently, "read1.1”, “read2.1”, and “read2.2” are also retrieved.
- a read maps on an ARD
- its counterpart read is automatically assigned to this ARD.
- "read 1.2” which does not map on the ARD, is thus automatically assigned to the ARD because "read 2.2" does.
- "Read 1.2” is particularly useful because it falls in a chromosomic region of a pathogen, and together with reads retrieved by taxonomic binning it can be used to reconstruct the whole region, the chromosome and the ARD, as it will described latter.
- Pipeline 18 goes on with a pooling step 26.
- a pool of reads is created, said pool comprising the reads assigned to said pathogen and all the reads assigned to ARD(s).
- the other read is included automatically in the pool because it has been assigned also to the ARD database.
- sequencing depth is larger than 150, a random set of pathogen reads is selected amongst the whole set of reads assigned to said pathogen to have a final average sequencing depth equal to 150.
- An assembly step 28 is then carried out for each created pools of reads in order to produce contigs.
- the assembly step runs "de novo" assemblers such as "IDBA-UD” ( Peng et al., “IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth", Bioinformatics, 2012 ), "MegaHit” ( Li et al., “MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph.”, Bioinformatics, 2015 ), “Omega” ( Haider et al., “Omega: an Overlap-graph de novo Assembler for Metagenomics", Bioinformatics, 2014 ), "Ray Meta” ( Boisvert et al., “Ray Meta: scalable de novo metagenome assembly and profinling", Genome Biology, 2012 ), “Spades” ( Bankevich et al.,
- IDBA-UD and Spades gives the best performance and are thus preferred.
- the parameters for IDBA-UD and Spades are for example default parameters, that is to say respectively "idba_ud500 --mink 40 --maxk maxReadLength --min_pairs 2" and "spades.py --careful --cov-cutoff 3".
- Assembly step 28 thus transforms each pool of reads in a set of contigs (usually named “assembly”), preliminary assigned to a particular pathogen of the taxonomic database, which contigs may comprise one or more ARD.
- the assembly step comprises the following steps: a) reads are first pre-processed with SGA (if it was not performed in QC step 20 ), b) then assembled using a de novo assembler, c) and original reads are mapped against contigs to polish the assembly (i.e. remove ultimate assembly errors). In particular, a contig is discarded if none of the pairs of reads maps against it.
- a following step 30 of the pipeline 18 consists in confirming the identity of pathogens based on the sets of contigs and identifying the ARD in the genome of the identified pathogen(s). In particular, for each set of configs, the following step are carried out:
- Metalphlan2 markers are used for identity confirmation, those markers being described for example in Segata et al., “Metagenomic microbial community profiling using unique clade-specific marker genes", Nature Methods, 2012 .
- a final processing step 30 is then carried out to process the identified ARDs in order to link them to pathogens.
- the origin of reads mapping against the contigs annotated with an ARD is analyzed. If some of the reads that map on a contig with an ARD are obtained from the taxonomic binning against pathogen RDB (step 20 ), thus the ARD is definitively linked to the pathogen. In practice, at least 5% of the total number of reads mapping against the contigs containing an ARD are required to come from step 20.
- the assembly may however comprise ARD contigs that are not derived from step 20.
- ARD contigs that are not derived from step 20.
- all the reads mapping on the contigs are obtained from the mapping of the reads against ARD database (step 24 ).
- ARD database step 24 .
- a first reason rests on the fact that the ARD is not part of the pathogen's genome.
- those contigs may actually corresponds to the pathogen genome. Indeed it may happen that the ARD is located on a particular MGE, that is to say a plasmid.
- the ARD is not integrated in the contigs corresponding to the chromosome of the pathogen, but constitute an independent contig.
- the processing step 30 links the ARD to the pathogen with a smaller evidence by comparing the median sequencing depth of the ARD ( D ARD ) and the median sequencing depth of the pathogen ( D path ), the median sequencing depth being the median of the distribution of the number of reads that map on the assembly each position (obtained at step c. of assembly step 28 ).
- D ARD is the median of the distribution of the number of reads that map at each position of an ARD
- D path is the median of the distribution of the number of read that map at each position of the assembly of the pathogen.
- an ARD is linked to the pathogen(s) with the closest average sequencing depth.
- "ARD2" located on “contig2” should be assigned to "Species 1" (because the median sequencing depth of "contig2” is 4 and the median sequencing depth of "Species1” is 4), while “ARD3" located on “contig3” should be assigned to "Species2” (because the median sequencing depth of "contig3” is 75 and the median sequencing depth of "Species” 2 is 8.).
- the ARD is assigned to all the species that have a median sequencing depth between 1/3 and 3 of the ARD median sequencing depth, and preferably greater than 1 because an ARD may be present in several copies in the genome of the pathogen.
- an information/storing step 34 comprising the storage of the results of the pipeline 18 , in particular, the list of identified pathogens and the ARD linked thereto, and/or the display of those results on a screen of a computer.
- the first validation study relies on in silico simulated metagenomes (validation study 1)
- the second validation study is a set of 3 positive miniBAL metagenomic samples for which only the culture identification is available (validation study 2)
- the third validation study is a set 2 positive BAL metagenomic samples with identification and AST profiles available (validation study 3).
- Kraken is used for taxonomic binning and ARD binning (steps 22 , 24 ) and IDBA-UD is used for assembly (step 28).
- 21 metagenomes have been simulated, each including 1 of the 21 selected pathogens (see Table 1). Each metagenome contains 300000 read pairs from the main pathogen, and 15000 read pairs from flora genomes. Genomes used for the simulations are real public genomes. Reads are simulated according to the Illumina MiSeq error model, with 2 ⁇ 300 bp paired-end reads, with V2 chemistry. Table 1 presents the strain used for the 21 simulated metagenomes, the number of ARD present in each strain, the number of ARD that are retrieved by the prior art pipeline (“P1"), and the number of ARD that are retrieved by the pipeline according to the invention ("P1+2"). Results are clearly in favor of the new pipeline which enables in most cases to recover all the ARD that were present in the original genomes.
- Table 2 Simulated strains and number of ARD found in the genomes of origin, in the assembly with IDBA-UD after P1 only, and in the assembly with IDBA-UD after P1+P2.
- Table 1 Strain # ARDs in the strain # ARDs retrieved by P1 only # ARDs retrieved by P1+P2
- E cloacae JZY01 15 1 15 E coli LFXU01 9 1 8
- E coli LHAT01 6 1 9 K oxytoca 9 2 9 K pneumoniae LFBF01 7 1 7 K pneumoniae CBWI01 15 3 13 H influenzae 1 0 1 P mirabilis 8 0 8 P vulgaris 12 1 12 M morganii 5 1 5 P aeruginosa BADP01 9 8 9 P aeruginosa JTVP01 10 9 10
- Figure 7 illustrates a computer system carrying out the pipeline according to the invention.
- Said system comprises the databases described above (taxonomic database, ARD database) as well as database memorizing the reads.
- Those databases are connected to a computing unit, e.g. for example a personal computer, a tablet, a smartphone, a server, network of computers, and more generally any system comprising one or more microprocessors and/or one or more microcontrollers, e.g. a digital signal processor, and/or one more programmable logic device, configured to implement a digital processing the reads as described above.
- a computing unit e.g. for example a personal computer, a tablet, a smartphone, a server, network of computers, and more generally any system comprising one or more microprocessors and/or one or more microcontrollers, e.g. a digital signal processor, and/or one more programmable logic device, configured to implement a digital processing the reads as described above.
- the computer unit comprises computer memories (RAM, ROM, cache memory, mass memory) for the storing the acquired distributions, instructions for executing the method according to the invention, and intermediate and final computation, in particular the list of pathogens and their linked ARD.
- the computer units further comprises a screen for displaying list and ARD.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Organic Chemistry (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Microbiology (AREA)
- General Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- Data Mining & Analysis (AREA)
- Immunology (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
Claims (14)
- Verfahren zur Identifizierung eines Krankheitserregers, der in einer metagenomischen Probe enthalten ist, und zum Identifizieren pathogener Marker in dem Genom des Krankheitserregers, wobei das Verfahren die folgenden Schritte umfasst:- Aufarbeiten (12) der metagenomischen Probe, um DNA zumindest von Krankheitserregern zu extrahieren, die in der Probe vorliegen,- Sequenzieren (14) der extrahierten DNA, um auf diese Weise einen Satz digitaler Nukleinsäuresequenzen oder "Leseeinheiten" zu erhalten,- Vergleichen (22) des Satzes von Leseeinheiten mit einer ersten Datenbank, die Genome bekannter Krankheitserreger umfasst, um Leseeinheiten des Satzes den bekannten Krankheitserregern zuzuordnen;- Erstellen (26) einer Gesamtheit von Leseeinheiten, die zumindest Leseeinheiten umfasst, welche einem Krankheitserreger unter den bekannten Krankheitserregern zugeordnet wurde, und Zusammenfügen (28) der Leseeinheiten in der Gesamtheit, um mindestens eine zusammengefügte digitale Nukleinsäuresequenz oder "Contig" zu erstellen,- Vergleichen (30) der erstellten Contigs mit einer zweiten Datenbank von bekannten pathogenen genetischen Markern, um zu prüfen, ob die erstellten Contigs einen bekannten Marker enthalten,
dadurch gekennzeichnet,- dass das Verfahren den Schritt des Vergleichens (24) des Satzes von Leseeinheiten mit der zweiten Datenbank umfasst, um Leseeinheiten des Satzes den bekannten pathogenen Markern zuzuordnen, wobei eine Leseeinheit einem bekannten pathogenen Marker zugeordnet wird, wenn sie vollständig in dem Marker enthalten ist oder wenn sie sich beidseitig über den Marker hinweg erstreckt, und- dass die Gesamtheit weiterhin die Leseeinheiten umfasst, welche den bekannten pathogenen Markern zugeordnet wurden, wobei die Contigs auf diese Weise ausgehend von Leseeinheiten, welche dem bekannten Krankheitserreger zugeordnet wurden, und Leseeinheiten, welche den bekannten pathogenen Markern zugordnet wurden, zusammengefügt werden. - Verfahren gemäß Anspruch 1, wobei die Leseeinheiten, welche sich beidseitig über den Marker hinweg erstrecken, Abschnitte aufweisen, die mit einer Länge von mindestens 20 bp in dem Marker enthalten sind.
- Verfahren gemäß Anspruch 1 oder 2, wobei die Leseeinheiten eine durchschnittliche Länge von L bp haben, wobei L > 100, und wobei die Leseeinheiten, welche sich beidseitig über den Marker hinweg erstrecken, einen Abschnitt im Bereich von [1; L-50] bp haben, der außerhalb des Markers liegt.
- Verfahren gemäß Anspruch 1, 2 oder 3, wobei die Leseeinheiten, welche sich beidseitig über den Marker hinweg erstrecken, einen ersten Abschnitt, welche in dem Marker enthalten ist, und einen zweiten Abschnitt aufweisen, welcher außerhalb des Markers liegt, und wobei die Länge des zweiten Abschnitts auf Grundlage einer Kartierung gegenüber dem Leistungskennwert der ARD-Datenbank gewählt wird.
- Verfahren gemäß Anspruch 4, wobei die Länge des zweiten Abschnitts derart gewählt wird, dass die Wahrscheinlichkeit einer richtigen Zuordnung gegenüber der 'ARD-Datenbank mindestens 70 %, vorzugsweise mindestens 80 % beträgt.
- Verfahren gemäß einem beliebigen der vorhergehenden Ansprüche, wobei der Vergleich des Satzes von Leseeinheiten mit der zweiten Datenbank das Kartieren jeder der Leseeinheiten auf den pathogenen Markern der zweiten Datenbank umfasst, unabhängig von den übrigen Leseeinheiten des Satzes.
- Verfahren gemäß einem beliebigen der vorhergehenden Ansprüche, wobei es sich bei der Sequenzierung um eine Sequenzierung mit gepaarten Enden handelt und wobei, wenn eine Leseeinheit einem Marker zugeordnet wird, auch eine Leseeinheit, welche komplementär zu dieser Leseeinheit ist, der Gesamtheit beigefügt wird.
- Verfahren gemäß einem beliebigen der vorhergehenden Ansprüche, wobei im Falle eines erstellten Contigs, der ausschließlich Leseeinheiten umfasst, welche einem bekannten Marker zugeordnet wurden, festgestellt wird, dass dieser bekannte pathogene Marker Bestandteil des Genoms des bekannten Krankheitserregers ist, wenn:
- Verfahren gemäß einem beliebigen der vorhergehenden Ansprüche, wobei es weiterhin einen Schritt des Vergleichen der Contigs mit 16SrDNA-Sequenzen und/oder metaphlan2-Markern einer Datenbank umfasst, und wobei der bekannte Krankheitserreger auf Grundlage dieses Vergleichs bestätigt wird.
- Verfahren gemäß einem beliebigen der vorhergehenden Ansprüche, wobei die Probe einem Menschen oder einem Tier abgenommen wird, und wobei die erste Datenbank auch Genome der Flora und von Wirten umfasst, und wobei Leseeinheiten, die Genomen der Flora und von Wirten zugeordnet wurden, herausgefiltert werden.
- Verfahren gemäß einem beliebigen der vorhergehenden Ansprüche, wobei es sich bei der metagenomischen Probe um einer bronchoalveoläre Lavage-Probe, eine Urinprobe oder eine Blutprobe handelt.
- Verfahren gemäß einem beliebigen der vorhergehenden Ansprüche, wobei es sich bei den pathogenen Markern um Marker der Antibiotikaresistenz oder um Virulenzmarker handelt.
- Maschinenlesbares Medium, das Anweisungen zur Ausführung eines Verfahren speichert, welches von einem Computer durchgeführt wird, wobei das Verfahren Folgendes umfasst- Vergleichen eines Satzes von Leseeinheiten, der erstellt wurde, indem DNA sequenziert wurde, die aus einer metagenomischen Probe extrahiert wurde, mit einer ersten Datenbank, die Genome bekannter Krankheitserreger umfasst, um Leseeinheiten des Satzes den bekannten Krankheitserregern zuzuordnen;- Erstellen einer Gesamtheit von Leseeinheiten, die zumindest Leseeinheiten umfasst, welche einem Krankheitserreger unter den bekannten Krankheitserregern zugeordnet wurde, und Zusammenfügen der Leseeinheiten in der Gesamtheit, um mindestens eine zusammengefügte digitale Nukleinsäuresequenz oder "Contig" zu erstellen,- Vergleichen der erstellten Contigs mit einer zweiten Datenbank von bekannten pathogenen genetischen Markern, um zu prüfen, ob die erstellten Contigs einen bekannten Marker enthalten,
dadurch gekennzeichnet,- dass das Verfahren den Schritt des Vergleichens des Satzes von Leseeinheiten mit der zweiten Datenbank umfasst, um Leseeinheiten des Satzes den bekannten pathogenen Markern zuzuordnen,- dass die Gesamtheit weiterhin die Leseeinheiten umfasst, welche den bekannten Markern zugeordnet wurden, wobei die Contigs auf diese Weise ausgehend von Leseeinheiten, welche dem bekannten Krankheitserreger zugeordnet wurden, und Leseeinheiten, welche den bekannten pathogenen Markern zugordnet wurden, zusammengefügt werden. - Maschinenlesbares Medium gemäß Anspruch 13, wobei es Anweisungen zur Ausführung eines Verfahrens gemäß einem beliebigen der Ansprüche 2 bis 12 speichert.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP16193621 | 2016-10-13 | ||
PCT/EP2017/076029 WO2018069430A1 (en) | 2016-10-13 | 2017-10-12 | Identification and antibiotic characterization of pathogens in metagenomic sample |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3526344A1 EP3526344A1 (de) | 2019-08-21 |
EP3526344B1 true EP3526344B1 (de) | 2020-09-30 |
Family
ID=57184303
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP17780452.3A Active EP3526344B1 (de) | 2016-10-13 | 2017-10-12 | Identifizierung und antibiotische charakterisierung von krankheitserregern in metagenomischen proben |
Country Status (5)
Country | Link |
---|---|
US (1) | US11749381B2 (de) |
EP (1) | EP3526344B1 (de) |
JP (1) | JP7068287B2 (de) |
CN (1) | CN109923217B (de) |
WO (1) | WO2018069430A1 (de) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2015364286B2 (en) | 2014-12-20 | 2021-11-04 | Arc Bio, Llc | Compositions and methods for targeted depletion, enrichment, and partitioning of nucleic acids using CRISPR/Cas system proteins |
CA2995983A1 (en) | 2015-08-19 | 2017-02-23 | Arc Bio, Llc | Capture of nucleic acids using a nucleic acid-guided nuclease-based system |
US11124831B2 (en) * | 2016-08-10 | 2021-09-21 | New York Genome Center | Ultra-low coverage genome sequencing and uses thereof |
WO2019226992A1 (en) * | 2018-05-24 | 2019-11-28 | The Trustees Of Columbia University In The City Of New York | Bacterial capture sequencing platform and methods of designing, constructing and using |
FR3099180B1 (fr) | 2019-07-23 | 2022-11-25 | Biomerieux Sa | Procédé de détection et de quantification d'une espèce biologique d'intérêt par analyse métagénomique, comportant l'utilisation d'une espèce de contrôle. |
FR3099183B1 (fr) | 2019-07-23 | 2022-11-18 | Biomerieux Sa | Procédé de détection et de quantification d'une espèce biologique d'intérêt par analyse métagénomique, et détermination d'un niveau de confiance associé |
FR3099182B1 (fr) | 2019-07-23 | 2022-11-25 | Biomerieux Sa | Procédé de détection et de quantification d'une espèce biologique d'intérêt par analyse métagénomique |
FR3099181B1 (fr) | 2019-07-23 | 2022-11-18 | Biomerieux Sa | Procédé de détection et de quantification d'une espèce biologique d'intérêt par analyse métagénomique, avec prise en compte d'un calibrateur. |
CN110438199A (zh) * | 2019-08-15 | 2019-11-12 | 深圳谱元科技有限公司 | 一种新型病原微生物检测的方法 |
CN113470742B (zh) * | 2020-03-31 | 2024-08-09 | 浙江省疾病预防控制中心 | 数据处理方法、装置、存储介质及计算机设备 |
CN111627500A (zh) * | 2020-04-16 | 2020-09-04 | 中国科学院生态环境研究中心 | 一种基于宏基因组技术识别水体中携带毒性因子病原菌的方法 |
CN114067911B (zh) * | 2020-08-07 | 2024-02-06 | 西安中科茵康莱医学检验有限公司 | 获取微生物物种及相关信息的方法和装置 |
CN112786109B (zh) * | 2021-01-19 | 2024-04-16 | 南京大学 | 一种基因组完成图的基因组组装方法 |
CN113223618B (zh) * | 2021-05-26 | 2022-09-16 | 予果生物科技(北京)有限公司 | 基于宏基因组的临床重要致病菌毒力基因检测的方法及系统 |
CN113782100B (zh) * | 2021-11-10 | 2022-02-18 | 中国人民解放军军事科学院军事医学研究院 | 一种基于细菌基因组高通量测序数据鉴定细菌种群携带的质粒类型的方法 |
CN115101129B (zh) * | 2022-06-27 | 2023-03-24 | 青岛华大医学检验所有限公司 | 一种基于宏基因组测序数据组装病原微生物基因组的方法 |
CN118230820A (zh) * | 2024-03-19 | 2024-06-21 | 浙江洛兮医学检验实验室有限公司 | 基于宏基因测序数据的耐药基因物种来源鉴定方法 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5234809A (en) | 1989-03-23 | 1993-08-10 | Akzo N.V. | Process for isolating nucleic acid |
CA2460959A1 (en) * | 2001-09-28 | 2003-05-22 | Incyte Genomics, Inc. | Enzymes |
US20050221341A1 (en) * | 2003-10-22 | 2005-10-06 | Shimkets Richard A | Sequence-based karyotyping |
US20100112590A1 (en) * | 2007-07-23 | 2010-05-06 | The Chinese University Of Hong Kong | Diagnosing Fetal Chromosomal Aneuploidy Using Genomic Sequencing With Enrichment |
KR20110091719A (ko) * | 2008-10-31 | 2011-08-12 | 바이오메리욱스, 인코포레이티드. | 식별제를 사용한 미생물의 분리 및 특성규명 방법 |
CN103186716B (zh) * | 2011-12-29 | 2017-02-08 | 上海生物信息技术研究中心 | 基于元基因组学的未知病原快速鉴定系统及分析方法 |
US8209130B1 (en) * | 2012-04-04 | 2012-06-26 | Good Start Genetics, Inc. | Sequence assembly |
FR3001464B1 (fr) | 2013-01-25 | 2016-02-26 | Biomerieux Sa | Procede d'isolement specifique d'acides nucleiques d'interet |
US20140288844A1 (en) * | 2013-03-15 | 2014-09-25 | Cosmosid Inc. | Characterization of biological material in a sample or isolate using unassembled sequence information, probabilistic methods and trait-specific database catalogs |
US20150032711A1 (en) * | 2013-07-06 | 2015-01-29 | Victor Kunin | Methods for identification of organisms, assigning reads to organisms, and identification of genes in metagenomic sequences |
GB2551642B (en) * | 2014-10-31 | 2020-09-23 | Pendulum Therapeutics Inc | Methods and compositions relating to microbial treatment and diagnosis of disorders |
-
2017
- 2017-10-12 EP EP17780452.3A patent/EP3526344B1/de active Active
- 2017-10-12 JP JP2019519228A patent/JP7068287B2/ja active Active
- 2017-10-12 WO PCT/EP2017/076029 patent/WO2018069430A1/en unknown
- 2017-10-12 CN CN201780063630.5A patent/CN109923217B/zh active Active
- 2017-10-12 US US16/342,017 patent/US11749381B2/en active Active
Non-Patent Citations (1)
Title |
---|
None * |
Also Published As
Publication number | Publication date |
---|---|
JP7068287B2 (ja) | 2022-05-16 |
EP3526344A1 (de) | 2019-08-21 |
CN109923217B (zh) | 2023-06-16 |
JP2019537780A (ja) | 2019-12-26 |
CN109923217A (zh) | 2019-06-21 |
US20190252042A1 (en) | 2019-08-15 |
US11749381B2 (en) | 2023-09-05 |
WO2018069430A1 (en) | 2018-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3526344B1 (de) | Identifizierung und antibiotische charakterisierung von krankheitserregern in metagenomischen proben | |
Orlando et al. | Ancient DNA analysis | |
CN105740650B (zh) | 一种快速准确鉴定高通量基因组数据污染源的方法 | |
CN106868116A (zh) | 一种桑树病原菌高通量鉴定及种属分类方法及其应用 | |
CN114067911A (zh) | 通过测序获取微生物物种及相关信息的方法、装置、计算机可读存储介质和电子设备 | |
US20200234793A1 (en) | Systems and methods for metagenomic analysis | |
KR101798229B1 (ko) | 전장 리보솜 rna 서열정보를 얻는 방법 및 상기 리보솜 rna 서열정보를 이용하여 미생물을 동정하는 방법 | |
JP2022541596A (ja) | メタゲノミクス分析による標的生物種の検出および定量方法 | |
US20170283862A1 (en) | Genetic testing for predicting resistance of klebsiella species against antimicrobial agents | |
Commichaux et al. | Evaluating the accuracy of Listeria monocytogenes assemblies from quasimetagenomic samples using long and short reads | |
Garoutte et al. | Methodologies for probing the metatranscriptome of grassland soil | |
Wang et al. | Genome assembly of the A-group Wolbachia in Nasonia oneida using linked-reads technology | |
Hollister et al. | Bioinformation and’omic approaches for characterization of environmental microorganisms | |
US20240013862A1 (en) | Methods to identify novel insecticidal proteins from complex metagenomic microbial samples | |
CN115651990A (zh) | 用于预测大肠埃希菌对抗生素药敏表型的特征基因组合、试剂盒及测序方法 | |
Zhou et al. | Comparative genomic analysis of Mycoplasma anatis strains | |
De Maayer et al. | The current state of metagenomic analysis | |
Riojas et al. | Identification and Characterization of Mycobacterial Species Using Whole-Genome Sequences | |
Standeven et al. | An efficient pipeline for creating metagenomic-assembled genomes from ancient oral microbiomes | |
Drage | Whole Genome Sequencing and Analysis of Wild Isolate Fungi in the Genus Eremothecium | |
Boyes et al. | The genome sequence of the Lesser Yellow Underwing, Noctua comes Hübner, 1813 [version 1; peer review: 2 | |
Humphreys | Characterizing the Accuracy of Phylogenetic Analyses that Leverage 16S rRNA Sequencing Data | |
Sereika | Recovery of novel microbial genomes | |
West | Genome reconstruction and characterization of microbial eukaryotes in complex microbial communities through genome-resolved metagenomics | |
Ni et al. | One-Step PCR Amplicon Library Construction (OSPALC, version 1) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20190506 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R079 Ref document number: 602017024699 Country of ref document: DE Free format text: PREVIOUS MAIN CLASS: C12Q0001680000 Ipc: C12Q0001689000 |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: C12Q 1/6888 20180101ALI20200409BHEP Ipc: G16B 20/00 20190101ALI20200409BHEP Ipc: G16B 30/00 20190101ALI20200409BHEP Ipc: C12Q 1/689 20180101AFI20200409BHEP |
|
INTG | Intention to grant announced |
Effective date: 20200429 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: REF Ref document number: 1318858 Country of ref document: AT Kind code of ref document: T Effective date: 20201015 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602017024699 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: FP |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20201231 Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20201230 Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20201230 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 |
|
REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG4D |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210201 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210130 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 Ref country code: AL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20201012 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602017024699 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: BE Ref legal event code: MM Effective date: 20201031 Ref country code: AT Ref legal event code: UEP Ref document number: 1318858 Country of ref document: AT Kind code of ref document: T Effective date: 20200930 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20201031 |
|
26N | No opposition filed |
Effective date: 20210701 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20201012 Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20210130 Ref country code: TR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 Ref country code: MT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20200930 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: NL Payment date: 20231026 Year of fee payment: 7 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20231027 Year of fee payment: 7 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20231025 Year of fee payment: 7 Ref country code: DE Payment date: 20231027 Year of fee payment: 7 Ref country code: CH Payment date: 20231102 Year of fee payment: 7 Ref country code: AT Payment date: 20230920 Year of fee payment: 7 |