WO2020140848A1 - 肠道微生物测序数据处理方法、装置、存储介质及处理器 - Google Patents

肠道微生物测序数据处理方法、装置、存储介质及处理器 Download PDF

Info

Publication number
WO2020140848A1
WO2020140848A1 PCT/CN2019/129425 CN2019129425W WO2020140848A1 WO 2020140848 A1 WO2020140848 A1 WO 2020140848A1 CN 2019129425 W CN2019129425 W CN 2019129425W WO 2020140848 A1 WO2020140848 A1 WO 2020140848A1
Authority
WO
WIPO (PCT)
Prior art keywords
intestinal
target object
index
analysis
microorganisms
Prior art date
Application number
PCT/CN2019/129425
Other languages
English (en)
French (fr)
Inventor
张东亚
张陈陈
夏慧华
刘洋荧
薛文斌
Original Assignee
深圳碳云智能数字生命健康管理有限公司
深圳数字生命研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳碳云智能数字生命健康管理有限公司, 深圳数字生命研究院 filed Critical 深圳碳云智能数字生命健康管理有限公司
Publication of WO2020140848A1 publication Critical patent/WO2020140848A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Definitions

  • the present application relates to the field of gene sequencing data analysis, specifically, to a method, device, storage medium, and processor for processing intestinal microbial sequencing data.
  • HMP Human Microbiome Project
  • MateHIT Human Intestinal Metagenomics
  • the 16S rRNA (Small subunit ribosomal RNA) gene is the most commonly used molecular marker (Biomarker) in the systematic evolutionary classification of prokaryotic microorganisms, and is widely used in the study of microbial ecology.
  • Biomarker molecular marker
  • a large number of studies based on 16S rRNA genes have led to the rapid development of microbial ecology research, and are also widely used in intestinal microbial research.
  • problems using the 16S rRNA gene data analysis method such as horizontal gene transfer, heterogeneity of multiple copies, differences in gene amplification efficiency, choice of data analysis methods, etc. These problems have affected the analysis of microbial community composition and diversity Time accuracy.
  • Metagenome also known as "metagenome” refers to the sum of the genetic material of all tiny organisms in a specific environment.
  • the metagenomic sequencing method takes the entire microbial community in a specific environment as the research object. It does not need to isolate and culture the microorganisms, but extracts the total DNA of the environmental microorganisms for research, and adopts a new generation of high-throughput sequencing technology to the DNA of the environmental microorganism samples. Direct sequencing. Because of the superiority of gene metagenomics to study microbial ecology, more and more studies have used metagenomic analysis methods to study microbial ecology.
  • the present application provides an intestinal microbial sequencing data processing method, device, storage medium, and processor, to solve the related technology that has not yet used the metagenomic gene data analysis method to effectively intestinal microbial flora and the status of the flora
  • the method of analysis as well as the problem that it is not yet possible to carry out a comprehensive and multi-dimensional analysis of individual intestinal microorganisms by adopting metagenomic gene analysis methods, and to meet the needs of personalized analysis.
  • a method for processing intestinal microbe sequencing data includes: acquiring sequencing data of the intestinal microflora of the target object; annotating the sequencing data according to a standard gene database to obtain an annotation result; and according to the annotation results, intestinal microflora of the target object An evaluation is performed to obtain information on the flora of the intestinal microorganisms of the target object.
  • the flora information of the intestinal microorganisms of the target object includes information of the types of intestinal microorganisms and the relative abundance of various species of the intestinal microorganisms, wherein the relative abundance of the species is the sequencing data Summing up the relative abundance of the target genes of the species belonging to, the relative abundance of each target gene is obtained according to the annotation result.
  • performing performance analysis on the intestinal microorganisms of the target object based on the flora information includes: calculating the diversity of intestinal microorganisms of the target object Index, to determine the position of the diversity index in the reference population; where the performance analysis involves beneficial bacteria analysis and/or harmful bacteria analysis, based on the flora information
  • Performing performance analysis includes: calculating the beneficial bacteria index and/or harmful bacteria index in the intestinal microorganisms of the target object, determining the position of the beneficial bacteria index and/or harmful bacteria index in the reference population; the performance analysis involves
  • performing performance analysis on the intestinal microorganisms of the target object based on the bacterial group information includes: calculating the intestinal disease index of the target object, and determining that the intestinal disease index is in the reference population s position.
  • the diversity index is calculated as follows: in the gut microorganisms of the target object, calculate the product of the relative abundance of the species of each intestinal microorganism and the logarithm of the relative abundance of the species ; The products calculated by all the intestinal microbes are added to obtain the diversity index; the intestinal disease index is calculated by the following method: statistic the average relative abundance of species of the first type of microorganism X1 and the second type The average relative abundance of species of microorganisms is X2, and the difference between X1 and X2 is the intestinal disease index; wherein, the first category of microorganisms refers to the relative abundance of species in people with specific diseases.
  • the relative abundance value of the species in the healthy population is greater than the combination of the first preset standard of intestinal microorganisms; the second type of microorganism refers to the relative abundance value of the species in the healthy population relative to the species abundance in the population with specific diseases.
  • the performance analysis involves diversity analysis, and the performance analysis is performed on the intestinal microorganisms of the target object based on the flora information to calculate the diversity index of the intestinal microorganisms of the target object
  • it also includes importing the diversity index of intestinal microorganisms of the target object into the database of the reference population for the next performance analysis step; where the performance analysis involves beneficial bacteria analysis and/or harmful bacteria analysis And performing performance analysis on the intestinal microorganisms of the target object based on the flora information, and calculating the beneficial bacteria index and/or harmful bacteria index in the intestinal microorganisms of the target object, including The beneficial bacteria index and/or harmful bacteria index in the intestinal microorganisms of the target object are imported into the database of the reference population for the next performance analysis step; the performance analysis involves disease prediction analysis, and is based on the flora Information, performing performance analysis on the intestinal microorganisms of the target object, and calculating the intestinal disease index of the target object, further including importing the intestinal disease index of the target
  • an intestinal microbe sequencing data processing device includes: a first acquisition module configured to acquire the sequencing data of the intestinal microflora of the target object; an annotation module configured to annotate the sequencing data according to a standard gene database to obtain annotation results; a second acquisition module, According to the annotation result, the intestinal microbial flora of the target object is evaluated to obtain the intestinal microbial flora information of the target object.
  • the flora information of the intestinal microorganisms of the target object includes information of the types of intestinal microorganisms and the relative abundance of various species of the intestinal microorganisms, wherein the relative abundance of the species is the sequencing data Summing up the relative abundance of the target genes of the species belonging to, the relative abundance of each target gene is obtained according to the annotation result.
  • the device further includes: a performance analysis module configured to perform performance analysis on the intestinal microorganisms of the target object based on the flora information, and the performance analysis involves at least one of the following: diversity analysis, Beneficial bacteria analysis, harmful bacteria analysis and disease prediction analysis.
  • a performance analysis module configured to perform performance analysis on the intestinal microorganisms of the target object based on the flora information, and the performance analysis involves at least one of the following: diversity analysis, Beneficial bacteria analysis, harmful bacteria analysis and disease prediction analysis.
  • the performance analysis module further includes: a first calculation module configured to calculate the diversity index of the intestinal microorganisms of the target object; the first position is determined A module configured to determine the position of the diversity index in the reference population; where the performance analysis involves beneficial bacteria analysis and/or harmful bacteria analysis, the performance analysis module further includes: a second calculation module, set To calculate the beneficial bacteria index and/or harmful bacteria index in the intestinal microorganisms of the target object; the second position determination module is set to determine the position of the beneficial bacteria index and/or harmful bacteria index in the reference population; When the performance analysis involves disease prediction analysis, the performance analysis module further includes: a third calculation module configured to calculate the intestinal disease index of the target object; and a third position determination module configured to determine the intestine The position of the disease index in the reference population.
  • the first calculation module includes: a product unit configured to calculate a pair of the relative abundance of the species and the relative abundance of the species in the intestinal microorganisms of the target object A product of numbers; an addition unit configured to add up the products calculated by all of the intestinal microorganisms to obtain the diversity index;
  • the third calculation module includes: a statistical unit configured to count the first type of microorganisms The average relative abundance of species X1 and the average relative abundance of microorganisms of the second type X2, the difference calculation unit is set to calculate the difference between X1 and X2 to obtain the intestinal disease index, wherein,
  • the first group of microorganisms refers to a combination of intestinal microorganisms in which the relative abundance value of species in a population suffering from a specific disease is greater than the relative abundance value of species in a healthy population;
  • the second group of microorganisms refers to The relative abundance value of species in a healthy population is greater than the combination of intestinal microbes of the second preset standard relative to the relative
  • the performance analysis module further includes: a first import A module configured to import the diversity index of intestinal microorganisms of the target object into the database of the reference population for the next performance analysis step; where the performance analysis involves beneficial bacteria analysis and/or harmful bacteria analysis,
  • the performance analysis module further includes: a second import module configured to set the target The beneficial bacteria index and/or harmful bacteria index in the intestinal microorganisms of the subject are imported into the database of the reference population for the next performance analysis step; where the performance analysis involves disease prediction analysis, and the third calculation module calculates
  • the performance analysis module further includes: a third import module configured to import the intestinal disease index of the target object into the database of the reference population for use in the following A performance analysis step.
  • a storage medium including a stored program, wherein the program executes any one of the above-mentioned intestinal microbe sequencing data processing methods.
  • a processor for running a program wherein when the program is executed, any one of the above intestinal microbe sequencing data processing methods described above is executed.
  • the following steps are adopted: acquiring sequencing data of the intestinal microflora of the target object; annotating the sequencing data according to a standard gene database to obtain an annotation result; and according to the annotation result, intestinal of the target object Assessment of microbial flora to obtain microbial flora information of the intestinal microbes of the target object, solving the problem that the related technology has not used the metagenomic gene data analysis method to effectively intestinal microbial flora and the status of the microbial flora Technical issues of analytical methods.
  • the sequencing data is annotated through a standard gene database, and the intestinal microbial flora of the target object is evaluated according to the annotation result to obtain the intestinal microbial flora information of the target object, thereby achieving The technical effect of effective analysis of the intestinal microflora and the status of the microflora.
  • FIG. 1 is a flowchart 1 of a method for processing intestinal microbial sequencing data according to an embodiment of the present application
  • FIG. 2 is a flowchart 2 of a method for processing intestinal microbial sequencing data according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of an apparatus for processing intestinal microorganism sequencing data according to an embodiment of the present application.
  • Standard gene database that is, a database containing a large number of gene sequences and the information of genes and/or species corresponding to each gene sequence.
  • Standard gene databases include, but are not limited to, IGC, KO, COG, SEED subsystems, KEGG and other databases.
  • Target gene In this application, when calculating the relative abundance of species, the relative abundance of the target gene of the species in the sequencing data is added.
  • the target gene here refers to the specific gene of the microorganism, and the specific gene is only in the Microorganisms exist, or the specific gene only corresponds to the species after artificial correction.
  • sequence information to be obtained is compared in the standard gene database to obtain the gene corresponding to each sequence information, and the function and biological source information of the gene.
  • Beneficial bacteria generally refers to bacteria that grow in the human intestines and stomach and have a positive correlation with human health.
  • the beneficial bacteria can be a specific bacterial name that meets the above definition It can also be a collection of multiple specific strain names that meet the above definition.
  • Harmful bacteria generally refers to bacteria that grow in the human intestines and stomach and have a negative correlation with human health, such as: food-borne pathogens, opportunistic pathogens, etc.; in this application, unless otherwise specified, The harmful bacteria may be either a specific name of the bacterial species that meets the above definition, or a collection of multiple specific names of the bacterial species that meet the above definition.
  • Foodborne pathogenic bacteria generally refers to pathogenic bacteria that may cause food poisoning or use food as a vector.
  • Opportunistic pathogens generally refers to bacteria that are harmless to human health under normal circumstances, but cause a variety of diseases when the beneficial bacteria in the human intestine are damaged and decreased, also known as "two-faced pathogens" .
  • a method for processing intestinal microbe sequencing data is provided.
  • FIG. 1 is a flowchart 1 of a method for processing intestinal microorganism sequencing data according to an embodiment of the present application. As shown in Figure 1, the method includes the following steps:
  • Step S102 Obtain the sequencing data of the intestinal microflora of the target object
  • Step S104 Annotate the sequencing data according to a standard gene database to obtain an annotation result
  • Step S106 Evaluate the intestinal microbial flora of the target object according to the annotation result to obtain the intestinal microbial flora information of the target object.
  • the method for processing intestinal microbial sequencing data obtaineds the sequencing data of the intestinal microbial flora of the target object; annotates the sequencing data according to a standard gene database to obtain an annotation result; according to the annotation result
  • To evaluate the intestinal microbial flora of the target object to obtain the intestinal microbial flora information of the target object which solves the related technology that has not adopted the metagenomic gene data analysis method
  • the technical problem of the method of effective analysis of the state of the flora is referred to evaluate the intestinal microbial flora.
  • the sequencing data is annotated through a standard gene database, and the intestinal microbial flora of the target object is evaluated according to the annotation result to obtain the intestinal microbial flora information of the target object, thereby achieving The technical effect of effective analysis of the intestinal microflora and the status of the microflora.
  • the above obtained intestinal microbial flora information of the target object includes information on the type of intestinal microbes and the relative abundance of various species of the intestinal microbes, wherein the relative abundance of the species is Summing up the relative abundance of target genes of the species belonging to the sequencing data, the relative abundance of each target gene is obtained according to the annotation result.
  • the processing method for intestinal microbe sequencing data further includes: Perform performance analysis of microorganisms, wherein the performance analysis involves at least one of the following: diversity analysis, beneficial bacteria analysis, harmful bacteria analysis, and disease prediction analysis.
  • the method for processing intestinal microbial sequencing data provided by the embodiments of the present application, after obtaining the bacterial group information of the intestinal microbes of the target object, based on the bacterial group information
  • the performance analysis of microorganisms solves the technical problem that related technologies have not been able to implement a comprehensive and multiple analysis of individual intestinal microorganisms by adopting metagenomic gene analysis methods and meet the needs of personalized analysis.
  • the comprehensive and multi-dimensional analysis of individual intestinal micro-organisms is achieved, and the technical effect of meeting the needs of individual analysis is achieved.
  • performing performance analysis on the intestinal microorganisms of the target object based on the flora information includes: calculating the diversity index of the intestinal microorganisms of the target object to determine the diversity The position of the sex index in the reference population.
  • the diversity index is calculated according to the following method: among the intestinal microorganisms of the target object, calculate the relative abundance of the species of each intestinal microorganism and the relative abundance of the species The product of the logarithm of the degree; sum the products calculated by all the intestinal microbes to obtain the diversity index.
  • the position of the diversity index in the reference population that is, the diversity index of the reference population (the diversity index of the intestinal microbial test results of 100, 200, 300, 1000 or more healthy people) according to ascending sort order, determining the number of diversity index is less than the target object in the reference population diversity index (i d), the diversity of the target object is determined by calculating the ratio of the reference i d and the total number (s d) population Index position in the reference population
  • the diversity index of the intestinal microflora of the target object is considered to be at a low level.
  • the type and composition of the intestinal microflora of the target subject are unbalanced (may be in an unbalanced state), and there are certain hidden dangers in intestinal health; if b (for example: 0.25) ⁇ c d ⁇ c (for example: 0.75), it is considered
  • the diversity index of the intestinal microflora of the target object is at a medium level. At this time, the type and composition of the intestinal microflora of the target object are relatively balanced.
  • the diversity index of the intestinal microflora of the target object is considered to be at a high level.
  • the type of intestinal microflora of the target object More, the composition of the flora is rich, and the risk of dysbiosis is relatively low, which is beneficial to the intestinal and physical health.
  • performing performance analysis on the intestinal microorganisms of the target object based on the bacterial group information may further include: determining the intestinal tract of the target object based on the target object's diversity index and the position of the diversity index in the reference population
  • the diversity score of microorganisms can be calculated as follows:
  • the diversity score first parameter (for example: 80) * diversity index + second parameter (for example: 40);
  • the diversity score third parameter (for example: 40) * diversity index + fourth parameter (for example: 50);
  • the diversity score fifth parameter (for example: 80) * diversity index + sixth parameter (for example: 20).
  • parameter data in the above-mentioned preset test standards can be adaptively replaced based on application scenarios, and this application does not make specific limitations.
  • the diversity index of intestinal microflora refers to the abundance and complexity of intestinal microbes in the human body. It is often referred to as the “magnifying glass for intestinal health” and is used to reflect invisible health problems.
  • performing performance analysis on the intestinal microorganisms of the target object based on the flora information includes: calculating the beneficial bacteria index in the intestinal microorganisms of the target object, and determining the beneficial bacteria The position of the bacterial index in the reference population, wherein, in an optional example, the relative abundance of beneficial bacteria is taken as the beneficial bacterial index of each beneficial bacteria in the intestinal microorganism of the target object.
  • the beneficial bacteria index of the intestinal microorganisms of the target object is calculated, that is, the relative abundance of each beneficial bacteria is taken as the beneficial bacteria index of the beneficial bacteria in the intestinal of the target object.
  • the beneficial bacteria index (100, 200, 300, 1000 or more) of beneficial bacteria a in the intestinal tract of the reference population
  • the beneficial bacteria index of beneficial bacteria a in the intestinal microbial test results of a healthy population is sorted in ascending order to determine the number of beneficial bacteria index in the reference population that is less than the beneficial bacteria index of the beneficial bacteria a in the intestinal tract of the target object (i p ), determine the position of the beneficial bacteria index of the beneficial bacteria a in the intestinal tract of the target object in the reference population by calculating the ratio of i p to the total number of the reference population (s p )
  • a for example: 0
  • c p ⁇ b for example: 0.25
  • the beneficial bacteria index of beneficial bacteria a in the intestine of the target object is considered to be at a low level
  • b( For example: 0.25) ⁇ c p ⁇ c for example: 0.75
  • the beneficial bacteria index of beneficial bacteria a in the intestine of the target object is considered to be at a medium level
  • c for example: 0.75
  • ⁇ c p ⁇ d for example: 1
  • the beneficial bacteria index of beneficial bacteria a in the intestinal tract of the target object is considered to be at a relatively high level.
  • performing performance analysis on the intestinal microorganisms of the target object based on the bacterial group information may further include: determining the intestinal tract of the target object based on the beneficial bacteria index of the target object and the position of the beneficial bacteria index in the reference population
  • the beneficial bacteria score of the microorganism, the beneficial bacteria score can be calculated as follows:
  • the beneficial bacteria score of the beneficial bacteria first parameter (for example: 80) * beneficial bacteria index of the beneficial bacteria + second parameter ( For example: 40);
  • the beneficial bacteria score of the beneficial bacteria third parameter (for example: 40) * the beneficial bacteria index of the beneficial bacteria + the fourth parameter (for example : 20);
  • the beneficial bacteria score of the beneficial bacteria fifth parameter (for example: 80) * the beneficial bacteria index of the beneficial bacteria + the sixth parameter ( For example: 20).
  • parameter data in the above-mentioned preset test standards can be adaptively replaced based on application scenarios, and this application does not make specific limitations.
  • beneficial bacteria in the intestinal microorganisms of the target object
  • the types of beneficial bacteria that can be analyzed for beneficial bacteria include at least the following 26 types. For details, see Table 1.
  • performing performance analysis on the intestinal microorganisms of the target object based on the bacterial group information includes: calculating the harmful bacteria index in the intestinal microorganisms of the target object, and determining the harmful bacteria The position of the bacteria index in the reference population, where, in an optional example, the relative abundance of harmful bacteria is taken as the harmful bacteria index in the intestinal microorganisms of the target object.
  • the harmful bacteria index of harmful bacteria b in the intestine of the reference population 100, 200, 300, 1000 or more
  • the number of harmful bacteria in the intestinal microbial test results of multiple healthy people is ranked in ascending order, and the number of harmful bacteria in the reference population is less than the harmful bacteria index in the intestinal microbe of the target subject (i o ), by calculating the ratio of i p to the total number of reference population (s o ), determine the position of the harmful bacteria index of harmful bacteria in the intestinal tract of the target object in the reference population
  • the harmful bacteria index of harmful bacteria in the intestinal tract of the target object is considered to be at a low level; if b( For example: 0.25) ⁇ c o ⁇ c (for example: 0.75), the harmful bacteria index of harmful bacteria b in the intestine of the target object is considered to be at a medium level. If c (for example: 0.75) ⁇ c o ⁇ d (for example: 1), the harmful bacteria index of harmful bacteria b in the intestine of the target object is considered to be at a relatively high level.
  • performing performance analysis on the intestinal microorganisms of the target object based on the bacterial group information may further include: determining the intestinal tract of the target object based on the target object's harmful bacteria index and the position of the harmful bacteria index in the reference population
  • the harmful bacteria scores of microorganisms can be calculated as follows:
  • the harmful bacteria score of the harmful bacteria first parameter (for example: 80) * harmful bacteria index of the harmful bacteria + second parameter ( For example: 40);
  • the harmful bacteria score of the harmful bacteria third parameter (for example: 40) * the harmful bacteria index of the harmful bacteria + the fourth parameter (for example : 20);
  • the harmful bacteria score of the harmful bacteria fifth parameter (for example: 80) * harmful bacteria index of the harmful bacteria + sixth parameter ( For example: 20).
  • parameter data in the above-mentioned preset test standards can be adaptively replaced based on application scenarios, and this application does not make specific limitations.
  • a Foodborne Pathogens (foodborne pathogens) 1 Campylobacter coli 2 Campylobacter jejuni 3 Clostridium botulinum 4 Clostridium perfringens (Clostridium perfringens) 5 Cronobacter Turicensis (Cronorobacter direichensis) 6 Staphylococcus aureus (Staphylococcus aureus) 7 Vibrio cholerae (Vibrio cholerae) 8 Shigella (Shigella) 9 Salmonella (Salmonella)
  • performing performance analysis on the intestinal microorganisms of the target object based on the flora information includes: calculating the intestinal disease index of the target object, and determining the intestinal disease index Position in the reference crowd.
  • the intestinal disease index is calculated by the following method: Counting the average relative abundance of species X1 and the average relative abundance X2 of microorganisms of the second type.
  • the difference of X2 is the intestinal disease index; wherein, the first type of microorganisms refers to the relative abundance value of species in people with specific diseases relative to the abundance value of species in healthy people is greater than the first The standard combination of intestinal microorganisms; the second type of microorganisms refers to the combination of intestinal microorganisms with a relative abundance value of species in healthy people that is greater than the second preset standard .
  • the first type of microorganisms refers to a combination of intestinal microorganisms with a relative abundance value of species in a population suffering from a specific disease that is higher than that of a healthy population, if it is assumed that The relative abundance value of species in people with specific diseases is a1, and the relative abundance value of species in healthy people is a2, then the meaning of "relative" here can be a1/a2 or (a1-a2)/ a2.
  • the first preset value may be 200%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000% or higher; when it refers to (a1-a2)/a2, the first preset value may be 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100% or higher value.
  • the second category of microorganisms refers to the combination of intestinal microorganisms whose relative abundance value of species in healthy people is higher than the second preset value relative to species in people with specific diseases, similarly, if it is assumed to be healthy
  • the relative abundance value of the species in the population is b1
  • the relative abundance value of the species in the population with specific diseases is b2
  • the meaning of "relative" here can be b1/b2 or (b1-b2) /b2.
  • the second preset value may be a value of 200%, 300%, 400%, 500%, 600%, 700%, 800%, 900%, 1000% or higher; when it refers to (b1-b2)/b2, the second preset value may be 20%, 30%, 40%, 50%, 600%, 70%, 80%, 90%, 100% or higher value.
  • disease index Disease the average relative abundance of species enriched in a disease case-the average relative abundance of species enriched in the control corresponding to the disease .
  • the position of the intestinal disease index in the reference population that is, the intestinal disease using the intestinal disease index of the reference population (100, 200, 300, 1000 or more healthy people's intestinal microbial test results) Index
  • the intestinal disease index of the target subject’s intestinal microflora is considered to be at a low level; if b (for example: 0.25) ⁇ c D ⁇ c (for example: 0.75), the intestinal disease index of the intestinal microflora of the target object is considered to be at a medium level. If c (for example: 0.75) ⁇ c D ⁇ d (for example: 1), the intestinal disease index of the intestinal microflora of the target object is considered to be at a relatively high level.
  • the performance analysis of the intestinal microorganisms of the target object based on the bacterial group information may further include: determining the target object's intestinal disease index based on the target object's intestinal disease index and the position of the intestinal disease index in the reference population
  • the intestinal disease score of intestinal microorganisms can be calculated as follows:
  • the intestinal disease score first parameter (for example: 80) * intestinal disease index + second parameter (for example: 40);
  • the intestinal disease score third parameter (for example: 40) * intestinal disease index + fourth parameter (for example: 20);
  • the intestinal disease score fifth parameter (for example: 80) * intestinal disease index + sixth parameter (for example: 20).
  • parameter data in the above-mentioned preset test standards can be adaptively replaced based on application scenarios, and this application does not make specific limitations.
  • FIG. 2 is a flowchart 2 of a method for processing intestinal microbial sequencing data according to an embodiment of the present application. As shown in FIG. 2, the processing method of intestinal microbe sequencing data further includes the following steps:
  • Step S108a when the performance analysis involves diversity analysis, and the performance analysis is performed on the gut microbes of the target object based on the flora information to calculate the diversity index of the gut microbes of the target object, the gut of the target object
  • the microbial diversity index is imported into the database of the reference population for use in the next performance analysis step;
  • Step S108b when the performance analysis involves beneficial bacteria analysis and analysis of the performance of the intestinal microorganisms of the target object based on the flora information, and the beneficial bacteria index in the intestinal microorganisms of the target object is calculated, the The beneficial bacteria index in Dao microorganism is imported into the database of reference population for use in the next performance analysis step;
  • Step S108c when the performance analysis involves harmful bacteria analysis, and the performance analysis is performed on the gut microbes of the target object based on the bacterial group information to calculate the harmful bacteria index in the gut microbes of the target object, the gut of the target object
  • the index of harmful bacteria in microorganisms is imported into the database of the reference population for use in the next performance analysis step;
  • Step S108d when the performance analysis involves disease prediction analysis, and the performance analysis of the intestinal microorganisms of the target object is performed based on the flora information to calculate the intestinal disease index of the target object, the intestinal disease index of the target object is imported Refer to the population database for the next performance analysis step.
  • Step S108e when annotating the sequencing data according to the standard gene database and obtaining the annotation result, import the bacterial information of the target object (intestinal microorganism type information and species abundance of various intestinal microorganisms) Refer to the population database for the next performance analysis step.
  • the evaluation result (the intestinal microbial flora information of the target object ,
  • the microbial group information includes intestinal microbial diversity information, and may further include relative intestinal microbial relative abundance information, etc.) is added to the reference population database, so as to update the reference range of each indicator in real time.
  • the processing method of intestinal microbial sequencing data of this application will also be based on the phenotype characteristics of the individual to be tested (including gender, age, race, height, weight , Diet, living area, etc.) Select the corresponding reference population for specific intestinal microbial health analysis, and then obtain more accurate and reliable health status assessment results.
  • the target number of initial reference objects are initially stored in the database, and the database specifically records the phenotypic information of each initial reference object (including gender, age, race, height, weight, diet, living area, etc.) ) And intestinal microbial health status assessment information (including the initial reference subject’s intestinal microbial diversity index, beneficial bacteria index, harmful bacteria index, intestinal disease index, etc., and can further include the relative abundance information of each intestinal microbe Wait).
  • step S102 of the processing method for intestinal microbial sequencing data of the present application needs to be described:
  • obtaining the sequencing data of the intestinal microflora of the target object in step S102 may be implemented in the following manner:
  • Step A1 Perform gene sequencing on the intestinal microbial sampling sample of the target object to obtain original genetic data of the intestinal microbial flora of the target object;
  • Step A2 Perform quality monitoring on the original gene data, that is, remove gene sequences with a fuzzy base number greater than a preset value in the original gene data, and remove low-quality gene sequences in the original gene data, where low quality
  • the gene sequence is a gene sequence whose length is less than a certain number after excluding low-quality continuous bases, wherein the above-mentioned certain number can be natural numbers such as 3, 4, and 5, which can be adjusted adaptively based on application scenarios;
  • Step A3 Delete the host gene sequence in the original gene data to obtain sequencing data of the intestinal microbial flora of the target object, where the host gene sequence is the gene sequence of the target object.
  • step S104 of the processing method for intestinal microbial sequencing data of the present application needs to be described:
  • step S104 annotates the sequencing data according to a standard gene database, and obtaining the annotation result can be achieved as follows:
  • Step B1 Compare the gene sequence in the sequencing data to a standard gene database (for example: the integrated gene set IGC of human intestinal microbial metagenome) to determine the relative abundance of each gene sequence included in the sequencing data (for example: determine the sequencing The gene abundance file corresponding to the data, in which the right column of the file contains the gene ID, and the left column contains the gene relative abundance corresponding to the right gene ID in turn);
  • a standard gene database for example: the integrated gene set IGC of human intestinal microbial metagenome
  • Step B2 based on the annotation information of each gene sequence recorded in the standard gene database (the annotation information includes: the species to which each gene sequence belongs), and the relative abundance of each gene sequence included in the sequencing data, it is determined that the sequencing data contains The relative abundance of each species in (for example: determine the species abundance file corresponding to the sequencing data, where the right column of the file is the species ID, and the left column is the relative abundance of the species corresponding to the right species ID in turn);
  • Step B3 based on the annotation information of each gene sequence recorded in the standard gene database (the annotation information includes: the biological function to which each gene sequence belongs), and the relative abundance of each gene sequence included in the sequencing data, determine the sequencing data
  • the relative abundance of each biological function included in for example: determine the biological function abundance file corresponding to the sequencing data, where the right column of the file is the biological function ID, and the left column is the right biological function ID in turn corresponds to the relative abundance of biological functions).
  • This application analyzes the data generated by metagenomic sequencing. Compared with the traditional 16SrRNA gene data analysis method, it can detect more comprehensive content of human intestinal microflora.
  • the processing method of intestinal microbial sequencing data provided in this application can predict at least 9 important diseases. Compared with other detection and analysis technologies, the processing method of intestinal microbial sequencing data provided by the present application has a more comprehensive prediction of diseases.
  • This technical solution can obtain the intestinal microbial diversity information of the detected target object, including the type of intestinal microbe, the relative abundance of each specific microorganism and the relative abundance of each specific microorganism in the reference population, beneficial bacteria , The relative abundance of food-borne pathogens and opportunistic pathogens, and their relative abundance in the population, respectively, so that the intestinal microbes and the specific microorganisms can be comprehensively evaluated
  • the situation can realize comprehensive and multi-dimensional analysis of intestinal microbes and realize personalized analysis.
  • the embodiment of the present application also provides a processing device for intestinal microbial sequencing data. It should be noted that the processing device for the intestinal microbe sequencing data of the embodiment of the present application can be used to perform Dao microbial sequencing data processing method.
  • the processing apparatus for intestinal microbe sequencing data provided by the embodiments of the present application will be described below.
  • the device includes: a first acquisition module 31, an annotation module 33, and a second acquisition module 35.
  • the first obtaining module 31 is configured to obtain sequencing data of the intestinal microflora of the target object
  • the annotation module 33 is configured to annotate the sequencing data according to a standard gene database to obtain annotation results
  • the second obtaining module 35 evaluates the intestinal microbial flora of the target object according to the annotation result, and obtains the intestinal microbial flora information of the target object.
  • the flora information of the intestinal microbes of the target object includes information of the types of intestinal microbes and various species of the intestinal microbes Relative abundance, wherein the relative abundance of the species is the sum of the relative abundances of the target genes of the species to which the sequencing data belongs, and the relative abundance of each target gene is obtained according to the annotation result.
  • the apparatus further includes: a performance analysis module configured to perform intestinal microbes of the target object based on the bacterial group information Performance analysis, the performance analysis involves at least one of the following: diversity analysis, beneficial bacteria analysis, harmful bacteria analysis, and disease prediction analysis.
  • the performance analysis module further includes: a first calculation module, configured to calculate The diversity index of the intestinal microorganisms of the target object; the first position determination module is set to determine the position of the diversity index in the reference population; where the performance analysis involves beneficial bacteria analysis and/or harmful bacteria analysis
  • the performance analysis module further includes: a second calculation module configured to calculate a beneficial bacteria index and/or harmful bacteria index in the intestinal microorganisms of the target object; a second position determination module configured to determine the beneficial The position of the bacterial index and/or harmful bacteria index in the reference population; where the performance analysis involves disease prediction analysis, the performance analysis module further includes: a third calculation module configured to calculate the intestine of the target object Tract disease index; the third position determination module is set to determine the position of the intestinal disease index in the reference population.
  • the first calculation module includes: a product unit configured to calculate each intestine in the intestinal microbe of the target object The product of the relative abundance of the species of microorganisms and the logarithm of the relative abundance of the species; an addition unit configured to add up the products calculated by all the microorganisms to obtain the diversity index;
  • the first The three calculation modules include: a statistical unit configured to count the average relative abundance of species X1 and the average relative abundance of the second species X2, and a difference calculation unit configured to calculate the X1 and X2 The difference between them, the intestinal disease index is obtained, wherein the first type of microorganisms refers to the relative abundance value of species in people with specific diseases relative to the abundance value of species in healthy people is greater than the first preset The combination of standard intestinal microorganisms; the second type of microorganisms refers to the relative abundance value of species in healthy people relative to the intestinal microorganisms in populations with specific diseases greater
  • the performance analysis involves diversity analysis
  • the first calculation module calculates the diversity of intestinal microbes of the target object
  • the performance analysis module further includes: a first import module configured to import the diversity index of intestinal microbes of the target object into the database of the reference population for use in the next performance analysis step ;
  • the performance analysis module further includes: a second import module configured to import the beneficial bacteria index and/or harmful bacteria index in the intestinal microorganisms of the target object into the database of the reference population for use in the next performance analysis step;
  • the performance analysis module further includes: a third import module configured to set the target The subject's intestinal disease index is imported into the database of
  • the apparatus for processing intestinal microbial sequencing data obtained by the embodiment of the present application obtains the sequencing data of the intestinal microbial flora of the target object through the first acquisition module 31; the annotation module 33 annotates the sequencing data according to the standard gene database to obtain Annotation result; the second acquisition module 35, based on the annotation result, evaluates the intestinal microbial flora of the target object to obtain the intestinal microbial flora information of the target object, which solves the problem in the related technology.
  • the sequencing data is annotated through a standard gene database, and the intestinal microbial flora of the target object is evaluated according to the annotation result to obtain the intestinal microbial flora information of the target object, thereby achieving The technical effect of effective analysis of the intestinal microflora and the status of the microflora.
  • the processing device for intestinal microbial sequencing data includes a processor and a memory.
  • the first acquisition module 31, the annotation module 33, the second acquisition module 35, and the like are all stored in the memory as program units, and the processor executes and stores the memory The above program unit to achieve the corresponding function.
  • the processor contains a core, and the core retrieves the corresponding program unit from the memory.
  • One or more kernels can be set, and the kernel parameters can be adjusted to effectively analyze the intestinal microflora and the status of the flora.
  • the memory may include non-permanent memory, random access memory (RAM) and/or non-volatile memory in a computer-readable medium, such as read only memory (ROM) or flash memory (flash RAM), and the memory includes at least one Memory chip.
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • An embodiment of the present application provides a storage medium on which a program is stored, and when the program is executed by a processor, the method for processing intestinal microbe sequencing data is implemented.
  • An embodiment of the present application provides a processor for running a program, where the processing method of intestinal microbial sequencing data is executed when the program is run.
  • An embodiment of the present application provides a device, which includes a processor, a memory, and a program stored on the memory and executable on the processor.
  • the processor executes the program, the following steps are achieved: acquiring the intestinal microflora of the target object Sequencing data; annotate the sequencing data according to a standard gene database to obtain an annotation result; according to the annotation result, evaluate the intestinal microbial flora of the target object to obtain the intestinal microbial bacteria of the target object Group information.
  • the flora information of the intestinal microorganisms of the target object includes information of the types of intestinal microorganisms and the relative abundance of various species of the intestinal microorganisms, wherein the relative abundance of the species is the sequencing data Summing up the relative abundance of the target genes of the species belonging to, the relative abundance of each target gene is obtained according to the annotation result.
  • performing performance analysis on the intestinal microorganisms of the target object based on the flora information includes: calculating the diversity of intestinal microorganisms of the target object Index, to determine the position of the diversity index in the reference population; where the performance analysis involves beneficial bacteria analysis and/or harmful bacteria analysis, based on the flora information
  • Performing performance analysis includes: calculating the beneficial bacteria index and/or harmful bacteria index in the intestinal microorganisms of the target object, determining the position of the beneficial bacteria index and/or harmful bacteria index in the reference population; the performance analysis involves
  • performing performance analysis on the intestinal microorganisms of the target object based on the bacterial group information includes: calculating the intestinal disease index of the target object, and determining that the intestinal disease index is in the reference population s position.
  • the diversity index is calculated as follows: in the gut microorganisms of the target object, calculate the product of the relative abundance of the species of each intestinal microorganism and the logarithm of the relative abundance of the species ; The products calculated by all the intestinal microbes are added to obtain the diversity index; the intestinal disease index is calculated by the following method: statistic the average relative abundance of species of the first type of microorganism X1 and the second type The average relative abundance of species of microorganisms is X2, and the difference between X1 and X2 is the intestinal disease index; wherein, the first category of microorganisms refers to the relative abundance of species in people with specific diseases.
  • the relative abundance value of the species in the healthy population is greater than the combination of the first preset standard of intestinal microorganisms; the second type of microorganism refers to the relative abundance value of the species in the healthy population relative to the species abundance in the population with specific diseases.
  • the performance analysis involves diversity analysis, and the performance analysis is performed on the intestinal microorganisms of the target object based on the flora information to calculate the diversity index of the intestinal microorganisms of the target object
  • it also includes importing the diversity index of intestinal microorganisms of the target object into the database of the reference population for the next performance analysis step; where the performance analysis involves beneficial bacteria analysis and/or harmful bacteria analysis And performing performance analysis on the intestinal microorganisms of the target object based on the flora information, and calculating the beneficial bacteria index and/or harmful bacteria index in the intestinal microorganisms of the target object, including The beneficial bacteria index and/or harmful bacteria index in the intestinal microorganisms of the target object are imported into the database of the reference population for the next performance analysis step; the performance analysis involves disease prediction analysis, and is based on the flora Information, performing performance analysis on the intestinal microorganisms of the target object, and calculating the intestinal disease index of the target object, further including importing the intestinal disease index of the target
  • the present application also provides a computer program product, which when executed on a data processing device, is suitable for executing a program initialized with the following method steps: acquiring sequencing data of the intestinal microflora of the target object; Annotate the sequencing data to obtain an annotation result; according to the annotation result, evaluate the intestinal microbial flora of the target object to obtain microbial flora information of the intestinal microbe of the target object.
  • the flora information of the intestinal microorganisms of the target object includes information of the types of intestinal microorganisms and the relative abundance of various species of the intestinal microorganisms, wherein the relative abundance of the species is the sequencing data Summing up the relative abundance of the target genes of the species belonging to, the relative abundance of each target gene is obtained according to the annotation result.
  • performing performance analysis on the intestinal microorganisms of the target object based on the flora information includes: calculating the diversity of intestinal microorganisms of the target object Index, to determine the position of the diversity index in the reference population; where the performance analysis involves beneficial bacteria analysis and/or harmful bacteria analysis, based on the flora information
  • Performing performance analysis includes: calculating the beneficial bacteria index and/or harmful bacteria index in the intestinal microorganisms of the target object, determining the position of the beneficial bacteria index and/or harmful bacteria index in the reference population; the performance analysis involves
  • performing performance analysis on the intestinal microorganisms of the target object based on the bacterial group information includes: calculating the intestinal disease index of the target object, and determining that the intestinal disease index is in the reference population s position.
  • the diversity index is calculated as follows: in the gut microorganisms of the target object, calculate the product of the relative abundance of the species of each intestinal microorganism and the logarithm of the relative abundance of the species ; The products calculated by all the intestinal microbes are added to obtain the diversity index; the intestinal disease index is calculated by the following method: statistic the average relative abundance of species of the first type of microorganism X1 and the second type The average relative abundance of species of microorganisms is X2, and the difference between X1 and X2 is the intestinal disease index; wherein, the first category of microorganisms refers to the relative abundance of species in people with specific diseases.
  • the relative abundance value of the species in the healthy population is greater than the combination of the first preset standard of intestinal microorganisms; the second type of microorganism refers to the relative abundance value of the species in the healthy population relative to the species abundance in the population with specific diseases.
  • the performance analysis involves diversity analysis, and the performance analysis is performed on the intestinal microorganisms of the target object based on the flora information to calculate the diversity index of the intestinal microorganisms of the target object
  • it also includes importing the diversity index of intestinal microorganisms of the target object into the database of the reference population for the next performance analysis step; where the performance analysis involves beneficial bacteria analysis and/or harmful bacteria analysis And performing performance analysis on the intestinal microorganisms of the target object based on the flora information, and calculating the beneficial bacteria index and/or harmful bacteria index in the intestinal microorganisms of the target object, including The beneficial bacteria index and/or harmful bacteria index in the intestinal microorganisms of the target object are imported into the database of the reference population for the next performance analysis step; the performance analysis involves disease prediction analysis, and is based on the flora Information, performing performance analysis on the intestinal microorganisms of the target object, and calculating the intestinal disease index of the target object, further including importing the intestinal disease index of the target
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
  • computer usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions
  • the device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to generate computer-implemented processing, which is executed on the computer or other programmable device
  • the instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
  • the computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-permanent memory, random access memory (RAM) and/or non-volatile memory in a computer-readable medium, such as read only memory (ROM) or flash memory (flash RAM).
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • Computer-readable media including permanent and non-permanent, removable and non-removable media, can store information by any method or technology.
  • the information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.
  • computer-readable media does not include temporary computer-readable media (transitory media), such as modulated data signals and carrier waves.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
  • computer usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the following steps are adopted: acquiring sequencing data of the intestinal microflora of the target object; annotating the sequencing data according to a standard gene database to obtain an annotation result; and according to the annotation result, intestinal of the target object Assessment of microbial flora to obtain microbial flora information of the intestinal microbes of the target object, solving the problem that the related technology has not used the metagenomic gene data analysis method to effectively intestinal microbial flora and the status of the microbial flora Technical issues of analytical methods.
  • the sequencing data is annotated through a standard gene database, and the intestinal microbial flora of the target object is evaluated according to the annotation result to obtain the intestinal microbial flora information of the target object, thereby achieving The technical effect of effective analysis of the intestinal microflora and the status of the microflora.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Genetics & Genomics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Bioethics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

一种肠道微生物测序数据处理方法、装置、存储介质及处理器。该方法包括:获取目标对象的肠道微生物菌群的测序数据(S102);根据标准基因数据库对测序数据进行注释,得到注释结果(S104);根据注释结果,对目标对象的肠道微生物菌群进行评估,获得目标对象的肠道微生物的菌群信息(S106)。该方法解决了相关技术中尚未有采用宏基因组基因数据分析方法对肠道微生物菌群和对菌群的状态进行有效的分析的方法,以及尚不能实现通过采用宏基因组基因分析方法对个体肠道微生物进行全面、多元的分析,并满足个性化分析的需求的问题。

Description

肠道微生物测序数据处理方法、装置、存储介质及处理器 技术领域
本申请涉及基因测序数据分析领域,具体而言,涉及一种肠道微生物测序数据的处理方法、装置、存储介质及处理器。
背景技术
随着人类微生物组计划(HMP)和人类肠道宏基因组学(MateHIT)项目的开展,越来越多的研究表明,人体的生理代谢和生长发育不仅受自身基因控制,有许多现象,如对疾病的易感性、药物反应等,无法全部用人体基因的差异来解释。这是因为,人体内生活着大量微生物,它们的组成和活动与人的生长发育、生老病死息息相关。
16S rRNA(Small subunit ribosomal RNA)基因是对原核微生物进行系统进化分类研究时最常用的分子标记物(Biomarker),广泛应用于微生物生态学研究中。近些年来随着高通量测序技术及数据分析方法等的不断进步,大量基于16S rRNA基因的研究使得微生物生态学研究得到了快速的发展,在肠道微生物研究方面也得到广泛的应用,然而使用16S rRNA基因数据分析法也存在诸多问题,比如水平基因转移、多拷贝的异质性、基因扩增效率的差异、数据分析方法的选择等,这些问题都影响了微生物群落组成和多样性分析时的准确性。
宏基因组(Metagenome),又称“元基因组”,是指某个特定环境中全部微小生物遗传物质的总和。宏基因组的测序方法以特定环境中的整个微生物群落作为研究的对象,不需要对微生物进行分离培养,而是提取环境微生物总DNA进行研究,采用新一代高通量测序技术对环境微生物样本的DNA直接测序。由于基因宏基因组研究微生物生态的优越性使得越来越多的研究采用宏基因组基因分析方法研究微生物生态。然而,目前尚未有采用宏基因组基因数据分析方法对肠道微生物菌群和对菌群的状态进行有效的分析的方法,尚不能实现通过采用宏基因组基因分析方法对个体肠道微生物进行全面、多元的分析,并满足个性化分析的需求。
发明内容
本申请提供一种肠道微生物测序数据处理方法、装置、存储介质及处理器,以解决相关技术中尚未有采用宏基因组基因数据分析方法对肠道微生物菌群和对菌群的状态进行有效的分析的方法,以及尚不能实现通过采用宏基因组基因分析方法对个体肠道微生物进行全面、多元的分析,并满足个性化分析的需求的问题。
根据本申请的一个方面,提供了一种肠道微生物测序数据处理方法。该方法包括:获取目标对象的肠道微生物菌群的测序数据;根据标准基因数据库对所述测序数据进行注释,得到注释结果;根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的菌群信息。
可选的,所述目标对象的肠道微生物的菌群信息包括肠道微生物的种类信息和各种所述肠道微生物的物种相对丰度,其中,所述物种相对丰度为所述测序数据中所属物种的目标基因的相对丰度的加和,每个所述目标基因的相对丰度根据所述注释结果获得。
可选的,在所述性能分析涉及多样性分析的情况下,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道微生物的多样性指数,确定所述多样性指数在参考人群中的位置;在所述性能分析涉及有益菌分析和/或有害菌分析的情况下,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道微生物中有益菌指数和/或有害菌指数,确定所述有益菌指数和/或有害菌指数在参考人群中的位置;在所述性能分析涉及疾病预测分析的情况下,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道疾病指数,确定所述肠道疾病指数在参考人群中的位置。
可选的,所述多样性指数按照如下方法计算:在所述目标对象的肠道微生物中,计算每种肠道微生物的所述物种相对丰度与所述物种相对丰度的对数的乘积;将所有所述肠道微生物计算得到的乘积进行加和,得到所述多样性指数;所述肠道疾病指数通过以下方法计算:统计第一类微生物的物种相对丰度均值X1和第二类微生物的物种相对丰度均值X2,所述X1与所述X2的差值即为所述肠道疾病指数;其中,第一类微生物指在患有特定疾病的人群中的物种相对丰度值相对健康人群中的物种相对丰度值大于第一预设标准的肠道微生物的组合;第二类微生物指在健康人群中的物种相对丰度值相对患有特定疾病的人群中的物种相对丰度值大于第二预设标准的肠道微生物的组合。
可选的,在所述性能分析涉及多样性分析,且基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,计算得到所述目标对象的肠道微生物的多样性指数的情况下,还包括将所述目标对象的肠道微生物的多样性指数导入所述参考人群的数据库以用于下一次性能分析步骤中;在所述性能分析涉及有益菌分析和/或有害菌分析,且基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,计算得到所述目标对象的肠道微生物中有益菌指数和/或有害菌指数的情况下,还包括将所述目标对象的肠道微生物中有益菌指数和/或有害菌指数导入所述参考人群的数据库以用于下一次性能分析步骤中;在所述性能分析涉及疾病预测分析,且基于所述菌群信息对所述 目标对象的肠道微生物的进行性能分析,计算得到所述目标对象的肠道疾病指数的情况下,还包括将所述目标对象的肠道疾病指数导入所述参考人群的数据库以用于下一次性能分析步骤中。
根据本申请的另一方面,提供了一种肠道微生物测序数据处理装置。该装置包括:第一获取模块,设置为获取目标对象的肠道微生物菌群的测序数据;注释模块,设置为根据标准基因数据库对所述测序数据进行注释,得到注释结果;第二获取模块,根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的菌群信息。
可选的,所述目标对象的肠道微生物的菌群信息包括肠道微生物的种类信息和各种所述肠道微生物的物种相对丰度,其中,所述物种相对丰度为所述测序数据中所属物种的目标基因的相对丰度的加和,每个所述目标基因的相对丰度根据所述注释结果获得。
可选的,所述装置还包括:性能分析模块,设置为基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,所述性能分析涉及如下至少之一:多样性分析、有益菌分析、有害菌分析及疾病预测分析。
可选的,在所述性能分析涉及多样性分析的情况下,所述性能分析模块还包括:第一计算模块,设置为计算所述目标对象的肠道微生物的多样性指数;第一位置确定模块,设置为确定所述多样性指数在参考人群中的位置;在所述性能分析涉及有益菌分析和/或有害菌分析的情况下,所述性能分析模块还包括:第二计算模块,设置为计算所述目标对象的肠道微生物中有益菌指数和/或有害菌指数;第二位置确定模块,设置为确定所述有益菌指数和/或有害菌指数在参考人群中的位置;在所述性能分析涉及疾病预测分析的情况下,所述性能分析模块还包括:第三计算模块,设置为计算所述目标对象的肠道疾病指数;第三位置确定模块,设置为确定所述肠道疾病指数在参考人群中的位置。
可选的,所述第一计算模块包括:乘积单元,设置为在所述目标对象的肠道微生物中,计算每种肠道微生物的所述物种相对丰度与所述物种相对丰度的对数的乘积;加和单元,设置为将所有所述肠道微生物计算得到的乘积进行加和,得到所述多样性指数;所述第三计算模块包括:统计单元,设置为统计第一类微生物的物种相对丰度均值X1和第二类微生物的物种相对丰度均值X2,差值计算单元,设置为计算所述X1与所述X2的差值,得到所述肠道疾病指数,其中,所述第一类微生物指在患有特定疾病的人群中的物种相对丰度值相对健康人群中的物种相对丰度值大于第一预设标准的肠道微生物的组合;所述第二类微生物指在健康人群中的物种相对丰度值相对患有特定疾病的人群中的物种相对丰度值大于第二预设标准的肠道微生物的组合。
可选的,在所述性能分析涉及多样性分析,且所述第一计算模块计算得到所述目标对象的肠道微生物的多样性指数的情况下,所述性能分析模块还包括:第一导入模块,设置为将所述目标对象的肠道微生物的多样性指数导入所述参考人群的数据库以用于下一次性能分析步骤中;在所述性能分析涉及有益菌分析和/或有害菌分析,且所述第二计算模块计算得到所述目标对象的肠道微生物中有益菌指数和/或有害菌指数的情况下,所述性能分析模块还包括:第二导入模块,设置为将所述目标对象的肠道微生物中有益菌指数和/或有害菌指数导入所述参考人群的数据库以用于下一次性能分析步骤中;在所述性能分析涉及疾病预测分析,且所述第三计算模块计算得到所述目标对象的肠道疾病指数的情况下,所述性能分析模块还包括:第三导入模块,设置为将所述目标对象的肠道疾病指数导入所述参考人群的数据库以用于下一次性能分析步骤中。
根据本申请的另一方面,提供了一种存储介质,所述存储介质包括存储的程序,其中,所述程序执行上述任意一项所述的肠道微生物测序数据处理方法。
根据本申请的另一方面,提供了一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行上述任意一项所述的肠道微生物测序数据处理方法。
通过本申请,采用以下步骤:获取目标对象的肠道微生物菌群的测序数据;根据标准基因数据库对所述测序数据进行注释,得到注释结果;根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的菌群信息,解决了相关技术中未有采用宏基因组基因数据分析方法对肠道微生物菌群和对菌群的状态进行有效的分析的方法的技术问题。
也即,通过标准基因数据库对所述测序数据进行注释,并依据注释结果对所述目标对象的肠道微生物菌群进行评估,以获得所述目标对象的肠道微生物的菌群信息,进而达到了对肠道微生物菌群和对菌群的状态进行有效的分析的技术效果。
附图说明
构成本申请的一部分的附图用来提供对本申请的进一步理解,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据本申请实施例提供的肠道微生物测序数据的处理方法的流程图一;
图2是根据本申请实施例提供的肠道微生物测序数据的处理方法的流程图二;
图3是根据本申请实施例提供的肠道微生物测序数据的处理装置的示意图。
具体实施方式
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
为了便于描述,以下对本申请实施例涉及的部分名词或术语进行说明:
标准基因数据库,即含有大量的基因序列以及各基因序列对应的基因和/或物种信息的数据库,标准基因数据库包括但不限于IGC、KO、COG、SEED subsystems、KEGG等数据库。
目标基因:本申请中在计算物种相对丰度时,按照测序数据中所属物种的目标基因的相对丰度进行加和得到,此处的目标基因指所属微生物的特异基因,该特异基因仅在该微生物中存在,或该特异基因经人工矫正后仅与该物种对应。
注释,即将获得的序列信息在标准基因数据库中进行比对,得到各序列信息对应的基因,及该基因的功能和生物来源信息。
有益菌:一般是指在人体肠胃生长的,与人体健康具有正相关性的细菌,在本申请中,在无特别说明的情况下,有益菌既可以为一具体的符合上述定义的菌种名称,也可为多个具体的符合上述定义的菌种名称的集合。
有害菌:一般是指在人体肠胃生长的,与人体健康具有负相关性的细菌,例如:食源性致病菌、机会致病菌等;在本申请中,在无特别说明的情况下,有害菌既可以为一具体的符合上述定义的菌种名称,也可为多个具体的符合上述定义的菌种名称的集合。
食源性致病菌:一般是指可能引起食物中毒或以食品为传播媒介的致病性细菌。
机会致病菌:一般是指在正常情况下对人体健康无害,但是在人体肠道有益菌受损而下降的情况下,引发多种疾病的细菌,亦被称为“两面派致病菌”。
根据本申请的实施例,提供了一种肠道微生物测序数据的处理方法。
图1是根据本申请实施例的肠道微生物测序数据的处理方法的流程图一。如图1所示,该方法包括以下步骤:
步骤S102,获取目标对象的肠道微生物菌群的测序数据;
步骤S104,根据标准基因数据库对所述测序数据进行注释,得到注释结果;
步骤S106,根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的菌群信息。
本申请实施例提供的肠道微生物测序数据的处理方法,通过获取目标对象的肠道微生物菌群的测序数据;根据标准基因数据库对所述测序数据进行注释,得到注释结果;根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的菌群信息,解决了相关技术中未有采用宏基因组基因数据分析方法对肠道微生物菌群和对菌群的状态进行有效的分析的方法的技术问题。
也即,通过标准基因数据库对所述测序数据进行注释,并依据注释结果对所述目标对象的肠道微生物菌群进行评估,以获得所述目标对象的肠道微生物的菌群信息,进而达到了对肠道微生物菌群和对菌群的状态进行有效的分析的技术效果。
需要说明的是:上述所获得所述目标对象的肠道微生物的菌群信息包括肠道微生物的种类信息和各类所述肠道微生物的物种相对丰度,其中,该物种相对丰度为所述测序数据中所属物种的目标基因的相对丰度的加和,每个所述目标基因的相对丰度根据所述注释结果获得。
进一步地,在获得所述目标对象的肠道微生物的菌群信息之后,本申请实施例提供的肠道微生物测序数据的处理方法还包括:基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,其中,所述性能分析涉及如下至少之一:多样性分析、有益菌分析、有害菌分析及疾病预测分析。
也即,本申请实施例提供的肠道微生物测序数据的处理方法,通过在获得所述目标对象的肠道微生物的菌群信息之后,还基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,解决了相关技术中尚不能实现通过采用宏基因组基因分析方法对个体肠道微生物进行全面、多元的分析,并满足个性化分析的需求的技术问题。实 现了对个体肠道微生物进行全面、多元的分析,并满足个性化分析需求的技术效果。
针对上述性能分析涉及多样性分析的情况,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道微生物的多样性指数,确定所述多样性指数在参考人群中的位置。
其中,在一个可选的示例中,所述多样性指数按照如下方法计算:在所述目标对象的肠道微生物中,计算每种肠道微生物的所述物种相对丰度与所述物种相对丰度的对数的乘积;将所有所述肠道微生物计算得到的乘积进行加和,得到所述多样性指数。
针对“对所述目标对象的肠道微生物的进行多样性分析”举例示意:
首先,计算目标对象的肠道微生物的多样性指数,
即,Shannon Index=∑P i×InP i,其中,P i为第i个物种的相对丰度。
其次,确定所述多样性指数在参考人群中的位置,即,将参考人群的多样性指数(100、200、300、1000或更多个健康人群的肠道微生物检测结果的多样性指数)按照从小到大的顺序进行排序,确定参考人群中多样性指数小于目标对象的多样性指数的人数(i d),通过计算i d与参考人群总人数(s d)的比例确定目标对象的多样性指数在参考人群中的位置
Figure PCTCN2019129425-appb-000001
以某种预设的测试标准为例,若a(例如:0)≤c d<b(例如:0.25),则认为目标对象的肠道微生物菌群的多样性指数处于较低水平,此时,目标对象的肠道微生物菌群的种类和组成不均衡(可能处于失衡状态),肠道健康存在一定的隐患;若b(例如:0.25)≤c d<c(例如:0.75),则认为目标对象的肠道微生物菌群的多样性指数处于中等水平,此时,目标对象的肠道微生物菌群的种类和组成比较均衡。若c(例如:0.75)≤c d<d(例如:1),则认为目标对象的肠道微生物菌群的多样性指数处于较高水平,此时,目标对象的肠道微生物菌群的种类较多,菌群的组成丰富,菌群失调的风险相对较低,有益于肠道和身体健康。
此外,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析还可以包括:基于目标对象的多样性指数和该多样性指数在参考人群中的位置,确定目标对象的肠道微生物的多样性得分,其多样性得分可以按照如下方法计算:
在目标对象的肠道微生物菌群的多样性指数处于较低水平的情况下,多样性得分=第一参数(例如:80)*多样性指数+第二参数(例如:40);
在目标对象的肠道微生物菌群的多样性指数处于中等水平的情况下,多样性得分=第三参数(例如:40)*多样性指数+第四参数(例如:50);
在目标对象的肠道微生物菌群的多样性指数处于较高水平的情况下,多样性得分=第五参数(例如:80)*多样性指数+第六参数(例如:20)。
需要说明的是:上述预设的测试标准中的参数数据,可以基于应用场景适应性替换,本申请不做具体限定。
最后需要说明的是,肠道微生物菌群的多样性指数是指人体的肠道微生物的丰富程度和复杂程度,常被称为“肠道健康的放大镜”,用于反映看不见的健康问题。多样性指数在一定的范围内越高,则表明微生物菌群生态系统越稳定,越不容易被外界因素(如不规律饮食)所干扰;反之,多样性指数在一定的范围内越低,则表明肠道微生物菌群越容易被外界环境所影响从而导致菌群结构失衡,并由此引发一系列的肠道相关的健康问题,如胃肠道紊乱等。
针对上述性能分析涉及有益菌分析的情况,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道微生物中有益菌指数,确定所述有益菌指数在参考人群中的位置,其中,在一个可选的示例中,将有益菌的相对丰度作为该目标对象的肠道微生物中每个有益菌的有益菌指数。
针对“对所述目标对象的肠道微生物的进行有益菌分析”举例示意:
首先,计算目标对象的肠道微生物的有益菌指数,即将每个有益菌的相对丰度作为该目标对象肠道中该有益菌的有益菌指数。
其次,确定所述每个有益菌的有益菌指数在参考人群中的位置,以有益菌a为例,将参考人群肠道内有益菌a的有益菌指数(100、200、300、1000或更多个健康人群的肠道微生物检测结果中有益菌a的有益菌指数)按照从小到大的顺序进行排序,确定参考人群中有益菌指数小于目标对象肠道内有益菌a的有益菌指数的人数(i p),通过计算i p与参考人群总人数(s p)的比例确定目标对象肠道内有益菌a的有益菌指数在参考人群中的位置
Figure PCTCN2019129425-appb-000002
以某种预设的测试标准为例,若a(例如:0)≤c p<b(例如:0.25),则认为目标对象肠道内有益菌a的有益菌指数处于较低水平;若b(例如:0.25)≤c p<c(例如:0.75),则认为目标对象肠道内有益菌a的有益菌指数处于中等水平。若c(例如:0.75)≤c p<d(例如:1),则认为目标对象肠道内有益菌a的有益菌指数处于较高 水平。
此外,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析还可以包括:基于目标对象的有益菌指数和该有益菌指数在参考人群中的位置,确定目标对象的肠道微生物的有益菌得分,其有益菌得分可以按照如下方法计算:
在目标对象肠道内某个有益菌的有益菌指数处于较低水平的情况下,该有益菌的有益菌得分=第一参数(例如:80)*该有益菌的有益菌指数+第二参数(例如:40);
在目标对象肠道内某个有益菌的有益菌指数处于适中水平的情况下,该有益菌的有益菌得分=第三参数(例如:40)*该有益菌的有益菌指数+第四参数(例如:20);
在目标对象肠道内某个有益菌的有益菌指数处于较高水平的情况下,该有益菌的有益菌得分=第五参数(例如:80)*该有益菌的有益菌指数+第六参数(例如:20)。
需要说明的是:上述预设的测试标准中的参数数据,可以基于应用场景适应性替换,本申请不做具体限定。
还需要说明的是:上述“对所述目标对象的肠道微生物的进行有益菌分析”中,可进行有益菌分析的有益菌种类至少包括以下26种,详情见表1。
表1待进行有益菌指数分析的有益菌种类
Figure PCTCN2019129425-appb-000003
Figure PCTCN2019129425-appb-000004
针对上述性能分析涉及有害菌分析的情况,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道微生物中有害菌指数,确定所述有害菌指数在参考人群中的位置,其中,在一个可选的示例中,将有害菌的相对丰度作为该目标对象的肠道微生物中有害菌指数。
针对“对所述目标对象的肠道微生物的进行有害菌分析”举例示意:
首先,计算目标对象的肠道微生物的有害菌指数,即将每个有害菌的相对丰度作为该目标对象肠道中该有害菌的有害菌指数。
其次,确定所述每个有害菌的有害菌指数在参考人群中的位置,以有害菌b为例,将参考人群肠道内有害菌b的的有害菌指数(100、200、300、1000或更多个健康人群的肠道微生物检测结果中有害菌b的有害菌指数)按照从小到大顺序进行排序,确定参考人群中有害菌指数小于目标对象肠道内有害菌b的有害菌指数的人数(i o),通过计算i p与参考人群总人数(s o)的比例确定目标对象肠道内有害菌b的有害菌指数在参考人群中的位置
Figure PCTCN2019129425-appb-000005
以某种预设的测试标准为例,若a(例如:0)≤c o<b(例如:0.25),则认为目标对象肠道内有害菌b的有害菌指数处于较低水平;若b(例如:0.25)≤c o<c(例如:0.75),则认为目标对象肠道内有害菌b的有害菌指数处于中等水平。若c(例如: 0.75)≤c o<d(例如:1),则认为目标对象肠道内有害菌b的有害菌指数处于较高水平。
此外,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析还可以包括:基于目标对象的有害菌指数和该有害菌指数在参考人群中的位置,确定目标对象的肠道微生物的有害菌得分,其有害菌得分可以按照如下方法计算:
在目标对象肠道内某个有害菌的有害菌指数处于较低水平的情况下,该有害菌的有害菌得分=第一参数(例如:80)*该有害菌的有害菌指数+第二参数(例如:40);
在目标对象肠道内某个有害菌的有害菌指数处于适中水平的情况下,该有害菌的有害菌得分=第三参数(例如:40)*该有害菌的有害菌指数+第四参数(例如:20);
在目标对象肠道内某个有害菌的有害菌指数处于较高水平的情况下,该有害菌的有害菌得分=第五参数(例如:80)*该有害菌的有害菌指数+第六参数(例如:20)。
需要说明的是:上述预设的测试标准中的参数数据,可以基于应用场景适应性替换,本申请不做具体限定。
还需要说明的是:上述“对所述目标对象的肠道微生物的进行有害菌分析”中,可进行有害菌分析的有害菌种类至少包括以下29种(9种食源性致病菌,20种机会性致病菌),详情见表2和表3。
表2待进行有害菌指数分析的食源性致病菌种类
  Foodborne pathogens(食源性致病菌)
1 Campylobacter coli(大肠弯曲杆菌)
2 Campylobacter jejuni(空肠弯曲杆菌)
3 Clostridium botulinum(肉毒杆菌)
4 Clostridium perfringens(产气荚膜梭状芽胞杆菌)
5 Cronobacter turicensis(苏黎世克罗诺罗杆菌)
6 Staphylococcus aureus(金黄色葡萄球菌)
7 Vibrio cholerae(霍乱弧菌)
8 Shigella(志贺氏菌)
9 Salmonella(沙门菌属)
表3待进行有害菌指数分析的机会性致病菌种类
Figure PCTCN2019129425-appb-000006
Figure PCTCN2019129425-appb-000007
针对上述性能分析涉及疾病预测分析的情况,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道疾病指数,确定所述肠道疾病指数在参考人群中的位置。
其中,在一个可选的示例中,所述肠道疾病指数通过以下方法计算:统计第一类微生物的物种相对丰度均值X1和第二类微生物的物种相对丰度均值X2,所述X1与所述X2的差值即为所述肠道疾病指数;其中,第一类微生物指在患有特定疾病的人群中的物种相对丰度值相对健康人群中的物种相对丰度值大于第一预设标准的肠道微生物的组合;第二类微生物指在健康人群中的物种相对丰度值相对患有特定疾病的人群中的物种相对丰度值大于第二预设标准的肠道微生物的组合。
具体的,第一类微生物指在患有特定疾病的人群中的物种相对丰度值相对健康人群中的物种相对丰度值高于第一预设值的肠道微生物的组合,如果假设在患有特定疾病的人群中的物种相对丰度值为a1,健康人群中的物种相对丰度值为a2,则此处“相 对”的涵义可以是a1/a2,也可以是(a1-a2)/a2。当其指a1/a2时,所述第一预设值可为200%、300%、400%、500%、600%、700%、800%、900%、1000%或更高的值;当其指(a1-a2)/a2时,所述第一预设值可为20%、30%、40%、50%、60%、70%、80%、90%、100%或更高的值。第二类微生物指在健康人群中的物种相对丰度值相对患有特定疾病的人群中的物种相对丰度值高于第二预设值的肠道微生物的组合,类似地,如果假设在健康人群中的物种相对丰度值为b1,在患有特定疾病的人群中的物种相对丰度值为b2,则此处“相对”的涵义可以是b1/b2,也可以是(b1-b2)/b2。当其指b1/b2时,所述第二预设值可为200%、300%、400%、500%、600%、700%、800%、900%、1000%或更高的值;当其指(b1-b2)/b2时,所述第二预设值可为20%、30%、40%、50%、600%、70%、80%、90%、100%或更高的值。
针对“对所述目标对象的肠道微生物的进行疾病预测分析”举例示意:
首先,计算目标对象的肠道微生物的肠道疾病指数,即,疾病指数Disease Index=某种疾病case中富集的物种相对丰度均值-该疾病对应的control中富集的物种相对丰度均值。
其次,确定所述肠道疾病指数在参考人群中的位置,即,将参考人群的肠道疾病指数(100、200、300、1000或更多个健康人群的肠道微生物检测结果的肠道疾病指数)按照从小到大的顺序进行排序,确定参考人群中肠道疾病指数小于目标对象的肠道疾病指数的人数(i D),通过计算i d与参考人群总人数(s D)的比例确定目标对象的肠道疾病指数在参考人群中的位置
Figure PCTCN2019129425-appb-000008
以某种预设的测试标准为例,若a(例如:0)≤c D<b(例如:0.25),则认为目标对象的肠道微生物菌群的肠道疾病指数处于较低水平;若b(例如:0.25)≤c D<c(例如:0.75),则认为目标对象的肠道微生物菌群的肠道疾病指数处于中等水平。若c(例如:0.75)≤c D<d(例如:1),则认为目标对象的肠道微生物菌群的肠道疾病指数处于较高水平。
此外,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析还可以包括:基于目标对象的肠道疾病指数和该肠道疾病指数在参考人群中的位置,确定目标对象的肠道微生物的肠道疾病得分,其肠道疾病得分可以按照如下方法计算:
在目标对象的肠道微生物菌群的肠道疾病指数处于较低水平的情况下,肠道疾病得分=第一参数(例如:80)*肠道疾病指数+第二参数(例如:40);
在目标对象的肠道微生物菌群的肠道疾病指数处于中等水平的情况下,肠道疾病 得分=第三参数(例如:40)*肠道疾病指数+第四参数(例如:20);
在目标对象的肠道微生物菌群的肠道疾病指数处于较高水平的情况下,肠道疾病得分=第五参数(例如:80)*肠道疾病指数+第六参数(例如:20)。
需要说明的是:上述预设的测试标准中的参数数据,可以基于应用场景适应性替换,本申请不做具体限定。
还需要说明的是:上述“对所述目标对象的肠道微生物的进行疾病预测分析”中,可进行肠道疾病分析的疾病种类至少包括以下6种,详情见表4。
表6待进行预测疾病分析的疾病类型
  Microbiome risk of diseases
1 T2D(2型糖尿病)
2 Obese(肥胖)
3 CRC(结直肠癌)
4 CHD(冠心病)
5 RA(类风湿性关节炎)
6 NAFLD(非酒精性脂肪肝)
7 Gout(痛风)
8 IBD(炎症性肠病)
9 AD(阿尔茨海默症)
进一步地,图2是根据本申请实施例的肠道微生物测序数据的处理方法的流程图二。如图2所示,该肠道微生物测序数据的处理方法还包括以下步骤:
步骤S108a,在性能分析涉及多样性分析,且基于菌群信息对目标对象的肠道微生物的进行性能分析,计算得到目标对象的肠道微生物的多样性指数的情况下,将目标对象的肠道微生物的多样性指数导入参考人群的数据库以用于下一次性能分析步骤中;
步骤S108b,在性能分析涉及有益菌分析分析,且基于菌群信息对目标对象的肠道微生物的进行性能分析,计算得到目标对象的肠道微生物中有益菌指数的情况下,将目标对象的肠道微生物中有益菌指数导入参考人群的数据库以用于下一次性能分析步骤中;
步骤S108c,在性能分析涉及有害菌分析,且基于菌群信息对目标对象的肠道微生物的进行性能分析,计算得到目标对象的肠道微生物中有害菌指数的情况下,将目 标对象的肠道微生物中有害菌指数导入参考人群的数据库以用于下一次性能分析步骤中;
步骤S108d,在性能分析涉及疾病预测分析,且基于菌群信息对目标对象的肠道微生物的进行性能分析,计算得到目标对象的肠道疾病指数的情况下,将目标对象的肠道疾病指数导入参考人群的数据库以用于下一次性能分析步骤中。
步骤S108e,在根据标准基因数据库对所述测序数据进行注释,得到注释结果的情况下,将目标对象的菌群信息(肠道微生物的种类信息和各种肠道微生物的物种相对丰度)导入参考人群的数据库以用于下一次性能分析步骤中。
也即,在对每个目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的菌群信息之后,还会将其评估结果(目标对象的肠道微生物的菌群信息,该菌群信息包括肠道微生物多样性信息,还可进一步包括各肠道微生物相对丰度信息等)添加至参考人群的数据库中,以便对每项指标的参考范围进行实时更新。
随着使用本申请肠道微生物测序数据的处理方法,进行肠道微生物健康状况评估的参与个体越来越多,数据库中存储的参考人群的规模将不断的扩大,进而使得肠道微生物健康状况评估的结果也会越来越准确,基于肠道微生物测序数据处理方法的肠道微生物健康状况评估的参考价值也越来越大。
最后,当数据库中存储的参考人群数达到一定的丰富程度的时候,本申请肠道微生物测序数据的处理方法还会根据待测个体的表型特征(包括性别,年龄,人种,身高、体重、饮食、居住区域等)选取相应的参考人群进行具体肠道微生物健康状况分析,进而获得更加精准、可靠的健康状况评估结果。
还需要说明的是:数据库中最初存储有目标数量个初始参考对象,其数据库具体记录有每个初始参考对象的表型信息(包括性别,年龄,人种,身高、体重、饮食、居住区域等)和肠道微生物健康状况评估信息(包括该初始参考对象的肠道微生物的多样性指数、有益菌指数、有害菌指数、肠道疾病指数等,还可进一步包括各肠道微生物相对丰度信息等)。
此外,还需要针对本申请肠道微生物测序数据的处理方法的步骤S102进行说明:
在一个可选的示例中,步骤S102获取目标对象的肠道微生物菌群的测序数据可以通过如下方式实现:
步骤A1,对目标对象的肠道微生物采样样本进行基因测序,获取目标对象的肠道微生物菌群的原始基因数据;
步骤A2,对该原始基因数据进行质量监控,即,将原始基因数据中的模糊碱基数 量大于预设数值的基因序列剔除,以及将原始基因数据中的低质量基因序列剔除,其中,低质量基因序列为剔除低质量连续碱基后的基因序列长度小于一定数量的基因序列,其中,上述一定数量可以为3、4、5等自然数,该自然数可基于应用场景适应性调整;
步骤A3,将原始基因数据中的寄主基因序列剔除,得到目标对象的肠道微生物菌群的测序数据,其中,寄主基因序列为目标对象的基因序列。
此外,还需要针对本申请肠道微生物测序数据的处理方法的步骤S104进行说明:
在一个可选的示例中,步骤S104根据标准基因数据库对所述测序数据进行注释,得到注释结果可以通过如下方式实现:
步骤B1,将测序数据中的基因序列对比到标准基因数据库(例如:人肠道微生物宏基因组的整合基因集IGC),确定测序数据中包含的每种基因序列的相对丰度(例如:确定测序数据对应的基因丰度文件,其中,该文件右侧一列为基因ID,左侧一列为右侧基因ID依次对应的基因相对丰度);
步骤B2,基于标准基因数据库中记载的每种基因序列的注释信息(注释信息包含:每种基因序列所属物种),和测序数据中包含的每种基因序列的相对丰度,确定测序数据中包含的每种物种的相对丰度,(例如:确定测序数据对应的物种丰度文件,其中,该文件右侧一列为物种ID,左侧一列为右侧物种ID依次对应的物种相对丰度);
步骤B3,基于标准基因数据库中记载的每种基因序列的注释信息(注释信息包含:每种基因序列所属生物学功能),和测序数据中包含的每种基因序列的相对丰度,确定测序数据中包含的每种生物学功能的相对丰度,(例如:确定测序数据对应的生物学功能丰度文件,其中,该文件右侧一列为生物学功能ID,左侧一列为右侧生物学功能ID依次对应的生物学功能相对丰度)。
综上所述,本申请实施提供的肠道微生物测序数据的处理方法实现了以下技术效果:
1、本申请通过宏基因组测序产出的数据进行分析,相比于传统的16SrRNA基因数据分析方法,能够检测出人体肠道微生物菌群更为全面的内容。
2、本申请提供的肠道微生物测序数据的处理方法,至少可以对9种重要疾病进行了预测。相比于其他检测分析技术,本申请提供的肠道微生物测序数据的处理方法对疾病的预测更加全面。
3、本技术方案能够获得被检测目标对象的肠道微生物多样性信息,包括肠道微生物的种类,各具体微生物的相对丰度和各个具体微生物的相对丰度在参考人群中的位 置,有益菌、食源性致病菌和机会致病菌等的相对丰度及它们的相对丰度分别在人群中的位置,从而能够在肠道微生物整体和特定微生物进行全面评估被检人肠道微生物的状况,实现能够对肠道微生物进行全面、多元的分析,实现个性化分析。
需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
本申请实施例还提供了一种肠道微生物测序数据的处理装置,需要说明的是,本申请实施例的肠道微生物测序数据的处理装置可以用于执行本申请实施例所提供的用于肠道微生物测序数据的处理方法。以下对本申请实施例提供的肠道微生物测序数据的处理装置进行介绍。
图3是根据本申请实施例的肠道微生物测序数据的处理装置的示意图。如图3所示,该装置包括:第一获取模块31、注释模块33和第二获取模块35。
第一获取模块31,设置为获取目标对象的肠道微生物菌群的测序数据;
注释模块33,设置为根据标准基因数据库对所述测序数据进行注释,得到注释结果;
第二获取模块35,根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的菌群信息。
可选的,在本申请实施例提供的肠道微生物测序数据的处理装置中,所述目标对象的肠道微生物的菌群信息包括肠道微生物的种类信息和各种所述肠道微生物的物种相对丰度,其中,所述物种相对丰度为所述测序数据中所属物种的目标基因的相对丰度的加和,每个所述目标基因的相对丰度根据所述注释结果获得。
可选的,在本申请实施例提供的肠道微生物测序数据的处理装置中,所述装置还包括:性能分析模块,设置为基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,所述性能分析涉及如下至少之一:多样性分析、有益菌分析、有害菌分析及疾病预测分析。
可选的,在本申请实施例提供的肠道微生物测序数据的处理装置中,在所述性能分析涉及多样性分析的情况下,所述性能分析模块还包括:第一计算模块,设置为计算所述目标对象的肠道微生物的多样性指数;第一位置确定模块,设置为确定所述多样性指数在参考人群中的位置;在所述性能分析涉及有益菌分析和/或有害菌分析的情况下,所述性能分析模块还包括:第二计算模块,设置为计算所述目标对象的肠道微生物中有益菌指数和/或有害菌指数;第二位置确定模块,设置为确定所述有益菌指数 和/或有害菌指数在参考人群中的位置;在所述性能分析涉及疾病预测分析的情况下,所述性能分析模块还包括:第三计算模块,设置为计算所述目标对象的肠道疾病指数;第三位置确定模块,设置为确定所述肠道疾病指数在参考人群中的位置。
可选的,在本申请实施例提供的肠道微生物测序数据的处理装置中,所述第一计算模块包括:乘积单元,设置为在所述目标对象的肠道微生物中,计算每种肠道微生物的所述物种相对丰度与所述物种相对丰度的对数的乘积;加和单元,设置为将所有所述微生物计算得到的乘积进行加和,得到所述多样性指数;所述第三计算模块包括:统计单元,设置为统计第一类微生物的物种相对丰度均值X1和第二类微生物的物种相对丰度均值X2,差值计算单元,设置为计算所述X1与所述X2的差值,得到所述肠道疾病指数,其中,所述第一类微生物指在患有特定疾病的人群中的物种相对丰度值相对健康人群中的物种相对丰度值大于第一预设标准的肠道微生物的组合;所述第二类微生物指在健康人群中的物种相对丰度值相对患有特定疾病的人群中的物种相对丰度值大于第二预设标准的肠道微生物的组合。
可选的,在本申请实施例提供的肠道微生物测序数据的处理装置中,在所述性能分析涉及多样性分析,且所述第一计算模块计算得到所述目标对象的肠道微生物的多样性指数的情况下,所述性能分析模块还包括:第一导入模块,设置为将所述目标对象的肠道微生物的多样性指数导入所述参考人群的数据库以用于下一次性能分析步骤中;在所述性能分析涉及有益菌分析和/或有害菌分析,且所述第二计算模块计算得到所述目标对象的肠道微生物中有益菌指数和/或有害菌指数的情况下,所述性能分析模块还包括:第二导入模块,设置为将所述目标对象的肠道微生物中有益菌指数和/或有害菌指数导入所述参考人群的数据库以用于下一次性能分析步骤中;在所述性能分析涉及疾病预测分析,且所述第三计算模块计算得到所述目标对象的肠道疾病指数的情况下,所述性能分析模块还包括:第三导入模块,设置为将所述目标对象的肠道疾病指数导入所述参考人群的数据库以用于下一次性能分析步骤中。
本申请实施例提供的肠道微生物测序数据的处理装置,通过第一获取模块31获取目标对象的肠道微生物菌群的测序数据;注释模块33根据标准基因数据库对所述测序数据进行注释,得到注释结果;第二获取模块35,根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的菌群信息,解决了相关技术中未有采用宏基因组基因数据分析方法对肠道微生物菌群和对菌群的状态进行有效的分析的方法的技术问题。
也即,通过标准基因数据库对所述测序数据进行注释,并依据注释结果对所述目标对象的肠道微生物菌群进行评估,以获得所述目标对象的肠道微生物的菌群信息,进而达到了对肠道微生物菌群和对菌群的状态进行有效的分析的技术效果。
所述肠道微生物测序数据的处理装置包括处理器和存储器,上述第一获取模块31、注释模块33和第二获取模块35等均作为程序单元存储在存储器中,由处理器执行存储在存储器中的上述程序单元来实现相应的功能。
处理器中包含内核,由内核去存储器中调取相应的程序单元。内核可以设置一个或以上,通过调整内核参数来对肠道微生物菌群和对菌群的状态进行有效的分析。
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM),存储器包括至少一个存储芯片。
本申请实施例提供了一种存储介质,其上存储有程序,该程序被处理器执行时实现所述肠道微生物测序数据的处理方法。
本申请实施例提供了一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行所述肠道微生物测序数据的处理方法。
本申请实施例提供了一种设备,设备包括处理器、存储器及存储在存储器上并可在处理器上运行的程序,处理器执行程序时实现以下步骤:获取目标对象的肠道微生物菌群的测序数据;根据标准基因数据库对所述测序数据进行注释,得到注释结果;根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的菌群信息。
可选的,所述目标对象的肠道微生物的菌群信息包括肠道微生物的种类信息和各种所述肠道微生物的物种相对丰度,其中,所述物种相对丰度为所述测序数据中所属物种的目标基因的相对丰度的加和,每个所述目标基因的相对丰度根据所述注释结果获得。
可选的,在所述性能分析涉及多样性分析的情况下,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道微生物的多样性指数,确定所述多样性指数在参考人群中的位置;在所述性能分析涉及有益菌分析和/或有害菌分析的情况下,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道微生物中有益菌指数和/或有害菌指数,确定所述有益菌指数和/或有害菌指数在参考人群中的位置;在所述性能分析涉及疾病预测分析的情况下,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道疾病指数,确定所述肠道疾病指数在参考人群中的位置。
可选的,所述多样性指数按照如下方法计算:在所述目标对象的肠道微生物中,计算每种肠道微生物的所述物种相对丰度与所述物种相对丰度的对数的乘积;将所有所述肠道微生物计算得到的乘积进行加和,得到所述多样性指数;所述肠道疾病指数 通过以下方法计算:统计第一类微生物的物种相对丰度均值X1和第二类微生物的物种相对丰度均值X2,所述X1与所述X2的差值即为所述肠道疾病指数;其中,第一类微生物指在患有特定疾病的人群中的物种相对丰度值相对健康人群中的物种相对丰度值大于第一预设标准的肠道微生物的组合;第二类微生物指在健康人群中的物种相对丰度值相对患有特定疾病的人群中的物种相对丰度值大于第二预设标准的肠道微生物的组合。
可选的,在所述性能分析涉及多样性分析,且基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,计算得到所述目标对象的肠道微生物的多样性指数的情况下,还包括将所述目标对象的肠道微生物的多样性指数导入所述参考人群的数据库以用于下一次性能分析步骤中;在所述性能分析涉及有益菌分析和/或有害菌分析,且基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,计算得到所述目标对象的肠道微生物中有益菌指数和/或有害菌指数的情况下,还包括将所述目标对象的肠道微生物中有益菌指数和/或有害菌指数导入所述参考人群的数据库以用于下一次性能分析步骤中;在所述性能分析涉及疾病预测分析,且基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,计算得到所述目标对象的肠道疾病指数的情况下,还包括将所述目标对象的肠道疾病指数导入所述参考人群的数据库以用于下一次性能分析步骤中。本文中的设备可以是服务器、PC、PAD、手机等。
本申请还提供了一种计算机程序产品,当在数据处理设备上执行时,适于执行初始化有如下方法步骤的程序:获取目标对象的肠道微生物菌群的测序数据;根据标准基因数据库对所述测序数据进行注释,得到注释结果;根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的菌群信息。
可选的,所述目标对象的肠道微生物的菌群信息包括肠道微生物的种类信息和各种所述肠道微生物的物种相对丰度,其中,所述物种相对丰度为所述测序数据中所属物种的目标基因的相对丰度的加和,每个所述目标基因的相对丰度根据所述注释结果获得。
可选的,在所述性能分析涉及多样性分析的情况下,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道微生物的多样性指数,确定所述多样性指数在参考人群中的位置;在所述性能分析涉及有益菌分析和/或有害菌分析的情况下,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道微生物中有益菌指数和/或有害菌指数,确定所述有益菌指数和/或有害菌指数在参考人群中的位置;在所述性能分析涉及疾病预测分析的情况下,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道疾病指数,确定所述肠道疾病指数在参考人群中的位置。
可选的,所述多样性指数按照如下方法计算:在所述目标对象的肠道微生物中,计算每种肠道微生物的所述物种相对丰度与所述物种相对丰度的对数的乘积;将所有所述肠道微生物计算得到的乘积进行加和,得到所述多样性指数;所述肠道疾病指数通过以下方法计算:统计第一类微生物的物种相对丰度均值X1和第二类微生物的物种相对丰度均值X2,所述X1与所述X2的差值即为所述肠道疾病指数;其中,第一类微生物指在患有特定疾病的人群中的物种相对丰度值相对健康人群中的物种相对丰度值大于第一预设标准的肠道微生物的组合;第二类微生物指在健康人群中的物种相对丰度值相对患有特定疾病的人群中的物种相对丰度值大于第二预设标准的肠道微生物的组合。
可选的,在所述性能分析涉及多样性分析,且基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,计算得到所述目标对象的肠道微生物的多样性指数的情况下,还包括将所述目标对象的肠道微生物的多样性指数导入所述参考人群的数据库以用于下一次性能分析步骤中;在所述性能分析涉及有益菌分析和/或有害菌分析,且基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,计算得到所述目标对象的肠道微生物中有益菌指数和/或有害菌指数的情况下,还包括将所述目标对象的肠道微生物中有益菌指数和/或有害菌指数导入所述参考人群的数据库以用于下一次性能分析步骤中;在所述性能分析涉及疾病预测分析,且基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,计算得到所述目标对象的肠道疾病指数的情况下,还包括将所述目标对象的肠道疾病指数导入所述参考人群的数据库以用于下一次性能分析步骤中。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定 方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。存储器是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说, 本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。
工业实用性
通过本申请,采用以下步骤:获取目标对象的肠道微生物菌群的测序数据;根据标准基因数据库对所述测序数据进行注释,得到注释结果;根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的菌群信息,解决了相关技术中未有采用宏基因组基因数据分析方法对肠道微生物菌群和对菌群的状态进行有效的分析的方法的技术问题。
也即,通过标准基因数据库对所述测序数据进行注释,并依据注释结果对所述目标对象的肠道微生物菌群进行评估,以获得所述目标对象的肠道微生物的菌群信息,进而达到了对肠道微生物菌群和对菌群的状态进行有效的分析的技术效果。

Claims (14)

  1. 一种肠道微生物测序数据的处理方法,所述方法包括:
    获取目标对象的肠道微生物菌群的测序数据;
    根据标准基因数据库对所述测序数据进行注释,得到注释结果;
    根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的菌群信息。
  2. 根据权利要求1所述的方法,其中,所述目标对象的肠道微生物的菌群信息包括肠道微生物的种类信息和各种所述肠道微生物的物种相对丰度,其中,所述物种相对丰度为所述测序数据中所属物种的目标基因的相对丰度的加和,每个所述目标基因的相对丰度根据所述注释结果获得。
  3. 根据权利要求2所述的方法,其中,在获得所述目标对象的肠道微生物的菌群信息之后,所述方法还包括:
    基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,所述性能分析涉及如下至少之一:多样性分析、有益菌分析、有害菌分析及疾病预测分析。
  4. 根据权利要求3所述的方法,其中,
    在所述性能分析涉及所述多样性分析的情况下,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道微生物的多样性指数,确定所述多样性指数在参考人群中的位置;
    在所述性能分析涉及有益菌分析和/或有害菌分析的情况下,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道微生物中有益菌指数和/或有害菌指数,确定所述有益菌指数和/或有害菌指数在参考人群中的位置;
    在所述性能分析涉及疾病预测分析的情况下,基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道疾病指数,确定所述肠道疾病指数在参考人群中的位置。
  5. 根据权利要求4所述的方法,其中,
    所述多样性指数按照如下方法计算:在所述目标对象的肠道微生物中,计算每种肠道微生物的所述物种相对丰度与所述物种相对丰度的对数的乘积;将所有所述肠道微生物计算得到的乘积进行加和,得到所述多样性指数;
    所述肠道疾病指数通过以下方法计算:统计第一类微生物的物种相对丰度均 值X1和第二类微生物的物种相对丰度均值X2,所述X1与所述X2的差值即为所述肠道疾病指数;其中,第一类微生物指在患有特定疾病的人群中的物种相对丰度值相对健康人群中的物种相对丰度值大于第一预设标准的肠道微生物的组合;第二类微生物指在健康人群中的物种相对丰度值相对患有特定疾病的人群中的物种相对丰度值大于第二预设标准的肠道微生物的组合。
  6. 根据权利要求3所述的方法,其中,
    在性能分析涉及多样性分析,且基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,计算得到所述目标对象的肠道微生物的多样性指数的情况下,还包括将所述目标对象的肠道微生物的多样性指数导入参考人群的数据库以用于下一次性能分析步骤中;
    在性能分析涉及有益菌分析和/或有害菌分析,且基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,计算得到所述目标对象的肠道微生物中有益菌指数和/或有害菌指数的情况下,还包括将所述目标对象的肠道微生物中有益菌指数和/或有害菌指数导入所述参考人群的数据库以用于下一次性能分析步骤中;
    在性能分析涉及疾病预测分析,且基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析,计算得到所述目标对象的肠道疾病指数的情况下,还包括将所述目标对象的肠道疾病指数导入所述参考人群的数据库以用于下一次性能分析步骤中。
  7. 一种肠道微生物测序数据的处理装置,其中,所述装置包括:
    第一获取模块,设置为获取目标对象的肠道微生物菌群的测序数据;
    注释模块,设置为根据标准基因数据库对所述测序数据进行注释,得到注释结果;
    第二获取模块,设置为根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的菌群信息。
  8. 根据权利要求7所述的装置,其中,所述目标对象的肠道微生物的菌群信息包括肠道微生物的种类信息和各种所述肠道微生物的物种相对丰度,其中,所述物种相对丰度为所述测序数据中所属物种的目标基因的相对丰度的加和,每个所述目标基因的相对丰度根据所述注释结果获得。
  9. 根据权利要求8所述的装置,其中,所述装置还包括:性能分析模块,所述性能分析模块用于基于所述菌群信息对所述目标对象的肠道微生物的进行性能分析, 所述性能分析涉及如下至少之一:多样性分析、有益菌分析、有害菌分析及疾病预测分析。
  10. 根据权利要求9所述的装置,其中,
    在所述性能分析涉及多样性分析的情况下,所述性能分析模块还包括:第一计算模块,设置为计算所述目标对象的肠道微生物的多样性指数;第一位置确定模块,设置为确定所述多样性指数在参考人群中的位置;
    在所述性能分析涉及有益菌分析和/或有害菌分析的情况下,所述性能分析模块还包括:第二计算模块,设置为计算所述目标对象的肠道微生物中有益菌指数和/或有害菌指数;第二位置确定模块,设置为确定所述有益菌指数和/或有害菌指数在参考人群中的位置;
    在所述性能分析涉及疾病预测分析的情况下,所述性能分析模块还包括:第三计算模块,设置为计算所述目标对象的肠道疾病指数;第三位置确定模块,设置为确定所述肠道疾病指数在参考人群中的位置。
  11. 根据权利要求10所述的装置,其中,
    所述第一计算模块包括:乘积单元,设置为在所述目标对象的肠道微生物中,计算每种肠道微生物的所述物种相对丰度与所述物种相对丰度的对数的乘积;加和单元,设置为将所有所述肠道微生物计算得到的乘积进行加和,得到所述多样性指数;
    所述第三计算模块包括:统计单元,设置为统计第一类微生物的物种相对丰度均值X1和第二类微生物的物种相对丰度均值X2,差值计算单元,设置为计算所述X1与所述X2的差值,得到所述肠道疾病指数,其中,所述第一类微生物指在患有特定疾病的人群中的物种相对丰度值相对健康人群中的物种相对丰度值大于第一预设标准的肠道微生物的组合;所述第二类微生物指在健康人群中的物种相对丰度值相对患有特定疾病的人群中的物种相对丰度值大于第二预设标准的肠道微生物的组合。
  12. 根据权利要求9所述的装置,其中,性能分析模块还包括:
    第一导入模块,设置为在所述性能分析涉及多样性分析,且第一计算模块计算得到所述目标对象的肠道微生物的多样性指数的情况下,将所述目标对象的肠道微生物的多样性指数导入参考人群的数据库以用于下一次性能分析步骤中;
    第二导入模块,设置为在所述性能分析涉及有益菌分析和/或有害菌分析,且第二计算模块计算得到所述目标对象的肠道微生物中有益菌指数和/或有害菌指 数的情况下,将所述目标对象的肠道微生物中有益菌指数和/或有害菌指数导入所述参考人群的数据库以用于下一次性能分析步骤中;
    第三导入模块,设置为在所述性能分析涉及疾病预测分析,且第三计算模块计算得到所述目标对象的肠道疾病指数的情况下,将所述目标对象的肠道疾病指数导入所述参考人群的数据库以用于下一次性能分析步骤中。
  13. 一种存储介质,所述存储介质包括存储的程序,其中,所述程序执行权利要求1至6中任意一项所述的肠道微生物测序数据的处理方法。
  14. 一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行权利要求1至6中任意一项所述的肠道微生物测序数据的处理方法。
PCT/CN2019/129425 2018-12-30 2019-12-27 肠道微生物测序数据处理方法、装置、存储介质及处理器 WO2020140848A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811648980.0 2018-12-30
CN201811648980.0A CN111161794B (zh) 2018-12-30 2018-12-30 肠道微生物测序数据处理方法、装置、存储介质及处理器

Publications (1)

Publication Number Publication Date
WO2020140848A1 true WO2020140848A1 (zh) 2020-07-09

Family

ID=70555593

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/129425 WO2020140848A1 (zh) 2018-12-30 2019-12-27 肠道微生物测序数据处理方法、装置、存储介质及处理器

Country Status (2)

Country Link
CN (1) CN111161794B (zh)
WO (1) WO2020140848A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111870617B (zh) * 2019-11-04 2022-06-10 深圳碳云智能数字生命健康管理有限公司 肠道益生菌补剂配方的确定方法、装置、存储介质及处理器
CN112151117B (zh) * 2020-08-11 2023-02-03 康美华大基因技术有限公司 一种基于时间序列宏基因组数据的动态观测装置及其检测方法
CN112151118B (zh) * 2020-08-11 2022-06-28 康美华大基因技术有限公司 一种多时间序列肠道菌群数据分析流程控制方法
CN113355438B (zh) * 2021-06-02 2022-05-10 深圳吉因加医学检验实验室 一种血浆微生物物种多样性评估方法、装置和存储介质
CN113628714B (zh) * 2021-07-30 2022-04-19 美益添生物医药(武汉)有限公司 针对疾病的营养素干预方法、系统、设备及存储介质
CN114121167B (zh) * 2021-11-30 2022-07-01 深圳零一生命科技有限责任公司 一种微生物基因数据库的构建方法及系统
CN114446387A (zh) * 2021-12-31 2022-05-06 杭州拓宏生物科技有限公司 脂肪肝标志基因及其应用
CN117352057B (zh) * 2023-03-28 2024-05-10 广东弘元普康医疗科技有限公司 一种菌群分布状态的评估方法及相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050026188A1 (en) * 2003-05-30 2005-02-03 Van Kessel Andrew G. Methods of identifying, characterizing and comparing organism communities
CN107058560A (zh) * 2017-05-04 2017-08-18 深圳市英马诺生物科技有限公司 自闭症生物标志物及其检测试剂盒、应用
CN107463800A (zh) * 2017-07-19 2017-12-12 东莞博奥木华基因科技有限公司 一种肠道微生物信息分析方法及系统
US20180308569A1 (en) * 2017-04-25 2018-10-25 S Eric Luellen System or method for engaging patients, coordinating care, pharmacovigilance, analysis or maximizing safety or clinical outcomes
CN108804875A (zh) * 2018-06-21 2018-11-13 中国科学院北京基因组研究所 一种利用宏基因组数据分析微生物群体功能的方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130121968A1 (en) * 2011-10-03 2013-05-16 Atossa Genetics, Inc. Methods of combining metagenome and the metatranscriptome in multiplex profiles
JP6485843B2 (ja) * 2014-09-30 2019-03-20 ビージーアイ シェンチェン 関節リウマチのバイオマーカー及びその使用
CN105046094B (zh) * 2015-08-26 2018-08-14 深圳谱元科技有限公司 肠道菌群的检测系统及其方法和动态式数据库
TWI629607B (zh) * 2017-08-15 2018-07-11 極諾生技股份有限公司 建立腸道菌數據庫的方法和相關檢測系統
CN108841974A (zh) * 2018-06-28 2018-11-20 北京水母科技有限公司 提取粪便样本16s rRNA在婴幼儿肠道微生态成熟度监测的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050026188A1 (en) * 2003-05-30 2005-02-03 Van Kessel Andrew G. Methods of identifying, characterizing and comparing organism communities
US20180308569A1 (en) * 2017-04-25 2018-10-25 S Eric Luellen System or method for engaging patients, coordinating care, pharmacovigilance, analysis or maximizing safety or clinical outcomes
CN107058560A (zh) * 2017-05-04 2017-08-18 深圳市英马诺生物科技有限公司 自闭症生物标志物及其检测试剂盒、应用
CN107463800A (zh) * 2017-07-19 2017-12-12 东莞博奥木华基因科技有限公司 一种肠道微生物信息分析方法及系统
CN108804875A (zh) * 2018-06-21 2018-11-13 中国科学院北京基因组研究所 一种利用宏基因组数据分析微生物群体功能的方法

Also Published As

Publication number Publication date
CN111161794B (zh) 2024-03-22
CN111161794A (zh) 2020-05-15

Similar Documents

Publication Publication Date Title
WO2020140848A1 (zh) 肠道微生物测序数据处理方法、装置、存储介质及处理器
Vujkovic-Cvijin et al. Host variables confound gut microbiota studies of human disease
Xia et al. Hypothesis testing and statistical analysis of microbiome
Rothschild et al. Environment dominates over host genetics in shaping human gut microbiota
Lackey et al. What's normal? Microbiomes in human milk and infant feces are related to each other but vary geographically: the INSPIRE study
Olm et al. Necrotizing enterocolitis is preceded by increased gut bacterial replication, Klebsiella, and fimbriae-encoding bacteria
Asgari et al. MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples
Susin et al. Variable selection in microbiome compositional data analysis
Xiao et al. False discovery rate control incorporating phylogenetic tree increases detection power in microbiome-wide multiple testing
Dai et al. Batch effects correction for microbiome data with Dirichlet-multinomial regression
Zhang et al. A distance-based approach for testing the mediation effect of the human microbiome
Aizawa Ex-ante inequality of opportunity in child malnutrition: New evidence from ten developing countries in Asia
US11984199B2 (en) Methods and systems for generating compatible substance instruction sets using artificial intelligence
Beck et al. Monitoring the microbiome for food safety and quality using deep shotgun sequencing
WO2020147557A1 (zh) 肠道微生物测序数据处理方法、装置、存储介质及处理器
Tickle et al. Two-stage microbial community experimental design
Vincent et al. Excretion of host DNA in feces is associated with risk of Clostridium difficile infection
Pookhao et al. A two-stage statistical procedure for feature selection and comparison in functional analysis of metagenomes
Zhang et al. Bayesian compositional regression with structured priors for microbiome feature selection
Low et al. Longitudinal changes in diet cause repeatable and largely reversible shifts in gut microbial communities of laboratory mice and are observed across segments of the entire intestinal tract
Lim et al. Growth phase estimation for abundant bacterial populations sampled longitudinally from human stool metagenomes
Prins et al. The gut microbiome across the cardiovascular risk spectrum
Schaan et al. The structure of Brazilian Amazonian gut microbiomes in the process of urbanisation
Chetty et al. Multi-omic approaches for host-microbiome data integration
Yadav et al. OTUX: V-region specific OTU database for improved 16S rRNA OTU picking and efficient cross-study taxonomic comparison of microbiomes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19907270

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19907270

Country of ref document: EP

Kind code of ref document: A1