WO2020147557A1 - Method and device for processing intestinal microorganism sequencing data, storage medium, and processor - Google Patents

Method and device for processing intestinal microorganism sequencing data, storage medium, and processor Download PDF

Info

Publication number
WO2020147557A1
WO2020147557A1 PCT/CN2019/129426 CN2019129426W WO2020147557A1 WO 2020147557 A1 WO2020147557 A1 WO 2020147557A1 CN 2019129426 W CN2019129426 W CN 2019129426W WO 2020147557 A1 WO2020147557 A1 WO 2020147557A1
Authority
WO
WIPO (PCT)
Prior art keywords
intestinal
target object
antibiotic resistance
genes
function
Prior art date
Application number
PCT/CN2019/129426
Other languages
French (fr)
Chinese (zh)
Inventor
张东亚
夏慧华
张陈陈
刘洋荧
薛文斌
Original Assignee
深圳碳云智能数字生命健康管理有限公司
深圳数字生命研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳碳云智能数字生命健康管理有限公司, 深圳数字生命研究院 filed Critical 深圳碳云智能数字生命健康管理有限公司
Publication of WO2020147557A1 publication Critical patent/WO2020147557A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Definitions

  • This application relates to the field of gene sequencing data analysis, and in particular to a method, device, storage medium and processor for processing intestinal microbial sequencing data.
  • HMP Human Microbiome Project
  • MateHIT Human Intestinal Metagenomics
  • 16S rRNA (Small subunit ribosomal RNA) gene is the most commonly used molecular marker (Biomarker) in the study of phylogenetic classification of prokaryotic microorganisms, and is widely used in microbial ecology research.
  • Biomarker molecular marker
  • a large number of studies based on 16S rRNA genes have led to rapid development of microbial ecology research, and it has also been widely used in intestinal microbial research.
  • the 16S rRNA gene data analysis method also has many problems, such as horizontal gene transfer, multi-copy heterogeneity, differences in gene amplification efficiency, and the choice of data analysis methods. These problems affect the composition and diversity analysis of microbial communities. Time accuracy.
  • Metagenome also known as "metagenome” refers to the sum of the genetic material of all tiny organisms in a specific environment.
  • the metagenomic sequencing method takes the entire microbial community in a specific environment as the research object. It does not require the isolation and culture of microorganisms, but extracts the total DNA of environmental microorganisms for research, and uses a new generation of high-throughput sequencing technology to analyze the DNA of environmental microbial samples Direct sequencing. Due to the superiority of gene metagenomics in studying microbial ecology, more and more studies have adopted metagenomic gene analysis methods to study microbial ecology.
  • This application provides an intestinal microbial sequencing data processing method, device, storage medium and processor to solve the problem that the analysis result of the metagenomic gene data analysis method in related technologies is single, and the information provided is limited, which cannot meet the needs of personalized analysis The problem.
  • a method for processing intestinal microbial sequencing data includes: obtaining the sequencing data of the gut microbial flora of the target object; annotating the sequencing data according to the standard gene database to obtain the annotation result; according to the annotation result, analyzing the gut microbial flora of the target object Perform evaluation to obtain functional information of the intestinal microbes of the target object.
  • a device for processing intestinal microbial sequencing data includes: a first acquisition module, configured to acquire sequencing data of the gut microbial flora of the target object; an annotation module, configured to annotate the sequencing data according to a standard gene database to obtain an annotation result; and a second acquisition module, It is configured to evaluate the intestinal microbial flora of the target object according to the annotation result, and obtain functional information of the intestinal microbe of the target object.
  • a storage medium includes a stored program, wherein the program executes the intestinal microbial sequencing data processing method described in any one of the above.
  • a processor for running a program wherein the intestinal microbial sequencing data processing method described in any one of the above is executed when the program is running.
  • the following steps are adopted: acquiring the sequencing data of the intestinal microbial flora of the target object; annotating the sequencing data according to the standard gene database to obtain the annotation result; according to the annotation result, analyzing the intestine of the target object
  • the tract microbial flora is evaluated to obtain functional information of the intestinal microbes of the target object.
  • the above-mentioned method of the present application can not only analyze the relative abundance of genes, but also can analyze the status of the intestinal microbial flora and flora of the target object based on the relative abundance information of each gene. Evaluation, and then obtain relatively complete information about the function of the gut microbes of the target object. That is, the information provided by the above method of the present application is relatively more comprehensive and diversified, and can meet individual needs.
  • Fig. 1 is a first flow chart of a method for processing intestinal microbial sequencing data according to an embodiment of the present application
  • FIG. 2 is a second flowchart of a method for processing intestinal microbial sequencing data according to an embodiment of the present application
  • Fig. 3 is a schematic diagram of a processing device for intestinal microbial sequencing data provided according to an embodiment of the present application
  • Figure 4 is a schematic diagram of a gene abundance file in an embodiment of the present application.
  • Figure 5 is a schematic diagram of an IGC KO annotation file in an embodiment of the present application.
  • Fig. 6 is a schematic diagram of the relative abundance file of KO in an embodiment of the present application.
  • Standard gene database that is, a database that contains a large number of gene sequences and the genes corresponding to each gene sequence and the functional information related to the gene.
  • Standard gene databases include but are not limited to IGC (Integrated Gene Catalog), KEGG (Kyoto Encyclopedia of Genes and Genomes) , GO (Gene Ontology), EggNOG: (evolutionary genealogy of genes: Non-supervised Orthologous Groups), CAZy (carbohydrate-active enzymes database), ARDB (Antibiotic Resistance Genes Database) and other databases.
  • Target gene refers to the specific gene of the microorganism to which the specific gene only exists, or the sequence of the specific gene only corresponds to the sequence of the species after artificial correction.
  • sequence information to be obtained will be compared in the standard gene database to obtain the gene corresponding to each sequence information, as well as the function and biological source information of the gene.
  • Antibiotic resistance genes refer to genes that make bacteria resistant to antibiotics. Such genes are often produced through genetic mutations, that is, when humans use antibiotics to kill bacteria, they will induce bacteria to evolve resistance genes and transfer and spread in the bacterial community.
  • KEGG referred to as the Kyoto Genome Encyclopedia, contains many databases.
  • the KEGG orthology database is the most basic database.
  • KEGG Orthology is abbreviated as KO.
  • KO KEGG Orthology
  • all genes that are homologous to it are grouped into one category, that is, each KO is assigned a K number.
  • K K number
  • the function of this gene is calculated.
  • search for the homologous genes of the gene in different species search for the homologous genes of the gene in different species, and compare these homologous genes.
  • a gene is defined as an orthology, and the function of the gene is used as the function of the orthology; in this way, the research on the gene function of different species can be used to provide a comprehensive database for the study of gene function.
  • a method for processing intestinal microbial sequencing data is provided.
  • Fig. 1 is a first flowchart of a method for processing intestinal microbial sequencing data according to an embodiment of the present application. As shown in Figure 1, the method includes the following steps:
  • Step S102 Obtain sequencing data of the intestinal microflora of the target object.
  • Step S104 Annotate the sequencing data according to the standard gene database to obtain an annotation result.
  • Step S106 According to the annotation result, evaluate the intestinal microbiota of the target object to obtain functional information of the intestinal microbe of the target object.
  • the method for processing intestinal microbial sequencing data obtains the sequencing data of the intestinal microbial flora of the target object; annotates the sequencing data according to the standard gene database to obtain the annotation result; according to the annotation result, the target object
  • the intestinal microbial flora of the target object is evaluated to obtain functional information of the intestinal microbes of the target object. That is, the sequencing data is annotated through the standard gene database, and the intestinal microbial flora of the target object is evaluated based on the annotation results to obtain the functional information of the intestinal microbes of the target object, and then the intestinal microbial flora can be analyzed. And to effectively analyze the status of the flora.
  • the above-mentioned method of the present application can not only analyze the relative abundance of genes, but also analyze the status of the intestinal microbial flora and flora of the target object based on the relative abundance information of each gene. Evaluation, and then obtain relatively complete information about the function of the gut microbes of the target object. That is, the information provided by the above method of the present application is relatively more comprehensive and diversified, and can meet individual needs.
  • the functional information of the gut microbes of the target object obtained above includes the functional genes of the gut microbes and the relative abundance of each functional gene.
  • the relative abundance of functional genes is related to the same function in the sequencing data.
  • the ratio of the sum of the relative abundance of genes to the sum of the relative abundances of related genes of each function in the sequencing data, and the relative abundances of related genes belonging to the same function are obtained according to the annotation results.
  • the above-mentioned “functional gene” refers to a collection of genes related to a specific biological function, where the number of genes may be one or more, depending on the specific biological function.
  • the above-mentioned “sum of the relative abundance of genes related to each function” refers to the sum of the relative abundances of all function-related genes included in the sequencing data.
  • the method for processing gut microbial sequencing data further includes: performing performance analysis on the gut microbes of the target object based on the function information. At least one of the following is involved: antibiotic resistance gene analysis and intestinal function gene analysis.
  • the method for processing gut microbial sequencing data provided by the embodiments of the present application, after obtaining the functional information of the gut microbes of the target object, analyzes the antibiotic resistance gene of the gut microbes of the target object based on the functional information.
  • And/or performance analysis such as intestinal functional gene analysis, realizes the comprehensive and multiple analysis of individual intestinal microbes, and meets the technical effects of individual analysis requirements, thereby solving the single, single, and unique analysis result in the existing technology.
  • the information is limited and cannot meet the technical problems of individualized analysis needs.
  • the performance analysis of the gut microbes of the target object based on the functional information can be achieved through the following steps: calculate the antibiotic resistance index of the gut microbes of the target object and determine the antibiotic resistance The position of the sex index in the reference population, where the relative abundance of antibiotic resistance genes is used as the antibiotic resistance index of the antibiotic resistance.
  • the antibiotic resistance index is calculated as follows: calculate the sum of the relative abundance of the same antibiotic resistance gene of the gut microbes of the target object; compare the relative abundance of the same antibiotic resistance gene The sum of degrees is divided by the sum of the relative abundances of all antibiotic resistance genes to obtain the relative abundance of the antibiotic resistance genes.
  • the antibiotic resistance index of each antibiotic of the intestinal microbe of the target object is calculated.
  • the erythromycin resistance index of the reference population (such as 100, 200, 300, 1000 or more people ( For example, the erythromycin resistance index of the intestinal microbial test results of healthy people) are sorted from small to large to determine the number of people whose erythromycin resistance index in the reference population is less than that of the target object ( i a ), determine the position of the erythromycin resistance index of the target object in the reference population by calculating the ratio of i a to the total number of people in the reference population (s a )
  • gut microflora is considered a target object in a relatively erythromycin resistance index low; if (2) (e.g.: 0.25) ⁇ c a ⁇ c (e.g.: 0.75), gut microflora is considered a target object at the middle level erythromycin resistance index; if (3) (e.g. : 0.75) ⁇ c a ⁇ d (For example: 1), that the gut microflora of erythromycin resistance index of the target object at a high level.
  • the performance analysis of the gut microbes of the target object based on the function information may also include: determining the antibiotic resistance of the gut microbes of the target object based on the antibiotic resistance index of the target object and the position of the antibiotic resistance index in the reference population Resistance score, its antibiotic resistance score can be calculated as follows:
  • the antibiotic resistance score the first parameter (for example: 80) * antibiotic resistance index + the second parameter (for example: 40);
  • the antibiotic resistance score the third parameter (for example: 40) * antibiotic resistance index + the fourth parameter (for example: 50);
  • the antibiotic resistance score (for example: 80)*antibiotic resistance index+sixth parameter (for example: 20).
  • parameter data such as the first parameter, the second parameter, the third parameter, the fourth parameter, the fifth parameter and the sixth parameter in the above-mentioned preset test standards can be adaptively replaced based on the application scenario. No specific restrictions.
  • antibiotic resistance genes are genes related to antibiotic resistance, and the resistance of each antibiotic is related to more than one gene.
  • the following table lists 16 antibiotics and their corresponding partial gene IDs (numbers). Each gene ID corresponds to a gene sequence. See the table for details 1.
  • 16 antibiotics and the genes related to each antibiotic in the above table are only examples. In the specific implementation process, one or more of them can be selected as needed, and other related genes can also be selected.
  • the performance analysis of the gut microbes of the target object based on the function information can be implemented through the following steps: calculate the gut function index of the gut microbes of the target object and determine the intestine The position of the functional index in the reference population, where, in an optional example, the sum of the relative abundance of functional genes belonging to the same function is used as the intestinal function index of the intestinal function.
  • the intestinal function index of the intestinal energy metabolism capacity of the reference population (for example, 100, 200, 300, 1000 or More individual groups (such as healthy people) in the intestinal microbial test results in the energy metabolism capacity of the intestinal function index) are sorted from small to large, and determine that the intestinal function index in the reference population is less than the target object’s intestinal energy metabolism capacity
  • the number of people with intestinal function index (i o ) by calculating the ratio of i o to the total number of people in the reference population (s o ), determine the position of the target object’s intestinal function index in the reference population
  • the performance analysis of the gut microbes of the target object based on the function information may also include: determining the gut microbes of the target object based on the gut function index of the target object and the position of the gut function index in the reference population.
  • the score of tract function, the score of intestinal function can be calculated as follows:
  • the intestinal function score the seventh parameter (for example: 80) * the intestinal function index + the eighth parameter (for example: 40);
  • the intestinal function score the ninth parameter (for example: 40) * the intestinal function index + the tenth parameter (for example: 50);
  • the intestinal function score the eleventh parameter (for example: 80) * the intestinal function index + the twelfth parameter (for example: 20 ).
  • the seventh parameter, eighth parameter, ninth parameter, tenth parameter, eleventh parameter, and twelfth parameter in the above-mentioned preset test standards can be adaptively replaced based on application scenarios. This application does not make specific restrictions.
  • KEGG Orthology is abbreviated as KO.
  • KO KEGG Orthology
  • all genes homologous to it are classified into one category, that is, each KO is assigned a K number. Use the function of this gene as the function of this KO.
  • the function of each gene is expanded. For a gene that has a clear function in a species, search for the homologous genes of the gene in different species, and compare these homologous genes.
  • a gene is defined as an orthology, and the function of the gene is used as the function of the orthology; in this way, the research on the gene function of different species can be used to provide a comprehensive database for the study of gene function.
  • FIG. 2 is a second flowchart of the method for processing intestinal microbial sequencing data according to an embodiment of the present application. As shown in Figure 2, the method for processing gut microbial sequencing data further includes the following steps:
  • Step S108a when the performance analysis involves the analysis of antibiotic resistance genes, and the performance analysis of the gut microbes of the target object is performed based on the functional information, and the antibiotic resistance index of the gut microbes of the target object is calculated, the target The antibiotic resistance index of the gut microbes of the subject is imported into the database of the reference population for use in the next performance analysis step.
  • Step S108b when the performance analysis involves the analysis of intestinal function genes, and the performance analysis of the intestinal microbes of the target object is performed based on the function information, the intestinal function index of the intestinal microbes of the target object is calculated, and the target The gut function index of the gut microbe of the subject is imported into the database of the reference population for use in the next performance analysis step.
  • Step S108c when the sequencing data is annotated according to the standard gene database and the annotation result is obtained, the functional information of the target object (the functional genes of the gut microbes and the relative abundance of each functional gene) is imported into the database of the reference population for use In the next performance analysis step.
  • the evaluation results are added to the database of the reference population, so that the reference range of each indicator can be updated in real time.
  • the “database of the reference population” in this application refers to a database that contains genetic information of gut microbes of multiple individuals.
  • the genetic information of gut microbes includes functional genes of gut microbes (functional genes include antibiotics). Resistance genes, intestinal function genes, etc.), the relative abundance of each functional gene, antibiotic resistance index, intestinal function index, etc.
  • the database may also include the volunteer’s gender, age, height, weight, living habits and Geographical information.
  • the processing method of the gut microbial sequencing data of this application will also be based on the phenotypic characteristics of the individual to be tested (including gender, age, race, height, and weight). , Diet, living area, etc.) Select the corresponding reference population for specific intestinal microbial flora evaluation and analysis, thereby making the evaluation results more accurate and reliable.
  • the target number of initial reference objects are initially stored in the database, and the database specifically records the phenotype information of each initial reference object (including gender, age, race, height, weight, diet, living area, etc.) ) And the evaluation information of the intestinal microbial flora (including antibiotic resistance gene analysis and intestinal functional gene analysis, etc., and can also further include the relative abundance information of each intestinal microbe).
  • step S102 of the method for processing intestinal microbial sequencing data of this application is also necessary to describe step S102 of the method for processing intestinal microbial sequencing data of this application:
  • obtaining the sequencing data of the gut microbial flora of the target object in step S102 can be achieved in the following manner:
  • Step A1 Perform genetic sequencing on the gut microbes of the target object to obtain the original sequencing data of the gut microbe flora of the target object (raw reads, usually in fasq format, the fastq file contains the quality information of all bases in the sequencing sequence) ;
  • Step A2 the quality of the original sequencing data is monitored, that is, the number of fuzzy bases N in the original sequencing data is greater than a preset value (the preset value can be 3, 4, 5 or more, specifically It can be reasonably adjusted according to actual application conditions) to remove reads, and to remove low-quality reads in the original sequencing data (for example, continuous bases with a quality value less than a specific value at the end of the reads can be removed according to the quality information in the reads.
  • the specific value at can be 20, 25, 30 or other higher values, which can be adjusted reasonably according to actual needs; further, reads whose lengths are less than a specific length after excluding the above-mentioned consecutive bases are excluded.
  • the length can be 30bp, 35bp or longer, which can be adjusted based on application scenarios.
  • Existing software with the above functions can be used to remove low-quality reads, such as fastx software);
  • Step A3 Remove the host gene sequence from the original gene data to obtain the sequencing data of the gut microbial flora of the target object, where the host gene sequence is the gene sequence of the target object.
  • the software used in this step can use soap software.
  • step S104 of the method for processing intestinal microbial sequencing data of this application needs to be described:
  • step S104 annotates the sequencing data according to the standard gene database, and obtaining the annotation result can be achieved in the following manner:
  • Step B1 Compare the reads (gene sequences) in the sequencing data to a standard gene database (for example: the integrated gene set IGC of the human gut microbial metagenomics), and determine the relative abundance of each gene sequence contained in the sequencing data ( For example: determine the gene abundance file corresponding to the sequencing data, where the file contains two columns of data, the data in the right column is the gene ID, and the data in the left column is the relative abundance of genes corresponding to the gene ID in turn);
  • a standard gene database for example: the integrated gene set IGC of the human gut microbial metagenomics
  • Step B2 based on the annotation information of each gene sequence recorded in the standard gene database (annotation information includes: the resistance gene information corresponding to each antibiotic, which can form an antibiotic resistance gene annotation file), and each type contained in the sequencing data
  • annotation information includes: the resistance gene information corresponding to each antibiotic, which can form an antibiotic resistance gene annotation file
  • each type contained in the sequencing data The relative abundance of the gene sequence, determine the relative abundance of each antibiotic resistance gene sequence contained in the sequencing data, (for example, determine the corresponding antibiotic resistance gene abundance file in the sequencing data, where the file contains two columns of data, The data in the left column of the file is the information of various antibiotic resistance genes, and the data in the right column is the relative abundance of the resistance genes on the left.
  • the specific calculation method can be, for example, according to the gene abundance file of step B1 and step B2 Add the relative abundance of genes belonging to the same antibiotic resistance gene in the annotation file of antibiotic resistance genes generated in the sample, and then divide by the sum of the relative abundances of all antibiotic resistance genes in the sample. The result is this Relative abundance of antibiotic resistance genes);
  • Step B3 based on the annotation information of each gene sequence recorded in the standard gene database (the annotation information includes: the gene information corresponding to each biological function, wherein the number of genes corresponding to each biological function is one or more, Depending on the specific biological function, the number of genes also differs), and the relative abundance of each gene sequence contained in the sequencing data, determine the relative abundance of each biological function gene contained in the sequencing data, (e.g. : Determine the biological function gene abundance file corresponding to the sequencing data. The file contains two columns of data.
  • the data in the left column of the file is the gene information corresponding to various biological functions
  • the data in the right column is the various organisms on the left
  • the relative abundance of the genes corresponding to the scientific function in turn, the specific calculation method can be, for example, adding the relative abundance of genes belonging to the same function, and then dividing the relative abundance of each function-related gene by The sum of the relative abundance of related genes corresponding to all the functions of the sample is the relative abundance of each functional gene).
  • Remove host reads (remove host sequence), use soap software to compare reads and host sequence, and remove reads whose sequence can be compared to host sequence.
  • Mapped to IGC compare with IGC
  • Function annotation function annotation
  • use soap software to compare the processed reads sequence with the IGC sequence, and get the target's gut microbial gene abundance (gene abundance)
  • the gene abundance file (as shown in Figure 4) consists of two columns of data, the left column is the gene ID, the right column is the gene abundance corresponding to the gene ID;
  • the IGC KO annotation file (such as Figure 5) consists of two columns of data, the left column is the gene ID, and the right column is the KO information corresponding to the gene ID.
  • the gene abundance (gene abundance) that can be annotated to the same KO is added and calculated, and then normalized to calculate the abundance (relative abundance) of each KO, and output the relative abundance file of each KO ( Figure 6)
  • the file consists of two columns of data, the left column is the KO number (each KO number corresponds to one KO related information), and the right column is the relative abundance value of the KO corresponding to the KO number. That is, according to the input gene abundance file and KO annotation file, the relative abundance of genes belonging to the same KO is added, and then divided by the sum of all KO gene abundances in the sample for normalization, and the result is The relative abundance of KO.
  • This antibiotic resistance index analysis analyzes the resistance index of Cefoxintin and Netilmicin of the gut microbes of the target object A. Among them, Cefoxintin and Netilmicin ) The corresponding Gene ID in IGC is shown in Table 3 below:
  • Netilmicin (netilmicin) resistance index 0.000601
  • the reference population has currently included 346 individuals with an antibiotic resistance index.
  • the resistance index of Cefoxintin (cefoxitin) 0 is sorted from small to large in the reference population, ranking 0th, Netilmicin (netilmicin)
  • the detection level is determined as:
  • the Netilmicin resistance index of the microbial flora is at a low level.
  • the Netilmicin resistance gene level score of the target object A is the Netilmicin resistance gene level score of the target object A:
  • This intestinal function analysis analyzes the short-chain fatty acids (short-chain fatty acid synthesis ability) and Bile Salt Hydrolase (bile salt hydrolysis ability) of the intestinal microbes of the target object A.
  • Short-chain fatty acids short-chain fatty acid synthesis ability
  • the KO and Gene ID corresponding to Bile Salt Hydrolase in IGC are shown in Table 5 below:
  • the sum of the relative abundance of KO involved in each function is the index of the function, so the index of Short-chain fatty acids (short-chain fatty acid synthesis capacity) is 0.00198, and the index of Bile Salt Hydrolase (bile salt hydrolysis capacity) Is 0.000588.
  • the reference population has currently included 346 individual antibiotic resistance indexes.
  • the set detection value level evaluation is:
  • This application analyzes the data generated by metagenomic sequencing. Compared with the traditional 16SrRNA gene data analysis method, it can detect a more comprehensive content of the human intestinal microflora. Among them, the detection content not only includes the intestinal tract The species of microorganisms also contains information on the function of intestinal microorganisms.
  • This technical solution can obtain intestinal microbial function information of the detected target object, including: the functional genes of the intestinal microorganisms, the relative abundance of each specific functional gene, and the relative abundance of each specific functional gene in the reference population Location (the relative abundance of antibiotic resistance genes, the relative abundance of intestinal function genes, and their relative abundance respectively in the reference population), so that all the intestinal microbes and specific microbes can be fully evaluated.
  • the status of human intestinal microbes enables comprehensive and diverse analysis of intestinal microbes and individualized analysis.
  • the embodiment of the present application also provides a device for processing gut microbial sequencing data. It should be noted that the device for processing gut microbial sequencing data in the embodiment of the present application can be used to execute the intestinal microbial sequencing data processing device provided by the embodiment of the present application. Ways of processing microbial sequencing data. The processing device is introduced below.
  • Fig. 3 is a schematic diagram of a processing device for intestinal microbial sequencing data according to an embodiment of the present application. As shown in FIG. 3, the device includes: a first acquisition module 31, an annotation module 33, and a second acquisition module 35.
  • the first obtaining module 31 is configured to obtain sequencing data of the intestinal microbial flora of the target object
  • the annotation module 33 is configured to annotate the sequencing data according to the standard gene database to obtain the annotation result;
  • the second acquisition module 35 is configured to evaluate the intestinal microbial flora of the target object according to the annotation results, and obtain functional information of the intestinal microbes of the target object.
  • the sequencing data of the gut microbial flora of the target object is obtained through the first obtaining module 31; and then the annotation module 33 is executed to annotate the sequencing data according to the standard gene database to obtain Annotation results; finally the second acquisition module 35 is executed.
  • the intestinal microbial flora of the target object is evaluated, and the functional information of the intestinal microbes of the target object is obtained.
  • the microbial flora is evaluated, so as to obtain the flora information of the intestinal microbes of the target object, and then to achieve the technical effect of effective analysis of the intestinal microflora and the state of the flora.
  • the processing device of this embodiment provides relatively more comprehensive and diversified information, and thus can meet individual needs.
  • the functional information of the gut microbes of the target object includes the functional genes of the gut microbes and the relative abundance of each functional gene.
  • the function The relative abundance of genes is the ratio of the sum of the relative abundances of genes belonging to the same function in the sequencing data to the sum of the relative abundances of all genes in the sequencing data.
  • the relative abundance of genes belonging to the same function is based on the annotation The result is obtained.
  • the above-mentioned “functional gene” refers to a collection of genes related to a specific biological function, where the number of genes may be one or more, depending on the specific biological function.
  • the above-mentioned “sum of the relative abundance of genes related to each function” refers to the sum of the relative abundances of all function-related genes included in the sequencing data.
  • the device for processing gut microbial sequencing data provided in the embodiment of the present application further includes: a performance analysis module configured to perform performance analysis on the gut microbes of the target object based on the function information, and the performance analysis involves at least one of the following: Antibiotic resistance gene analysis and intestinal function gene analysis.
  • the processing device including the above-mentioned analysis module after obtaining the functional information of the gut microbes of the target object through the second acquisition module, can also perform antibiotic resistance gene analysis on the gut microbes of the target object based on the functional information through the performance analysis module And/or performance analysis such as intestinal functional gene analysis, realizes the comprehensive and multiple analysis of individual intestinal microbes, and meets the technical effects of individual analysis requirements, thereby solving the single, single, and unique analysis result in the existing technology.
  • the information is limited and cannot meet the technical problems of individualized analysis needs.
  • the performance analysis module includes: a first calculation module configured to calculate the target object The antibiotic resistance index of intestinal microbes, the first position determination module is set to determine the position of the antibiotic resistance index in the reference population; when the performance analysis involves intestinal functional gene analysis, the performance analysis module includes: second calculation The module is set to calculate the intestinal function index of the intestinal microbes of the target object, and the second position determination module is set to determine the position of the intestinal function index in the reference population.
  • the step of "analyzing the intestinal microbes of the target object for antibiotic resistance genes" performed by the performance analysis module is illustrated as an example:
  • the first calculation module calculates the antibiotic resistance index of each antibiotic of the gut microbe of the target object.
  • the first position determination module determines the position of the antibiotic resistance index of each antibiotic in the reference population.
  • the erythromycin resistance index of the reference population such as 100, 200, 300, 1000 or The erythromycin resistance index of the intestinal microbial test results of more individual groups (such as healthy people) is sorted from small to large, and it is determined that the erythromycin resistance index in the reference population is less than that of the target object.
  • the number of people with sex index (i a ), the position of the target object’s erythromycin resistance index in the reference population is determined by calculating the ratio of i a to the total number of people in the reference population (s a )
  • gut microflora is considered a target object in a relatively erythromycin resistance index low; if (2) (e.g.: 0.25) ⁇ c a ⁇ c (e.g.: 0.75), gut microflora is considered a target object at the middle level erythromycin resistance index; if (3) (e.g. : 0.75) ⁇ c a ⁇ d (For example: 1), that the gut microflora of erythromycin resistance index of the target object at a high level.
  • the performance analysis module determines the antibiotic resistance index of the target object and the position of the antibiotic resistance index in the reference population based on the first position determination module, it may also include an antibiotic resistance scoring module: the antibiotic resistance scoring module is used To determine the antibiotic resistance score of the gut microbes of the target object, the antibiotic resistance score module calculates the antibiotic resistance score by executing the following method:
  • the antibiotic resistance score the first parameter (for example: 80) * antibiotic resistance index + the second parameter (for example: 40);
  • the antibiotic resistance score the third parameter (for example: 40) * antibiotic resistance index + the fourth parameter (for example: 50);
  • the antibiotic resistance score (for example: 80)*antibiotic resistance index+sixth parameter (for example: 20).
  • parameter data such as the first parameter, the second parameter, the third parameter, the fourth parameter, the fifth parameter and the sixth parameter in the above-mentioned preset test standards can be adaptively replaced based on the application scenario. No specific restrictions.
  • Table 1 exemplarily lists 16 antibiotics and their corresponding partial gene IDs (numbers), and each gene ID corresponds to a gene sequence. In actual research applications, one or more of them can be selected according to different research directions or purposes, or other related genes can be selected.
  • the performance analysis module performs "intestinal functional gene analysis of the intestinal microbes of the target object" for example:
  • the second calculation module calculates the intestinal function index of each function of the gut microbe of the target object, that is, the sum of the relative abundances of functional genes related to the same function is used as the intestinal function index corresponding to the function.
  • the second position determination module determines the position of the intestinal function index corresponding to each function in the reference population.
  • the intestinal function index of the intestinal energy metabolism capacity of the reference population for example, 100, 200 , 300, 1000 or more people (such as healthy people) in the intestinal microbial test results in the intestinal function index of energy metabolism
  • determine the intestinal function index in the reference population is smaller than the target object
  • the number of intestinal function index of intestinal energy metabolism capacity (i o ) by calculating the ratio of i o to the total number of people in the reference population (s o ), determine the intestinal function index of intestinal energy metabolism capacity of the target object in the reference population position
  • the performance analysis module may further include an intestinal function scoring module after determining the intestinal function index of the target object and the position of the intestinal function index in the reference crowd based on the second calculation module and the second position determining module, respectively,
  • the intestinal function score module is used to determine the intestinal function score of the intestinal microbes of the target object.
  • the intestinal function score module can calculate the intestinal function score by executing the following methods:
  • the intestinal function score the seventh parameter (for example: 80) * the intestinal function index + the eighth parameter (for example: 40);
  • the intestinal function score the ninth parameter (for example: 40) * the intestinal function index + the tenth parameter (for example: 50);
  • the intestinal function score the eleventh parameter (for example: 80) * the intestinal function index + the twelfth parameter (for example: 20 ).
  • the seventh parameter, eighth parameter, ninth parameter, tenth parameter, eleventh parameter, and twelfth parameter in the above-mentioned preset test standards can be adaptively replaced based on application scenarios. This application does not make specific restrictions.
  • Table 2 exemplarily lists 9 functions. In specific applications, one or more of them can be selected according to needs, and other functions of interest and related genes can also be selected.
  • the first calculation module includes: a first summation unit, which is set to measure the relative abundance of antibiotic resistance genes when the antibiotic resistance index is In this case, the first summation unit is used to calculate the sum of the relative abundances of the same antibiotic resistance gene of the gut microbes of the target object; the division unit is set to the sum of the relative abundances of the same antibiotic resistance gene Divide by the sum of the relative abundances of all antibiotic resistance genes to obtain the relative abundance of antibiotic resistance genes; the second calculation module includes: the second summation unit, which is set to compare the relative abundance of functional genes belonging to the same intestinal function The abundance is added to obtain the intestinal function index.
  • the performance analysis module further includes: a first import module configured to analyze the antibiotic resistance gene when the performance analysis is involved, and the first calculation module calculates When the antibiotic resistance index of the gut microbes of the target object is obtained, the antibiotic resistance index of the gut microbes of the target object is imported into the database of the reference population for use in the next performance analysis step; the second import module is set to In the case that the performance analysis involves intestinal function gene analysis, and the second calculation module calculates the intestinal function index of the intestinal microbes of the target object, the intestinal function index of the intestinal microbes of the target object is imported into the database of the reference population. Used in the next performance analysis step.
  • the processing device can evaluate the intestinal microflora of each target object and obtain functional information of the target object’s intestinal microbes. , It can also add the evaluation result (functional information of the gut microbes of the target object, the functional information includes the functional genes of the gut microbes and the relative abundance of each functional gene) to the database of the reference population stored in the processing device , In order to update the reference range of each indicator in real time.
  • the processing device for intestinal microbial sequencing data of this application With the use of the processing device for intestinal microbial sequencing data of this application, more and more individuals participate in the evaluation of the intestinal microbial flora, and the scale of the reference population stored in the database will continue to expand, thereby making the processing device more effective The results of microbial flora evaluation will also become more and more accurate, and the reference value of the processing device for the evaluation of intestinal microflora is also increasing.
  • the processing device of this application will also determine the phenotypic characteristics of the individual to be tested (including gender, age, race, height, weight, diet, living area). Etc.) Select the corresponding reference population for specific intestinal microbial flora assessment and analysis, thereby making the assessment results more accurate and reliable.
  • the target number of initial reference objects are initially stored in the database of the processing device, and the database specifically records the phenotype information of each initial reference object (including gender, age, race, height, weight, diet). , Living area, etc.) and the evaluation information of the intestinal microbial flora (including antibiotic resistance gene analysis and intestinal function gene analysis, etc., and may further include the relative abundance information of each intestinal microbe species, etc.).
  • the first obtaining module 31 obtains the sequencing data of the gut microbial flora of the target object by performing the following steps:
  • Step A1 Perform genetic sequencing on the gut microbes of the target object to obtain the original sequencing data of the gut microbe flora of the target object (raw reads, usually in fasq format, the fastq file contains the quality information of all bases in the sequencing sequence) ;
  • Step A2 the quality of the original sequencing data is monitored, that is, the number of fuzzy bases N in the original sequencing data is greater than a preset value (the preset value can be 3, 4, 5 or more, specifically It can be reasonably adjusted according to actual application conditions) to remove reads, and to remove low-quality reads in the original sequencing data (for example, continuous bases with a quality value less than a specific value at the end of the reads can be removed according to the quality information in the reads.
  • the specific value at can be 20, 25, 30 or other higher values, which can be adjusted reasonably according to actual needs; further, reads whose lengths are less than a specific length after excluding the above-mentioned consecutive bases are excluded.
  • the length can be 30bp, 35bp or longer, which can be adjusted based on application scenarios.
  • Existing software with the above functions can be used to remove low-quality reads, such as fastx software);
  • Step A3 Remove the host gene sequence from the original gene data to obtain the sequencing data of the gut microbial flora of the target object, where the host gene sequence is the gene sequence of the target object.
  • the software used in this step can use soap software.
  • annotation module 33 can be implemented by performing the following steps:
  • Step B1 Compare the reads (gene sequences) in the sequencing data to a standard gene database (for example: the integrated gene set IGC of the human gut microbial metagenomics), and determine the relative abundance of each gene sequence contained in the sequencing data ( For example: determine the gene abundance file corresponding to the sequencing data, where the file contains two columns of data, the data in the right column is the gene ID, and the data in the left column is the relative abundance of genes corresponding to the gene ID in turn);
  • a standard gene database for example: the integrated gene set IGC of the human gut microbial metagenomics
  • Step B2 based on the annotation information of each gene sequence recorded in the standard gene database (annotation information includes: the resistance gene information corresponding to each antibiotic, which can form an antibiotic resistance gene annotation file), and each type contained in the sequencing data
  • annotation information includes: the resistance gene information corresponding to each antibiotic, which can form an antibiotic resistance gene annotation file
  • each type contained in the sequencing data The relative abundance of the gene sequence, determine the relative abundance of each antibiotic resistance gene sequence contained in the sequencing data, (for example: determine the corresponding antibiotic resistance gene abundance file in the sequencing data, where the file contains two columns of data , The data in the left column is the information of various antibiotic resistance genes, and the data in the right column is the relative abundance corresponding to the resistance genes on the left.
  • the specific calculation method can be: according to the gene abundance file in step B1 and the generation in B2 Add the relative abundance of genes belonging to the same antibiotic resistance gene, and then divide by the sum of the relative abundances of all antibiotic resistance genes in the sample. The result is the antibiotic resistance Relative abundance of sex genes);
  • Step B3 based on the annotation information of each gene sequence recorded in the standard gene database (the annotation information includes: the gene information corresponding to each biological function, wherein the number of genes corresponding to each biological function is one or more According to the specific biological function, the number of genes also differs), and the relative abundance of each gene sequence contained in the sequencing data, determine the relative abundance of each biological function gene contained in the sequencing data, ( For example: to determine the biological function gene abundance file corresponding to the sequencing data, where the file contains two columns of data, the left column of data is the gene information corresponding to various biological functions, and the right column of data is the various organisms on the left).
  • the specific calculation method can be, for example, adding the relative abundance of genes belonging to the same function, and then dividing the relative abundance of each function-related gene by The sum of the relative abundance of related genes corresponding to all the functions of the sample is the relative abundance of each functional gene).
  • the apparatus for processing gut microbial sequencing data of the present application includes a processor and a memory.
  • the above-mentioned first acquisition module 31, annotation module 33, and second acquisition module 35 are all stored as program units in the memory, and are executed by the processor and stored in the memory.
  • the above-mentioned program unit in to realize the corresponding function.
  • the processor contains a kernel, which calls the corresponding program unit from the memory.
  • One or more kernels can be set, and the intestinal microbial flora and the status of the flora can be effectively and comprehensively analyzed by adjusting the kernel parameters.
  • the memory may include non-permanent memory in computer-readable media, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM), and the memory includes at least one Memory chip.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash random access memory
  • the embodiment of the present application provides a storage medium on which a program is stored, and when the program is executed by a processor, a method for processing gut microbial sequencing data is realized.
  • the embodiment of the present application provides a processor, which is used to run a program, wherein the intestinal microbial sequencing data processing method is executed when the program is running.
  • the embodiment of the present application provides a device, which includes a processor, a memory, and a program stored on the memory and running on the processor.
  • the processor executes the program to implement the following steps: obtain the intestinal microflora of the target object Sequencing data: Annotate the sequencing data according to the standard gene database to obtain the annotation results; according to the annotation results, evaluate the gut microbiota of the target object to obtain functional information of the gut microbe of the target object.
  • the functional information of the gut microbes of the target object includes the functional genes of the gut microbes and the relative abundance of each functional gene, where the relative abundance of the functional gene is the relative abundance of genes belonging to the same function in the sequencing data The ratio of the sum of and the sum of the relative abundances of all functional genes in the sequencing data. The relative abundances of genes belonging to the same function are obtained from the annotation results.
  • the method further includes: performing performance analysis on the gut microbes of the target object based on the function information, and the performance analysis involves at least one of the following: antibiotic resistance gene analysis and Intestinal function gene analysis.
  • the performance analysis of the gut microbes of the target object based on the functional information includes: calculating the antibiotic resistance index of the gut microbes of the target object and determining the antibiotic resistance The position of the index in the reference population; in the case that the performance analysis involves intestinal function gene analysis, the performance analysis of the gut microbes of the target object based on the function information includes: calculating the gut function index of the gut microbes of the target object, Determine the position of the intestinal function index in the reference population.
  • the antibiotic resistance index is the relative abundance of antibiotic resistance genes
  • the relative abundance of antibiotic resistance genes is calculated according to the following method: Calculate the relative abundance of the same kind of antibiotic resistance genes in the gut microbes of the target object And; divide the sum of the relative abundances of the same antibiotic resistance gene by the sum of the relative abundances of all antibiotic resistance genes to get the relative abundance of antibiotic resistance genes
  • the intestinal function index is calculated by the following method: belong to the same The sum of the relative abundance of the functional genes of the intestinal function is the intestinal function index.
  • the performance analysis involves the analysis of antibiotic resistance genes, and the performance analysis of the gut microbes of the target object is performed based on the functional information, and the antibiotic resistance index of the gut microbes of the target object is calculated
  • it also includes The antibiotic resistance index of the gut microbes of the target object is imported into the database of the reference population for the next performance analysis step;
  • the performance analysis involves the analysis of the gut function genes, and the performance of the gut microbes of the target object is performed based on the functional information
  • the intestinal function index of the intestinal microbes of the target object is obtained by analysis and calculation, it also includes importing the intestinal function index of the intestinal microbes of the target object into the database of the reference population for use in the next performance analysis step.
  • the devices in this article can be servers, PCs, PADs, mobile phones, etc.
  • This application also provides a computer program product, which when executed on a data processing device, is suitable for executing a program that initializes the following method steps: acquiring the sequencing data of the gut microbial flora of the target object; sequencing according to the standard gene database The data is annotated to obtain the annotation result; according to the annotation result, the gut microbial flora of the target object is evaluated to obtain the functional information of the gut microbe of the target object.
  • the functional information of the gut microbes of the target object includes the functional genes of the gut microbes and the relative abundance of each functional gene, where the relative abundance of the functional gene is the relative abundance of genes belonging to the same function in the sequencing data The ratio of the sum of and the sum of the relative abundances of all functional genes in the sequencing data. The relative abundances of genes belonging to the same function are obtained from the annotation results.
  • the method further includes: performing performance analysis on the gut microbes of the target object based on the function information, and the performance analysis involves at least one of the following: antibiotic resistance gene analysis and Intestinal function gene analysis.
  • the performance analysis of the gut microbes of the target object based on the functional information includes: calculating the antibiotic resistance index of the gut microbes of the target object and determining the antibiotic resistance The position of the index in the reference population; in the case that the performance analysis involves intestinal function gene analysis, the performance analysis of the gut microbes of the target object based on the function information includes: calculating the gut function index of the gut microbes of the target object, Determine the position of the intestinal function index in the reference population.
  • the antibiotic resistance index is the relative abundance of antibiotic resistance genes
  • the relative abundance of antibiotic resistance genes is calculated according to the following method: Calculate the relative abundance of the same kind of antibiotic resistance genes in the gut microbes of the target object And; divide the sum of the relative abundances of the same antibiotic resistance gene by the sum of the relative abundances of all antibiotic resistance genes to get the relative abundance of antibiotic resistance genes
  • the intestinal function index is calculated by the following method: belong to the same The sum of the relative abundance of the functional genes of the intestinal function is the intestinal function index.
  • the performance analysis involves the analysis of antibiotic resistance genes, and the performance analysis of the gut microbes of the target object is performed based on the functional information, and the antibiotic resistance index of the gut microbes of the target object is calculated, it also includes The antibiotic resistance index of the gut microbes of the target object is imported into the database of the reference population for the next performance analysis step; the performance analysis involves the analysis of gut function genes, and the performance analysis of the gut microbes of the target object is performed based on the functional information In the case where the intestinal function index of the intestinal microbe of the target object is calculated, the intestinal function index of the intestinal microbe of the target object is also imported into the database of the reference population for use in the next performance analysis step.
  • the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
  • computer usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to generate computer-implemented processing, which is executed on the computer or other programmable device
  • the instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
  • the computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.
  • processors CPU
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-permanent memory in a computer readable medium, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory (flash RAM).
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • Computer-readable media including permanent and non-permanent, removable and non-removable media, can store information by any method or technology.
  • the information can be computer-readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.
  • computer-readable media does not include temporary computer-readable media (transitory media), such as modulated data signals and carrier waves.
  • the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
  • computer usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the following steps are adopted: acquiring the sequencing data of the intestinal microbial flora of the target object; annotating the sequencing data according to the standard gene database to obtain the annotation result; according to the annotation result, analyzing the intestine of the target object
  • the tract microbial flora is evaluated to obtain functional information of the intestinal microbes of the target object.
  • the above-mentioned method of the present application can not only analyze the relative abundance of genes, but also can analyze the status of the intestinal microbial flora and flora of the target object based on the relative abundance information of each gene. Evaluation, and then obtain relatively complete information about the function of the gut microbes of the target object. That is, the information provided by the above method of the present application is relatively more comprehensive and diversified, and can meet individual needs.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A method and device for processing intestinal microorganism sequencing data, a processor, and a memory. The method comprises: obtaining sequencing data of intestinal microbial flora of a target object (S102); annotating the sequencing data according to a standard gene database to obtain an annotation result (S104); evaluating the intestinal microbial flora of the target object according to the annotation result to obtain functional information of intestinal microorganisms of the target object (S106). A relatively comprehensive and diverse analysis of intestinal microorganisms of an individual is implemented by using a metagenome gene analysis method, and the method can better satisfy the needs of personalized analysis compared with existing methods.

Description

肠道微生物测序数据处理方法、装置、存储介质及处理器Intestinal microbial sequencing data processing method, device, storage medium and processor 技术领域Technical field
本申请涉及基因测序数据分析领域,具体而言,涉及一种肠道微生物测序数据处理方法、装置、存储介质及处理器。This application relates to the field of gene sequencing data analysis, and in particular to a method, device, storage medium and processor for processing intestinal microbial sequencing data.
背景技术Background technique
随着人类微生物组计划(HMP)和人类肠道宏基因组学(MateHIT)项目的开展,越来越多的研究表明,人体的生理代谢和生长发育不仅受自身基因控制,有许多现象,如对疾病的易感性、药物反应等,无法全部用人体基因的差异来解释。这是因为,人体内生活着大量微生物,它们的组成和活动与人的生长发育、生老病死息息相关。With the development of the Human Microbiome Project (HMP) and Human Intestinal Metagenomics (MateHIT) projects, more and more studies have shown that the human body’s physiological metabolism and growth are not only controlled by its own genes, but there are many phenomena, such as disease The susceptibility, drug reaction, etc., cannot all be explained by differences in human genes. This is because a large number of microorganisms live in the human body, and their composition and activities are closely related to human growth and development, birth, aging, sickness and death.
16S rRNA(Small subunit ribosomal RNA)基因是对原核微生物进行系统进化分类研究时最常用的分子标记物(Biomarker),广泛应用于微生物生态学研究中。近些年来随着高通量测序技术及数据分析方法等的不断进步,大量基于16S rRNA基因的研究使得微生物生态学研究得到了快速的发展,在肠道微生物研究方面也得到广泛的应用,然而使用16S rRNA基因数据分析法也存在诸多问题,比如水平基因转移、多拷贝的异质性、基因扩增效率的差异、数据分析方法的选择等,这些问题都影响了微生物群落组成和多样性分析时的准确性。16S rRNA (Small subunit ribosomal RNA) gene is the most commonly used molecular marker (Biomarker) in the study of phylogenetic classification of prokaryotic microorganisms, and is widely used in microbial ecology research. In recent years, with the continuous progress of high-throughput sequencing technology and data analysis methods, a large number of studies based on 16S rRNA genes have led to rapid development of microbial ecology research, and it has also been widely used in intestinal microbial research. However, The 16S rRNA gene data analysis method also has many problems, such as horizontal gene transfer, multi-copy heterogeneity, differences in gene amplification efficiency, and the choice of data analysis methods. These problems affect the composition and diversity analysis of microbial communities. Time accuracy.
宏基因组(Metagenome),又称“元基因组”,是指某个特定环境中全部微小生物遗传物质的总和。宏基因组的测序方法以特定环境中的整个微生物群落作为研究的对象,不需要对微生物进行分离培养,而是提取环境微生物总DNA进行研究,采用新一代高通量测序技术对环境微生物样本的DNA直接测序。由于基因宏基因组研究微生物生态的优越性,越来越多的研究采用宏基因组基因分析方法研究微生物生态。Metagenome, also known as "metagenome", refers to the sum of the genetic material of all tiny organisms in a specific environment. The metagenomic sequencing method takes the entire microbial community in a specific environment as the research object. It does not require the isolation and culture of microorganisms, but extracts the total DNA of environmental microorganisms for research, and uses a new generation of high-throughput sequencing technology to analyze the DNA of environmental microbial samples Direct sequencing. Due to the superiority of gene metagenomics in studying microbial ecology, more and more studies have adopted metagenomic gene analysis methods to study microbial ecology.
然而,目前通过采用宏基因组的基因分析方法仅进行基因相对丰度分析,分析结果单一,因而提供的信息有限,不能满足个性化分析的需求。However, at present, only the relative abundance of genes is analyzed by using metagenomic gene analysis methods, and the analysis results are single, so the information provided is limited and cannot meet the needs of personalized analysis.
发明内容Summary of the invention
本申请提供一种肠道微生物测序数据处理方法、装置、存储介质及处理器,以解决相关技术中采用宏基因组基因数据分析方法的分析结果单一,提供的信息有限,不能满足个性化分析的需求的问题。This application provides an intestinal microbial sequencing data processing method, device, storage medium and processor to solve the problem that the analysis result of the metagenomic gene data analysis method in related technologies is single, and the information provided is limited, which cannot meet the needs of personalized analysis The problem.
根据本申请的一个方面,提供了一种肠道微生物测序数据的处理方法。该方法包 括:获取目标对象的肠道微生物菌群的测序数据;根据标准基因数据库对所述测序数据进行注释,得到注释结果;根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的功能信息。According to one aspect of the present application, a method for processing intestinal microbial sequencing data is provided. The method includes: obtaining the sequencing data of the gut microbial flora of the target object; annotating the sequencing data according to the standard gene database to obtain the annotation result; according to the annotation result, analyzing the gut microbial flora of the target object Perform evaluation to obtain functional information of the intestinal microbes of the target object.
根据本申请的另一方面,提供了一种肠道微生物测序数据的处理装置。该装置包括:第一获取模块,设置为获取目标对象的肠道微生物菌群的测序数据;注释模块,设置为根据标准基因数据库对所述测序数据进行注释,得到注释结果;第二获取模块,设置为根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的功能信息。According to another aspect of the present application, a device for processing intestinal microbial sequencing data is provided. The device includes: a first acquisition module, configured to acquire sequencing data of the gut microbial flora of the target object; an annotation module, configured to annotate the sequencing data according to a standard gene database to obtain an annotation result; and a second acquisition module, It is configured to evaluate the intestinal microbial flora of the target object according to the annotation result, and obtain functional information of the intestinal microbe of the target object.
根据本申请的另一方面,提供了一种存储介质,所述存储介质包括存储的程序,其中,所述程序执行上述任意一项所述的肠道微生物测序数据处理方法。According to another aspect of the present application, a storage medium is provided, the storage medium includes a stored program, wherein the program executes the intestinal microbial sequencing data processing method described in any one of the above.
根据本申请的另一方面,提供了一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行上述任意一项所述的肠道微生物测序数据处理方法。According to another aspect of the present application, there is provided a processor for running a program, wherein the intestinal microbial sequencing data processing method described in any one of the above is executed when the program is running.
通过本申请,采用以下步骤:获取目标对象的肠道微生物菌群的测序数据;根据标准基因数据库对所述测序数据进行注释,得到注释结果;根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的功能信息。与相关现有技术相比,本申请的上述方法不仅能够对基因的相对丰度进行分析,而且能够根据各基因的相对丰度信息对目标对象的肠道微生物的菌群及菌群的状态进行评估,进而相对完整地获得目标对象的肠道微生物的功能信息。即,本申请的上述方法所提供的信息相对更全面,更多元化,能够满足个性化的需求。Through this application, the following steps are adopted: acquiring the sequencing data of the intestinal microbial flora of the target object; annotating the sequencing data according to the standard gene database to obtain the annotation result; according to the annotation result, analyzing the intestine of the target object The tract microbial flora is evaluated to obtain functional information of the intestinal microbes of the target object. Compared with the related prior art, the above-mentioned method of the present application can not only analyze the relative abundance of genes, but also can analyze the status of the intestinal microbial flora and flora of the target object based on the relative abundance information of each gene. Evaluation, and then obtain relatively complete information about the function of the gut microbes of the target object. That is, the information provided by the above method of the present application is relatively more comprehensive and diversified, and can meet individual needs.
附图说明BRIEF DESCRIPTION
构成本申请的一部分的附图用来提供对本申请的进一步理解,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings forming a part of the application are used to provide a further understanding of the application. The schematic embodiments and descriptions of the application are used to explain the application, and do not constitute an undue limitation on the application. In the drawings:
图1是根据本申请实施例提供的肠道微生物测序数据的处理方法的流程图一;Fig. 1 is a first flow chart of a method for processing intestinal microbial sequencing data according to an embodiment of the present application;
图2是根据本申请实施例提供的肠道微生物测序数据的处理方法的流程图二;2 is a second flowchart of a method for processing intestinal microbial sequencing data according to an embodiment of the present application;
图3是根据本申请实施例提供的肠道微生物测序数据的处理装置的示意图;Fig. 3 is a schematic diagram of a processing device for intestinal microbial sequencing data provided according to an embodiment of the present application;
图4是本申请一实施例中的基因丰度文件示意图;Figure 4 is a schematic diagram of a gene abundance file in an embodiment of the present application;
图5是本申请一实施例中的IGC的KO注释文件示意图;Figure 5 is a schematic diagram of an IGC KO annotation file in an embodiment of the present application;
图6是本申请一实施例中的KO的相对丰度文件示意图。Fig. 6 is a schematic diagram of the relative abundance file of KO in an embodiment of the present application.
具体实施方式detailed description
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that the embodiments in the present application and the features in the embodiments can be combined with each other if there is no conflict. The application will be described in detail below with reference to the drawings and in conjunction with the embodiments.
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only It is a part of the embodiments of this application, but not all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work should fall within the protection scope of this application.
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms “first” and “second” in the description and claims of the present application and the above drawings are used to distinguish similar objects, and do not have to be used to describe a specific order or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances for the purposes of the embodiments of the present application described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, for example, processes, methods, systems, products or devices that contain a series of steps or units need not be limited to those clearly listed Those steps or units may instead include other steps or units that are not explicitly listed or inherent to these processes, methods, products, or equipment.
为了便于描述,以下对本申请实施例涉及的部分名词或术语进行说明:For ease of description, some terms or terms involved in the embodiments of this application are described below:
标准基因数据库,即含有大量的基因序列以及各基因序列对应的基因和该基因相关的功能信息的数据库,标准基因数据库包括但不限于IGC(Integrated Gene Catalog)、KEGG(Kyoto Encyclopedia of Genes and Genomes)、GO(Gene Ontology)、EggNOG:(evolutionary genealogy of genes:Non-supervised Orthologous Groups)、CAZy(carbohydrate-active enzymes database),ARDB(Antibiotic Resistance Genes Database)等数据库。Standard gene database, that is, a database that contains a large number of gene sequences and the genes corresponding to each gene sequence and the functional information related to the gene. Standard gene databases include but are not limited to IGC (Integrated Gene Catalog), KEGG (Kyoto Encyclopedia of Genes and Genomes) , GO (Gene Ontology), EggNOG: (evolutionary genealogy of genes: Non-supervised Orthologous Groups), CAZy (carbohydrate-active enzymes database), ARDB (Antibiotic Resistance Genes Database) and other databases.
目标基因:指所属微生物的特异基因,该特异基因仅在该微生物中存在,或该特异基因的序列经人工矫正后仅与该物种的序列对应上。Target gene: refers to the specific gene of the microorganism to which the specific gene only exists, or the sequence of the specific gene only corresponds to the sequence of the species after artificial correction.
注释,即将获取的序列信息在标准基因数据库中进行比对,得到各序列信息对应的基因,以及该基因的功能和生物来源信息。Note: The sequence information to be obtained will be compared in the standard gene database to obtain the gene corresponding to each sequence information, as well as the function and biological source information of the gene.
抗生素抗性基因,是指使细菌对抗生素产生抗药性有关的基因。这类基因往往通过基因突变产生,即在人类使用抗生素杀灭细菌的同时,会诱导细菌进化出抗药性基因,并在细菌群落中转移扩散。Antibiotic resistance genes refer to genes that make bacteria resistant to antibiotics. Such genes are often produced through genetic mutations, that is, when humans use antibiotics to kill bacteria, they will induce bacteria to evolve resistance genes and transfer and spread in the bacterial community.
KEGG,简称京都基因组百科全书,包含了许多的数据库,对于研究基因功能来说,KEGG orthology数据库是最基本的一个数据库。KEGG, referred to as the Kyoto Genome Encyclopedia, contains many databases. For the study of gene functions, the KEGG orthology database is the most basic database.
KEGG Orthology简称KO,对于每个功能已知的基因,会把和其同源的基因所有基因都归为一类,就是每一个KO,并赋予一个K number。用该基因的功能作为这个KO的功能。基于同源基因具有相似功能的假设,把每个基因的功能进行了扩充,对于某个物种中功能研究的很清楚的基因,在不同的物种间搜寻该基因的同源基因,将这些同源基因定义为一个orthology,用该基因的功能作为该orthology的功能;这样就将对于不同物种基因功能的研究都利用起来,提供了一个全面的研究基因功能的数据库。KEGG Orthology is abbreviated as KO. For each gene with a known function, all genes that are homologous to it are grouped into one category, that is, each KO is assigned a K number. Use the function of this gene as the function of this KO. Based on the assumption that homologous genes have similar functions, the function of each gene has been expanded. For a gene whose function is clearly studied in a species, search for the homologous genes of the gene in different species, and compare these homologous genes. A gene is defined as an orthology, and the function of the gene is used as the function of the orthology; in this way, the research on the gene function of different species can be used to provide a comprehensive database for the study of gene function.
根据本申请的实施例,提供了一种肠道微生物测序数据的处理方法。According to an embodiment of the present application, a method for processing intestinal microbial sequencing data is provided.
图1是根据本申请实施例的肠道微生物测序数据的处理方法的流程图一。如图1所示,该方法包括以下步骤:Fig. 1 is a first flowchart of a method for processing intestinal microbial sequencing data according to an embodiment of the present application. As shown in Figure 1, the method includes the following steps:
步骤S102,获取目标对象的肠道微生物菌群的测序数据。Step S102: Obtain sequencing data of the intestinal microflora of the target object.
步骤S104,根据标准基因数据库对所述测序数据进行注释,得到注释结果。Step S104: Annotate the sequencing data according to the standard gene database to obtain an annotation result.
步骤S106,根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的功能信息。Step S106: According to the annotation result, evaluate the intestinal microbiota of the target object to obtain functional information of the intestinal microbe of the target object.
本申请实施例提供的肠道微生物测序数据的处理方法,通过获取目标对象的肠道微生物菌群的测序数据;根据标准基因数据库对测序数据进行注释,得到注释结果;根据注释结果,对目标对象的肠道微生物菌群进行评估,获得目标对象的肠道微生物的功能信息。也即,通过标准基因数据库对测序数据进行注释,并依据注释结果对目标对象的肠道微生物菌群进行评估,以获得目标对象的肠道微生物的功能信息,进而能够实现对肠道微生物菌群和对菌群的状态进行有效分析。The method for processing intestinal microbial sequencing data provided by the embodiments of the present application obtains the sequencing data of the intestinal microbial flora of the target object; annotates the sequencing data according to the standard gene database to obtain the annotation result; according to the annotation result, the target object The intestinal microbial flora of the target object is evaluated to obtain functional information of the intestinal microbes of the target object. That is, the sequencing data is annotated through the standard gene database, and the intestinal microbial flora of the target object is evaluated based on the annotation results to obtain the functional information of the intestinal microbes of the target object, and then the intestinal microbial flora can be analyzed. And to effectively analyze the status of the flora.
因而,与相关技术相比,本申请的上述方法不仅能够对基因的相对丰度进行分析,而且能够根据各基因的相对丰度信息对目标对象的肠道微生物的菌群及菌群的状态进行评估,进而相对完整地获得目标对象的肠道微生物的功能信息。即,本申请的上述方法所提供的信息相对更全面,更多元化,能够满足个性化的需求。Therefore, compared with related technologies, the above-mentioned method of the present application can not only analyze the relative abundance of genes, but also analyze the status of the intestinal microbial flora and flora of the target object based on the relative abundance information of each gene. Evaluation, and then obtain relatively complete information about the function of the gut microbes of the target object. That is, the information provided by the above method of the present application is relatively more comprehensive and diversified, and can meet individual needs.
需要说明的是:上述所获得目标对象的肠道微生物的功能信息包括肠道微生物的功能基因及各功能基因的相对丰度,其中,功能基因的相对丰度为测序数据中属于同一功能的相关的基因的相对丰度的加和与测序数据中各功能的相关的基因的相对丰度的加和的比值,属于同一功能的相关的基因的相对丰度根据注释结果获得。It should be noted that: the functional information of the gut microbes of the target object obtained above includes the functional genes of the gut microbes and the relative abundance of each functional gene. Among them, the relative abundance of functional genes is related to the same function in the sequencing data. The ratio of the sum of the relative abundance of genes to the sum of the relative abundances of related genes of each function in the sequencing data, and the relative abundances of related genes belonging to the same function are obtained according to the annotation results.
上述“功能基因”指的是与某一特定生物学功能相关的基因的集合,此处基因的数量可能是1个或多个,根据具体生物学功能的不同而不同。上述“各功能的相关的基因的相对丰度的加和”指的是测序数据中所涵盖的所有功能相关的基因的相对丰度 之和。The above-mentioned "functional gene" refers to a collection of genes related to a specific biological function, where the number of genes may be one or more, depending on the specific biological function. The above-mentioned "sum of the relative abundance of genes related to each function" refers to the sum of the relative abundances of all function-related genes included in the sequencing data.
进一步地,在获得目标对象的肠道微生物的功能信息之后,本申请实施例提供的肠道微生物测序数据的处理方法还包括:基于功能信息对目标对象的肠道微生物的进行性能分析,性能分析涉及如下至少之一:抗生素抗性基因分析及肠道功能基因分析。Further, after the functional information of the gut microbes of the target object is obtained, the method for processing gut microbial sequencing data provided in the embodiments of the present application further includes: performing performance analysis on the gut microbes of the target object based on the function information. At least one of the following is involved: antibiotic resistance gene analysis and intestinal function gene analysis.
也即,本申请实施例提供的肠道微生物测序数据的处理方法,通过在获得目标对象的肠道微生物的功能信息之后,还基于功能信息对目标对象的肠道微生物的进行抗生素抗性基因分析和/或肠道功能基因分析等性能分析,实现了对个体肠道微生物进行全面、多元的分析,并满足个性化分析需求的技术效果,从而解决了现有技术中所存在的分析结果单一、信息有限,无法满足个性化分析需求的技术问题。That is, the method for processing gut microbial sequencing data provided by the embodiments of the present application, after obtaining the functional information of the gut microbes of the target object, analyzes the antibiotic resistance gene of the gut microbes of the target object based on the functional information. And/or performance analysis such as intestinal functional gene analysis, realizes the comprehensive and multiple analysis of individual intestinal microbes, and meets the technical effects of individual analysis requirements, thereby solving the single, single, and unique analysis result in the existing technology. The information is limited and cannot meet the technical problems of individualized analysis needs.
针对上述性能分析涉及抗生素抗性基因分析的情况,基于功能信息对目标对象的肠道微生物的进行性能分析可以通过以下步骤得以实现:计算目标对象的肠道微生物的抗生素抗性指数,确定抗生素抗性指数在参考人群中的位置,其中,将抗生素抗性基因的相对丰度作为该种抗生素抗性的抗生素抗性指数。In view of the above-mentioned performance analysis involving the analysis of antibiotic resistance genes, the performance analysis of the gut microbes of the target object based on the functional information can be achieved through the following steps: calculate the antibiotic resistance index of the gut microbes of the target object and determine the antibiotic resistance The position of the sex index in the reference population, where the relative abundance of antibiotic resistance genes is used as the antibiotic resistance index of the antibiotic resistance.
其中,在一个可选的示例中,抗生素抗性指数按照如下方法计算:计算目标对象的肠道微生物的同一种抗生素抗性基因的相对丰度之和;将同一种抗生素抗性基因的相对丰度之和除以所有抗生素抗性基因的相对丰度之和,得到该种抗生素抗性基因的相对丰度。Among them, in an optional example, the antibiotic resistance index is calculated as follows: calculate the sum of the relative abundance of the same antibiotic resistance gene of the gut microbes of the target object; compare the relative abundance of the same antibiotic resistance gene The sum of degrees is divided by the sum of the relative abundances of all antibiotic resistance genes to obtain the relative abundance of the antibiotic resistance genes.
针对“对目标对象的肠道微生物的进行抗生素抗性基因分析”举例示意:Examples for "Analysis of Antibiotic Resistance Genes of the Intestinal Microbes of Target Objects":
首先,计算目标对象的肠道微生物的每种抗生素的抗生素抗性指数。First, the antibiotic resistance index of each antibiotic of the intestinal microbe of the target object is calculated.
其次,确定每种抗生素的抗生素抗性指数在参考人群中的位置,以红霉素为例,将参考人群的红霉素抗性指数(比如100、200、300、1000或更多个人群(如健康人群)的肠道微生物检测结果的红霉素抗性指数)按照从小到大的顺序进行排序,确定参考人群中红霉素抗性指数小于目标对象的红霉素抗性指数的人数(i a),通过计算i a与参考人群总人数(s a)的比例确定目标对象的红霉素抗性指数在参考人群中的位置
Figure PCTCN2019129426-appb-000001
Second, determine the position of the antibiotic resistance index of each antibiotic in the reference population. Taking erythromycin as an example, the erythromycin resistance index of the reference population (such as 100, 200, 300, 1000 or more people ( For example, the erythromycin resistance index of the intestinal microbial test results of healthy people) are sorted from small to large to determine the number of people whose erythromycin resistance index in the reference population is less than that of the target object ( i a ), determine the position of the erythromycin resistance index of the target object in the reference population by calculating the ratio of i a to the total number of people in the reference population (s a )
Figure PCTCN2019129426-appb-000001
以某种预设的测试标准为例,若(1)(例如:0)≤c a<b(例如:0.25),则认为目标对象的肠道微生物菌群的红霉素抗性指数处于较低水平;若(2)(例如:0.25)≤c a<c(例如:0.75),则认为目标对象的肠道微生物菌群的红霉素抗性指数处于中等水平;若(3)(例如:0.75)≤c a<d(例如:1),则认为目标对象的肠道微生物菌群的红霉素抗性指数处于较高水平。 In some predetermined test criteria, for example, if (1) (e.g.: 0) ≤c a <b (e.g.: 0.25), gut microflora is considered a target object in a relatively erythromycin resistance index low; if (2) (e.g.: 0.25) ≤c a <c (e.g.: 0.75), gut microflora is considered a target object at the middle level erythromycin resistance index; if (3) (e.g. : 0.75) ≤c a <d (For example: 1), that the gut microflora of erythromycin resistance index of the target object at a high level.
此外,基于功能信息对目标对象的肠道微生物的进行性能分析还可以包括:基于目标对象的抗生素抗性指数和该抗生素抗性指数在参考人群中的位置,确定目标对象的肠道微生物的抗生素抗性得分,其抗生素抗性得分可以按照如下方法计算:In addition, the performance analysis of the gut microbes of the target object based on the function information may also include: determining the antibiotic resistance of the gut microbes of the target object based on the antibiotic resistance index of the target object and the position of the antibiotic resistance index in the reference population Resistance score, its antibiotic resistance score can be calculated as follows:
在目标对象的肠道微生物菌群的抗生素抗性指数处于较低水平的情况下,抗生素抗性得分=第一参数(例如:80)*抗生素抗性指数+第二参数(例如:40);In the case where the antibiotic resistance index of the intestinal microflora of the target object is at a low level, the antibiotic resistance score = the first parameter (for example: 80) * antibiotic resistance index + the second parameter (for example: 40);
在目标对象的肠道微生物菌群的抗生素抗性指数处于中等水平的情况下,抗生素抗性得分=第三参数(例如:40)*抗生素抗性指数+第四参数(例如:50);In the case that the antibiotic resistance index of the intestinal microflora of the target object is at a medium level, the antibiotic resistance score = the third parameter (for example: 40) * antibiotic resistance index + the fourth parameter (for example: 50);
在目标对象的肠道微生物菌群的抗生素抗性指数处于较高水平的情况下,抗生素抗性得分=第五参数(例如:80)*抗生素抗性指数+第六参数(例如:20)。In the case where the antibiotic resistance index of the intestinal microflora of the target object is at a high level, the antibiotic resistance score=fifth parameter (for example: 80)*antibiotic resistance index+sixth parameter (for example: 20).
需要说明的是:上述预设的测试标准中的第一参数、第二参数、第三参数、第四参数、第五参数和第六参数等参数数据,可以基于应用场景适应性替换,本申请不做具体限定。It should be noted that the parameter data such as the first parameter, the second parameter, the third parameter, the fourth parameter, the fifth parameter and the sixth parameter in the above-mentioned preset test standards can be adaptively replaced based on the application scenario. No specific restrictions.
还需要说明的是,抗生素抗性基因,即为抗生素抗性相关的基因,每个抗生素的抗性与一个以上的基因相关。上述“对目标对象的肠道微生物的进行抗生素抗性基因分析”中,下表列举了16种抗生素及其分别对应的部分基因ID(编号),每个基因ID对应一段基因序列,详情见表1。It should also be noted that antibiotic resistance genes are genes related to antibiotic resistance, and the resistance of each antibiotic is related to more than one gene. In the above-mentioned "Analysis of Antibiotic Resistance Genes of Intestinal Microbes of Target Objects", the following table lists 16 antibiotics and their corresponding partial gene IDs (numbers). Each gene ID corresponds to a gene sequence. See the table for details 1.
表1抗生素类型Table 1 Types of antibiotics
Figure PCTCN2019129426-appb-000002
Figure PCTCN2019129426-appb-000002
Figure PCTCN2019129426-appb-000003
Figure PCTCN2019129426-appb-000003
需要说明的是,上表中的16种抗生素及各抗生素相关的基因仅为示例,在具体的实施过程中可根据需要选择其中的一种或多种,也可选择其他相关的基因。It should be noted that the 16 antibiotics and the genes related to each antibiotic in the above table are only examples. In the specific implementation process, one or more of them can be selected as needed, and other related genes can also be selected.
针对上述性能分析涉及肠道功能基因分析的情况,基于功能信息对目标对象的肠道微生物的进行性能分析可以通过以下步骤得以实施:计算目标对象的肠道微生物的肠道功能指数,确定肠道功能指数在参考人群中的位置,其中,在一个可选的示例中,将属于同一种功能的功能基因的相对丰度的加和作为该肠道功能的肠道功能指数。In view of the above-mentioned performance analysis involving intestinal function gene analysis, the performance analysis of the gut microbes of the target object based on the function information can be implemented through the following steps: calculate the gut function index of the gut microbes of the target object and determine the intestine The position of the functional index in the reference population, where, in an optional example, the sum of the relative abundance of functional genes belonging to the same function is used as the intestinal function index of the intestinal function.
针对“对目标对象的肠道微生物的进行肠道功能基因分析”举例示意:An example of "Analysis of Intestinal Function Genes of Intestinal Microbes of Target Objects":
首先,计算目标对象的肠道微生物每种功能的肠道功能指数,即将属于同一种功能相关的功能基因的相对丰度的加和作为该种功能对应的肠道功能指数。First, calculate the intestinal function index for each function of the gut microbe of the target object, that is, the sum of the relative abundance of functional genes that belong to the same function as the intestinal function index corresponding to the function.
其次,确定每种功能对应的肠道功能指数在参考人群中的位置,以能量代谢能力为例,将参考人群肠道能量代谢能力的肠道功能指数(比如,100、200、300、1000或更多个人群(如健康人群)在肠道微生物检测结果中的能量代谢能力的肠道功能指数)按照从小到大顺序进行排序,确定参考人群中肠道功能指数小于目标对象肠道能量代谢能力的肠道功能指数的人数(i o),通过计算i o与参考人群总人数(s o)的比例确定目标对象肠道能量代谢能力的肠道功能指数在参考人群中的位置
Figure PCTCN2019129426-appb-000004
Secondly, determine the position of the intestinal function index corresponding to each function in the reference population. Taking energy metabolism capacity as an example, the intestinal function index of the intestinal energy metabolism capacity of the reference population (for example, 100, 200, 300, 1000 or More individual groups (such as healthy people) in the intestinal microbial test results in the energy metabolism capacity of the intestinal function index) are sorted from small to large, and determine that the intestinal function index in the reference population is less than the target object’s intestinal energy metabolism capacity The number of people with intestinal function index (i o ), by calculating the ratio of i o to the total number of people in the reference population (s o ), determine the position of the target object’s intestinal function index in the reference population
Figure PCTCN2019129426-appb-000004
以某种预设的测试标准为例,若(1)(例如:0)≤c o<b(例如:0.25),则认为目标对象肠道能量代谢能力的肠道功能指数处于较低水平;若(2)(例如:0.25)≤c o<c(例如:0.75),则认为目标对象肠道能量代谢能力的肠道功能指数处于中等水平。若(3)(例如:0.75)≤c o<d(例如:1),则认为目标对象肠道能量代谢能力的肠道功能指数处于较高水平。 Taking a certain preset test standard as an example, if (1) (for example: 0) ≤ c o <b (for example: 0.25), the intestinal function index of the target object's intestinal energy metabolism ability is considered to be at a low level; If (2) (for example: 0.25)≤c o <c (for example: 0.75), the intestinal function index of the intestinal energy metabolism capacity of the target object is considered to be at a medium level. If (3) (for example: 0.75) ≤ c o <d (for example: 1), the intestinal function index of the intestinal energy metabolism capacity of the target object is considered to be at a higher level.
此外,基于功能信息对目标对象的肠道微生物的进行性能分析还可以包括:基于目标对象的肠道功能指数和该肠道功能指数在参考人群中的位置,确定目标对象的肠道微生物的肠道功能得分,其肠道功能得分可以按照如下方法计算:In addition, the performance analysis of the gut microbes of the target object based on the function information may also include: determining the gut microbes of the target object based on the gut function index of the target object and the position of the gut function index in the reference population. The score of tract function, the score of intestinal function can be calculated as follows:
在目标对象的肠道微生物菌群的肠道功能指数处于较低水平的情况下,肠道功能 得分=第七参数(例如:80)*肠道功能指数+第八参数(例如:40);In the case where the intestinal function index of the intestinal microflora of the target object is at a low level, the intestinal function score = the seventh parameter (for example: 80) * the intestinal function index + the eighth parameter (for example: 40);
在目标对象的肠道微生物菌群的肠道功能指数处于适中水平的情况下,肠道功能得分=第九参数(例如:40)*肠道功能指数+第十参数(例如:50);When the intestinal function index of the intestinal microflora of the target object is at a moderate level, the intestinal function score = the ninth parameter (for example: 40) * the intestinal function index + the tenth parameter (for example: 50);
在目标对象的肠道微生物菌群的肠道功能指数处于较高水平的情况下,肠道功能得分=第十一参数(例如:80)*肠道功能指数+第十二参数(例如:20)。When the intestinal function index of the intestinal microflora of the target object is at a high level, the intestinal function score = the eleventh parameter (for example: 80) * the intestinal function index + the twelfth parameter (for example: 20 ).
需要说明的是:上述预设的测试标准中的第七参数、第八参数、第九参数、第十参数、第十一参数和第十二参数等参数数据,可以基于应用场景适应性替换,本申请不做具体限定。It should be noted that: the seventh parameter, eighth parameter, ninth parameter, tenth parameter, eleventh parameter, and twelfth parameter in the above-mentioned preset test standards can be adaptively replaced based on application scenarios. This application does not make specific restrictions.
还需要说明的是,KEGG Orthology简称KO,对于每个功能已知的基因,会把和其同源的所有基因都归为一类,就是每一个KO,并赋予一个K number。用该基因的功能作为这个KO的功能。基于同源基因具有相似功能的假设,把每个基因的功能进行了扩充,对于某个物种中功能研究的很清楚的基因,在不同的物种间搜寻该基因的同源基因,将这些同源基因定义为一个orthology,用该基因的功能作为该orthology的功能;这样就将对于不同物种基因功能的研究都利用起来,提供了一个全面的研究基因功能的数据库。对上述“对目标对象的肠道微生物的进行肠道功能分析”中,下表列举了9种肠道功能分析的肠道功能及其分别对应的部分KO集合和每个KO对应的部分基因编号(Gene ID),每个基因编号对应一段基因序列,详情见表2。It should also be noted that KEGG Orthology is abbreviated as KO. For each gene with a known function, all genes homologous to it are classified into one category, that is, each KO is assigned a K number. Use the function of this gene as the function of this KO. Based on the hypothesis that homologous genes have similar functions, the function of each gene is expanded. For a gene that has a clear function in a species, search for the homologous genes of the gene in different species, and compare these homologous genes. A gene is defined as an orthology, and the function of the gene is used as the function of the orthology; in this way, the research on the gene function of different species can be used to provide a comprehensive database for the study of gene function. In the above "Intestinal function analysis of intestinal microbes of the target object", the following table lists 9 types of intestinal function analysis of intestinal functions and their corresponding partial KO sets and partial gene numbers corresponding to each KO (Gene ID), each gene number corresponds to a gene sequence, see Table 2 for details.
表2:Table 2:
Figure PCTCN2019129426-appb-000005
Figure PCTCN2019129426-appb-000005
Figure PCTCN2019129426-appb-000006
Figure PCTCN2019129426-appb-000006
Figure PCTCN2019129426-appb-000007
Figure PCTCN2019129426-appb-000007
需要说明的是,上表中的9种功能仅为示例,在具体的实施过程中可根据需要选择其中的一种或多种,也可选择其他感兴趣的功能及其相关基因。It should be noted that the 9 functions in the above table are only examples. In the specific implementation process, one or more of them can be selected according to needs, and other functions of interest and related genes can also be selected.
进一步地,图2是根据本申请实施例的肠道微生物测序数据的处理方法的流程图二。如图2所示,该肠道微生物测序数据的处理方法还包括以下步骤:Further, FIG. 2 is a second flowchart of the method for processing intestinal microbial sequencing data according to an embodiment of the present application. As shown in Figure 2, the method for processing gut microbial sequencing data further includes the following steps:
步骤S108a,在性能分析涉及抗生素抗性基因分析,且基于功能信息对目标对象的肠道微生物的进行性能分析,计算得到目标对象的肠道微生物的抗生素抗性指数的情况下,还包括将目标对象的肠道微生物的抗生素抗性指数导入参考人群的数据库以用于下一次性能分析步骤中。Step S108a, when the performance analysis involves the analysis of antibiotic resistance genes, and the performance analysis of the gut microbes of the target object is performed based on the functional information, and the antibiotic resistance index of the gut microbes of the target object is calculated, the target The antibiotic resistance index of the gut microbes of the subject is imported into the database of the reference population for use in the next performance analysis step.
步骤S108b,在性能分析涉及肠道功能基因分析,且基于功能信息对目标对象的肠道微生物的进行性能分析,计算得到目标对象的肠道微生物的肠道功能指数的情况下,还包括将目标对象的肠道微生物的肠道功能指数导入参考人群的数据库以用于下一次性能分析步骤中。Step S108b, when the performance analysis involves the analysis of intestinal function genes, and the performance analysis of the intestinal microbes of the target object is performed based on the function information, the intestinal function index of the intestinal microbes of the target object is calculated, and the target The gut function index of the gut microbe of the subject is imported into the database of the reference population for use in the next performance analysis step.
步骤S108c,在根据标准基因数据库对测序数据进行注释,得到注释结果的情况下,将目标对象的功能信息(肠道微生物的功能基因及各功能基因的相对丰度)导入参考人群的数据库以用于下一次性能分析步骤中。Step S108c, when the sequencing data is annotated according to the standard gene database and the annotation result is obtained, the functional information of the target object (the functional genes of the gut microbes and the relative abundance of each functional gene) is imported into the database of the reference population for use In the next performance analysis step.
也即,在对每个目标对象的肠道微生物菌群进行评估,获得目标对象的肠道微生物的功能信息之后,还会将其评估结果(目标对象的肠道微生物的功能信息,该功能信息包括肠道微生物的功能基因及各功能基因的相对丰度)添加至参考人群的数据库中,以便对每项指标的参考范围进行实时更新。That is, after evaluating the gut microbial flora of each target object and obtaining functional information of the gut microbes of the target object, the evaluation results (functional information of the gut microbes of the target object, the functional information Including the functional genes of intestinal microbes and the relative abundance of each functional gene) are added to the database of the reference population, so that the reference range of each indicator can be updated in real time.
需要说明的是,本申请中的“参考人群的数据库”是指收录了多个个体的肠道微生物基因信息的数据库,其中的肠道微生物基因信息包括肠道微生物的功能基因(功能基因包括抗生素抗性基因、肠道功能基因等),各功能基因的相对丰度、抗生素抗性指数、肠道功能指数等,该数据库还可以包含如志愿者的性别、年龄、身高、体重、生活习惯及地域等信息。It should be noted that the "database of the reference population" in this application refers to a database that contains genetic information of gut microbes of multiple individuals. The genetic information of gut microbes includes functional genes of gut microbes (functional genes include antibiotics). Resistance genes, intestinal function genes, etc.), the relative abundance of each functional gene, antibiotic resistance index, intestinal function index, etc. The database may also include the volunteer’s gender, age, height, weight, living habits and Geographical information.
随着使用本申请肠道微生物测序数据的处理方法,进行肠道微生物菌群评估的参 与个体越来越多,数据库中存储的参考人群的规模将不断的扩大,进而使得肠道微生物菌群评估的结果也会越来越准确,基于肠道微生物测序数据处理方法的肠道微生物菌群评估的参考价值也越来越大。With the application of the method for processing gut microbial sequencing data in this application, more and more individuals are involved in the evaluation of the gut microbial flora, and the scale of the reference population stored in the database will continue to expand, thereby enabling the evaluation of the gut microbiota The results will become more and more accurate, and the reference value of intestinal microbial flora assessment based on intestinal microbial sequencing data processing methods will also increase.
最后,当数据库中存储的参考人群数达到一定的丰富程度的时候,本申请肠道微生物测序数据的处理方法还会根据待测个体的表型特征(包括性别,年龄,人种,身高、体重、饮食、居住区域等)选取相应的参考人群进行具体肠道微生物菌群评估分析,进而使得评估结果更加精准、可靠。Finally, when the number of reference populations stored in the database reaches a certain degree of abundance, the processing method of the gut microbial sequencing data of this application will also be based on the phenotypic characteristics of the individual to be tested (including gender, age, race, height, and weight). , Diet, living area, etc.) Select the corresponding reference population for specific intestinal microbial flora evaluation and analysis, thereby making the evaluation results more accurate and reliable.
还需要说明的是:数据库中最初存储有目标数量个初始参考对象,其数据库具体记录有每个初始参考对象的表型信息(包括性别、年龄、人种、身高、体重、饮食、居住区域等)和肠道微生物菌群的评估信息(包括抗生素抗性基因分析及肠道功能基因分析等,还可以进一步包括各肠道微生物的物种相对丰度信息等)。It should also be noted that the target number of initial reference objects are initially stored in the database, and the database specifically records the phenotype information of each initial reference object (including gender, age, race, height, weight, diet, living area, etc.) ) And the evaluation information of the intestinal microbial flora (including antibiotic resistance gene analysis and intestinal functional gene analysis, etc., and can also further include the relative abundance information of each intestinal microbe).
此外,还需要针对本申请肠道微生物测序数据的处理方法的步骤S102进行说明:In addition, it is also necessary to describe step S102 of the method for processing intestinal microbial sequencing data of this application:
在一个可选的示例中,步骤S102获取目标对象的肠道微生物菌群的测序数据可以通过如下方式实现:In an optional example, obtaining the sequencing data of the gut microbial flora of the target object in step S102 can be achieved in the following manner:
步骤A1,对目标对象的肠道微生物进行基因测序,获取目标对象的肠道微生物菌群的原始测序数据(raw reads,通常为fasq格式,fastq文件中含有测序序列中所有碱基的质量信息);Step A1: Perform genetic sequencing on the gut microbes of the target object to obtain the original sequencing data of the gut microbe flora of the target object (raw reads, usually in fasq format, the fastq file contains the quality information of all bases in the sequencing sequence) ;
步骤A2,对该原始测序数据进行质量监控,即,将原始测序数据中模糊碱基N的数量大于预设数值(该预设数值可以是3个、4个、5个或更多个,具体可以根据实际应用情况而进行合理调整)的reads剔除,以及将原始测序数据中的低质量reads剔除(比如,可以根据reads中的质量信息将reads末尾质量值小于特定值的连续碱基剔除,此处的特定值可以是20、25、30或其他更高的数值,具体可根据实际需求合理调整;进一步地,在剔除上述连续碱基后的reads长度小于特定长度的reads剔除,此处的特定长度可以是30bp、35bp或者更长,具体可基于应用场景适应性调整。去除低质量reads可以选用现有具有上述功能的软件,比如可以是fastx软件);Step A2, the quality of the original sequencing data is monitored, that is, the number of fuzzy bases N in the original sequencing data is greater than a preset value (the preset value can be 3, 4, 5 or more, specifically It can be reasonably adjusted according to actual application conditions) to remove reads, and to remove low-quality reads in the original sequencing data (for example, continuous bases with a quality value less than a specific value at the end of the reads can be removed according to the quality information in the reads. The specific value at can be 20, 25, 30 or other higher values, which can be adjusted reasonably according to actual needs; further, reads whose lengths are less than a specific length after excluding the above-mentioned consecutive bases are excluded. The length can be 30bp, 35bp or longer, which can be adjusted based on application scenarios. Existing software with the above functions can be used to remove low-quality reads, such as fastx software);
步骤A3,将原始基因数据中的寄主基因序列剔除,得到目标对象的肠道微生物菌群的测序数据,其中,寄主基因序列为目标对象的基因序列。该步骤所采用的软件可以使用soap软件。Step A3: Remove the host gene sequence from the original gene data to obtain the sequencing data of the gut microbial flora of the target object, where the host gene sequence is the gene sequence of the target object. The software used in this step can use soap software.
此外,还需要针对本申请肠道微生物测序数据的处理方法的步骤S104进行说明:In addition, step S104 of the method for processing intestinal microbial sequencing data of this application needs to be described:
在一个可选的示例中,步骤S104根据标准基因数据库对测序数据进行注释,得到注释结果可以通过如下方式实现:In an optional example, step S104 annotates the sequencing data according to the standard gene database, and obtaining the annotation result can be achieved in the following manner:
步骤B1,将测序数据中的reads(即基因序列)对比到标准基因数据库(例如:人肠道微生物宏基因组的整合基因集IGC),确定测序数据中包含的每种基因序列的相对丰度(例如:确定测序数据对应的基因丰度文件,其中,该文件包含两列数据,右侧一列数据为基因ID,左侧一列数据为右侧基因ID依次对应的基因相对丰度);Step B1: Compare the reads (gene sequences) in the sequencing data to a standard gene database (for example: the integrated gene set IGC of the human gut microbial metagenomics), and determine the relative abundance of each gene sequence contained in the sequencing data ( For example: determine the gene abundance file corresponding to the sequencing data, where the file contains two columns of data, the data in the right column is the gene ID, and the data in the left column is the relative abundance of genes corresponding to the gene ID in turn);
步骤B2,基于标准基因数据库中记载的每种基因序列的注释信息(注释信息包含:每种抗生素对应的抗性基因信息,可形成抗生素抗性基因注释文件),和测序数据中包含的每种基因序列的相对丰度,确定测序数据中包含的每种抗生素抗性基因序列的相对丰度,(例如:确定测序数据中对应的抗生素抗性基因丰度文件,其中该文件包含两列数据,该文件左侧一列数据为各种抗生素抗性基因信息,右侧一列数据为左侧抗性基因依次对应的相对丰度,具体计算方法比如可以是:按照步骤B1的基因丰度文件和步骤B2中产生的抗生素抗性基因注释文件,将属于同一个抗生素抗性基因的基因相对丰度进行加和,然后除以样本中所有抗生素抗性基因的相对丰度之和,得到的结果即为这个抗生素抗性基因的相对丰度);Step B2, based on the annotation information of each gene sequence recorded in the standard gene database (annotation information includes: the resistance gene information corresponding to each antibiotic, which can form an antibiotic resistance gene annotation file), and each type contained in the sequencing data The relative abundance of the gene sequence, determine the relative abundance of each antibiotic resistance gene sequence contained in the sequencing data, (for example, determine the corresponding antibiotic resistance gene abundance file in the sequencing data, where the file contains two columns of data, The data in the left column of the file is the information of various antibiotic resistance genes, and the data in the right column is the relative abundance of the resistance genes on the left. The specific calculation method can be, for example, according to the gene abundance file of step B1 and step B2 Add the relative abundance of genes belonging to the same antibiotic resistance gene in the annotation file of antibiotic resistance genes generated in the sample, and then divide by the sum of the relative abundances of all antibiotic resistance genes in the sample. The result is this Relative abundance of antibiotic resistance genes);
步骤B3,基于标准基因数据库中记载的每种基因序列的注释信息(注释信息包含:每种生物学功能对应的基因信息,其中,每种生物学功能对应基因的数量为1个或多个,根据具体生物学功能的不同,基因的数量也存在差异),和测序数据中包含的每种基因序列的相对丰度,确定测序数据中包含的每种生物学功能基因的相对丰度,(例如:确定测序数据对应的生物学功能基因丰度文件,其中该文件包含两列数据,该文件左侧一列数据为各种生物学功能所对应的基因信息,右侧一列数据为左侧各种生物学功能所对应的基因依次对应的相对丰度,具体计算方法比如,可以是将属于同一种功能的基因的相对丰度进行加和,然后将每一种功能相关的基因的相对丰度除以样本所有功能所对应的相关的基因的相对丰度之和,即为每一种功能基因的相对丰度)。Step B3, based on the annotation information of each gene sequence recorded in the standard gene database (the annotation information includes: the gene information corresponding to each biological function, wherein the number of genes corresponding to each biological function is one or more, Depending on the specific biological function, the number of genes also differs), and the relative abundance of each gene sequence contained in the sequencing data, determine the relative abundance of each biological function gene contained in the sequencing data, (e.g. : Determine the biological function gene abundance file corresponding to the sequencing data. The file contains two columns of data. The data in the left column of the file is the gene information corresponding to various biological functions, and the data in the right column is the various organisms on the left The relative abundance of the genes corresponding to the scientific function in turn, the specific calculation method can be, for example, adding the relative abundance of genes belonging to the same function, and then dividing the relative abundance of each function-related gene by The sum of the relative abundance of related genes corresponding to all the functions of the sample is the relative abundance of each functional gene).
实施例一Example one
(一)获取目标对象甲的肠道微生物的测序数据(reads),并对测序数据做如下处理:(1) Obtain the sequencing data (reads) of the gut microbes of the target object A, and process the sequencing data as follows:
1.1 Filter reads(过滤数据),将fastq格式的测序数据reads序列含有模糊碱基“N”的个数大于等于3个剔除。1.1 Filter reads (filtered data), the fastq format sequencing data read sequence contains more than 3 fuzzy bases "N".
1.2 Trim reads(去除低质量的reads),用Fastx软件将1.1处理后的测序数据reads末尾质量值小于20的连续剔除;再将剔除低质量连续碱基后reads长度小于30bp的reads剔除。1.2 Trim reads (remove low-quality reads), use Fastx software to remove the reads with a quality value less than 20 at the end of the sequencing data after 1.1 processing; then remove reads with a read length of less than 30bp after removing low-quality consecutive bases.
1.3 Remove host reads(去除宿主序列),用soap软件比对reads和宿主序列,将reads序列能比对到host序列的reads剔除。1.3 Remove host reads (remove host sequence), use soap software to compare reads and host sequence, and remove reads whose sequence can be compared to host sequence.
1.4 Mapped to IGC(与IGC比对)和Function annotation(功能注释),用soap软件将上述处理后的reads序列与IGC序列进行比对,并得到目标对象的肠道微生物基因丰度(gene abundance)文件和IGC的KO注释文件,基因丰度文件(如图4所示)由两列数据组成,其中左边一列为基因ID,右边一列为基因ID对应的基因丰度;IGC的KO注释文件(如图5所示)由两列数据组成,左边一列为基因ID,右边一列为基因ID对应的KO信息。对于能注释到同一个KO的gene abundance(基因丰度)进行加和计算,然后归一化即可算出每个KO的abundance(相对丰度),输出各个KO的相对丰度文件(如图6所示),该文件由两列数据组成,左边一列为KO编号(每个KO编号对应一个KO相关信息),右边一列为KO编号对应的KO的相对丰度值。即按照输入的基因丰度文件和KO注释文件,将属于同一个KO的基因的相对丰度进行加和,然后除以样本所有的KO基因丰度之和进行归一化,得到的结果即为KO的相对丰度。1.4 Mapped to IGC (comparison with IGC) and Function annotation (function annotation), use soap software to compare the processed reads sequence with the IGC sequence, and get the target's gut microbial gene abundance (gene abundance) File and IGC KO annotation file, the gene abundance file (as shown in Figure 4) consists of two columns of data, the left column is the gene ID, the right column is the gene abundance corresponding to the gene ID; the IGC KO annotation file (such as Figure 5) consists of two columns of data, the left column is the gene ID, and the right column is the KO information corresponding to the gene ID. The gene abundance (gene abundance) that can be annotated to the same KO is added and calculated, and then normalized to calculate the abundance (relative abundance) of each KO, and output the relative abundance file of each KO (Figure 6) As shown), the file consists of two columns of data, the left column is the KO number (each KO number corresponds to one KO related information), and the right column is the relative abundance value of the KO corresponding to the KO number. That is, according to the input gene abundance file and KO annotation file, the relative abundance of genes belonging to the same KO is added, and then divided by the sum of all KO gene abundances in the sample for normalization, and the result is The relative abundance of KO.
(二)肠道微生物功能分析(2) Analysis of intestinal microbial function
2.1抗生素抗性基因分析2.1 Analysis of antibiotic resistance genes
2.1.1抗生素抗性指数(抗生素抗性基因的相对丰度)分析2.1.1 Antibiotic resistance index (relative abundance of antibiotic resistance genes) analysis
本次抗生素抗性指数分析分析目标对象甲的肠道微生物的Cefoxintin(头孢西丁)和Netilmicin(奈替米星)的抗性指数,其中,Cefoxintin(头孢西丁)和Netilmicin(奈替米星)在IGC中对应的Gene ID如下表3所示:This antibiotic resistance index analysis analyzes the resistance index of Cefoxintin and Netilmicin of the gut microbes of the target object A. Among them, Cefoxintin and Netilmicin ) The corresponding Gene ID in IGC is shown in Table 3 below:
表3:table 3:
Figure PCTCN2019129426-appb-000008
Figure PCTCN2019129426-appb-000008
通过上述1.4 Mapped to IGC(与IGC比对)和Function annotation(功能注释)获得该目标对象甲所有抗生素抗性基因相对丰度之和为0.354,和表3中各Gene ID对应的相对丰度,如下表4。Through the above 1.4 Mapped to IGC (comparison with IGC) and Function annotation (function annotation), the sum of the relative abundance of all antibiotic resistance genes of the target object A is 0.354, which is the relative abundance corresponding to each Gene ID in Table 3. See Table 4 below.
表4:Table 4:
Gene IDGene ID 相对丰度Relative abundance
32254883225488 00
61221026122102 00
92966529296652 00
17775981777598 00
15610481561048 00
76807597680759 0.00006010.0000601
2.1.2计算Cefoxintin(头孢西丁)和Netilmicin(奈替米星)的抗性指数:2.1.2 Calculate the resistance index of Cefoxintin and Netilmicin:
Cefoxintin(头孢西丁)的抗性指数=0+0+0+0+0=0Cefoxintin (Cefoxitin) resistance index=0+0+0+0+0=0
Netilmicin(奈替米星)的抗性指数=0.0000601Netilmicin (netilmicin) resistance index = 0.000601
2.1.3确定该目标对象甲的Cefoxintin(头孢西丁)和Netilmicin(奈替米星)的抗性指数在参考人群中的位置:2.1.3 Determine the position of the resistance index of Cefoxintin and Netilmicin of the target object A in the reference population:
该参考人群目前已收录了346个人抗生素抗性指数,Cefoxintin(头孢西丁)的抗性指数0在参考人群中的按照从小到大的顺序进行排序,排在第0位,Netilmicin(奈替米星)的抗性指数0.0000601在参考人群中按照从小到大的顺序进行排序,排在第13位。所以,该目标对象甲的Cefoxintin(头孢西丁)指数在参考人群中的位置(c a)0/346=0;该目标对象甲的Netilmicin(奈替米星)的抗性指数在参考人群中的位置(c a)=13/346=0.0376。 The reference population has currently included 346 individuals with an antibiotic resistance index. The resistance index of Cefoxintin (cefoxitin) 0 is sorted from small to large in the reference population, ranking 0th, Netilmicin (netilmicin) The resistance index 0.0006001 of star) is ranked 13th in the reference population in descending order. Therefore, the position of the Cefoxintin index of the target object A in the reference population (c a ) 0/346=0; the resistance index of the Netilmicin (netilmicin) of the target object A is in the reference population The position of (c a )=13/346=0.376.
2.1.4确认该目标对象甲的肠道微生物的Cefoxintin(头孢西丁)和Netilmicin(奈替米星)抗性得分:2.1.4 Confirm the resistance scores of Cefoxintin and Netilmicin (netilmicin) of the gut microbes of the target object A:
根据预设的测试标准,检测水平确定为:According to the preset test standards, the detection level is determined as:
①若0≤c a<0.25,被检测人肠道菌群抗生素抗性基因处于较低水平: ① If 0≤c a <0.25, the detected human intestinal flora antibiotic resistance gene at a low level:
②若0.25≤c a<0.75,检测人肠道菌群抗生素抗性基因处于中等水平: ② If 0.25≤c a <0.75, detecting a human intestinal flora at the middle level of antibiotic resistance genes:
③若0.75≤c a<1,被检测人肠道菌群抗生素抗性基因处于较高水平。 ③ If 0.75≤c a <1, is detected human intestinal flora antibiotic resistance gene at a high level.
计算抗生素抗性基因水平得分计算:Calculate the antibiotic resistance gene level score calculation:
①若0≤c a<0.25,score=40+(c a-0)*(60-40)/(0.25-0)=80*c a+40; ① If 0≤c a <0.25, score = 40 + (c a -0) * (60-40) / (0.25-0) = 80 * c a +40;
②若0.25≤c a<0.75,score=60+(c a-0.25)*(80-60)/(0.75-0.25)=40*c a+50; ② If 0.25≤c a <0.75, score = 60 + (c a -0.25) * (80-60) / (0.75-0.25) = 40 * c a +50;
③若0.75≤c a<1,score=80+(c a-0.75)*(100-80)/(1-0.75)=80*c a+20。 ③ If 0.75≤c a <1, score = 80 + (c a -0.75) * (100-80) / (1-0.75) = 80 * c a +20.
所以,该目标对象甲的Cefoxintin(头孢西丁)指数在参考人群中的位置(c a)=0,0≤0<0.25,认为目标对象甲的肠道微生物菌群的Cefoxintin(头孢西丁)抗性指数处于较低水平;该目标对象甲的Netilmicin(奈替米星)的抗性指数在参考人群中的位置(c a)=0.0376,0≤0.0376<0.25,认为目标对象甲的肠道微生物菌群的Netilmicin(奈替米星)抗性指数处于较低水平。 Therefore, the target object A is Cefoxintin (cefoxitin) index position in the reference population (c a) = 0,0≤0 <0.25 , that gut microflora of the target object A Cefoxintin (cefoxitin) resistance index at a low level; a of the target object netilmicin (netilmicin) the resistance index in the position of the reference population (c a) = 0.0376,0≤0.0376 <0.25 , the target object a that intestinal The Netilmicin resistance index of the microbial flora is at a low level.
该目标对象甲的Cefoxintin(头孢西丁)抗性基因水平得分:The Cefoxintin resistance gene level score of the target object A:
Score=40*0+50=50Score=40*0+50=50
该目标对象甲的Netilmicin(奈替米星)抗性基因水平得分:The Netilmicin resistance gene level score of the target object A:
Score=40*0.0376+40=43.008Score=40*0.0376+40=43.008
2.2肠道功能基因分析2.2 Intestinal functional gene analysis
2.2.1肠道功能指数分析2.2.1 Analysis of intestinal function index
本次肠道功能分析分析目标对象甲的肠道微生物的Short-chain fatty acids(短链脂肪酸合成能力)和Bile Salt Hydrolase(胆盐水解能力),Short-chain fatty acids(短链脂肪酸合成能力)和Bile Salt Hydrolase(胆盐水解能力)在IGC中对应的KO和Gene ID如下表5所示:This intestinal function analysis analyzes the short-chain fatty acids (short-chain fatty acid synthesis ability) and Bile Salt Hydrolase (bile salt hydrolysis ability) of the intestinal microbes of the target object A. Short-chain fatty acids (short-chain fatty acid synthesis ability) The KO and Gene ID corresponding to Bile Salt Hydrolase in IGC are shown in Table 5 below:
表5:table 5:
Figure PCTCN2019129426-appb-000009
Figure PCTCN2019129426-appb-000009
通过上述1.4 Mapped to IGC(与IGC比对)和Function annotation(功能注释)获得该目标对象甲所有KO的相对丰度,并计算Short-chain fatty acids(短链脂肪酸合成能力)和Bile Salt Hydrolase(胆盐水解能力)涉及到的KO的相对丰度加和,如下表6。Through the above 1.4 Mapped to IGC (comparison with IGC) and Function annotation (function annotation) to obtain the relative abundance of all KOs of the target object A, and calculate Short-chain fatty acids (short-chain fatty acid synthesis ability) and Bile Salt Hydrolase ( The relative abundance of KO involved in bile salt hydrolysis capacity is summed, as shown in Table 6 below.
表6Table 6
Figure PCTCN2019129426-appb-000010
Figure PCTCN2019129426-appb-000010
Figure PCTCN2019129426-appb-000011
Figure PCTCN2019129426-appb-000011
每个功能所涉及到的KO的相对丰度的加和为该功能的指数,所以Short-chain fatty acids(短链脂肪酸合成能力)的指数为0.00198,Bile Salt Hydrolase(胆盐水解能力)的指数为0.000588。The sum of the relative abundance of KO involved in each function is the index of the function, so the index of Short-chain fatty acids (short-chain fatty acid synthesis capacity) is 0.00198, and the index of Bile Salt Hydrolase (bile salt hydrolysis capacity) Is 0.000588.
2.2.2确定该目标对象甲肠道Short-chain fatty acids(短链脂肪酸合成能力)和Bile Salt Hydrolase(胆盐水解能力)指数在参考人群中的位置:2.2.2 Determine the position of the target object's short-chain fatty acids (short-chain fatty acid synthesis ability) and Bile Salt Hydrolase (bile salt hydrolysis ability) index in the reference population:
该参考人群目前已收录了346个人抗生素抗性指数,Short-chain fatty acids(短链脂肪酸合成能力)指数0.00198在参考人群中的按照从小到大的顺序进行排序,排在第66位,Bile Salt Hydrolase(胆盐水解能力)指数0.000588在参考人群中按照从小到大的顺序进行排序,排在第269位。所以,该目标对象甲的Short-chain fatty acids(短链脂肪酸合成能力)指数在参考人群中的位置(ck)=66/346=0.191;该目标对象甲的Bile Salt Hydrolase(胆盐水解能力)指数在参考人群中的位置(ck)=269/346=0.777。The reference population has currently included 346 individual antibiotic resistance indexes. The Short-chain fatty acids index of 0.00198 is ranked in the order of small to large in the reference population, ranking 66th, Bile Salt The Hydrolase (bile salt hydrolysis capacity) index of 0.000588 ranked 269th in the reference population in descending order. Therefore, the position of the Short-chain fatty acids index of the target object A in the reference population (ck) = 66/346 = 0.191; the Bile Salt Hydrolase (bile salt hydrolysis capacity) of the target object A The position of the index in the reference population (ck)=269/346=0.77.
2.2.3确定该目标对象甲肠道Short-chain fatty acids(短链脂肪酸合成能力)和Bile Salt Hydrolase(胆盐水解能力)功能得分:2.2.3 Determine the function scores of short-chain fatty acids (short-chain fatty acid synthesis ability) and Bile Salt Hydrolase (bile salt hydrolysis ability) of the target object:
设定的检测值水平评价为:The set detection value level evaluation is:
①若0≤c k<0.25,被检测人肠道菌群某功能处于较低水平: ① If 0≤c k <0.25, a certain function of the tested human intestinal flora is at a low level:
②若0.25≤c k<0.75,检测人肠道菌群某功能处于中等水平: ②If 0.25≤c k <0.75, the function of detecting human intestinal flora is at a medium level:
③若0.75≤c k<1,被检测人肠道菌群某功能处于较高水平。 ③If 0.75≤c k <1, a certain function of the tested human intestinal flora is at a higher level.
预定计算肠道微生物功能得分规则:Rules for calculating the score of intestinal microbial function:
①若0≤c k<0.25,score=40+(c-0)*(60-40)/(0.25-0)=80*c+40; ① If 0≤c k <0.25, score=40+(c-0)*(60-40)/(0.25-0)=80*c+40;
②若0.25≤c k<0.75,score=60+(c-0.25)*(80-60)/(0.75-0.25)=40*c+50; ②If 0.25≤c k <0.75, score=60+(c-0.25)*(80-60)/(0.75-0.25)=40*c+50;
③若0.75≤c k<1,score=80+(c-0.75)*(100-80)/(1-0.75)=80*c+20。 ③If 0.75≤c k <1, score=80+(c-0.75)*(100-80)/(1-0.75)=80*c+20.
所以,该目标对象甲的Short-chain fatty acids(短链脂肪酸合成能力)指数在参考人群中的位置(c k)=0.191,0≤0.191<0.25,认为目标对象甲的肠道微生物菌群的Short-chain fatty acids(短链脂肪酸合成能力)功能指数处于较低水平;该目标对象甲的Bile Salt Hydrolase(胆盐水解能力)的抗性指数在参考人群中的位置(c k)=0.777, 0.75≤0.777<1,认为目标对象甲的肠道微生物菌群的Bile Salt Hydrolase(胆盐水解能力)的功能指数处于较高水平。 Therefore, the position of the short-chain fatty acids (short-chain fatty acid synthesis ability) index of the target object A in the reference population (c k ) = 0.191, 0≤0.191<0.25, it is considered that the intestinal microbial flora of the target object A Short-chain fatty acids (short-chain fatty acids synthesis ability) functional index is at a low level; the position of the Bile Salt Hydrolase resistance index of the target object A in the reference population (c k ) = 0.777, 0.75≤0.777<1, it is considered that the functional index of the Bile Salt Hydrolase (bile salt hydrolase ability) of the intestinal microflora of the target object A is at a high level.
该目标对象甲的Short-chain fatty acids(短链脂肪酸合成能力)功能水平得分:The short-chain fatty acids (short-chain fatty acid synthesis ability) function level score of the target object A:
Score=80*0.191+40=55.3Score=80*0.191+40=55.3
该目标对象甲的Bile Salt Hydrolase(胆盐水解能力)功能水平得分:The functional level score of Bile Salt Hydrolase of this target object A:
Score=80*0.777+20=82.16Score=80*0.777+20=82.16
综上,本申请实施提供的肠道微生物测序数据的处理方法实现了以下技术效果:In summary, the method for processing gut microbial sequencing data provided by this application has achieved the following technical effects:
1、本申请通过宏基因组测序产出的数据进行分析,相比于传统的16SrRNA基因数据分析方法,能够检测出人体肠道微生物菌群更为全面的内容,其中,该检测内容不仅包含肠道微生物的物种,同时还包含肠道微生物的功能信息。1. This application analyzes the data generated by metagenomic sequencing. Compared with the traditional 16SrRNA gene data analysis method, it can detect a more comprehensive content of the human intestinal microflora. Among them, the detection content not only includes the intestinal tract The species of microorganisms also contains information on the function of intestinal microorganisms.
2、本技术方案中对检测人的16种抗生素抗性相关的基因进行了全面检测,通过抗生素抗性相关的基因的检测情况可以评估被检测人肠道菌群中各种抗生素抗性基因的水平。2. In this technical solution, 16 kinds of antibiotic resistance-related genes in humans have been tested comprehensively. Through the detection of antibiotic resistance-related genes, the status of various antibiotic resistance genes in the intestinal flora of the tested person can be evaluated. Level.
3、本技术方案能够获得被检测目标对象的肠道微生物功能信息,包括:肠道微生物的功能基因,各具体功能基因的相对丰度,和各具体功能基因的相对丰度在参考人群中的位置(抗生素抗性基因的相对丰度、肠道功能基因的相对丰度,及它们的相对丰度分别在参考人群中的位置),从而能够在肠道微生物整体和特定微生物进行全部评估被检人肠道微生物的状况,实现能够对肠道微生物进行全面、多元的分析,实现个性化分析。3. This technical solution can obtain intestinal microbial function information of the detected target object, including: the functional genes of the intestinal microorganisms, the relative abundance of each specific functional gene, and the relative abundance of each specific functional gene in the reference population Location (the relative abundance of antibiotic resistance genes, the relative abundance of intestinal function genes, and their relative abundance respectively in the reference population), so that all the intestinal microbes and specific microbes can be fully evaluated. The status of human intestinal microbes enables comprehensive and diverse analysis of intestinal microbes and individualized analysis.
需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although the logical sequence is shown in the flowchart, in some cases, The steps shown or described can be performed in a different order than here.
本申请实施例还提供了一种肠道微生物测序数据的处理装置,需要说明的是,本申请实施例的肠道微生物测序数据的处理装置可以用于执行本申请实施例所提供的用于肠道微生物测序数据的处理方法。以下对该处理装置进行介绍。The embodiment of the present application also provides a device for processing gut microbial sequencing data. It should be noted that the device for processing gut microbial sequencing data in the embodiment of the present application can be used to execute the intestinal microbial sequencing data processing device provided by the embodiment of the present application. Ways of processing microbial sequencing data. The processing device is introduced below.
图3是根据本申请实施例的肠道微生物测序数据的处理装置的示意图。如图3所示,该装置包括:第一获取模块31、注释模块33以及第二获取模块35。Fig. 3 is a schematic diagram of a processing device for intestinal microbial sequencing data according to an embodiment of the present application. As shown in FIG. 3, the device includes: a first acquisition module 31, an annotation module 33, and a second acquisition module 35.
第一获取模块31,设置为获取目标对象的肠道微生物菌群的测序数据;The first obtaining module 31 is configured to obtain sequencing data of the intestinal microbial flora of the target object;
注释模块33,设置为根据标准基因数据库对测序数据进行注释,得到注释结果;The annotation module 33 is configured to annotate the sequencing data according to the standard gene database to obtain the annotation result;
第二获取模块35,设置为根据注释结果,对目标对象的肠道微生物菌群进行评估,获得目标对象的肠道微生物的功能信息。The second acquisition module 35 is configured to evaluate the intestinal microbial flora of the target object according to the annotation results, and obtain functional information of the intestinal microbes of the target object.
本申请实施例提供的肠道微生物测序数据的处理装置,通过第一获取模块31获取目标对象的肠道微生物菌群的测序数据;然后执行注释模块33根据标准基因数据库对测序数据进行注释,得到注释结果;最后执行第二获取模块35,根据注释结果,对目标对象的肠道微生物菌群进行评估,获得目标对象的肠道微生物的功能信息,该处理装置通过注释结果对目标对象的肠道微生物菌群进行评估,从而能够获得目标对象的肠道微生物的菌群信息,进而能够实现对肠道微生物菌群和对菌群的状态进行有效的分析的技术效果。与现有技术的处理装置仅能提供单一结果、信息有限的缺陷相比,本实施例的处理装置所提供的信息相对更全面、更多元化,因而能够满足个性化的需求。In the apparatus for processing gut microbial sequencing data provided by the embodiment of the present application, the sequencing data of the gut microbial flora of the target object is obtained through the first obtaining module 31; and then the annotation module 33 is executed to annotate the sequencing data according to the standard gene database to obtain Annotation results; finally the second acquisition module 35 is executed. According to the annotation results, the intestinal microbial flora of the target object is evaluated, and the functional information of the intestinal microbes of the target object is obtained. The microbial flora is evaluated, so as to obtain the flora information of the intestinal microbes of the target object, and then to achieve the technical effect of effective analysis of the intestinal microflora and the state of the flora. Compared with the defect that the processing device of the prior art can only provide a single result and limited information, the processing device of this embodiment provides relatively more comprehensive and diversified information, and thus can meet individual needs.
需要说明的是:在本申请实施例提供的肠道微生物测序数据的处理装置中,目标对象的肠道微生物的功能信息包括肠道微生物的功能基因及各功能基因的相对丰度,其中,功能基因的相对丰度为测序数据中属于同一功能的基因的相对丰度的加和与测序数据中所有功能的基因的相对丰度的加和的比值,属于同一功能的基因的相对丰度根据注释结果获得。It should be noted that in the apparatus for processing gut microbial sequencing data provided in the embodiments of the present application, the functional information of the gut microbes of the target object includes the functional genes of the gut microbes and the relative abundance of each functional gene. Among them, the function The relative abundance of genes is the ratio of the sum of the relative abundances of genes belonging to the same function in the sequencing data to the sum of the relative abundances of all genes in the sequencing data. The relative abundance of genes belonging to the same function is based on the annotation The result is obtained.
上述“功能基因”指的是与某一特定生物学功能相关的基因的集合,此处基因的数量可能是1个或多个,根据具体生物学功能的不同而不同。上述“各功能的相关的基因的相对丰度的加和”指的是测序数据中所涵盖的所有功能相关的基因的相对丰度之和。The above-mentioned "functional gene" refers to a collection of genes related to a specific biological function, where the number of genes may be one or more, depending on the specific biological function. The above-mentioned "sum of the relative abundance of genes related to each function" refers to the sum of the relative abundances of all function-related genes included in the sequencing data.
进一步地,在本申请实施例提供的肠道微生物测序数据的处理装置还包括:性能分析模块,设置为基于功能信息对目标对象的肠道微生物的进行性能分析,性能分析涉及如下至少之一:抗生素抗性基因分析及肠道功能基因分析。Further, the device for processing gut microbial sequencing data provided in the embodiment of the present application further includes: a performance analysis module configured to perform performance analysis on the gut microbes of the target object based on the function information, and the performance analysis involves at least one of the following: Antibiotic resistance gene analysis and intestinal function gene analysis.
包含上述分析模块的处理装置,在通过第二获取模块获得目标对象的肠道微生物的功能信息之后,还能够通过执行性能分析模块基于功能信息对目标对象的肠道微生物的进行抗生素抗性基因分析和/或肠道功能基因分析等性能分析,实现了对个体肠道微生物进行全面、多元的分析,并满足个性化分析需求的技术效果,从而解决了现有技术中所存在的分析结果单一、信息有限,无法满足个性化分析需求的技术问题。The processing device including the above-mentioned analysis module, after obtaining the functional information of the gut microbes of the target object through the second acquisition module, can also perform antibiotic resistance gene analysis on the gut microbes of the target object based on the functional information through the performance analysis module And/or performance analysis such as intestinal functional gene analysis, realizes the comprehensive and multiple analysis of individual intestinal microbes, and meets the technical effects of individual analysis requirements, thereby solving the single, single, and unique analysis result in the existing technology. The information is limited and cannot meet the technical problems of individualized analysis needs.
可选地,在本申请实施例提供的肠道微生物测序数据的处理装置中,在性能分析涉及抗生素抗性基因分析的情况下,性能分析模块包括:第一计算模块,设置为计算目标对象的肠道微生物的抗生素抗性指数,第一位置确定模块,设置为确定抗生素抗 性指数在参考人群中的位置;在性能分析涉及肠道功能基因分析的情况下,性能分析模块包括:第二计算模块,设置为计算目标对象的肠道微生物的肠道功能指数,第二位置确定模块,设置为确定肠道功能指数在参考人群中的位置。Optionally, in the device for processing intestinal microbial sequencing data provided by the embodiment of the present application, in the case that the performance analysis involves antibiotic resistance gene analysis, the performance analysis module includes: a first calculation module configured to calculate the target object The antibiotic resistance index of intestinal microbes, the first position determination module is set to determine the position of the antibiotic resistance index in the reference population; when the performance analysis involves intestinal functional gene analysis, the performance analysis module includes: second calculation The module is set to calculate the intestinal function index of the intestinal microbes of the target object, and the second position determination module is set to determine the position of the intestinal function index in the reference population.
其中,在一个可选的示例中,针对性能分析模块执行“对目标对象的肠道微生物的进行抗生素抗性基因分析”的步骤举例示意:Among them, in an optional example, the step of "analyzing the intestinal microbes of the target object for antibiotic resistance genes" performed by the performance analysis module is illustrated as an example:
首先,第一计算模块计算目标对象的肠道微生物的每种抗生素的抗生素抗性指数。First, the first calculation module calculates the antibiotic resistance index of each antibiotic of the gut microbe of the target object.
其次,第一位置确定模块确定每种抗生素的抗生素抗性指数在参考人群中的位置,以红霉素为例,将参考人群的红霉素抗性指数(比如100、200、300、1000或更多个人群(如健康人群)的肠道微生物检测结果的红霉素抗性指数)按照从小到大的顺序进行排序,确定参考人群中红霉素抗性指数小于目标对象的红霉素抗性指数的人数(i a),通过计算i a与参考人群总人数(s a)的比例确定目标对象的红霉素抗性指数在参考人群中的位置
Figure PCTCN2019129426-appb-000012
Second, the first position determination module determines the position of the antibiotic resistance index of each antibiotic in the reference population. Taking erythromycin as an example, the erythromycin resistance index of the reference population (such as 100, 200, 300, 1000 or The erythromycin resistance index of the intestinal microbial test results of more individual groups (such as healthy people) is sorted from small to large, and it is determined that the erythromycin resistance index in the reference population is less than that of the target object. The number of people with sex index (i a ), the position of the target object’s erythromycin resistance index in the reference population is determined by calculating the ratio of i a to the total number of people in the reference population (s a )
Figure PCTCN2019129426-appb-000012
以某种预设的测试标准为例,若(1)(例如:0)≤c a<b(例如:0.25),则认为目标对象的肠道微生物菌群的红霉素抗性指数处于较低水平;若(2)(例如:0.25)≤c a<c(例如:0.75),则认为目标对象的肠道微生物菌群的红霉素抗性指数处于中等水平;若(3)(例如:0.75)≤c a<d(例如:1),则认为目标对象的肠道微生物菌群的红霉素抗性指数处于较高水平。 In some predetermined test criteria, for example, if (1) (e.g.: 0) ≤c a <b (e.g.: 0.25), gut microflora is considered a target object in a relatively erythromycin resistance index low; if (2) (e.g.: 0.25) ≤c a <c (e.g.: 0.75), gut microflora is considered a target object at the middle level erythromycin resistance index; if (3) (e.g. : 0.75) ≤c a <d (For example: 1), that the gut microflora of erythromycin resistance index of the target object at a high level.
此外,性能分析模块在基于第一位置确定模块确定了目标对象的抗生素抗性指数和该抗生素抗性指数在参考人群中的位置后,还可以包括抗生素抗性得分模块:抗生素抗性得分模块用于确定目标对象的肠道微生物的抗生素抗性得分,抗生素抗性得分模块通过执行如下方法计算得到抗生素抗性得分:In addition, after the performance analysis module determines the antibiotic resistance index of the target object and the position of the antibiotic resistance index in the reference population based on the first position determination module, it may also include an antibiotic resistance scoring module: the antibiotic resistance scoring module is used To determine the antibiotic resistance score of the gut microbes of the target object, the antibiotic resistance score module calculates the antibiotic resistance score by executing the following method:
在目标对象的肠道微生物菌群的抗生素抗性指数处于较低水平的情况下,抗生素抗性得分=第一参数(例如:80)*抗生素抗性指数+第二参数(例如:40);In the case where the antibiotic resistance index of the intestinal microflora of the target object is at a low level, the antibiotic resistance score = the first parameter (for example: 80) * antibiotic resistance index + the second parameter (for example: 40);
在目标对象的肠道微生物菌群的抗生素抗性指数处于中等水平的情况下,抗生素抗性得分=第三参数(例如:40)*抗生素抗性指数+第四参数(例如:50);In the case that the antibiotic resistance index of the intestinal microflora of the target object is at a medium level, the antibiotic resistance score = the third parameter (for example: 40) * antibiotic resistance index + the fourth parameter (for example: 50);
在目标对象的肠道微生物菌群的抗生素抗性指数处于较高水平的情况下,抗生素抗性得分=第五参数(例如:80)*抗生素抗性指数+第六参数(例如:20)。In the case where the antibiotic resistance index of the intestinal microflora of the target object is at a high level, the antibiotic resistance score=fifth parameter (for example: 80)*antibiotic resistance index+sixth parameter (for example: 20).
需要说明的是:上述预设的测试标准中的第一参数、第二参数、第三参数、第四参数、第五参数和第六参数等参数数据,可以基于应用场景适应性替换,本申请不做具体限定。It should be noted that the parameter data such as the first parameter, the second parameter, the third parameter, the fourth parameter, the fifth parameter and the sixth parameter in the above-mentioned preset test standards can be adaptively replaced based on the application scenario. No specific restrictions.
还需要说明的是,前述表1示例性地列举了16种抗生素及其分别对应的部分基因ID(编号),每个基因ID对应一段基因序列。在实际研究应用中,根据不同的研究方向或目的,可以选择其中的一种或多种,也可以选择其他相关的基因。It should also be noted that the foregoing Table 1 exemplarily lists 16 antibiotics and their corresponding partial gene IDs (numbers), and each gene ID corresponds to a gene sequence. In actual research applications, one or more of them can be selected according to different research directions or purposes, or other related genes can be selected.
在一个可选的示例中,针对性能分析模块执行“对目标对象的肠道微生物的进行肠道功能基因分析”举例示意:In an optional example, the performance analysis module performs "intestinal functional gene analysis of the intestinal microbes of the target object" for example:
首先,第二计算模块计算目标对象的肠道微生物每种功能的肠道功能指数,即将属于同一种功能相关的功能基因的相对丰度的加和作为该种功能对应的肠道功能指数。First, the second calculation module calculates the intestinal function index of each function of the gut microbe of the target object, that is, the sum of the relative abundances of functional genes related to the same function is used as the intestinal function index corresponding to the function.
其次,第二位置确定模块确定每种功能对应的肠道功能指数在参考人群中的位置,以能量代谢能力为例,将参考人群肠道能量代谢能力的肠道功能指数(比如,100、200、300、1000或更多个人群(如健康人群)在肠道微生物检测结果中的能量代谢能力的肠道功能指数)按照从小到大顺序进行排序,确定参考人群中肠道功能指数小于目标对象肠道能量代谢能力的肠道功能指数的人数(i o),通过计算i o与参考人群总人数(s o)的比例确定目标对象肠道能量代谢能力的肠道功能指数在参考人群中的位置
Figure PCTCN2019129426-appb-000013
Secondly, the second position determination module determines the position of the intestinal function index corresponding to each function in the reference population. Taking energy metabolism capacity as an example, the intestinal function index of the intestinal energy metabolism capacity of the reference population (for example, 100, 200 , 300, 1000 or more people (such as healthy people) in the intestinal microbial test results in the intestinal function index of energy metabolism) in order from small to large, determine the intestinal function index in the reference population is smaller than the target object The number of intestinal function index of intestinal energy metabolism capacity (i o ), by calculating the ratio of i o to the total number of people in the reference population (s o ), determine the intestinal function index of intestinal energy metabolism capacity of the target object in the reference population position
Figure PCTCN2019129426-appb-000013
以某种预设的测试标准为例,若(1)(例如:0)≤c o<b(例如:0.25),则认为目标对象肠道能量代谢能力的肠道功能指数处于较低水平;若(2)(例如:0.25)≤c o<c(例如:0.75),则认为目标对象肠道能量代谢能力的肠道功能指数处于中等水平。若(3)(例如:0.75)≤c o<d(例如:1),则认为目标对象肠道能量代谢能力的肠道功能指数处于较高水平。 Taking a certain preset test standard as an example, if (1) (for example: 0) ≤ c o <b (for example: 0.25), the intestinal function index of the target object's intestinal energy metabolism ability is considered to be at a low level; If (2) (for example: 0.25)≤c o <c (for example: 0.75), the intestinal function index of the intestinal energy metabolism capacity of the target object is considered to be at a medium level. If (3) (for example: 0.75) ≤ c o <d (for example: 1), the intestinal function index of the intestinal energy metabolism capacity of the target object is considered to be at a higher level.
此外,性能分析模块在基于第二计算模块和第二位置确定模块分别确定目标对象的肠道功能指数和该肠道功能指数在参考人群中的位置后,还可以进一步包括肠道功能得分模块,肠道功能得分模块用于确定目标对象的肠道微生物的肠道功能得分,肠道功能得分模块可以通过执行如下方法计算得到肠道功能得分:In addition, the performance analysis module may further include an intestinal function scoring module after determining the intestinal function index of the target object and the position of the intestinal function index in the reference crowd based on the second calculation module and the second position determining module, respectively, The intestinal function score module is used to determine the intestinal function score of the intestinal microbes of the target object. The intestinal function score module can calculate the intestinal function score by executing the following methods:
在目标对象的肠道微生物菌群的肠道功能指数处于较低水平的情况下,肠道功能得分=第七参数(例如:80)*肠道功能指数+第八参数(例如:40);In the case where the intestinal function index of the intestinal microflora of the target object is at a low level, the intestinal function score = the seventh parameter (for example: 80) * the intestinal function index + the eighth parameter (for example: 40);
在目标对象的肠道微生物菌群的肠道功能指数处于适中水平的情况下,肠道功能得分=第九参数(例如:40)*肠道功能指数+第十参数(例如:50);When the intestinal function index of the intestinal microflora of the target object is at a moderate level, the intestinal function score = the ninth parameter (for example: 40) * the intestinal function index + the tenth parameter (for example: 50);
在目标对象的肠道微生物菌群的肠道功能指数处于较高水平的情况下,肠道功能得分=第十一参数(例如:80)*肠道功能指数+第十二参数(例如:20)。When the intestinal function index of the intestinal microflora of the target object is at a high level, the intestinal function score = the eleventh parameter (for example: 80) * the intestinal function index + the twelfth parameter (for example: 20 ).
需要说明的是:上述预设的测试标准中的第七参数、第八参数、第九参数、第十参数、第十一参数和第十二参数等参数数据,可以基于应用场景适应性替换,本申请 不做具体限定。It should be noted that: the seventh parameter, eighth parameter, ninth parameter, tenth parameter, eleventh parameter, and twelfth parameter in the above-mentioned preset test standards can be adaptively replaced based on application scenarios. This application does not make specific restrictions.
还需要说明的是,前述表2示例性地列出了9种功能,在具体的应用中可根据需要选择其中的一种或多种,也可选择其他感兴趣的功能及其相关基因。It should also be noted that the foregoing Table 2 exemplarily lists 9 functions. In specific applications, one or more of them can be selected according to needs, and other functions of interest and related genes can also be selected.
可选地,在本申请实施例提供的肠道微生物测序数据的处理装置中,第一计算模块包括:第一加和单元,设置为在抗生素抗性指数为抗生素抗性基因的相对丰度的情况下,第一加和单元用于计算目标对象的肠道微生物的同一种抗生素抗性基因的相对丰度之和;相除单元,设置为将同一种抗生素抗性基因的相对丰度之和除以所有抗生素抗性基因的相对丰度之和,得到抗生素抗性基因的相对丰度;第二计算模块包括:第二加和单元,设置为将属于同一种肠道功能的功能基因的相对丰度的加和,得到肠道功能指数。Optionally, in the device for processing intestinal microbial sequencing data provided by the embodiment of the present application, the first calculation module includes: a first summation unit, which is set to measure the relative abundance of antibiotic resistance genes when the antibiotic resistance index is In this case, the first summation unit is used to calculate the sum of the relative abundances of the same antibiotic resistance gene of the gut microbes of the target object; the division unit is set to the sum of the relative abundances of the same antibiotic resistance gene Divide by the sum of the relative abundances of all antibiotic resistance genes to obtain the relative abundance of antibiotic resistance genes; the second calculation module includes: the second summation unit, which is set to compare the relative abundance of functional genes belonging to the same intestinal function The abundance is added to obtain the intestinal function index.
可选地,在本申请实施例提供的肠道微生物测序数据的处理装置中,性能分析模块还包括:第一导入模块,设置为在性能分析涉及抗生素抗性基因分析,且第一计算模块计算得到目标对象的肠道微生物的抗生素抗性指数的情况下,将目标对象的肠道微生物的抗生素抗性指数导入参考人群的数据库以用于下一次性能分析步骤中;第二导入模块,设置为在性能分析涉及肠道功能基因分析,且第二计算模块计算得到目标对象的肠道微生物的肠道功能指数的情况下,将目标对象的肠道微生物的肠道功能指数导入参考人群的数据库以用于下一次性能分析步骤中。Optionally, in the device for processing intestinal microbial sequencing data provided in the embodiment of the present application, the performance analysis module further includes: a first import module configured to analyze the antibiotic resistance gene when the performance analysis is involved, and the first calculation module calculates When the antibiotic resistance index of the gut microbes of the target object is obtained, the antibiotic resistance index of the gut microbes of the target object is imported into the database of the reference population for use in the next performance analysis step; the second import module is set to In the case that the performance analysis involves intestinal function gene analysis, and the second calculation module calculates the intestinal function index of the intestinal microbes of the target object, the intestinal function index of the intestinal microbes of the target object is imported into the database of the reference population. Used in the next performance analysis step.
上述性能分析模块在还包括第一导入模块及第二导入模块的情况下,该处理装置能够在对每个目标对象的肠道微生物菌群进行评估,获得目标对象的肠道微生物的功能信息之后,还能够将其评估结果(目标对象的肠道微生物的功能信息,该功能信息包括肠道微生物的功能基因及各功能基因的相对丰度)添加至该处理装置已存储的参考人群的数据库中,以便对每项指标的参考范围进行实时更新。In the case where the above-mentioned performance analysis module further includes a first introduction module and a second introduction module, the processing device can evaluate the intestinal microflora of each target object and obtain functional information of the target object’s intestinal microbes. , It can also add the evaluation result (functional information of the gut microbes of the target object, the functional information includes the functional genes of the gut microbes and the relative abundance of each functional gene) to the database of the reference population stored in the processing device , In order to update the reference range of each indicator in real time.
随着使用本申请肠道微生物测序数据的处理装置,进行肠道微生物菌群评估的参与个体越来越多,数据库中存储的参考人群的规模将不断的扩大,进而使得该处理装置对肠道微生物菌群评估的结果也会越来越准确,该处理装置对肠道微生物菌群评估的参考价值也越来越大。With the use of the processing device for intestinal microbial sequencing data of this application, more and more individuals participate in the evaluation of the intestinal microbial flora, and the scale of the reference population stored in the database will continue to expand, thereby making the processing device more effective The results of microbial flora evaluation will also become more and more accurate, and the reference value of the processing device for the evaluation of intestinal microflora is also increasing.
最后,当数据库中存储的参考人群数达到一定的丰富程度的时候,本申请的处理装置还会根据待测个体的表型特征(包括性别,年龄,人种,身高、体重、饮食、居住区域等)选取相应的参考人群进行具体肠道微生物菌群评估分析,进而使得评估结果更加精准、可靠。Finally, when the number of reference populations stored in the database reaches a certain degree of abundance, the processing device of this application will also determine the phenotypic characteristics of the individual to be tested (including gender, age, race, height, weight, diet, living area). Etc.) Select the corresponding reference population for specific intestinal microbial flora assessment and analysis, thereby making the assessment results more accurate and reliable.
还需要说明的是:该处理装置的数据库中最初存储有目标数量个初始参考对象,其数据库具体记录有每个初始参考对象的表型信息(包括性别、年龄、人种、身高、 体重、饮食、居住区域等)和肠道微生物菌群的评估信息(包括抗生素抗性基因分析及肠道功能基因分析等,还可以进一步包括各肠道微生物的物种相对丰度信息等)。It should also be noted that the target number of initial reference objects are initially stored in the database of the processing device, and the database specifically records the phenotype information of each initial reference object (including gender, age, race, height, weight, diet). , Living area, etc.) and the evaluation information of the intestinal microbial flora (including antibiotic resistance gene analysis and intestinal function gene analysis, etc., and may further include the relative abundance information of each intestinal microbe species, etc.).
此外,还需要针对本申请肠道微生物测序数据的处理装置的第一获取模块31进行说明:In addition, it is also necessary to explain the first obtaining module 31 of the apparatus for processing gut microbial sequencing data of the present application:
在一个可选的示例中,第一获取模块31获取目标对象的肠道微生物菌群的测序数据可以通过执行如下步骤实现:In an optional example, the first obtaining module 31 obtains the sequencing data of the gut microbial flora of the target object by performing the following steps:
步骤A1,对目标对象的肠道微生物进行基因测序,获取目标对象的肠道微生物菌群的原始测序数据(raw reads,通常为fasq格式,fastq文件中含有测序序列中所有碱基的质量信息);Step A1: Perform genetic sequencing on the gut microbes of the target object to obtain the original sequencing data of the gut microbe flora of the target object (raw reads, usually in fasq format, the fastq file contains the quality information of all bases in the sequencing sequence) ;
步骤A2,对该原始测序数据进行质量监控,即,将原始测序数据中模糊碱基N的数量大于预设数值(该预设数值可以是3个、4个、5个或更多个,具体可以根据实际应用情况而进行合理调整)的reads剔除,以及将原始测序数据中的低质量reads剔除(比如,可以根据reads中的质量信息将reads末尾质量值小于特定值的连续碱基剔除,此处的特定值可以是20、25、30或其他更高的数值,具体可根据实际需求合理调整;进一步地,在剔除上述连续碱基后的reads长度小于特定长度的reads剔除,此处的特定长度可以是30bp、35bp或者更长,具体可基于应用场景适应性调整。去除低质量reads可以选用现有具有上述功能的软件,比如可以是fastx软件);Step A2, the quality of the original sequencing data is monitored, that is, the number of fuzzy bases N in the original sequencing data is greater than a preset value (the preset value can be 3, 4, 5 or more, specifically It can be reasonably adjusted according to actual application conditions) to remove reads, and to remove low-quality reads in the original sequencing data (for example, continuous bases with a quality value less than a specific value at the end of the reads can be removed according to the quality information in the reads. The specific value at can be 20, 25, 30 or other higher values, which can be adjusted reasonably according to actual needs; further, reads whose lengths are less than a specific length after excluding the above-mentioned consecutive bases are excluded. The length can be 30bp, 35bp or longer, which can be adjusted based on application scenarios. Existing software with the above functions can be used to remove low-quality reads, such as fastx software);
步骤A3,将原始基因数据中的寄主基因序列剔除,得到目标对象的肠道微生物菌群的测序数据,其中,寄主基因序列为目标对象的基因序列。该步骤所采用的软件可以使用soap软件。Step A3: Remove the host gene sequence from the original gene data to obtain the sequencing data of the gut microbial flora of the target object, where the host gene sequence is the gene sequence of the target object. The software used in this step can use soap software.
此外,还需要针对本申请处理装置中的注释模块33进行说明:In addition, it is necessary to explain the annotation module 33 in the processing device of this application:
在一个可选的示例中,注释模块33可以通过执行如下步骤实现:In an optional example, the annotation module 33 can be implemented by performing the following steps:
步骤B1,将测序数据中的reads(即基因序列)对比到标准基因数据库(例如:人肠道微生物宏基因组的整合基因集IGC),确定测序数据中包含的每种基因序列的相对丰度(例如:确定测序数据对应的基因丰度文件,其中,该文件包含两列数据,右侧一列数据为基因ID,左侧一列数据为右侧基因ID依次对应的基因相对丰度);Step B1: Compare the reads (gene sequences) in the sequencing data to a standard gene database (for example: the integrated gene set IGC of the human gut microbial metagenomics), and determine the relative abundance of each gene sequence contained in the sequencing data ( For example: determine the gene abundance file corresponding to the sequencing data, where the file contains two columns of data, the data in the right column is the gene ID, and the data in the left column is the relative abundance of genes corresponding to the gene ID in turn);
步骤B2,基于标准基因数据库中记载的每种基因序列的注释信息(注释信息包含:每种抗生素对应的抗性基因信息,可形成抗生素抗性基因注释文件),和测序数据中包含的每种基因序列的相对丰度,确定测序数据中包含的每种抗生素抗性基因序列的相对丰度,(例如:确定测序数据中对应的抗生素抗性基因丰度文件,其中,该文件包含两列数据,左侧一列数据为各种抗生素抗性基因信息,右侧一列数据为左侧抗性基因 依次对应的相对丰度,具体计算方法比如可以是:按照步骤B1的基因丰度文件和B2中产生的抗生素抗性基因注释文件,将属于同一个抗生素抗性基因的基因相对丰度进行加和,然后除以样本中所有抗生素抗性基因的相对丰度之和,得到的结果即为这个抗生素抗性基因的相对丰度);Step B2, based on the annotation information of each gene sequence recorded in the standard gene database (annotation information includes: the resistance gene information corresponding to each antibiotic, which can form an antibiotic resistance gene annotation file), and each type contained in the sequencing data The relative abundance of the gene sequence, determine the relative abundance of each antibiotic resistance gene sequence contained in the sequencing data, (for example: determine the corresponding antibiotic resistance gene abundance file in the sequencing data, where the file contains two columns of data , The data in the left column is the information of various antibiotic resistance genes, and the data in the right column is the relative abundance corresponding to the resistance genes on the left. The specific calculation method can be: according to the gene abundance file in step B1 and the generation in B2 Add the relative abundance of genes belonging to the same antibiotic resistance gene, and then divide by the sum of the relative abundances of all antibiotic resistance genes in the sample. The result is the antibiotic resistance Relative abundance of sex genes);
步骤B3,基于标准基因数据库中记载的每种基因序列的注释信息(注释信息包含:每种生物学功能对应的基因信息,其中,每种生物学功能对应的基因的数量为1个或多个,根据具体生物学功能的不同,基因的数量也存在差异),和测序数据中包含的每种基因序列的相对丰度,确定测序数据中包含的每种生物学功能基因的相对丰度,(例如:确定测序数据对应的生物学功能基因丰度文件,其中,该文件包含两列数据,左侧一列数据为各种生物学功能所对应的基因信息,右侧一列数据为左侧各种生物学功能所对应的基因依次对应的相对丰度,具体计算方法比如,可以是将属于同一种功能的基因的相对丰度进行加和,然后将每一种功能相关的基因的相对丰度除以样本所有功能所对应的相关的基因的相对丰度之和,即为每一种功能基因的相对丰度)。Step B3, based on the annotation information of each gene sequence recorded in the standard gene database (the annotation information includes: the gene information corresponding to each biological function, wherein the number of genes corresponding to each biological function is one or more According to the specific biological function, the number of genes also differs), and the relative abundance of each gene sequence contained in the sequencing data, determine the relative abundance of each biological function gene contained in the sequencing data, ( For example: to determine the biological function gene abundance file corresponding to the sequencing data, where the file contains two columns of data, the left column of data is the gene information corresponding to various biological functions, and the right column of data is the various organisms on the left The relative abundance of the genes corresponding to the scientific function in turn, the specific calculation method can be, for example, adding the relative abundance of genes belonging to the same function, and then dividing the relative abundance of each function-related gene by The sum of the relative abundance of related genes corresponding to all the functions of the sample is the relative abundance of each functional gene).
本申请的肠道微生物测序数据的处理装置包括处理器和存储器,上述第一获取模块31、注释模块33以及第二获取模块35等均作为程序单元存储在存储器中,由处理器执行存储在存储器中的上述程序单元来实现相应的功能。The apparatus for processing gut microbial sequencing data of the present application includes a processor and a memory. The above-mentioned first acquisition module 31, annotation module 33, and second acquisition module 35 are all stored as program units in the memory, and are executed by the processor and stored in the memory. The above-mentioned program unit in to realize the corresponding function.
处理器中包含内核,由内核去存储器中调取相应的程序单元。内核可以设置一个或以上,通过调整内核参数来对肠道微生物菌群和对菌群的状态进行有效全面的分析。The processor contains a kernel, which calls the corresponding program unit from the memory. One or more kernels can be set, and the intestinal microbial flora and the status of the flora can be effectively and comprehensively analyzed by adjusting the kernel parameters.
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM),存储器包括至少一个存储芯片。The memory may include non-permanent memory in computer-readable media, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM), and the memory includes at least one Memory chip.
本申请实施例提供了一种存储介质,其上存储有程序,该程序被处理器执行时实现肠道微生物测序数据的处理方法。The embodiment of the present application provides a storage medium on which a program is stored, and when the program is executed by a processor, a method for processing gut microbial sequencing data is realized.
本申请实施例提供了一种处理器,处理器用于运行程序,其中,程序运行时执行肠道微生物测序数据的处理方法。The embodiment of the present application provides a processor, which is used to run a program, wherein the intestinal microbial sequencing data processing method is executed when the program is running.
本申请实施例提供了一种设备,设备包括处理器、存储器及存储在存储器上并可在处理器上运行的程序,处理器执行程序时实现以下步骤:获取目标对象的肠道微生物菌群的测序数据;根据标准基因数据库对测序数据进行注释,得到注释结果;根据注释结果,对目标对象的肠道微生物菌群进行评估,获得目标对象的肠道微生物的功能信息。The embodiment of the present application provides a device, which includes a processor, a memory, and a program stored on the memory and running on the processor. The processor executes the program to implement the following steps: obtain the intestinal microflora of the target object Sequencing data: Annotate the sequencing data according to the standard gene database to obtain the annotation results; according to the annotation results, evaluate the gut microbiota of the target object to obtain functional information of the gut microbe of the target object.
可选的,目标对象的肠道微生物的功能信息包括肠道微生物的功能基因及各功能基因的相对丰度,其中,功能基因的相对丰度为测序数据中属于同一功能的基因的相 对丰度的加和与测序数据中所有功能的基因的相对丰度的加和的比值,属于同一功能的基因的相对丰度根据注释结果获得。Optionally, the functional information of the gut microbes of the target object includes the functional genes of the gut microbes and the relative abundance of each functional gene, where the relative abundance of the functional gene is the relative abundance of genes belonging to the same function in the sequencing data The ratio of the sum of and the sum of the relative abundances of all functional genes in the sequencing data. The relative abundances of genes belonging to the same function are obtained from the annotation results.
可选的,在获得目标对象的肠道微生物的功能信息之后,方法还包括:基于功能信息对目标对象的肠道微生物的进行性能分析,性能分析涉及如下至少之一:抗生素抗性基因分析及肠道功能基因分析。Optionally, after obtaining the functional information of the gut microbes of the target object, the method further includes: performing performance analysis on the gut microbes of the target object based on the function information, and the performance analysis involves at least one of the following: antibiotic resistance gene analysis and Intestinal function gene analysis.
可选的,在性能分析涉及抗生素抗性基因分析的情况下,基于功能信息对目标对象的肠道微生物的进行性能分析包括:计算目标对象的肠道微生物的抗生素抗性指数,确定抗生素抗性指数在参考人群中的位置;在性能分析涉及肠道功能基因分析的情况下,基于功能信息对目标对象的肠道微生物的进行性能分析包括:计算目标对象的肠道微生物的肠道功能指数,确定肠道功能指数在参考人群中的位置。Optionally, in the case that the performance analysis involves the analysis of antibiotic resistance genes, the performance analysis of the gut microbes of the target object based on the functional information includes: calculating the antibiotic resistance index of the gut microbes of the target object and determining the antibiotic resistance The position of the index in the reference population; in the case that the performance analysis involves intestinal function gene analysis, the performance analysis of the gut microbes of the target object based on the function information includes: calculating the gut function index of the gut microbes of the target object, Determine the position of the intestinal function index in the reference population.
可选的,抗生素抗性指数为抗生素抗性基因的相对丰度,抗生素抗性基因的相对丰度按照如下方法计算:计算目标对象的肠道微生物的同一种抗生素抗性基因的相对丰度之和;将同一种抗生素抗性基因的相对丰度之和除以所有抗生素抗性基因的相对丰度之和,得到抗生素抗性基因的相对丰度;肠道功能指数通过以下方法计算:属于同一种肠道功能的功能基因的相对丰度的加和即为肠道功能指数。Optionally, the antibiotic resistance index is the relative abundance of antibiotic resistance genes, and the relative abundance of antibiotic resistance genes is calculated according to the following method: Calculate the relative abundance of the same kind of antibiotic resistance genes in the gut microbes of the target object And; divide the sum of the relative abundances of the same antibiotic resistance gene by the sum of the relative abundances of all antibiotic resistance genes to get the relative abundance of antibiotic resistance genes; the intestinal function index is calculated by the following method: belong to the same The sum of the relative abundance of the functional genes of the intestinal function is the intestinal function index.
可选的,在性能分析涉及抗生素抗性基因分析,且基于功能信息对目标对象的肠道微生物的进行性能分析,计算得到目标对象的肠道微生物的抗生素抗性指数的情况下,还包括将目标对象的肠道微生物的抗生素抗性指数导入参考人群的数据库以用于下一次性能分析步骤中;在性能分析涉及肠道功能基因分析,且基于功能信息对目标对象的肠道微生物的进行性能分析,计算得到目标对象的肠道微生物的肠道功能指数的情况下,还包括将目标对象的肠道微生物的肠道功能指数导入参考人群的数据库以用于下一次性能分析步骤中。本文中的设备可以是服务器、PC、PAD、手机等。Optionally, when the performance analysis involves the analysis of antibiotic resistance genes, and the performance analysis of the gut microbes of the target object is performed based on the functional information, and the antibiotic resistance index of the gut microbes of the target object is calculated, it also includes The antibiotic resistance index of the gut microbes of the target object is imported into the database of the reference population for the next performance analysis step; the performance analysis involves the analysis of the gut function genes, and the performance of the gut microbes of the target object is performed based on the functional information When the intestinal function index of the intestinal microbes of the target object is obtained by analysis and calculation, it also includes importing the intestinal function index of the intestinal microbes of the target object into the database of the reference population for use in the next performance analysis step. The devices in this article can be servers, PCs, PADs, mobile phones, etc.
本申请还提供了一种计算机程序产品,当在数据处理设备上执行时,适于执行初始化有如下方法步骤的程序:获取目标对象的肠道微生物菌群的测序数据;根据标准基因数据库对测序数据进行注释,得到注释结果;根据注释结果,对目标对象的肠道微生物菌群进行评估,获得目标对象的肠道微生物的功能信息。This application also provides a computer program product, which when executed on a data processing device, is suitable for executing a program that initializes the following method steps: acquiring the sequencing data of the gut microbial flora of the target object; sequencing according to the standard gene database The data is annotated to obtain the annotation result; according to the annotation result, the gut microbial flora of the target object is evaluated to obtain the functional information of the gut microbe of the target object.
可选的,目标对象的肠道微生物的功能信息包括肠道微生物的功能基因及各功能基因的相对丰度,其中,功能基因的相对丰度为测序数据中属于同一功能的基因的相对丰度的加和与测序数据中所有功能的基因的相对丰度的加和的比值,属于同一功能的基因的相对丰度根据注释结果获得。Optionally, the functional information of the gut microbes of the target object includes the functional genes of the gut microbes and the relative abundance of each functional gene, where the relative abundance of the functional gene is the relative abundance of genes belonging to the same function in the sequencing data The ratio of the sum of and the sum of the relative abundances of all functional genes in the sequencing data. The relative abundances of genes belonging to the same function are obtained from the annotation results.
可选的,在获得目标对象的肠道微生物的功能信息之后,方法还包括:基于功能信息对目标对象的肠道微生物的进行性能分析,性能分析涉及如下至少之一:抗生素 抗性基因分析及肠道功能基因分析。Optionally, after obtaining the functional information of the gut microbes of the target object, the method further includes: performing performance analysis on the gut microbes of the target object based on the function information, and the performance analysis involves at least one of the following: antibiotic resistance gene analysis and Intestinal function gene analysis.
可选的,在性能分析涉及抗生素抗性基因分析的情况下,基于功能信息对目标对象的肠道微生物的进行性能分析包括:计算目标对象的肠道微生物的抗生素抗性指数,确定抗生素抗性指数在参考人群中的位置;在性能分析涉及肠道功能基因分析的情况下,基于功能信息对目标对象的肠道微生物的进行性能分析包括:计算目标对象的肠道微生物的肠道功能指数,确定肠道功能指数在参考人群中的位置。Optionally, in the case that the performance analysis involves the analysis of antibiotic resistance genes, the performance analysis of the gut microbes of the target object based on the functional information includes: calculating the antibiotic resistance index of the gut microbes of the target object and determining the antibiotic resistance The position of the index in the reference population; in the case that the performance analysis involves intestinal function gene analysis, the performance analysis of the gut microbes of the target object based on the function information includes: calculating the gut function index of the gut microbes of the target object, Determine the position of the intestinal function index in the reference population.
可选的,抗生素抗性指数为抗生素抗性基因的相对丰度,抗生素抗性基因的相对丰度按照如下方法计算:计算目标对象的肠道微生物的同一种抗生素抗性基因的相对丰度之和;将同一种抗生素抗性基因的相对丰度之和除以所有抗生素抗性基因的相对丰度之和,得到抗生素抗性基因的相对丰度;肠道功能指数通过以下方法计算:属于同一种肠道功能的功能基因的相对丰度的加和即为肠道功能指数。Optionally, the antibiotic resistance index is the relative abundance of antibiotic resistance genes, and the relative abundance of antibiotic resistance genes is calculated according to the following method: Calculate the relative abundance of the same kind of antibiotic resistance genes in the gut microbes of the target object And; divide the sum of the relative abundances of the same antibiotic resistance gene by the sum of the relative abundances of all antibiotic resistance genes to get the relative abundance of antibiotic resistance genes; the intestinal function index is calculated by the following method: belong to the same The sum of the relative abundance of the functional genes of the intestinal function is the intestinal function index.
可选的,在性能分析涉及抗生素抗性基因分析,且基于功能信息对目标对象的肠道微生物的进行性能分析,计算得到目标对象的肠道微生物的抗生素抗性指数的情况下,还包括将目标对象的肠道微生物的抗生素抗性指数导入参考人群的数据库以用于下一次性能分析步骤中;在性能分析涉及肠道功能基因分析,且基于功能信息对目标对象的肠道微生物进行性能分析,计算得到目标对象的肠道微生物的肠道功能指数的情况下,还包括将目标对象的肠道微生物的肠道功能指数导入参考人群的数据库以用于下一次性能分析步骤中。Optionally, when the performance analysis involves the analysis of antibiotic resistance genes, and the performance analysis of the gut microbes of the target object is performed based on the functional information, and the antibiotic resistance index of the gut microbes of the target object is calculated, it also includes The antibiotic resistance index of the gut microbes of the target object is imported into the database of the reference population for the next performance analysis step; the performance analysis involves the analysis of gut function genes, and the performance analysis of the gut microbes of the target object is performed based on the functional information In the case where the intestinal function index of the intestinal microbe of the target object is calculated, the intestinal function index of the intestinal microbe of the target object is also imported into the database of the reference population for use in the next performance analysis step.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。This application is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the application. It should be understood that each flow and/or block in the flowchart and/or block diagram and a combination of the flow and/or block in the flowchart and/or block diagram may be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are generated A device that implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括 指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to generate computer-implemented processing, which is executed on the computer or other programmable device The instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, the computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。存储器是计算机可读介质的示例。The memory may include non-permanent memory in a computer readable medium, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media, including permanent and non-permanent, removable and non-removable media, can store information by any method or technology. The information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. As defined in this article, computer-readable media does not include temporary computer-readable media (transitory media), such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "include" or any other variant thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device that includes a series of elements includes not only those elements, but also includes Other elements not explicitly listed, or include elements inherent to this process, method, commodity, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, commodity or equipment that includes the element.
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同 替换、改进等,均应包含在本申请的权利要求范围之内。The above are only examples of the application, and are not used to limit the application. For those skilled in the art, this application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the scope of the claims of this application.
工业实用性Industrial applicability
通过本申请,采用以下步骤:获取目标对象的肠道微生物菌群的测序数据;根据标准基因数据库对所述测序数据进行注释,得到注释结果;根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的功能信息。与相关现有技术相比,本申请的上述方法不仅能够对基因的相对丰度进行分析,而且能够根据各基因的相对丰度信息对目标对象的肠道微生物的菌群及菌群的状态进行评估,进而相对完整地获得目标对象的肠道微生物的功能信息。即,本申请的上述方法所提供的信息相对更全面,更多元化,能够满足个性化的需求。Through this application, the following steps are adopted: acquiring the sequencing data of the intestinal microbial flora of the target object; annotating the sequencing data according to the standard gene database to obtain the annotation result; according to the annotation result, analyzing the intestine of the target object The tract microbial flora is evaluated to obtain functional information of the intestinal microbes of the target object. Compared with the related prior art, the above-mentioned method of the present application can not only analyze the relative abundance of genes, but also can analyze the status of the intestinal microbial flora and flora of the target object based on the relative abundance information of each gene. Evaluation, and then obtain relatively complete information about the function of the gut microbes of the target object. That is, the information provided by the above method of the present application is relatively more comprehensive and diversified, and can meet individual needs.

Claims (14)

  1. 一种肠道微生物测序数据的处理方法,其中,所述方法包括:A method for processing intestinal microbial sequencing data, wherein the method includes:
    获取目标对象的肠道微生物菌群的测序数据;Obtain sequencing data of the gut microbial flora of the target object;
    根据标准基因数据库对所述测序数据进行注释,得到注释结果;Annotate the sequencing data according to the standard gene database to obtain an annotation result;
    根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的功能信息。According to the annotation result, the intestinal microbial flora of the target object is evaluated, and the functional information of the intestinal microbe of the target object is obtained.
  2. 根据权利要求1所述的方法,其中,所述目标对象的肠道微生物的功能信息包括肠道微生物的功能基因及各所述功能基因的相对丰度,其中,所述功能基因的相对丰度为所述测序数据中属于同一功能相关的基因的相对丰度的加和与所述测序数据中各功能相关的基因的相对丰度的加和的比值,所述属于同一功能相关的基因的相对丰度根据所述注释结果获得。The method according to claim 1, wherein the functional information of the gut microbes of the target object includes functional genes of the gut microbes and the relative abundance of each of the functional genes, wherein the relative abundance of the functional genes Is the ratio of the sum of the relative abundances of the genes that are related to the same function in the sequencing data to the sum of the relative abundances of the genes that are related to each function in the sequencing data, and the relative abundance of the genes that are related to the same function The abundance is obtained according to the annotation result.
  3. 根据权利要求1或2所述的方法,其中,在获得所述目标对象的肠道微生物的功能信息之后,所述方法还包括:The method according to claim 1 or 2, wherein after obtaining the functional information of the gut microbes of the target object, the method further comprises:
    基于所述功能信息对所述目标对象的肠道微生物进行性能分析,所述性能分析涉及如下至少之一:抗生素抗性基因分析及肠道功能基因分析。Perform performance analysis on the intestinal microbes of the target object based on the functional information, and the performance analysis involves at least one of the following: antibiotic resistance gene analysis and intestinal functional gene analysis.
  4. 根据权利要求3所述的方法,其中,The method of claim 3, wherein
    在所述性能分析涉及抗生素抗性基因分析的情况下,基于所述功能信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道微生物的抗生素抗性指数,确定所述抗生素抗性指数在参考人群中的位置;In the case where the performance analysis involves antibiotic resistance gene analysis, performing performance analysis on the gut microbes of the target object based on the functional information includes: calculating an antibiotic resistance index of the gut microbes of the target object, Determine the position of the antibiotic resistance index in the reference population;
    在所述性能分析涉及肠道功能基因分析的情况下,基于所述功能信息对所述目标对象的肠道微生物的进行性能分析包括:计算所述目标对象的肠道微生物的肠道功能指数,确定所述肠道功能指数在参考人群中的位置。In the case where the performance analysis involves intestinal function gene analysis, the performance analysis of the gut microbes of the target object based on the function information includes: calculating the gut function index of the gut microbes of the target object, The position of the intestinal function index in the reference population is determined.
  5. 根据权利要求4所述的方法,其中,The method according to claim 4, wherein
    所述抗生素抗性指数为抗生素抗性基因的相对丰度,所述抗生素抗性基因的相对丰度按照如下方法计算:计算所述目标对象的肠道微生物的同一种抗生素抗性相关的基因的相对丰度之和;将所述同一种抗生素抗性相关的基因的相对丰度之和除以所有抗生素抗性相关的基因的相对丰度之和,得到所述抗生素抗性基因的相对丰度;The antibiotic resistance index is the relative abundance of antibiotic resistance genes, and the relative abundance of the antibiotic resistance genes is calculated according to the following method: Calculate the genes related to the same kind of antibiotic resistance of the gut microbes of the target object The sum of relative abundance; the sum of the relative abundance of the genes related to the same antibiotic resistance is divided by the sum of the relative abundances of all the genes related to antibiotic resistance to obtain the relative abundance of the antibiotic resistance genes ;
    所述肠道功能指数通过以下方法计算:属于同一功能相关的功能基因的相对丰度的加和,即为所述肠道功能指数。The intestinal function index is calculated by the following method: the sum of the relative abundances of functional genes related to the same function is the intestinal function index.
  6. 根据权利要求4所述的方法,其中,The method according to claim 4, wherein
    在所述性能分析涉及抗生素抗性基因分析,且基于所述功能信息对所述目标对象的肠道微生物的进行性能分析,计算得到所述目标对象的肠道微生物的抗生素抗性指数的情况下,还包括将所述目标对象的肠道微生物的抗生素抗性指数导入参考人群的数据库以用于下一次性能分析步骤中;In the case where the performance analysis involves the analysis of antibiotic resistance genes, and the performance analysis of the gut microbes of the target object is performed based on the function information, and the antibiotic resistance index of the gut microbes of the target object is calculated , Further comprising importing the antibiotic resistance index of the gut microbes of the target object into the database of the reference population for use in the next performance analysis step;
    在所述性能分析涉及肠道功能基因分析,且基于所述功能信息对所述目标对象的肠道微生物的进行性能分析,计算得到所述目标对象的肠道微生物的肠道功能指数的情况下,还包括将所述目标对象的肠道微生物的肠道功能指数导入参考人群的数据库以用于下一次性能分析步骤中。When the performance analysis involves intestinal function gene analysis, and the performance analysis of the intestinal microbes of the target object is performed based on the function information, and the intestinal function index of the intestinal microbes of the target object is calculated , It also includes importing the intestinal function index of the intestinal microbe of the target object into the database of the reference population for use in the next performance analysis step.
  7. 一种肠道微生物测序数据的处理装置,其中,所述装置包括:A processing device for intestinal microbial sequencing data, wherein the device includes:
    第一获取模块,设置为获取目标对象的肠道微生物菌群的测序数据;The first obtaining module is configured to obtain sequencing data of the intestinal microflora of the target object;
    注释模块,设置为根据标准基因数据库对所述测序数据进行注释,得到注释结果;An annotation module, configured to annotate the sequencing data according to the standard gene database to obtain an annotation result;
    第二获取模块,设置为根据所述注释结果,对所述目标对象的肠道微生物菌群进行评估,获得所述目标对象的肠道微生物的功能信息。The second acquisition module is configured to evaluate the intestinal microbial flora of the target object according to the annotation results, and obtain functional information of the intestinal microbes of the target object.
  8. 根据权利要求7所述的装置,其中,所述目标对象的肠道微生物的功能信息包括肠道微生物的功能基因及各所述功能基因的相对丰度,其中,所述功能基因的相对丰度为所述测序数据中属于同一功能相关的基因的相对丰度的加和与所述测序数据中所有功能相关的基因的相对丰度的加和的比值,所述属于同一功能相关的基因的相对丰度根据所述注释结果获得。7. The device according to claim 7, wherein the functional information of the gut microbes of the target object includes functional genes of the gut microbes and the relative abundance of each of the functional genes, wherein the relative abundance of the functional genes Is the ratio of the sum of the relative abundances of genes that are related to the same function in the sequencing data to the sum of the relative abundances of all the functionally related genes in the sequencing data, and the relative abundance of the genes that are related to the same function The abundance is obtained according to the annotation result.
  9. 根据权利要求7或8所述的装置,其中,所述装置还包括:性能分析模块,设置为基于所述功能信息对所述目标对象的肠道微生物的进行性能分析,所述性能分析涉及如下至少之一:抗生素抗性基因分析及肠道功能基因分析。The device according to claim 7 or 8, wherein the device further comprises: a performance analysis module configured to perform performance analysis on the gut microbes of the target object based on the function information, and the performance analysis involves the following At least one of: antibiotic resistance gene analysis and intestinal function gene analysis.
  10. 根据权利要求9所述的装置,其中,The device according to claim 9, wherein:
    在所述性能分析涉及抗生素抗性基因分析的情况下,所述性能分析模块包括:第一计算模块,设置为计算所述目标对象的肠道微生物的抗生素抗性指数,第一位置确定模块,设置为确定所述抗生素抗性指数在参考人群中的位置;In the case that the performance analysis involves antibiotic resistance gene analysis, the performance analysis module includes: a first calculation module configured to calculate the antibiotic resistance index of the gut microbe of the target object, and a first position determination module, Set to determine the position of the antibiotic resistance index in the reference population;
    在所述性能分析涉及肠道功能基因分析的情况下,所述性能分析模块包括:第二计算模块,设置为计算所述目标对象的肠道微生物的肠道功能指数,第二位置确定模块,设置为确定所述肠道功能指数在参考人群中的位置。In the case that the performance analysis involves intestinal function gene analysis, the performance analysis module includes: a second calculation module configured to calculate the intestinal function index of the intestinal microbe of the target object, and a second position determination module, It is set to determine the position of the intestinal function index in the reference population.
  11. 根据权利要求10所述的装置,其中,The device according to claim 10, wherein
    所述第一计算模块包括:第一加和单元,设置为在所述抗生素抗性指数为抗生素抗性基因的相对丰度的情况下,所述第一加和单元用于计算所述目标对象的肠道微生物的同一种抗生素抗性相关的基因的相对丰度之和;相除单元,设置为将所述同一种抗生素抗性相关的基因的相对丰度之和除以所有抗生素抗性相关的基因的相对丰度之和,得到所述抗生素抗性基因的相对丰度;The first calculation module includes: a first summation unit, configured to calculate the target object when the antibiotic resistance index is the relative abundance of antibiotic resistance genes The sum of the relative abundances of genes related to the same antibiotic resistance of the gut microbes; the division unit is set to divide the sum of the relative abundances of the genes related to the same antibiotic resistance by all antibiotic resistance related The sum of the relative abundances of the genes of, to obtain the relative abundance of the antibiotic resistance genes;
    所述第二计算模块包括:第二加和单元,设置为将属于同一功能相关的功能基因的相对丰度的加和,得到所述肠道功能指数。The second calculation module includes: a second summation unit configured to sum the relative abundances of functional genes related to the same function to obtain the intestinal function index.
  12. 根据权利要求10所述的装置,其中,所述性能分析模块还包括:The device according to claim 10, wherein the performance analysis module further comprises:
    第一导入模块,设置为在所述性能分析涉及抗生素抗性基因分析,且所述第一计算模块计算得到所述目标对象的肠道微生物的抗生素抗性指数的情况下,将所述目标对象的肠道微生物的抗生素抗性指数导入参考人群的数据库以用于下一次性能分析步骤中;The first import module is configured to: when the performance analysis involves antibiotic resistance gene analysis, and the first calculation module calculates the antibiotic resistance index of the gut microbe of the target object, the target object The antibiotic resistance index of intestinal microbes imported into the database of the reference population for use in the next performance analysis step;
    第二导入模块,设置为在所述性能分析涉及肠道功能基因分析,且所述第二计算模块计算得到所述目标对象的肠道微生物的肠道功能指数的情况下,将所述目标对象的肠道微生物的肠道功能指数导入参考人群的数据库以用于下一次性能分析步骤中。The second import module is configured to: when the performance analysis involves intestinal function gene analysis and the second calculation module calculates the intestinal function index of the intestinal microbes of the target object, the target object The intestinal function index of the gut microbes is imported into the database of the reference population for use in the next performance analysis step.
  13. 一种存储介质,所述存储介质包括存储的程序,其中,所述程序执行权利要求1至6中任意一项所述的肠道微生物测序数据的处理方法。A storage medium comprising a stored program, wherein the program executes the method for processing gut microbial sequencing data according to any one of claims 1 to 6.
  14. 一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行权利要求1至6中任意一项所述的肠道微生物测序数据的处理方法。A processor for running a program, wherein the method for processing gut microbial sequencing data according to any one of claims 1 to 6 is executed when the program is running.
PCT/CN2019/129426 2019-01-15 2019-12-27 Method and device for processing intestinal microorganism sequencing data, storage medium, and processor WO2020147557A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910038192.8A CN111161795A (en) 2019-01-15 2019-01-15 Intestinal microorganism sequencing data processing method and device, storage medium and processor
CN201910038192.8 2019-01-15

Publications (1)

Publication Number Publication Date
WO2020147557A1 true WO2020147557A1 (en) 2020-07-23

Family

ID=70555610

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/129426 WO2020147557A1 (en) 2019-01-15 2019-12-27 Method and device for processing intestinal microorganism sequencing data, storage medium, and processor

Country Status (2)

Country Link
CN (1) CN111161795A (en)
WO (1) WO2020147557A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111870617B (en) * 2019-11-04 2022-06-10 深圳碳云智能数字生命健康管理有限公司 Method and device for determining intestinal probiotic supplement formula, storage medium and processor
CN112037847A (en) * 2020-09-15 2020-12-04 中国科学院微生物研究所 Microbial strain genome analysis method and device and electronic equipment
CN113611357A (en) * 2020-11-17 2021-11-05 上海美吉生物医药科技有限公司 Resistance gene analysis method, device, medium and terminal based on metagenome

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040142325A1 (en) * 2001-09-14 2004-07-22 Liat Mintz Methods and systems for annotating biomolecular sequences
CN102517392A (en) * 2011-12-26 2012-06-27 深圳华大基因研究院 Metagenome 16S hypervariable region V3 based classification method and device thereof
CN107937582A (en) * 2017-12-29 2018-04-20 苏州普瑞森基因科技有限公司 A kind of primer sets and its application for being used to analyze enteric microorganism
CN108138219A (en) * 2014-09-25 2018-06-08 阿雷斯遗传学有限公司 For predicting the heredity test of klebsiella species combating microorganisms agent resistance

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016050111A1 (en) * 2014-09-30 2016-04-07 Bgi Shenzhen Biomarkers for rheumatoid arthritis and usage thereof
CN105046094B (en) * 2015-08-26 2018-08-14 深圳谱元科技有限公司 The detecting system and its method and dynamic type database of intestinal flora
CN107254518A (en) * 2017-05-24 2017-10-17 中山大学 The quantitative detecting method of enteric bacteria antibiotics resistance gene
CN108804875B (en) * 2018-06-21 2020-11-17 中国科学院北京基因组研究所 Method for analyzing microbial population function by using metagenome data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040142325A1 (en) * 2001-09-14 2004-07-22 Liat Mintz Methods and systems for annotating biomolecular sequences
CN102517392A (en) * 2011-12-26 2012-06-27 深圳华大基因研究院 Metagenome 16S hypervariable region V3 based classification method and device thereof
CN108138219A (en) * 2014-09-25 2018-06-08 阿雷斯遗传学有限公司 For predicting the heredity test of klebsiella species combating microorganisms agent resistance
CN107937582A (en) * 2017-12-29 2018-04-20 苏州普瑞森基因科技有限公司 A kind of primer sets and its application for being used to analyze enteric microorganism

Also Published As

Publication number Publication date
CN111161795A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
Asgari et al. MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples
Xia et al. Hypothesis testing and statistical analysis of microbiome
Jeffery et al. Composition and temporal stability of the gut microbiota in older persons
Kraal et al. The prevalence of species and strains in the human microbiome: a resource for experimental efforts
Tang et al. A general framework for association analysis of microbial communities on a taxonomic tree
WO2020140848A1 (en) Intestinal microbe sequencing data processing method and device, storage medium, and processor
Zhang et al. A distance-based approach for testing the mediation effect of the human microbiome
Volant et al. SHAMAN: a user-friendly website for metataxonomic analysis from raw reads to statistical analysis
WO2020147557A1 (en) Method and device for processing intestinal microorganism sequencing data, storage medium, and processor
Robinson et al. Intricacies of assessing the human microbiome in epidemiologic studies
Carr et al. Reconstructing the genomic content of microbiome taxa through shotgun metagenomic deconvolution
Wani et al. Metagenomics and artificial intelligence in the context of human health
Ni et al. COMAN: a web server for comprehensive metatranscriptomics analysis
Lan et al. POGO-DB—a database of pairwise-comparisons of genomes and conserved orthologous genes
Harrison et al. Fungal microbiomes are determined by host phylogeny and exhibit widespread associations with the bacterial microbiome
Choi et al. Sparsely correlated hidden Markov models with application to genome-wide location studies
Hurtado et al. WGS-Based lineage and antimicrobial resistance pattern of Salmonella Typhimurium isolated during 2000–2017 in Peru
Vokou et al. Metagenomic characterization reveals pronounced seasonality in the diversity and structure of the phyllosphere bacterial community in a Mediterranean ecosystem
Missarova et al. Sensitive cluster-free differential expression testing
Aggarwal et al. Pangenomics in microbial and crop research: progress, applications, and perspectives
Chetty et al. Multi-omic approaches for host-microbiome data integration
Song et al. An adaptive independence test for microbiome community data
Kim Bioinformatic and Statistical Analysis of Microbiome Data
Yadav et al. OTUX: V-region specific OTU database for improved 16S rRNA OTU picking and efficient cross-study taxonomic comparison of microbiomes
An et al. Statistical approach of functional profiling for a microbial community

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19910708

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 20/12/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19910708

Country of ref document: EP

Kind code of ref document: A1