CN114093411B

CN114093411B - Method and equipment for analyzing evolutionary relationship and abundance information of microbial population based on sample

Info

Publication number: CN114093411B
Application number: CN202111430273.6A
Authority: CN
Inventors: 何昆仑; 韩洋; 贾志龙; 宋欣雨; 于康
Original assignee: Chinese PLA General Hospital
Current assignee: Chinese PLA General Hospital
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-08-09
Anticipated expiration: 2041-11-29
Also published as: CN114093411A

Abstract

The invention relates to a method and equipment for analyzing evolutionary relationship and abundance information of microbial populations based on samples. The method comprises the following steps: acquiring genetic information of microbial populations of a sample, constructing a phylogenetic tree of the microbial populations of the sample, and extracting characteristics of an evolutionary relationship among the microbial populations according to the phylogenetic tree; acquiring abundance information of microbial populations of a sample, and extracting characteristics of the abundance information of the microbial populations; fusing features of the evolutionary relationships among the populations of microorganisms with features of the abundance information of the populations of microorganisms to obtain a set of features; and inputting the feature set into a classifier to obtain a classification result of the sample. The method integrates the evolution information of the microorganisms and the abundance information of the microorganisms provided by the microorganism phylogenetic tree, and deeply excavates the life law hidden behind the microorganism data, thereby having important application value.

Description

Method and equipment for analyzing evolutionary relationship and abundance information of microbial populations based on samples

Technical Field

The invention relates to the field of microbial data analysis, in particular to a method, equipment, a system, a computer-readable storage medium and application thereof for analyzing evolutionary relationship and abundance information of microbial populations based on samples.

Background

The microbiome plays an important role in human health and development of disease. In recent years, a great deal of research shows that the composition and structure of intestinal flora are closely related to many chronic systemic metabolic diseases such as diabetes, obesity and the like, and even related to cancer. Microorganisms even affect the developmental maturation of the body's immune system. More and more research suggests that human health is closely related to microorganisms in the body. The variety of human microorganisms is wide, the distribution and abundance in human bodies are greatly different, and researchers face huge challenges when analyzing the microbiology data due to the bulkiness and complexity of the microbiology data. AI provides researchers with a new tool for analyzing microbiome data, which can help us to obtain more associations between microbiome and host health by means of AI, however, at present, the researches on microbes by machine learning and AI mainly aim at the abundance of microbes and DNA sequences thereof, research the relationships between microbes and disease and population characteristics, and do not consider the relationships between microbes.

In recent years, in the field of deep learning, as an emerging graph data learning technology, a graph neural network has attracted extensive attention, and the graph neural network realizes the combination of graph data and deep learning. The graph data is a general data representation method for describing the relationship, and has a wide application scene. The microbial relational graph is analyzed by utilizing the graph neural network, and the data information is further combined, so that the method has strong innovation in the field of life science, and can generate beneficial promotion effect on the research of the field of life science.

Disclosure of Invention

The method integrates the evolution information of the microorganisms and the abundance information of the microorganisms provided by the microorganism phylogenetic tree, and deeply excavates the life law hidden behind the microorganism data to solve the related life science problems.

The application discloses a method for analyzing evolutionary relationship and abundance information of microbial populations based on a sample, comprising:

acquiring genetic information of microbial communities in a sample, constructing a phylogenetic tree of the microbial communities in the sample, and extracting characteristics of the evolutionary relationship among the microbial communities according to the phylogenetic tree;

acquiring abundance information of microbial populations of a sample, and extracting characteristics of the abundance information of the microbial populations;

fusing features of the evolutionary relationships among the populations of microorganisms with features of the abundance information of the populations of microorganisms to obtain a set of features;

and inputting the feature set into a classifier to obtain a classification result of the sample.

Further, the analysis method also comprises the steps of extracting the characteristics of the abundance information of the microbial population, fusing the characteristics of the characteristic set and the characteristics of the abundance information of the microbial population to obtain a fused multi-dimensional characteristic set, and inputting the multi-dimensional characteristic set into a classifier to obtain a classification result of the sample;

optionally, the abundance information of the microbial population is convolved to obtain the feature of the abundance information of the microbial population, the feature set is convolved to obtain the feature of the feature set, and the feature of the abundance information of the microbial population obtained by the convolution and the feature of the feature set obtained by the convolution are fused to obtain the fused multi-dimensional feature set.

Further, the obtaining genetic information of the microbial population of the sample obtains the genetic information of the microbial population of the sample using a method comprising high-throughput sequencing; optionally, the high-throughput sequencing comprises two types: one is based on 16s rDNA, 18s rDNA and ITS area to carry out amplification sequencing; one is metagenomic sequencing;

optionally, the genetic information of the microbial population is the sequence of DNA or protein and/or the structure of DNA or protein of the microbial population, and a phylogenetic tree of the microbial population is constructed by the sequence of DNA or protein and/or the structure of DNA or protein of the microbial population;

optionally, the phylogenetic tree of the microbial population of the constructed sample adopts a phylogenetic tree of the microbial population comprising a distance method, a maximum reduction method, a maximum likelihood method and a bayesian method;

optionally, a relationship matrix among the microbial populations is obtained according to the phylogenetic tree, and the evolutionary relationship among the microbial populations is extracted.

Further, the abundance information of the microbial populations of the sample is obtained based on an OTU analysis;

optionally, after obtaining information on the abundance of the population of microorganisms in the sample, a pre-treatment is performed, including normalization.

Further, the sample is a host of a microbial population or a growth environment of a microbial population; optionally, the host comprises a human, an animal or a plant, and the growing environment comprises soil, water; optionally, the sample is from the mouth, intestine, skin, stomach, esophagus, stool, urethra, blood, eye, nasopharynx, external auditory canal, vagina, or lung of the host.

An analysis apparatus based on evolutionary relationship and abundance information of a population of microorganisms of a sample, the apparatus comprising: a memory and a processor;

the memory is to store program instructions;

the processor is configured to invoke program instructions that, when executed, perform the above-described method of analyzing the evolutionary relationship and abundance information of a population of microorganisms based on a sample.

An analysis system based on evolutionary relationship and abundance information of a population of microorganisms of a sample, comprising:

a first acquisition unit configured to acquire genetic information of microbial populations of a sample, construct a phylogenetic tree of the microbial populations of the sample, and extract characteristics of an evolutionary relationship among the microbial populations based on the phylogenetic tree;

a second acquisition unit configured to acquire abundance information of a microbial population of a sample and extract a feature of the abundance information of the microbial population;

a first fusion unit for fusing the characteristics of the evolutionary relationship among the microbial populations with the characteristics of the abundance information of the microbial populations to obtain a set of characteristics;

and the classification unit is used for inputting the feature set into a classifier to obtain a classification result of the sample.

a first acquisition unit configured to acquire genetic information of microbial populations of a sample, construct a phylogenetic tree of the microbial populations of the sample, and extract a feature of an evolutionary relationship among the microbial populations based on the phylogenetic tree;

the second fusion unit is used for fusing the characteristic set with the characteristic of the abundance information of the microbial population in the second acquisition unit to obtain a fused multi-dimensional characteristic set;

and the classification unit is used for inputting the multi-dimensional feature set into a classifier to obtain a classification result of the sample.

A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the above-described method for analyzing evolutionary relationship and abundance information of a population of microorganisms based on a sample.

Any of the following applications:

the application of the device in diagnosis of the occurrence and development of diseases; optionally the development of said disease is associated with changes in the type and number of microorganisms; alternatively, the disease includes hypertension, obesity, tumors, food allergy, cholelithiasis, urinary incontinence, acne, osteoarthritis, inflammatory bowel disease, type T2 diabetes, constipation, recurrent Urinary Tract Infection (UTI), celiac disease, colitis, kidney disease, neurological disease, and the like.

The use of the apparatus described above to assist in the selection of a disease treatment regimen; alternative such treatment regimens include the selection of therapeutic drugs, whether to administer immunotherapy, etc., which are affected by changes in the type and amount of microorganisms.

The application of the equipment in ecosystem monitoring;

the application of the device in sample classification or prediction of the properties of the sample;

the application of the system in diagnosis of the occurrence and development of diseases; optionally the development of said disease is associated with changes in the type and number of microorganisms;

the use of the system described above in sample classification or predicting attributes of a sample;

use of the system described above to assist in the selection of a disease treatment regimen;

the application of the system in ecosystem monitoring.

The application has the advantages that:

1. the method is characterized in that the life rule hidden behind the microbial data is dug from the deep level, and the accuracy and the depth of data analysis are greatly improved through deep analysis of multiple dimensions such as the evolutionary relationship, abundance information and the like of microbial populations;

2. the method has the advantages that characteristics of the evolutionary relationship among the microbial communities and characteristics of the abundance information of the microbial communities are innovatively fused to obtain characteristic sets, the characteristic sets and the characteristics of the abundance information of the microbial communities are fused again to obtain multi-dimensional characteristic sets, and microbial community data are fully utilized;

3. the application creatively discloses an analysis device and a system based on the evolutionary relationship and abundance information of microbial population of a sample, and the application can be more accurately applied to the fields of auxiliary diagnosis of disease occurrence and development related to the change of the species and the quantity of microbes, auxiliary selection of treatment schemes, ecosystem monitoring, prediction of sample attributes and the like through deep interpretation of microbial population data of the sample.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram of analysis of evolutionary relationship and abundance information of a sample-based population of microorganisms provided by an embodiment of the present invention;

FIG. 2 is a schematic diagram of an analysis apparatus for analyzing information on the evolutionary relationships and abundances of microbial populations based on a sample, according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of an analysis system for sample-based information on the evolutionary relationships and abundances of microbial populations provided by embodiments of the present invention;

fig. 4 is a classification result diagram of the chinese Tibetan data set of the multiple deep learning algorithms provided in the embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.

In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

Fig. 1 is a schematic flow chart of a method for analyzing evolutionary relationship and abundance information of microbial populations based on a sample, which includes the following steps:

101: acquiring genetic information of microbial communities in a sample, constructing a phylogenetic tree of the microbial communities in the sample, and extracting characteristics of the evolutionary relationship among the microbial communities according to the phylogenetic tree;

in one embodiment, obtaining genetic information of a population of microorganisms of a sample obtains genetic information of a population of microorganisms of a sample using a method comprising high-throughput sequencing; alternatively, high throughput sequencing includes two categories: one is based on 16s rDNA, 18s rDNA, ITS zone for sequencing; one is metagenomic sequencing.

In one embodiment, the genetic information of a microbial population is the sequence of DNA or protein and/or the structure of DNA or protein of the microbial population from which the phylogenetic tree of a microbial population is constructed.

In one embodiment, the phylogenetic tree of microbial populations of the constructed sample employs a phylogenetic tree of microbial populations comprising a distance method, a maximum reduction method, a maximum likelihood method, and a bayesian method; optionally, a relationship matrix among the microbial populations is obtained according to the phylogenetic tree, and the evolutionary relationship among the microbial populations is extracted.

In one embodiment, a relationship matrix between populations of microorganisms is obtained from the phylogenetic tree, and a graph of relationships between populations of microorganisms is constructed using a graph-related software package (e.g., NetworkX) to extract evolutionary relationships between the populations of microorganisms.

In one embodiment, genetic information is obtained for a population of microorganisms in a sample, a phylogenetic tree is constructed for the population of microorganisms in the sample, a graph of relationships between the populations of microorganisms is obtained, and a graph neural network is used to extract features of the evolutionary relationships between the populations of microorganisms. Alternatively, the Graph neural network may be a Graph convolution neural network (GCN), a Graph attention network (GAT, Graph LSTM, etc.

16S rDNA sequencing: the 16S rDNA is a DNA sequence for coding the RNA of the small subunit of the ribosome of the prokaryote, a HiSeq sequencer or a latest MiSeq sequencer is adopted to sequence a certain hypervariable region of the 16S rDNA, and the diversity of bacteria or archaea in environmental microorganisms is further analyzed.

18S rDNA sequencing: 18S rDNA is a DNA sequence coding eukaryotic ribosome small subunit rRNA. Structurally, the biological sample is divided into a conserved region and a hypervariable region, wherein the conserved region reflects the genetic relationship among biological species, and the hypervariable region reflects the difference among the species.

ITS sequencing: the ITS is divided into two regions: ITS1/ITS2, wherein ITS1 is located between eukaryotic ribosomal rDNA sequences 18S and 5.8S, and ITS2 is located between eukaryotic ribosomal rDNA sequences 5.8S and 28S. Sequencing ITS1 or ITS2 for further analysis of fungal diversity in environmental microorganisms

Metagenomics (Metagenomics), also known as Metagenomics, refers to the DNA of the entire microbial community being studied simultaneously. Metagenomic sequencing is the sequencing of the genome of a population of microorganisms in a sample.

In one embodiment, the sample is a host of a population of microorganisms or a growing environment of a population of microorganisms; optionally, the host comprises a human, an animal or a plant, and the growing environment comprises soil, water; optionally, the sample is from the mouth, intestine, skin, stomach, esophagus, stool, urethra, blood, eye, nasopharynx, external auditory canal, vagina, or lung of the host.

Sequencing can now be performed using a variety of different sequencers, including Roche 454, Illumina's Novoseq, Miseq, Hiseq, Life PGM or Pacbio and nanopore's third generation sequencers.

The phylogenetic tree of the microbial community for constructing the sample adopts a phylogenetic tree of the microbial community for constructing the sample by adopting a distance method, a maximum reduction method, a maximum likelihood method and a Bayesian method.

The distance method is represented by a Neighbor-Joining (NJ) method, and the NJ method is suitable for constructing a phylogenetic tree of short sequences with small evolutionary distance and few information sites.

The maximal reduction method is based on the hypothesis that the minimum number of nucleotide (or amino acid) substitutions required during the evolution process is calculated for all possible correct topologies and the topology with the minimum number of required substitutions is selected as the optimal phylogenetic tree, i.e. by comparing all possible trees, the tree with the smallest length is selected as the final phylogenetic tree, i.e. the maximal reduction tree.

The first application of maximum likelihood to phylogenetic analysis was in the analysis of gene frequency data, which consisted of accumulating the probability of all possible residue substitutions at each position, taking into account the likelihood of the occurrence of a residue at each position, to generate a likelihood for a particular position. The ML (maximum likelihood) method calculates the likelihood function for all possible phylogenetic trees, and the tree with the maximum likelihood function value is the most possible phylogenetic tree.

Bayesian (Bayesian) methods use the monte carlo method of markov chains to generate estimates of the posterior probability (spatial probability) of all parameters, including the topology, branch length, and estimates of the parameters of the surrogate model, according to various molecular evolution models. The method not only can directly quantize the parameters of the model, but also can analyze a large data set, and the credibility of each branch is represented by the posterior probability without the detection of a self-guiding method (bootstrap).

And (3) construction and display steps of the phylogenetic tree:

(1) preparing data: data commonly used to construct phylogenetic trees include morphological data and molecular data. The morphological data is mainly obtained by encoding morphological characters; molecular data is mainly downloaded by sequencing or the public database GeBank.

(2) And (3) processing data: comprises sequence splicing, sequence comparison, and disputed site correction;

(3) the best model is typically evaluated before the phylogenetic tree is constructed. The software used is ModelTest, MrModelTest, jModelTest, etc.

(4) The method comprises the following steps: commonly used methods for constructing phylogenetic trees, including distance method, Maximum reduction Method (MP), Maximum Likelihood Method (ML), and Bayesian Inference (BI) (Hall,2008), can be used.

(5) Displaying: common software for editing and displaying phylogenetic tree diagrams are TreeView, FigTree, MEGA, ITOL (http:// ITOL. embl. de /), R package (ggtree, APE), etc.

102: acquiring abundance information of microbial populations of a sample, and extracting characteristics of the abundance information of the microbial populations;

in one embodiment, the abundance information of the microbial population of the sample is obtained based on an OTU-based assay.

Otu (operational taxonomic units) is the same marker that is manually set for a certain classification unit (line, species, genus, group, etc.) in phylogenetic studies or population genetics studies for the convenience of analysis. Sequences are typically classified into different OTUs according to a similarity threshold of 97%, each OTU being generally considered a microbial species. A similarity of less than 97% can be considered as belonging to a different species, a similarity of less than 93% -95% can be considered as belonging to a different genus.

In one embodiment, after obtaining information on the abundance of microbial populations of the sample, a pre-treatment is performed, including normalization.

103: fusing features of the evolutionary relationships among the populations of microorganisms with features of the abundance information of the populations of microorganisms to obtain a set of features;

in one embodiment, genetic information of a population of microorganisms of a sample is obtained, a phylogenetic tree of the population of microorganisms of the sample is constructed, a graph of relationships among the population of microorganisms is obtained, features of the evolutionary relationships among the population of microorganisms are extracted using a graph neural network, abundance information of the population of microorganisms of the sample is obtained, and the features of the abundance information of the population of microorganisms are extracted; and fusing the characteristics of the evolutionary relationship among the microbial population and the characteristics of the abundance information of the microbial population by using a graph neural network to obtain a characteristic set. Alternatively, the Graph neural network may be a Graph convolution neural network (GCN), a Graph attention network (GAT, Graph LSTM, etc.

104: and inputting the feature set into a classifier to obtain a classification result of the sample.

In one embodiment, the feature set is input into a pre-trained classifier to obtain a classification result of the sample.

In one embodiment, the classifier may be any one of a random forest model, a decision tree model, a logistic regression model, a Support Vector Machine (SVM) model, and a neural network model, which is not limited herein.

In one embodiment, the training of the classifier may be: obtaining different types of labeled samples (such as diseased samples and normal samples, samples in different disease stages, soil samples in different periods and the like), repeating the characteristic extraction process and the fusion process to obtain a characteristic set or a multi-dimensional characteristic set, inputting the characteristic set or the multi-dimensional characteristic set into a classifier to obtain a classification result of the samples, calculating the loss between the classification result and a real value by using a loss function, then performing back propagation, and updating parameters by using an optimizer to obtain the trained classifier.

In one embodiment, the analysis method further comprises extracting the characteristics of the abundance information of the microbial population, fusing the characteristics of the characteristic set and the characteristics of the abundance information of the microbial population to obtain a fused multi-dimensional characteristic set, and inputting the multi-dimensional characteristic set into a classifier to obtain the classification result of the sample.

In one embodiment, convolving the abundance information of the population of microorganisms obtains the features of the abundance information of the population of microorganisms, convolving the set of features obtains the features of the set of features, fusing the features of the abundance information of the population of microorganisms obtained by the convolving and the features of the set of features obtained by the convolving to obtain a fused set of multidimensional features.

In one embodiment, the sample is a host for a microbial population or a growing environment for a microbial population; optionally, the host comprises a human, an animal or a plant, and the growing environment comprises soil, water; optionally, the sample is from the mouth, intestine, skin, stomach, esophagus, stool, urethra, blood, eye, nasopharynx, external auditory canal, vagina, or lung of the host.

In one example, the above method is applied to the analysis of the Hanzang population, and for the metagenomic data of the intestinal microorganisms of 363 samples (102 samples of Tibetan living in the plateau, 92 samples of Han living in the plain, 81 samples of Han living in the plateau back to Han living in the plain, 67 samples of Han living in 1 week from the plain to the plateau, and 21 samples of Han living in the plateau), a phylogenetic tree is constructed by the maximum likelihood method for the DNA sequence of the microorganism population of each person obtained, a graph of the relationship among the microorganisms is constructed using a graph-related software package (e.g., NetworkX), the characteristics of the evolutionary relationship among the microorganism populations are extracted, Laplacian regularization is performed, and GCN integration is performed with the characteristics of the abundance information of the microorganism population to obtain a microorganism population set fusing the characteristics of the evolutionary relationship among the microorganism populations and the abundance information of the microorganism population, and performing graph convolution on the feature set, simultaneously performing convolution on the features of the abundance information of the microbial population, combining the two convolution results in the channel dimension, performing full connection, and predicting the population type of each sample. The method and the device take the prediction accuracy of the crowd type as a measurement index to evaluate the effect of different models (the model, the AutoGenome model and the SVM model) in the case. After ten-fold cross validation of the data, the accuracy of the prediction of the results of the model of the application (Phylo-GDL) was 0.802, which is higher than 0.769 for AutoGenome and 0.647 for SVM, see fig. 4.

In one embodiment, the method is used for disease occurrence development diagnosis analysis, metagenomic data of intestinal microorganisms of a sample (patients with different tumor stages) is obtained, a phylogenetic tree of microorganism populations of the sample is constructed, and characteristics of evolutionary relationships among the microorganism populations are extracted according to the phylogenetic tree; acquiring abundance information of microbial populations of a sample, and extracting characteristics of the abundance information of the microbial populations; fusing features of the evolutionary relationships among the populations of microorganisms with features of the abundance information of the populations of microorganisms to obtain a set of features; and inputting the characteristic set into a classifier to obtain a classification result of the sample, wherein the classification result is the tumor stage of the patient.

In one example, where the above method is used in conjunction with analysis in the selection of a disease treatment protocol, the sample may be a patient in need of immunotherapy and the classification of whether immunotherapy is appropriate is given based on the genetic information and abundance information of the microbial population of the sample.

In one example, using the above method for ecosystem monitoring, the samples can be soil containing microorganisms at different periods, and the results of classification of the ecological environment change are given according to the genetic information and abundance information of the microorganism population of the samples.

Fig. 2 is an analysis apparatus for analyzing the evolutionary relationship and abundance information of microbial populations based on a sample, according to an embodiment of the present invention, the apparatus including: a memory and a processor;

the memory is to store program instructions;

Fig. 3 is a system for analyzing evolutionary relationship and abundance information of a microbial population based on a sample, according to an embodiment of the present invention, including:

a first acquisition unit 301 configured to acquire genetic information of microbial populations of a sample, construct a phylogenetic tree of the microbial populations of the sample, and extract characteristics of an evolutionary relationship between the microbial populations based on the phylogenetic tree;

a second acquisition unit 302 for acquiring abundance information of a microbial population of a sample, and extracting a feature of the abundance information of the microbial population;

a first fusion unit 303 configured to fuse the characteristics of the evolutionary relationship between the microbial populations and the characteristics of the abundance information of the microbial populations to obtain a characteristic set;

and the classification unit 304 is configured to input the feature set into a classifier, so as to obtain a classification result of the sample.

The embodiment of the invention provides an analysis system for the evolutionary relationship and abundance information of microbial populations based on samples, which comprises:

The validation results of this validation example show that assigning an intrinsic weight to an indication can moderately improve the performance of the method relative to the default settings.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by hardware that is instructed to implement by a program, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

While the invention has been described in detail with reference to specific embodiments thereof, it will be apparent to one skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims

1. A method for analyzing evolutionary relationship and abundance information of a population of microorganisms based on a sample, comprising:

acquiring genetic information of microbial populations of a sample, constructing a phylogenetic tree of the microbial populations of the sample to obtain a map of the relationship among the microbial populations, and extracting the characteristics of the evolutionary relationship among the microbial populations by using a neural network of the map;

2. The method of claim 1, further comprising extracting the features of the abundance information of the microbial population, fusing the features of the abundance information of the microbial population with the features of the feature set to obtain a fused multi-dimensional feature set, and inputting the multi-dimensional feature set into a classifier to obtain the classification result of the sample.

3. The method of claim 1, wherein the method comprises convolving the abundance information of the population of microorganisms to obtain the feature of the abundance information of the population of microorganisms, convolving the feature set to obtain the feature of the feature set, and fusing the feature of the abundance information of the population of microorganisms obtained by the convolution and the feature of the feature set obtained by the convolution to obtain the fused multi-dimensional feature set.

4. The method of claim 1, wherein the obtaining genetic information of the population of microorganisms in the sample obtains genetic information of the population of microorganisms in the sample using a method comprising high throughput sequencing.

5. The method of claim 4, wherein the high throughput sequencing comprises two categories: one is based on 16s rDNA, 18s rDNA, ITS zone for sequencing; one is metagenomic sequencing.

6. The method according to claim 1, wherein the genetic information of the microbial population is a DNA or protein sequence and/or a DNA or protein structure of the microbial population, and the DNA or protein sequence and/or DNA or protein structure of the microbial population is used to construct a phylogenetic tree of the microbial population.

7. The method of claim 1, wherein the phylogenetic tree of microbial populations constructing the sample employs a phylogenetic tree of microbial populations that includes distance, maximum reduction, maximum likelihood, and bayesian approaches to construct the sample.

8. The method of claim 1, wherein the phylogenetic tree is used to obtain a relationship matrix between microbial populations and the evolutionary relationships between the microbial populations are extracted.

9. The method of claim 1, wherein the sample-based analysis of the evolutionary relationship and abundance information of microbial populations is obtained from an OTU-based analysis.

10. The method of claim 1, wherein the sample is subjected to a pre-treatment after the information on the abundance of the population of microorganisms is obtained, wherein the pre-treatment comprises normalization.

11. The method of claim 1, wherein the sample is a host of a population of microorganisms or a growing environment of a population of microorganisms.

12. The method of claim 11, wherein the host comprises a human, animal or plant and the growing environment comprises soil, water.

13. The method of claim 1, wherein the sample is from the oral cavity, intestinal tract, skin, stomach, esophagus, stool, urethra, blood, eye, nasopharyngeal cavity, external auditory canal, vagina, or lung of the host.

14. An analysis apparatus based on evolutionary relationship and abundance information of a population of microorganisms of a sample, the apparatus comprising: a memory and a processor;

the memory is to store program instructions;

the processor is configured to invoke program instructions that, when executed, perform the method of analyzing the evolutionary relationship and abundance information of a population of sample-based microorganisms of any one of claims 1-13.

15. An analysis system based on evolutionary relationship and abundance information of a population of microorganisms of a sample, comprising:

a first acquisition unit configured to acquire genetic information of microbial populations of a sample, construct a phylogenetic tree of the microbial populations of the sample, obtain a map of relationships among the microbial populations, and extract features of evolutionary relationships among the microbial populations using a neural network of the map;

16. An analysis system based on evolutionary relationship and abundance information of a population of microorganisms of a sample, comprising:

17. A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of analyzing the evolutionary relationship and abundance information of a population of sample-based microorganisms of any one of claims 1-13 above.

18. Use of the device of claim 14 for ecosystem monitoring.

19. Use of the apparatus of claim 14 in sample classification or predicting properties of a sample.

20. Use of the system of any of claims 15-16 for sample classification or predicting properties of a sample.

21. Use of the system of any one of claims 15-16 for ecosystem monitoring.