CN110111846A - Determine the method and apparatus of environmental factor Yu Bacterial community and functional dependency - Google Patents

Determine the method and apparatus of environmental factor Yu Bacterial community and functional dependency Download PDF

Info

Publication number
CN110111846A
CN110111846A CN201910334811.8A CN201910334811A CN110111846A CN 110111846 A CN110111846 A CN 110111846A CN 201910334811 A CN201910334811 A CN 201910334811A CN 110111846 A CN110111846 A CN 110111846A
Authority
CN
China
Prior art keywords
flora
strains
correlation
operable
environmental factors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910334811.8A
Other languages
Chinese (zh)
Other versions
CN110111846B (en
Inventor
宁康
陈超云
何睿乔
成章昱
韩毛振
查毓国
杨朋硕
姚奇
周豪
钟朝芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Ezhou Institute of Industrial Technology Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Ezhou Institute of Industrial Technology Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology, Ezhou Institute of Industrial Technology Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201910334811.8A priority Critical patent/CN110111846B/en
Publication of CN110111846A publication Critical patent/CN110111846A/en
Application granted granted Critical
Publication of CN110111846B publication Critical patent/CN110111846B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medical Informatics (AREA)
  • Organic Chemistry (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Microbiology (AREA)
  • Public Health (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The embodiment of the invention provides the method and apparatus of a kind of determining environmental factor and Bacterial community and functional dependency.Wherein, the described method includes: obtaining the marker gene sequence of all strains in flora sample using second generation high-flux sequence, analyzing the marker gene sequence, obtain the strain can activity classification unit, and using R language to it is described can activity classification unit visualize;According to the strain can activity classification unit, to the flora sample carry out difference analysis, obtain the Bacterial community of the flora sample, in conjunction with semiclosed environment characteristic parameters, determine the Bacterial community and flora function, the correlation with environmental factor.The method and apparatus of determining environmental factor and Bacterial community and functional dependency provided in an embodiment of the present invention more can accurately determine the correlation of environmental factor and Bacterial community and function.

Description

Method and equipment for determining correlation between environmental factors and flora structure and function
Technical Field
The embodiment of the invention relates to the technical field of microbiome and biological information, in particular to a method and equipment for determining correlation between environmental factors and flora structure and function.
Background
In recent years, with the development of sequencing technology, modern genomics, proteomics, metabonomics and other "omics" theories, the introduction of system biology visual angles and the application of bioinformatics, the development of microbial genomics is greatly promoted, so that the study of microecology is greatly enriched. Research has shown that microbiome plays a very important role in the geochemical cycle, and also plays a crucial role in local chemical cycles in soil, oceans, and lakes. Meanwhile, disruption of habitat, climatic factors, geographical factors in cities affect the structural composition of microbial communities and the functions of microorganisms. In addition, human activity can have a large impact on the microbial community of the environment within a residence, and the density of human activity can also affect the state of the microbial community in the environment. An increasing number of studies of the association between the microbiome and environmental factors have shown that environmental microorganisms and human activities are interacting.
The metagenome research method based on the high-throughput sequencing technology is one of the most effective and important methods for recognizing and analyzing the structure and function of a biological mixed system at present. The method can comprehensively detect the types and the compositions of the microorganisms in the microorganism samples, so that the community composition and the functional information of the microorganisms in the environment can be easily obtained, and the potential interaction between the microorganisms and the environment and between human beings can be researched. The development of the theory of genomics, proteomics, metabonomics and the like, the improvement of a multi-group combined analysis technology and the growing emphatic study of microbiology in micro-environment. With the convenience of high-throughput sequencing and the big data era brought by the rapid development of high-performance computation, the interaction network of human-microorganism-environment is improved to a certain extent. However, factors affecting microbial communities are various, and existing problem-oriented scientific research is usually focused on one or a few points, and the longitudinal analysis of the microbial systematization in the semi-closed environment is rarely reported. Therefore, finding a method for determining the correlation between the semi-closed environment and the flora structure and the flora function is a technical gap to be filled in the industry.
Disclosure of Invention
In order to effectively make up for the technical gap in the prior art, the embodiment of the invention provides a method and equipment for determining the correlation between environmental factors and flora structure and function.
In a first aspect, embodiments of the present invention provide a method for determining the structural and functional relevance of environmental factors to flora, comprising: adopting second-generation high-throughput sequencing to obtain marker gene sequences of all strains in a flora sample, analyzing the marker gene sequences to obtain operable classification units of the strains, and visualizing the operable classification units by adopting R language; and performing difference analysis on the flora samples according to the operable classification units of the strains to obtain the flora structure of the flora samples, and determining the correlation between the flora structure and the flora function and environmental factors by combining semi-closed environmental characteristic parameters.
Further, the analyzing the marker gene sequence to obtain the operable classification unit of the strain comprises: and analyzing the sequence of the marker gene by using a MOthur and Qiime bioinformatics tool to obtain an operable classification unit of the strain.
Further, the performing difference analysis on the flora sample according to the operable classification unit of the strain to obtain the flora structure of the flora sample includes: obtaining relative abundance composition information of the flora samples by contrasting a GreenGene database according to the operable classification units of the strains, grouping the flora samples according to the relative abundance composition information, and performing difference analysis on the relative abundance composition information and the operable classification units of the strains by adopting a principal coordinate analysis method and a principal component analysis method to obtain the flora structure of the flora samples.
Further, the principal coordinate analysis method includes: and respectively calculating the distance between strains to obtain the Euclidean distance between the strains according to the relative abundance composition information of the flora sample and the operable classification unit of the strains by adopting an Unweighted _ Unifrac method, and obtaining the Pearson correlation of the relative abundance composition information according to the Euclidean distance.
Further, the semi-closed environment characteristic parameters comprise: atmospheric pressure, human activity, and temperature.
Further, the correlation between the flora structure and the flora function and environmental factors is determined by combining the semi-closed environment characteristic parameters, and comprises the following steps: Z-Score standardization is carried out on the flora structure, the atmospheric pressure, the human activity and the temperature of the flora sample, and Mantel inspection is carried out on the standardized result to obtain the atmospheric pressure, the human activity and the temperature as well as the influence effect indexes on the flora structure and the flora function of the flora sample.
In a second aspect, embodiments of the present invention provide an apparatus for determining the structural and functional relevance of environmental factors to flora, comprising:
the system comprises a strain operable classification unit acquisition module, a strain operable classification unit acquisition module and a strain identification module, wherein the strain operable classification unit acquisition module is used for acquiring marker gene sequences of all strains in a flora sample by adopting second-generation high-throughput sequencing, analyzing the marker gene sequences to obtain operable classification units of the strains and visualizing the operable classification units by adopting R language;
and the correlation determination module is used for performing difference analysis on the flora samples according to the operable classification units of the strains to obtain the flora structure of the flora samples, and determining the correlation between the flora structure and the flora function and the environmental factors by combining with semi-closed environmental characteristic parameters.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor to perform a method of determining the structural and functional relevance of environmental factors to flora provided in any of the various possible implementations of the first aspect.
In a fourth aspect, embodiments of the present invention provide a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method of determining a structural and functional relevance of environmental factors to a population of bacteria, as provided in any of the various possible implementations of the first aspect.
According to the method and the equipment for determining the correlation between the environmental factors and the flora structure and function, provided by the embodiment of the invention, the operable classification unit of the strains is obtained by analyzing the marker gene sequences of all strains in the flora sample, the flora structure of the flora sample is obtained on the basis, and the correlation between the environmental factors and the flora structure and function can be more accurately determined by combining the semi-closed environment characteristic parameters.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below to the drawings required for the description of the embodiments or the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of a method for determining the correlation between environmental factors and flora structure and function according to an embodiment of the present invention;
FIG. 2 is a graph showing the relative abundance of species at these levels in a population sample provided in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating the relationship between the structure of bacteria and temperature in a semi-closed environment according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the structural diversity of the bacterial communities among CCDLs and within the semi-closed environment according to the present invention;
FIG. 5 is a schematic diagram illustrating the structural and functional stability relationships of a flora in a semi-enclosed environment according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating the influence of multiple factors on the function of a bacterial colony in a semi-closed environment according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of an apparatus for determining correlation between environmental factors and flora structure and function according to an embodiment of the present invention;
fig. 8 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. In addition, technical features of various embodiments or individual embodiments provided by the invention can be arbitrarily combined with each other to form a feasible technical solution, but must be realized by a person skilled in the art, and when the technical solution combination is contradictory or cannot be realized, the technical solution combination is not considered to exist and is not within the protection scope of the present invention.
An embodiment of the present invention provides a method for determining correlation between environmental factors and flora structure and function, and referring to fig. 1, the method includes:
101. adopting second-generation high-throughput sequencing to obtain marker gene sequences of all strains in a flora sample, analyzing the marker gene sequences to obtain operable classification units of the strains, and visualizing the operable classification units by adopting R language;
102. and performing difference analysis on the flora samples according to the operable classification units of the strains to obtain the flora structure of the flora samples, and determining the correlation between the flora structure and the flora function and environmental factors by combining semi-closed environmental characteristic parameters.
On the basis of the above embodiments, the method for determining correlation between environmental factors and flora structure and function provided in the embodiments of the present invention, where the analyzing the marker gene sequence to obtain the operable classification unit of the bacterial species, includes: and analyzing the sequence of the marker gene by using a MOthur and Qiime bioinformatics tool to obtain an operable classification unit of the strain.
On the basis of the foregoing embodiments, the method for determining correlation between environmental factors and flora structure and function provided in the embodiments of the present invention includes, in the step of performing differential analysis on the flora sample according to the operable classification unit of the bacterial species to obtain the flora structure of the flora sample, including: obtaining relative abundance composition information of the flora samples by contrasting a GreenGene database according to the operable classification units of the strains, grouping the flora samples according to the relative abundance composition information, and performing difference analysis on the relative abundance composition information and the operable classification units of the strains by adopting a principal coordinate analysis method and a principal component analysis method to obtain the flora structure of the flora samples.
On the basis of the above embodiments, the method for determining the correlation between environmental factors and flora structure and function provided in the embodiments of the present invention includes: and respectively calculating the distance between strains to obtain the Euclidean distance between the strains according to the relative abundance composition information of the flora sample and the operable classification unit of the strains by adopting an Unweighted _ Unifrac method, and obtaining the Pearson correlation of the relative abundance composition information according to the Euclidean distance.
On the basis of the above embodiments, the method for determining the correlation between environmental factors and the structure and function of the flora provided in the embodiments of the present invention includes: atmospheric pressure, human activity, and temperature.
On the basis of the above embodiments, the method for determining the correlation between environmental factors and the structure and function of flora provided in the embodiments of the present invention, which combines the semi-closed environment characteristic parameters to determine the correlation between the structure and function of flora and environmental factors, includes: Z-Score standardization is carried out on the flora structure, the atmospheric pressure, the human activity and the temperature of the flora sample, and Mantel inspection is carried out on the standardized result to obtain the atmospheric pressure, the human activity and the temperature as well as the influence effect indexes on the flora structure and the flora function of the flora sample.
According to the method for determining the correlation between the environmental factors and the flora structure and function, provided by the embodiment of the invention, the operable classification units of the strains are obtained by analyzing the marker gene sequences of all the strains in the flora sample, the flora structure of the flora sample is obtained on the basis, and the correlation between the environmental factors and the flora structure and function can be determined more accurately by combining the semi-closed environmental characteristic parameters.
In order to more clearly illustrate the essence of the technical solution of the present invention, on the basis of the above-mentioned embodiments, an overall embodiment is proposed, which shows the overall view of the technical solution of the present invention. It should be noted that the whole embodiment is only for further embodying the technical essence of the present invention, and is not intended to limit the scope of the present invention, and those skilled in the art can obtain any combination type technical solution meeting the essence of the technical solution of the present invention by combining technical features based on the various embodiments of the present invention, and as long as the combined technical solution can be practically implemented, the combined technical solution is within the scope of the present patent.
1) Collecting flora samples in semi-closed environment of campus of science and university in Huazhong
Selecting a sampling area:
areas with strong human activity laws-classrooms, dining halls, student apartments and libraries;
middle areas of human activity laws-campus bus stations, track and field sites, school hospitals, school doors, international academic exchange centers;
areas of weaker human activity-mountain, landscape lakes.
Collecting a sample:
and (3) collecting flora samples in a time sequence manner by adopting a non-destructive sampling method. An included angle of about 20 degrees is formed between the sterile cotton swab soaked by the physiological saline and the ground, the sterile cotton swab rotates towards one direction while repeatedly wiping the ground surface, the ground surface is wiped while moving, one cotton swab wipes the ground surface with the square of 20 centimeters, the ground surface with the square of 1 meter at each sampling point is repeated for 5 times, the ground surface is respectively distributed at the top point and the central point, and finally, 5 times of sampling are carried out to combine the two into a microorganism sample. Samples were taken once every other day for a total of three times for each quarter, with a total of 6 quarters collected. In 2016, samples are collected three times in every quarter in spring and autumn in 2017, samples are collected once in summer before and after a hundred-year rainstorm, and in 2017, in spring and summer in 2018, samples are collected once in each of winter and summer. Specific sample information is shown in table 1.
TABLE 1 sample conditions
2) Extracting DNA genome, carrying out PCR amplification and sequencing, and extracting the genome DNA of the sample by adopting an improved CTAB method. The extraction method comprises the following steps:
(1) cutting cotton swab sample with sterile scissors, adding 500mg sterile glass beads with diameter of 2-3mm, adding 1ml dissolving solution [0.1M Tris-HCl (pH 8.0),0.02M EDTA ], and vortexing for 2 min;
(2)500uL lysis buffer, [0.1M Tris-HCl (pH 8.0),0.02M EDTA (pH 8.0), 2% CTAB and 1.4M NaCl ];
(3) adding 400uL of 10% SDS solution prepared in situ with sterile water;
(4) add 10. mu.L of protease K (Sigma, MO, USA) +10ul RNase at a density of 10 mg/mL;
(5) adding β -Mercaptoethanol of 100 μ L, and vortexing to mix completely;
(6) cracking at 65 deg.C for 1.5 hr, mixing uniformly for several times, 10 min/time, 5 s each time;
(7) an equal volume (1mL) of phenol was added: chloroform: isoamyl alcohol (25:24:1), mixed by inversion, centrifuged at 12,500rpm for 10min, and the supernatant aspirated and transferred to a new Ep tube. Repeating the operation once;
(8) an equal volume (1ml) of chloroform was added: isoamyl alcohol (24:1), evenly mixing, centrifuging at 12,500rpm for 10min, sucking supernatant fluid and transferring to a new Ep tube;
(9) adding 0.6 times volume of precooled isopropanol into the supernatant, precipitating for 30min at-20 ℃, centrifuging, and removing the supernatant; adding 1mL of (-20 ℃ precooling) 75% ethanol, washing twice, centrifuging at 12,500rpm for 1min, and removing supernatant;
(7) volatilizing ethanol, adding 20 mu L of TE buffer, and dissolving overnight;
performing PCR amplification on a 16S rRNA sequence in the DNA;
constructing a sequencing library for the amplified fragment and performing high-throughput sequencing;
3) obtaining operational taxon statistics
And (3) obtaining the marker gene sequences of all strains in the flora sample by adopting a second generation high-throughput sequencing method. And analyzing the obtained gene sequences by means of bioinformatics tools such as MOthur, Qiime and the like to obtain the statistical information of the operable classification units of the strains in each flora sample.
4) Processing statistical information
And visualizing the result by using R language drawing software. Referring specifically to fig. 2, the distribution of the species at the level of 22 phyla (i.e., k _ bacteria. p _ proteobacteria. c _ alphaproteobacteria. o _ sphinganals. f _ sphinganeae to k _ bacteria. p _ cyanobacter. c _ oscillatorialphy. o _ oscillatoria. f _ Phormidiaceae) among the total population samples is shown in fig. 2, wherein the species at the level of 22 phyla (i.e., from k _ bacteria. p _ proteobacteria. c _ oscillatorialp. o _ oscillatoria. f _ Phormidiaceae) are classified as "other" species than the 22 species with relatively high abundance, the different colors represent different species, and the same colors represent the same species classified at the phyla level. From fig. 2, it can be seen that the flora sample structure is not significantly different between different sampling points, but different seasons have different flora structures. Referring specifically to fig. 3, (a-F) in fig. 3 indicate (a)2015 winter, (B)2016 spring, (C)2016 summer, (D)2016 autumn, (E)2016 winter, (F)2017 spring, respectively, each quarterly flora sample is an OUT network based on spearman correlation, and the size of the dots in the graph is proportional to the relative abundance of detected OUT. The scatter diagrams (G) and (H) both calculate the euclidean distances of the data in different quarters with the data in winter of 2015 as the origin, each data point (five-pointed star) representing a quarter, and the line is the fitted line. (G) Is the species composition and (H) is the temperature at each sample point. (I) Shown is a linear relationship between temperature and the composition of the species in the flora sample. The flood graph (J) is seasonal strain markers calculated based on LEfSe, the height of the gray columns represents the abundance of the species, and the vertical boxes represent different strains. The various combinations of circles and lines represent: flavobacterium, Flavobacterium; acinetobacter, Acinetobacter; chryseobacterium, genus chrysobacillus; erwinia, Erwinia. The filled circles represent different bacterial groups, Cluster1 to Cluster 6. As can be seen from the OUT network structure (fig. 3.(a-F)), the composition of the flora sample varied between seasons, and seasonal representative species could be found by biomarker analysis, as shown in fig. 3. (J). Based on the analysis of the spearman distance calculation method, the composition of the flora sample and the actually measured temperature of each sampling point show regular changes (fig. 3 (G) and 3 (H)), and the spearman distance obtained by the composition of the flora sample changing seasonally has linear correlation with the temperature, as shown in fig. 3 (I). The situation in campus of science and technology university in china can be seen in fig. 4, a) the method of principal component analysis demonstrates differences in the composition of the flora between CCDL, well known mountain and lake, three classes of samples (24.03% for PC1, 12.06% for PC2, 9.73% for PC3, respectively). (B) CCDL is hierarchically clustered with each other based on the level of the hierarchy, namely teaching building, dining hall, library and dormitory (the four parts are the same as those in (C), and the vertical axis is height). (C) Different samples are seasonally classified into two categories, namely "W" and "E", according to characteristics of campus environment, with a thick black line in fig. 4 (C) as a boundary. Each panel has six columns, from left to right, 2016 spring, 2016 summer, 2016 autumn, 2016 winter, 2017 spring, 2017 summer, and optionally 22 strains on the level of phyla (see fig. 4 for a name listing in the bottom right corner, not to be repeated herein). On the colony structure of microorganisms, there is a difference in the places where the typical human activity density is high in CCDL (Classroom: classrom; Canteen: Canteen; Dormitory: Dormitory; Library: Library) and the places where the typical human activity is relatively small in the lakes in the mountains and campuses (fig. 4.(a)), but such a difference is not manifested in the middle of each sampling point of CCDL, and similar microbial colony structures exist between different CCDL points (fig. 4. (B-C)).
5) Prediction of flora function
Starting from the operable classification unit of the strains of the sample, tools such as PICRUSt, Tax4Fun, FAPROTAX and the like are used for predicting the functions of the strains, and the functional diversity of the strains and the functional difference of the strains among the strains of the sample are analyzed. A semi-closed environment, such as the university of science and technology in china, can be seen in fig. 5. In FIG. 5, (A-E) analysis of the principal components among the different sample flora composition groups. In the group, the academic exchange center and other (A), school gate and other (B) analyses collectively cover 251 samples collected in the university of science and technology in Huazhong. The 'CCDL' and academic exchange center (C) cover 121 samples collected in the campus of the university of Chinese science and technology, and the 'CCDL' and gate (D) cover 124 samples collected in the campus of the university of Chinese science and technology. In order to compare the microbial community structures in different campuses in the Wuhan area, 284 samples in total collected from the university campus of science and technology in Huazhong, the university campus of Wuhan, the university campus in Huazhong, and the university campus in Huazhong are shown in (E). (F-J) shows the comparison result of the functional hierarchy. (K) There are 10 samples of science and technology university in china in 2016 summer and 10 samples after rainstorm in 2016 autumn, and 14 samples of 2016 autumn in autumn (the location is from west canteen to west campus bus station, specifically see lower right corner in (k), which is limited to space and will not be described any further). To account for the dynamic changes in campus flora structure, 2016 summer samples were analyzed in (L), 2016 summer raindrops samples in (M), and 2016 autumn samples in (N). The structure of the bacterial flora varied (fig. 5.(a-E)), but there was a relative stability in function (fig. 5.(F-J)), and there was no great difference in the function of the bacterial species. There was also no significant difference in functionality compared to other local university campuses. Even in 2016 summer heavy rains, the functions of the bacterial species in the campus were rapidly restored to the original change laws (FIG. 5. (K-N)). And (3) differential analysis: and analyzing the diversity of the strains of the samples and the difference among the flora sample groups based on the statistical table of the operable classification unit of the strains of the samples.
In this example, relative abundance composition information for species composition was obtained against the GreenGene database based on the classifiable operator tables (see fig. 2). Grouping the samples according to the abundance composition information, and performing inter-group difference analysis on the relative abundance composition information and the operable classification units by using a principal coordinate analysis method and a principal component analysis method. In order to perform principal coordinate analysis, the method of Unweighted _ Unifrac is adopted, and distance calculation between samples is performed on the strain relative abundance table and the classifiable operation unit table of the samples respectively, as shown in FIG. 3. In order to explore the dynamic change process of the flora structure in the environment on the time sequence, Euclidean distance calculation is carried out on the collected flora samples, the Euclidean distance of the samples in each season with the first sampling as the starting point is obtained, and the dynamic change of the environment flora composition is represented in a distance mode, as shown in figure 3. Secondly, for more visual characterization of the process of dynamic change, the pearson correlation of the relative abundance composition is calculated, species composition is carried out in the correlation network by taking the time series as the vertical axis, and dynamic observation is carried out, as shown in fig. 3.
And (3) correlation analysis: the correlation analysis between the flora structure (mainly comprising strains) and environmental factors is carried out by combining the flora structure with the semi-closed environment characteristic parameters.
Starting from the structure and the function of the flora, and combining with the characteristic parameters of semi-closed environment, the relevance analysis between the composition of strains in the flora and the environmental factors and between the functions of the strains and the environmental factors is carried out. In order to reveal different environmental factors and human factors and influence on the composition of the semi-closed environment flora and the function of the strain, a Mantel test method is adopted for testing, and the test contents comprise air pressure, CCDL (carbon dioxide free radical decomposition) and temperature.
To assess the magnitude of the effect of potential environmental factors on flora structure and function, the distance matrix calculated based on euclidean distance was further tested for significance, see fig. 6. In FIG. 6, (A) in order to reveal different environmental factors as well as human factors, the effect on the composition and function of the semi-closed environment flora, the present study used a Mantel test, the contents of which included atmospheric pressure (1004.7mb to 1028.1mb), whether CCDL was present (Clinic, Gate, Hotel Station Hotel Station, Sports fields, Hill, lake, and temperature (3.4 ℃ C. to 53.4 ℃ C.). Data between different Samples (Samples 1through m) were normalized by Z-Score and used as input for Mantel test. (B) The relationship between different factors and the composition of the strains, the arc length is in direct proportion to the correlation. (C) The relationship between different factors and the function of the strain, the arc length is in direct proportion to the correlation. Data between different groups (flora structure, atmosphere, human activity, temperature) were normalized by Z-Score (i.e. standard deviation normalization) and used as input for Mantel test. To perform this test, the distance matrix of the species and function was tested separately by calling the 'mantel' function contained in the 'vegan' library in the R language. The atmospheric pressure, human activity and temperature are sequentially and progressively correlated with the flora structure in the semi-closed environment of the university of science and technology in china (as shown in fig. 6, the influence indexes ES are respectively "0.100", "0.111" and "0.167"), and the influence indexes ES are respectively "0.110", "0.136" and "0.137" functionally and progressively correlated with the human activity, atmospheric pressure and temperature. Here, ES is effective size, which means the magnitude of the effect.
The implementation basis of the various embodiments of the present invention is realized by programmed processing performed by a device having a processor function. Therefore, in engineering practice, the technical solutions and functions thereof of the embodiments of the present invention can be packaged into various modules. Based on this reality, on the basis of the above embodiments, the embodiments of the present invention provide an apparatus for determining the structural and functional relevance of environmental factors to flora, which is used to execute the method for determining the structural and functional relevance of environmental factors to flora in the above method embodiments. Referring to fig. 7, the apparatus includes:
an operable classification unit obtaining module 701 for strains, configured to obtain marker gene sequences of all strains in a flora sample by using second-generation high-throughput sequencing, analyze the marker gene sequences to obtain an operable classification unit for the strains, and visualize the operable classification unit by using an R language;
a correlation determination module 702, configured to perform difference analysis on the flora sample according to the operable classification unit of the strain to obtain a flora structure of the flora sample, and determine the correlation between the flora structure and flora function and the environmental factor by combining the semi-closed environmental characteristic parameter.
The device for determining the correlation between the environmental factors and the flora structure and function provided by the embodiment of the invention adopts the operable classification unit acquisition module and the correlation determination module of the strains, obtains the operable classification unit of the strains by analyzing the marker gene sequences of all the strains in the flora sample, obtains the flora structure of the flora sample on the basis, and can more accurately determine the correlation between the environmental factors and the flora structure and function by combining with the semi-closed environmental characteristic parameters.
The method of the embodiment of the invention is realized by depending on the electronic equipment, so that the related electronic equipment is necessarily introduced. To this end, an embodiment of the present invention provides an electronic apparatus, as shown in fig. 8, including: at least one processor (processor)801, a communication Interface (Communications Interface)804, at least one memory (memory)802, and a communication bus 803, wherein the at least one processor 801, the communication Interface 804, and the at least one memory 802 communicate with each other via the communication bus 803. The at least one processor 801 may invoke logic instructions in the at least one memory 802 to perform the following method: adopting second-generation high-throughput sequencing to obtain marker gene sequences of all strains in a flora sample, analyzing the marker gene sequences to obtain operable classification units of the strains, and visualizing the operable classification units by adopting R language; and performing difference analysis on the flora samples according to the operable classification units of the strains to obtain the flora structure of the flora samples, and determining the correlation between the flora structure and the flora function and environmental factors by combining semi-closed environmental characteristic parameters.
Furthermore, the logic instructions in the at least one memory 802 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. Examples include: adopting second-generation high-throughput sequencing to obtain marker gene sequences of all strains in a flora sample, analyzing the marker gene sequences to obtain operable classification units of the strains, and visualizing the operable classification units by adopting R language; and performing difference analysis on the flora samples according to the operable classification units of the strains to obtain the flora structure of the flora samples, and determining the correlation between the flora structure and the flora function and environmental factors by combining semi-closed environmental characteristic parameters. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. Based on this recognition, each block in the flowchart or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In this patent, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A method for determining the structural and functional relevance of environmental factors to a flora, comprising:
adopting second-generation high-throughput sequencing to obtain marker gene sequences of all strains in a flora sample, analyzing the marker gene sequences to obtain operable classification units of the strains, and visualizing the operable classification units by adopting R language;
and performing difference analysis on the flora samples according to the operable classification units of the strains to obtain the flora structure of the flora samples, and determining the correlation between the flora structure and the flora function and environmental factors by combining semi-closed environmental characteristic parameters.
2. The method of claim 1, wherein said analyzing said marker gene sequence to obtain operable taxa of said bacterial species comprises:
and analyzing the sequence of the marker gene by using a MOthur and Qiime bioinformatics tool to obtain an operable classification unit of the strain.
3. The method for determining correlation between environmental factors and flora structure and function according to claim 1, wherein the differential analysis of the flora sample according to the operable classification units of the bacterial species to obtain the flora structure of the flora sample comprises:
obtaining relative abundance composition information of the flora samples by contrasting a GreenGene database according to the operable classification units of the strains, grouping the flora samples according to the relative abundance composition information, and performing difference analysis on the relative abundance composition information and the operable classification units of the strains by adopting a principal coordinate analysis method and a principal component analysis method to obtain the flora structure of the flora samples.
4. The method of determining the structural and functional relevance of environmental factors to flora according to claim 3, wherein the principal coordinate analysis method comprises:
and respectively calculating the distance between strains to obtain the Euclidean distance between the strains according to the relative abundance composition information of the flora sample and the operable classification unit of the strains by adopting an Unweighted _ Unifrac method, and obtaining the Pearson correlation of the relative abundance composition information according to the Euclidean distance.
5. The method for determining the structural and functional relevance of environmental factors to flora according to claim 4, wherein the semi-closed environmental characteristic parameters comprise:
atmospheric pressure, human activity, and temperature.
6. The method for determining the correlation of environmental factors and flora structure and function according to claim 5, wherein the correlation of the flora structure and flora function and environmental factors is determined by combining semi-closed environmental characteristic parameters, and comprises the following steps:
Z-Score standardization is carried out on the flora structure, the atmospheric pressure, the human activity and the temperature of the flora sample, and Mantel inspection is carried out on the standardized result to obtain the atmospheric pressure, the human activity and the temperature as well as the influence effect indexes on the flora structure and the flora function of the flora sample.
7. An apparatus for determining the structural and functional relevance of environmental factors to a flora, comprising:
the system comprises a strain operable classification unit acquisition module, a strain operable classification unit acquisition module and a strain identification module, wherein the strain operable classification unit acquisition module is used for acquiring marker gene sequences of all strains in a flora sample by adopting second-generation high-throughput sequencing, analyzing the marker gene sequences to obtain operable classification units of the strains and visualizing the operable classification units by adopting R language;
and the correlation determination module is used for performing difference analysis on the flora samples according to the operable classification units of the strains to obtain the flora structure of the flora samples, and determining the correlation between the flora structure and the flora function and the environmental factors by combining with semi-closed environmental characteristic parameters.
8. An electronic device, comprising:
at least one processor, at least one memory, a communication interface, and a bus; wherein,
the processor, the memory and the communication interface complete mutual communication through the bus;
the memory stores program instructions executable by the processor, the processor calling the program instructions to perform the method of any of claims 1 to 6.
9. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1-6.
CN201910334811.8A 2019-04-24 2019-04-24 Method and equipment for determining correlation between environmental factors and flora structure and function Active CN110111846B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910334811.8A CN110111846B (en) 2019-04-24 2019-04-24 Method and equipment for determining correlation between environmental factors and flora structure and function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910334811.8A CN110111846B (en) 2019-04-24 2019-04-24 Method and equipment for determining correlation between environmental factors and flora structure and function

Publications (2)

Publication Number Publication Date
CN110111846A true CN110111846A (en) 2019-08-09
CN110111846B CN110111846B (en) 2023-03-14

Family

ID=67486505

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910334811.8A Active CN110111846B (en) 2019-04-24 2019-04-24 Method and equipment for determining correlation between environmental factors and flora structure and function

Country Status (1)

Country Link
CN (1) CN110111846B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114999574A (en) * 2022-08-01 2022-09-02 中山大学 Parallel identification and analysis method and system for intestinal flora big data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007068431A (en) * 2005-09-05 2007-03-22 Oji Paper Co Ltd Method and kit for analyzing bacterial flora
CN101724690A (en) * 2009-08-31 2010-06-09 华南理工大学 Method for detecting polymorphism of flora of prawn culture water body
CN107287293A (en) * 2017-06-13 2017-10-24 浙江大学 The absolute abundance assay method of biological community structure in a kind of environment
CN107402302A (en) * 2017-08-06 2017-11-28 潘荣兰 A kind of environmental monitoring prokaryotes quantity rapid evaluation chip
CN107475385A (en) * 2017-08-21 2017-12-15 上海派森诺生物科技股份有限公司 A kind of bacterial diversity composition modal data analysis method based on SMRT high throughput sequencing technologies
CN108283089A (en) * 2018-01-10 2018-07-17 河海大学 A kind of non-maintaining floating cultivation platforms and soil system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007068431A (en) * 2005-09-05 2007-03-22 Oji Paper Co Ltd Method and kit for analyzing bacterial flora
CN101724690A (en) * 2009-08-31 2010-06-09 华南理工大学 Method for detecting polymorphism of flora of prawn culture water body
CN107287293A (en) * 2017-06-13 2017-10-24 浙江大学 The absolute abundance assay method of biological community structure in a kind of environment
CN107402302A (en) * 2017-08-06 2017-11-28 潘荣兰 A kind of environmental monitoring prokaryotes quantity rapid evaluation chip
CN107475385A (en) * 2017-08-21 2017-12-15 上海派森诺生物科技股份有限公司 A kind of bacterial diversity composition modal data analysis method based on SMRT high throughput sequencing technologies
CN108283089A (en) * 2018-01-10 2018-07-17 河海大学 A kind of non-maintaining floating cultivation platforms and soil system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XIAOQUAN SU等: ""GPU-Meta-Storms: computing the structure similarities among massive amount of microbial community samples using GPU"", 《BIOINFORMATICS》 *
YONG ZHANG等: ""Climate change and human activities altered the diversity and composition of soil microbial community in alpine grasslands of the Qinghai-Tibetan Plateau"", 《SCIENCE OF THE TOTAL ENVIRONMENT》 *
杨浩等: ""典型集雨人饮地区窖水微生物群落多样性及差异解析"", 《环境科学》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114999574A (en) * 2022-08-01 2022-09-02 中山大学 Parallel identification and analysis method and system for intestinal flora big data

Also Published As

Publication number Publication date
CN110111846B (en) 2023-03-14

Similar Documents

Publication Publication Date Title
Wan et al. Biogeographic patterns of microbial association networks in paddy soil within Eastern China
Curtis et al. What is the extent of prokaryotic diversity?
Hu et al. Mountain biodiversity and ecosystem functions: interplay between geology and contemporary environments
Zinger et al. DNA metabarcoding—Need for robust experimental designs to draw sound ecological conclusions
O'Brien et al. Spatial scale drives patterns in soil bacterial diversity
Xiong et al. Geographic distance and pH drive bacterial distribution in alkaline lake sediments across Tibetan Plateau
Wang et al. Mechanisms of soil bacterial and fungal community assembly differ among and within islands
Zinger et al. Extracellular DNA extraction is a fast, cheap and reliable alternative for multi-taxa surveys based on soil DNA
Ramette Multivariate analyses in microbial ecology
Suleiman et al. Shifts in soil bacterial community after eight years of land-use change
Gonnella et al. Endemic hydrothermal vent species identified in the open ocean seed bank
Wang et al. Temperature drives local contributions to beta diversity in mountain streams: stochastic and deterministic processes
Erdozain et al. Metabarcoding of storage ethanol vs. conventional morphometric identification in relation to the use of stream macroinvertebrates as ecological indicators in forest management
Massatti et al. Contrasting support for alternative models of genomic variation based on microhabitat preference: Species‐specific effects of climate change in alpine sedges
Sommers et al. Diversity patterns of microbial eukaryotes mirror those of bacteria in Antarctic cryoconite holes
Banerjee et al. Linking microbial co‐occurrences to soil ecological processes across a woodland‐grassland ecotone
Zhao et al. The scale dependence of fungal community distribution in paddy soil driven by stochastic and deterministic processes
Keet et al. Strong spatial and temporal turnover of soil bacterial communities in South Africa's hyperdiverse fynbos biome
Bendiksby et al. Combining genetic analyses of archived specimens with distribution modelling to explain the anomalous distribution of the rare lichen Staurolemma omphalarioides: long‐distance dispersal or vicariance?
Simonin et al. Consistent declines in aquatic biodiversity across diverse domains of life in rivers impacted by surface coal mining
Ahmadi et al. Evolutionary applications of phylogenetically-informed ecological niche modelling (ENM) to explore cryptic diversification over cryptic refugia
Roy et al. Differences in the fungal communities nursed by two genetic groups of the alpine cushion plant, Silene acaulis
Dellicour et al. Landscape genetic analyses of Cervus elaphus and Sus scrofa: comparative study and analytical developments
Li et al. Sampling cores and sequencing depths affected the measurement of microbial diversity in soil quadrats
Escalas et al. A unifying quantitative framework for exploring the multiple facets of microbial biodiversity across diverse scales

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant