CN111061818A

CN111061818A - Metabolic group and other omics combined analysis method and device

Info

Publication number: CN111061818A
Application number: CN201911380147.7A
Authority: CN
Inventors: 郑洪坤; 秦刚; 张蕾; 梁若冰
Original assignee: Beijing Biomarker Technologies Co ltd
Current assignee: Beijing Biomarker Technologies Co ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2020-04-24
Anticipated expiration: 2039-12-27
Also published as: CN111061818B

Abstract

The embodiment of the invention provides a metabolome and other omics combined analysis method and a device, wherein the method comprises the following steps: performing multiple-inertia analysis on the data of the metabolome and the data of one or more other omics except the metabolome to obtain the association relationship of the data between the metabolome and the other omics; taking the data of the metabolome with the incidence relation and the data of other omics as a data pair, carrying out correlation analysis on each data pair, and screening the data pairs according to the correlation analysis result of each data pair; and performing restrictive correspondence analysis on each screened data pair, determining a finally-retained data pair according to the restrictive correspondence analysis result of each data pair, and finally determining the association relationship of the data between the metabolome and the other omics according to the finally-retained data pair. The embodiment of the invention realizes the joint analysis of the metabolome and other omics, and the analysis result is more accurate.

Description

Metabolic group and other omics combined analysis method and device

Technical Field

The invention belongs to the technical field of biological information analysis, and particularly relates to a metabolome and other omics combined analysis method and device.

Background

Metabonomics is a group study which is started along with the continuous development of mass spectrometry technology and information technology, and takes a whole set of metabolites in organisms as study objects. The metabolome plays a key role in disease diagnosis and prevention, new drug screening and development and ecological research.

At present, the development of various omics such as transcriptome, proteome, metabolome, microbiome and the like has promoted the understanding of the bio-physiological activities, but the elucidation of the complex life activities of organisms is difficult to be performed by single omics research. Research of different omics shows the state of organisms under different space-time conditions, and various omics need to be integrated to obtain the activity mechanism of the organism as a whole. The combined analysis of multiomics is beneficial to systematically explaining the intrinsic mechanism of organisms, and how to effectively integrate metabolome and other omics data and extract key information is a problem to be solved urgently.

Most of the existing multiomic combined analysis methods mainly use correlation analysis, but the correlation analysis is not accurate in analyzing the association relation between the group of chemical data.

Disclosure of Invention

In order to overcome the problem that the analysis result of the existing multi-omics combined analysis method is inaccurate or at least partially solve the problem, the embodiment of the invention provides a metabolome and other omics combined analysis method and device.

According to a first aspect of embodiments of the present invention, there is provided a metabolome and proteomics combined analysis method, comprising:

performing multiple-inertia analysis on the data of the metabolome and the data of one or more other omics except the metabolome to obtain the association relationship of the data between the metabolome and the other omics;

taking the data of the metabolome with the incidence relation and the data of other omics as a data pair, carrying out correlation analysis on each data pair, and screening the data pairs according to the correlation analysis result of each data pair;

and performing restrictive correspondence analysis on each screened data pair, determining a finally-retained data pair according to the restrictive correspondence analysis result of each data pair, and finally determining the association relationship of the data between the metabolome and the other omics according to the finally-retained data pair.

Specifically, the step of performing multiple covariant analysis on the data of the metabolome and the data of one or more other omics except the metabolome to obtain the association relationship between the metabolome and the other omics comprises the following steps:

projecting the metabolome data and the omics data into the same dimensional space based on omicide 4 software;

determining the relevance between any data of the metabolome and any data of any other omics according to the included angle between the coordinate of any data of the metabolome and the connecting line of the coordinate of any data of any other omics and the origin in the dimension space;

and preliminarily determining that the association relationship exists between the data of the metabolome corresponding to the association larger than the first preset threshold and the data of other omics.

Specifically, the step of performing a multiple-covariance analysis on the metabolome data, and one or more omics data other than the metabolome, specifically comprises:

performing differential analysis on the expression level file of the metabolome data and the expression level file of the omics data;

performing a multiple covariant analysis on the difference analysis result of the metabolome data and the difference analysis result of the omics data.

Specifically, the step of performing correlation analysis on each data pair and screening the data pairs according to the correlation analysis result of each data pair includes:

and performing correlation analysis on each data pair based on a spearman method, acquiring a correlation coefficient and a correlation P value of each data pair, and screening out the data pairs of which the correlation coefficient is greater than a second preset threshold and the correlation P value is less than a third preset threshold.

Specifically, the step of performing a restrictive correspondence analysis on each of the screened data pairs, and determining a final retained data pair according to a restrictive correspondence analysis result of each of the data pairs, includes:

based on vegan software, performing restrictive corresponding analysis on each screened data pair to obtain a scoring result of each data pair; wherein the data of the metabolome of the data pair is taken as a restrictive condition for the restrictive correspondence analysis;

and determining the finally reserved data pairs according to the scoring result of each data pair.

In particular, the omics include one or more of transcriptomes, proteomics, and microbiomes.

Specifically, the step of finally determining the association of the data between the metabolome and the omics according to the finally retained data pairs further comprises:

if the metabolites generated by the organisms with the transcriptome genes are preset metabolites, acquiring data of other omics which are in association with the preset metabolites according to the finally determined association relationship of the data between the metabolome and the other omics;

determining that the transcriptome gene has a predetermined difference gene if the organism has data from an omics having an association with the predetermined metabolite.

According to a second aspect of the embodiments of the present invention, there is provided a metabolome and proteomics combined analysis device, including:

the acquisition module is used for carrying out multiple-covariance analysis on the data of the metabolome and the data of one or more other omics except the metabolome to acquire the incidence relation of the data between the metabolome and the other omics;

the screening module is used for taking the data of the metabolome with the incidence relation and the data of other omics as a data pair, carrying out correlation analysis on each data pair and screening the data pair according to the correlation analysis result of each data pair;

and the determining module is used for performing restrictive correspondence analysis on each screened data pair, determining a finally-retained data pair according to the restrictive correspondence analysis result of each data pair, and finally determining the association relationship of the data between the metabolome and the other omics according to the finally-retained data pair.

According to a third aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor calls the program instructions to perform a metabolome and other omics joint analysis method provided in any one of the various possible implementations of the first aspect.

According to a fourth aspect of embodiments of the present invention, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a metabolome and other omics joint analysis method provided in any of the various possible implementations of the first aspect.

The embodiment of the invention provides a metabolome and other omics combined analysis method and a device, the method is characterized in that after multi-covariance analysis is carried out on metabolome and other omics data, the incidence relation of the data between the metabolome and other omics is preliminarily determined, correlation analysis is further carried out on the metabolome data with the incidence relation and other omics data, a more accurate analysis result is screened out, then deep restrictive corresponding analysis is carried out on the screened data with the incidence relation, and the incidence relation of the data between the metabolome and other omics is finally determined, so that the combined analysis of the metabolome and other omics is realized, and the analysis result is more accurate.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic overall flow chart of the metabolome and other omics joint analysis method provided by the embodiment of the present invention;

FIG. 2 is a schematic diagram of the overall structure of a metabolome and other omics combined analysis device provided by the embodiment of the present invention;

fig. 3 is a schematic view of an overall structure of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In an embodiment of the present invention, a method for analyzing metabolome and other omics jointly is provided, and fig. 1 is a schematic overall flow chart of the method for analyzing metabolome and other omics jointly provided in the embodiment of the present invention, the method includes: s101, carrying out multiple co-inertia analysis on the data of the metabolome and the data of one or more other omics except the metabolome to obtain the association relationship of the data between the metabolome and the other omics;

wherein the metabolome data is data for a plurality of metabolites and the omics include one or more of transcriptome, proteome and microbiome. The data of transcriptome is data of multiple genes, the data of proteome is data of multiple proteins, and the data of microbiology is data of multiple microorganisms. And performing multiple-covariance analysis on the data of the metabolome and the data of other omics, and primarily determining the association relationship between the metabolome data and the data of other different omics.

S102, taking the data of the metabolome with the incidence relation and the data of other omics as a data pair, carrying out correlation analysis on each data pair, and screening the data pair according to the correlation analysis result of each data pair;

then, correlation analysis is performed on the two data preliminarily determined to have the association relationship in each data pair, and the correlation between the two data is determined. And screening out a data pair consisting of two data with strong correlation.

S103, performing restrictive correspondence analysis on each screened data pair, determining a finally-reserved data pair according to the restrictive correspondence analysis result of each data pair, and finally determining the association relationship of the data between the metabolome and the other omics according to the finally-reserved data pair.

And further determining the finally reserved data pairs through restrictive correspondence analysis on the basis of the screened data pairs with strong correlation. And taking the finally retained data of other omics in any data pair as finally determined data associated with the metabonomic data in the data pair, thereby obtaining a more accurate incidence relation analysis result.

In the embodiment, after the multi-covariant analysis is performed on the metabolome and other omics data, and the incidence relation of the data between the metabolome and other omics is preliminarily determined, the relativity analysis is further performed on the metabolome data with the incidence relation and other omics data, so that a more accurate analysis result is screened out, then the screened data with the incidence relation is subjected to deep restrictive correspondence analysis, and the incidence relation of the data between the metabolome and other omics is finally determined, so that the combined analysis of the metabolome and other omics is realized, and the analysis result is more accurate.

On the basis of the above embodiment, in this embodiment, the multiple covariant analysis is performed on the data of the metabolome and the data of one or more other omics other than the metabolome, and the step of obtaining the association relationship between the metabolome and the other omics includes: projecting the metabolome data and the omics data into the same dimensional space based on omicide 4 software; determining the relevance between any data of the metabolome and any data of any other omics according to the included angle between the coordinate of any data of the metabolome and the connecting line of the coordinate of any data of any other omics and the origin in the dimension space; and preliminarily determining that the association relationship exists between the data of the metabolome corresponding to the association larger than the first preset threshold and the data of other omics.

The multicompartment inertia analysis is an exploratory data analysis method for mining the common relation among a plurality of data sets, and can project a plurality of omics data to the same dimensional space, so as to visually display the association relation between each metabolite in the metabolome and each data in other omics, such as the association with the data of one or more omics of the transcriptome, the proteome and the microbiome. In the embodiment, the omicide 4 software is used for analysis, and the data of the omics are displayed in a two-dimensional graph, wherein an included angle formed by a connecting line between a position point corresponding to any two data and an origin reflects the relevance between the two data, and the smaller the included angle is, the stronger the relevance is. And preliminarily determining that the association exists between the metabolome data with the association being larger than the first preset threshold and the data of other omics.

On the basis of the above embodiment, the step of performing the multiple covariant analysis on the metabolome data and the one or more omics data other than the metabolome in this embodiment specifically includes: performing differential analysis on the expression level file of the metabolome data and the expression level file of the omics data; performing a multiple covariant analysis on the difference analysis result of the metabolome data and the difference analysis result of the omics data.

Specifically, before performing the joint analysis of the metabolome and the other omics, the expression level files of the metabolome and the other omics need to be prepared, the expression level files of the metabolome and the other omics are generally subjected to the difference analysis, and the difference analysis results of the metabolome and the other omics are used as input to perform the multiple inertia analysis. The correlation analysis and the restrictive correspondence analysis are also performed based on the expression level file.

On the basis of the foregoing embodiment, in this embodiment, the step of performing correlation analysis on each data pair and screening the data pairs according to the correlation analysis result of each data pair includes: and performing correlation analysis on each data pair based on a spearman method, acquiring a correlation coefficient and a correlation P value of each data pair, and screening out the data pairs of which the correlation coefficient is greater than a second preset threshold and the correlation P value is less than a third preset threshold.

Specifically, correlation analysis is performed on the metabolome data and other omics data in each data pair based on the spearman method, and the correlation analysis result is a correlation coefficient and a correlation P value between two data in each data pair. And screening out data pairs with the correlation coefficient larger than a second preset threshold and the correlation coefficient P value smaller than a third preset threshold, wherein if the second preset threshold is 0.8, the third preset threshold is 0.05, and screening out other omics data such as genes, proteins and microorganisms which are related to the metabonomic data. And visualizing the screening result by using the network map.

On the basis of the foregoing embodiment, in this embodiment, the step of performing the restricted correspondence analysis on each screened data pair, and determining the finally retained data pair according to the result of the restricted correspondence analysis on each data pair includes: based on vegan software, performing restrictive corresponding analysis on each screened data pair to obtain a scoring result of each data pair; wherein the data of the metabolome of the data pair is taken as a restrictive condition for the restrictive correspondence analysis; and determining the finally reserved data pairs according to the scoring result of each data pair.

Specifically, vegan software is used for carrying out restriction correspondence analysis on the data in each screened data pair, and metabolites in each data pair are taken as restriction conditions during analysis. And finally determining other omics data in association with the metabonomic data according to the scoring result of each data pair.

On the basis of the foregoing embodiments, the step of finally determining the association relationship between the metabolome and the omics according to the finally retained data pair in this embodiment further includes: if the metabolites generated by the organisms with the transcriptome genes are preset metabolites, acquiring data of other omics which are in association with the preset metabolites according to the finally determined association relationship of the data between the metabolome and the other omics; determining that the transcriptome gene has a predetermined difference gene if the organism has data from an omics having an association with the predetermined metabolite.

Specifically, when determining whether a predetermined difference gene exists in the transcriptome gene, it is first determined whether a metabolite produced by the organism having the transcriptome gene is a predetermined metabolite. The predetermined differential genes enable organisms with transcriptome genes to have particular functions, such as drought resistance. If the metabolites generated by the organisms with the transcriptome genes are preset metabolites, searching data of other omics corresponding to the preset metabolites according to the finally determined association relationship, further judging whether the organisms with the transcriptome genes have the data of other omics corresponding to the preset metabolites, and if so, determining that preset difference genes exist in the transcriptome genes, thereby comprehensively using the metabolites and the data of other omics associated with the metabolites to determine the difference genes in the transcriptome genes, and enabling the determination result to be more accurate.

In another embodiment of the present invention, a metabolome and other omics combined analysis device is provided for carrying out the methods of the preceding embodiments. Thus, the descriptions and definitions in the embodiments of the aforementioned metabolome and other omics combinatorial analysis methods can be used for an understanding of the various executive modules in the embodiments of the present invention. Fig. 2 is a schematic diagram of an overall structure of a metabolome and other omics combined analysis device provided in the embodiment of the present invention, which includes an obtaining module 201, a screening module 202, and a determining module 203, wherein:

the obtaining module 201 is configured to perform multiple-covariance analysis on the data of the metabolome and the data of one or more other omics except the metabolome, and obtain an association relationship between the metabolome and the other omics;

wherein the metabolome data is data for a plurality of metabolites and the omics include one or more of transcriptome, proteome and microbiome. The data of transcriptome is data of multiple genes, the data of proteome is data of multiple proteins, and the data of microbiology is data of multiple microorganisms. The obtaining module 201 performs multiple inertia analysis on the metabolome data and other omics data, and preliminarily determines the association relationship between the metabolome data and other different omics data.

The screening module 202 is configured to use the metabolome data and other omics data having an association relationship as a data pair, perform correlation analysis on each data pair, and screen the data pair according to the correlation analysis result of each data pair;

the screening module 202 performs correlation analysis on the two data preliminarily determined to have the correlation in each data pair, and determines the correlation between the two data. And screening out a data pair consisting of two data with strong correlation.

The determining module 203 is configured to perform a restrictive correspondence analysis on each of the screened data pairs, determine a final retained data pair according to a restrictive correspondence analysis result of each of the data pairs, and finally determine an association relationship between the metabolome and the other omics according to the final retained data pair.

The determining module 203 further determines the finally retained data pairs through restrictive correspondence analysis on the basis of the screened data pairs with strong correlation. And taking the finally retained data of other omics in any data pair as finally determined data associated with the metabonomic data in the data pair, thereby obtaining a more accurate incidence relation analysis result.

On the basis of the foregoing embodiment, the obtaining module in this embodiment is specifically configured to: projecting the metabolome data and the omics data into the same dimensional space based on omicide 4 software; determining the relevance between any data of the metabolome and any data of any other omics according to the included angle between the coordinate of any data of the metabolome and the connecting line of the coordinate of any data of any other omics and the origin in the dimension space; and preliminarily determining that the association relationship exists between the data of the metabolome corresponding to the association larger than the first preset threshold and the data of other omics.

On the basis of the foregoing embodiment, the obtaining module in this embodiment is specifically configured to: performing differential analysis on the expression level file of the metabolome data and the expression level file of the omics data; performing a multiple covariant analysis on the difference analysis result of the metabolome data and the difference analysis result of the omics data.

On the basis of the above embodiment, the screening module in this embodiment is specifically configured to: and performing correlation analysis on each data pair based on a spearman method, acquiring a correlation coefficient and a correlation P value of each data pair, and screening out the data pairs of which the correlation coefficient is greater than a second preset threshold and the correlation P value is less than a third preset threshold.

On the basis of the foregoing embodiment, the determining module in this embodiment is specifically configured to: based on vegan software, performing restrictive corresponding analysis on each screened data pair to obtain a scoring result of each data pair; wherein the data of the metabolome of the data pair is taken as a restrictive condition for the restrictive correspondence analysis; and determining the finally reserved data pairs according to the scoring result of each data pair.

Based on the above embodiments, the omics in this embodiment include one or more of transcriptome, proteome, and microbiome.

On the basis of the above embodiments, the present embodiment further includes an application module, configured to, if a metabolite produced by an organism having a transcriptome gene is a preset metabolite, obtain data of an omics having an association relationship with the preset metabolite according to a final determination of an association relationship of data between the metabolome and the omics; determining that the transcriptome gene has a predetermined difference gene if the organism has data from an omics having an association with the predetermined metabolite.

Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)301, a communication Interface (communication Interface)302, a memory (memory)303 and a communication bus 304, wherein the processor 301, the communication Interface 302 and the memory 303 complete communication with each other through the communication bus 304. Processor 301 may call logic instructions in memory 303 to perform the following method: performing multiple-inertia analysis on the data of the metabolome and the data of one or more other omics except the metabolome to obtain the association relationship of the data between the metabolome and the other omics; taking the data of the metabolome with the incidence relation and the data of other omics as a data pair, carrying out correlation analysis on each data pair, and screening the data pairs according to the correlation analysis result of each data pair; and performing restrictive correspondence analysis on each screened data pair, determining a finally-retained data pair according to the restrictive correspondence analysis result of each data pair, and finally determining the association relationship of the data between the metabolome and the other omics according to the finally-retained data pair.

In addition, the logic instructions in the memory 303 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above method embodiments, for example, including: performing multiple-inertia analysis on the data of the metabolome and the data of one or more other omics except the metabolome to obtain the association relationship of the data between the metabolome and the other omics; taking the data of the metabolome with the incidence relation and the data of other omics as a data pair, carrying out correlation analysis on each data pair, and screening the data pairs according to the correlation analysis result of each data pair; and performing restrictive correspondence analysis on each screened data pair, determining a finally-retained data pair according to the restrictive correspondence analysis result of each data pair, and finally determining the association relationship of the data between the metabolome and the other omics according to the finally-retained data pair.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A metabolome and other omics joint analysis method, which is characterized by comprising the following steps:

2. The method for the metabolome and other omics joint analysis of claim 1, wherein the step of performing a multiple covariant analysis on the metabolome data and one or more other omics data other than the metabolome, and obtaining the association between the metabolome and the other omics comprises:

3. The metabolome and other omics joint analysis method of claim 1, wherein the step of performing a multiple covaiance analysis of the metabolome data and one or more omics data other than the metabolome comprises:

4. The metabolome and proteomics combined analysis method of claim 1, wherein each of the data pairs is analyzed for correlation, and the step of screening the data pairs according to the correlation analysis result of each of the data pairs comprises:

5. The metabolome and other omics joint analysis method of claim 1, wherein the step of performing a restricted correspondence analysis on each of the selected data pairs and determining the final retained data pair based on the result of the restricted correspondence analysis on each of the selected data pairs comprises:

6. The metabolome and proteomics combined assay of any one of claims 1-5, wherein the proteomics comprises one or more of transcriptome, proteome and microbiome.

7. The metabolome and other omics combined analysis method of any of claims 1 to 5 wherein the step of finally determining the data association between the metabolome and the other omics from the finally retained data pairs is followed by further steps of:

8. A metabolome and proteomics combined analysis device, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the metabolome and other omics joint analysis method of any of claims 1 to 7.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the metabolome and proteomics joint analysis method of any one of claims 1 to 7.