CN109979527A - A kind of transcript profile and metabolism group data relation analysis method and system - Google Patents
A kind of transcript profile and metabolism group data relation analysis method and system Download PDFInfo
- Publication number
- CN109979527A CN109979527A CN201910176587.4A CN201910176587A CN109979527A CN 109979527 A CN109979527 A CN 109979527A CN 201910176587 A CN201910176587 A CN 201910176587A CN 109979527 A CN109979527 A CN 109979527A
- Authority
- CN
- China
- Prior art keywords
- gene
- metabolin
- analysis
- transcript profile
- difference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 125
- 230000004060 metabolic process Effects 0.000 title claims abstract description 78
- DPJRMOMPQZCRJU-UHFFFAOYSA-M thiamine hydrochloride Chemical compound Cl.[Cl-].CC1=C(CCO)SC=[N+]1CC1=CN=C(C)N=C1N DPJRMOMPQZCRJU-UHFFFAOYSA-M 0.000 claims abstract description 127
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 118
- 230000014509 gene expression Effects 0.000 claims abstract description 55
- 238000012163 sequencing technique Methods 0.000 claims abstract description 26
- 238000000034 method Methods 0.000 claims abstract description 14
- 230000037361 pathway Effects 0.000 claims description 46
- 230000037353 metabolic pathway Effects 0.000 claims description 25
- 238000012098 association analyses Methods 0.000 claims description 14
- 238000010586 diagram Methods 0.000 claims description 9
- 230000009274 differential gene expression Effects 0.000 claims description 8
- 230000010354 integration Effects 0.000 claims description 6
- 239000002207 metabolite Substances 0.000 claims description 6
- 238000007405 data analysis Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 abstract description 8
- 230000033228 biological regulation Effects 0.000 description 7
- 238000012216 screening Methods 0.000 description 7
- 230000031018 biological processes and functions Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000003068 pathway analysis Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000002790 cross-validation Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000010201 enrichment analysis Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 238000003559 RNA-seq method Methods 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000012097 association analysis method Methods 0.000 description 1
- 238000009412 basement excavation Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
- 230000033001 locomotion Effects 0.000 description 1
- 230000036285 pathological change Effects 0.000 description 1
- 231100000915 pathological change Toxicity 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a kind of transcript profiles and metabolism group data relation analysis method and system, and described method includes following steps: step S1, carry out transcript profile sequencing to sample, and carry out analysis of biological information to transcript profile data, obtain difference expression gene;Step S2 carries out metabolism group sequencing to sample, and carries out analysis of biological information to metabolism group data, obtains difference metabolin;Step S3, difference expression gene and difference metabolin based on acquisition carry out analysis of biological information linked character, through the invention, solve the problems, such as that there are one-sidedness and partial data unreliability for the single data for organizing sequencing in the prior art, meanwhile, pass through information between biomolecule in the present invention and transmits, matching coordinative, specific function is finally showed, so that multiple groups confluence analysis more system of the invention, is more advantageous to and discloses complicated functional mechanism.
Description
Technical field
The present invention relates to transcription groups and metabonomic technology field, are based on biological information algorithm groups more particularly to one kind
The transcript profile and metabolism group data relation analysis method and system of conjunction.
Background technique
Metabolism group is the important component of systems biology, is the final product of vital movement, directly represents environment
Variation or physiological and pathological change to be influenced to body bring, and available a large amount of difference expression genes and regulation generation are sequenced in transcript profile
Thank to access, but transcription and metabolism are not independently to occur in biosystem, single group learn the data of sequencing there are one-sidedness and
Partial data unreliability, possibly can not the entire biological process of complete picture.Also, due to being difficult it between gene and phenotype
Between be associated with, cause crucial signal path to be difficult to determine, single group learns research and expected research purpose is often not achieved.It can be based on
The gene or metabolin for participating in same bioprocess have the same or similar changing rule, integrate transcript profile and and metabolism group
Data carry out the association analysis of the two, deeply excavate the gene and metabolin for participating in regulation process, disclose true gene expression
Regulated and control network obtains more complete access and mechanism parsing.Meanwhile with the fast development of high-flux sequence, transcript profile sequencing
Generated magnanimity biological data is sequenced with metabolism group, data are calculated and analysis has higher requirement, are needed based on height
Performance computing cluster carries out Data Integration and excavation, and high-throughput integration analysis technology is that the final biological information of acquisition is essential
Important means.
The Chinese invention patent of Publication No. CN107832585 discloses a kind of RNAseq data analysing method, but it is still
The method for not disclosing two groups of sequencing data association analysis, there are still the data of single group sequencing, there are one-sidedness and part number
The problem of according to unreliability.
Summary of the invention
In order to overcome the deficiencies of the above existing technologies, purpose of the present invention is to provide a kind of transcript profiles and metabolism group number
According to association analysis method and system, learned the data of sequencing there are one-sidedness and partial data not with solving single group in the prior art
The problem of reliability, excavates the gene and metabolin for participating in regulation process by the entire biological process of complete picture, discloses true
Real gene expression regulation network determines crucial signal path, obtains more complete access and mechanism parsing.
In view of the above and other objects, the present invention proposes a kind of transcript profile and metabolism group data relation analysis method, including
Following steps:
Step S1 carries out transcript profile sequencing to sample, and carries out analysis of biological information to transcript profile data, obtains difference table
Up to gene;
Step S2 carries out metabolism group sequencing to sample, and carries out analysis of biological information to metabolism group data, obtains difference generation
Thank to object;
Step S3, difference expression gene and difference metabolin based on acquisition carry out analysis of biological information linked character.
Preferably, in step S3, gene expression amount and metabolin abundance data based on acquisition are included but unlimited
In the analysis of Pathway functional mode, O2PLS model analysis and relative coefficient model analysis, the Pathway functional mode
For the KEGG metabolic pathway that query gene and metabolin share, that analyzes gene and metabolin in shared metabolic pathway is associated with spy
Sign, the O2PLS model analysis are used to predict to obtain in turn using gene expression amount and metabolin abundance data building O2PLS model
Relevant property gene and metabolin collection combined analysis, the relative coefficient model analysis for calculate gene expression amount with
The pearson relative coefficient of metabolin abundance simultaneously exports displaying.
Preferably, the Pathway functional mode analysis includes but is not limited to that group difference gene and group difference are metabolized
Object share the analysis of metabolic pathway, group difference gene and all metabolins share metabolic pathway analysis and all genes with
All metabolins share the analysis of metabolic pathway.
Preferably, in every kind of analysis type, pathway annotation is carried out first, will need to participate in gene and the generation of analysis
It thanks to object and matching is compared with the information of gene included in KEGG database pathway and metabolin, to obtain gene
With the pathway where metabolin, the gene and metabolin for being present in same pathway, annotation are then shown with graphic statistics
It is deemed likely to potential function association for the gene and metabolin of same pathway, finally in the form of shared pathway
Obtain the related information of gene and metabolin.
Preferably, output gene metabolic pathway associated with metabolin is also drawn in the Pathway functional mode analysis
Figure, gene functional character associated with metabolin is intuitively presented.
Preferably, the O2PLS model analysis includes the following steps:
Step 2.1, it is repeatedly modeled by intersection-verifying method, calculates the prediction error modeled every time, select pre-
Most suitable model in modeling, generally, the prediction smaller expression model of error are more reasonable;By repeatedly pre- modeling, obtain most suitable
O2PLS model;
Step 2.2, percentage contribution of the live part to model in transcript profile and metabolism group, the building of assessment models are calculated
Situation;
Step 2.3, the contribution degree of each gene of associated section and each metabolin in entire model is calculated, contribution degree
Size is embodied by load value, for all transcript profiles and metabolism group data, draws the load diagram of different groups respectively;
Step 2.4, the element load value obtained according to step 2.3 is as a result, screening-gene and metabolin carry out integration drafting
Load diagram to show the maximum gene of correlation degree and metabolism group, and draws output group associated payload figure.
Preferably, it includes but is not limited to that all differences gene expression amount and all differences are metabolized that the relative coefficient, which calculates,
The pearson coefficient and all gene expression amounts of object abundance and the pearson coefficient of all metabolin abundance.
Preferably, the relative coefficient model analysis is also used to draw output correlation thermal map using calculating gained coefficient
And network.
Preferably, the relative coefficient model analysis includes the following steps:
Step 3.1, the pearson coefficient of gene expression amount and metabolin abundance is calculated;
Step 3.2, the pearson system based on step 3.1 calculated differential gene expression amount and difference metabolin abundance
Number contents, sort from high to low by correlation absolute value, take several before ranking, and correlation absolute value is greater than the base of preset value
Cause, then the correlation of the gene filtered out and all differences metabolin is shown with thermal map;
Step 3.3, the pearson system based on step 3.1 calculated differential gene expression amount and difference metabolin abundance
Number content, garbled data draw output correlation network, to show the gene or metabolin that are in important relative position.
In order to achieve the above objectives, the present invention also provides a kind of transcript profiles and metabolism group data association analysis system, comprising:
Difference expression gene analytical unit for carrying out transcript profile sequencing to sample, and carries out biology to transcript profile data
Information analysis obtains difference expression gene;
Difference metabolite analysis unit for carrying out metabolism group sequencing to sample, and carries out biological letter to metabolism group data
Breath analysis, obtains difference metabolin;
Association analysis unit is associated with spy for carrying out analysis of biological information with difference metabolin based on difference expression gene
Sign.
Compared with prior art, a kind of transcript profile of the present invention and metabolism group data relation analysis method and system pass through to sample
This carries out transcript profile sequencing and the sequencing of metabolism group respectively, and two groups of data of gene expression amount and metabolin abundance based on acquisition carry out
Association analysis, analyzing and associating feature, the data that can solve individually to organize sequencing are complete there are one-sidedness and partial data unreliability
It is whole to describe entire biological process, the gene and metabolin for participating in regulation process are excavated, true gene expression regulation net is disclosed
Network determines crucial signal path, obtains more complete access and mechanism parsing.
Detailed description of the invention
Fig. 1 is a kind of step flow chart of transcript profile and metabolism group data relation analysis method of the present invention;
Fig. 2 is a kind of system architecture diagram of transcript profile and metabolism group data association analysis system of the present invention.
Specific embodiment
Below by way of specific specific example and embodiments of the present invention are described with reference to the drawings, those skilled in the art can
Understand further advantage and effect of the invention easily by content disclosed in the present specification.The present invention can also pass through other differences
Specific example implemented or applied, details in this specification can also be based on different perspectives and applications, without departing substantially from
Various modifications and change are carried out under spirit of the invention.
It is found through analysis, better biology can be obtained based on two groups of (transcript profile and metabolism group) data relation analysis of analysis
Data analysis result is more advantageous to the research for excavating mechanism principle, passes through the multiple groups data correlation of metabolism group and transcript profile point
Analysis method, the metabolin information of numerous genes and differential accumulation to temporal expression carry out confluence analysis, and binding molecule biology
Technology explains biological phenotype of interest from molecular level, explores biology growing development, physiological and pathological acknowledgement mechanism.Cause
This, the present invention proposes a kind of transcript profile based on biological information algorithm combination and metabolism group data relation analysis method
Fig. 1 is a kind of step flow chart of transcript profile and metabolism group data relation analysis method of the present invention.As shown in Figure 1,
A kind of transcript profile of the present invention and metabolism group data relation analysis method, include the following steps:
Step S1 carries out transcript profile sequencing to sample, and carries out analysis of biological information to transcript profile data, obtains difference table
Up to gene, specially screening obtains associated gene set influential on sample packet
Step S2 carries out metabolism group sequencing to sample, and carries out analysis of biological information to metabolism group data, obtains difference generation
Thank to object.Specially screening obtains metabolin set influential on sample packet.
Step S3 carries out analysis of biological information linked character based on difference expression gene and difference metabolin.In the present invention
In specific embodiment, analysis of biological information linked character is carried out based on cloud computing, is based on gene expression amount and metabolin abundance two
Group data carry out following three kinds of model analysis:
1, Pathway functional mode is analyzed, i.e. the shared KEGG metabolic pathway (pathway) of query gene and metabolin,
Analyze the linked character of gene and metabolin in shared pathway.
Specifically, the Pathway functional mode analysis includes but is not limited to that shared metabolic pathway analysis and result are shown,
It is analyzed by group difference, the gene of differential expression is obtained by transcript profile data, obtains difference table by metabolism group data
The metabolin reached, and the KEGG enrichment analysis for having carried out respective group will carry out in association analysis for gene and metabolin
The analysis of shared KEGG metabolic pathway (pathway)
In the specific embodiment of the invention, the analysis of Pathway functional mode includes but is not limited to following three types:
1) group difference gene and group difference metabolin share the analysis of metabolic pathway (pathway);
2) since the type of group difference metabolin may not have target metabolite seldom or in difference metabolin, therefore
It carries out group difference gene and all metabolins shares the analysis of metabolic pathway (pathway);
3) basis as the screening of other personality analysis, carries out all genes and all metabolins share metabolic pathway
(pathway) analysis.
Specifically, in every kind of analysis type, pathway annotation is carried out first, and above-mentioned needs are participated in the gene of analysis
Matching is compared with the information of gene included in KEGG database pathway and metabolin with metabolin, to obtain
Then pathway where gene and metabolin shows the gene and metabolin for being present in same pathway with graphic statistics,
The gene and metabolin that annotation is same pathway have been deemed likely to potential function association.Finally to share pathway
The related information of form acquisition gene and metabolin.
Specifically, the result display diagram is to show difference metabolin by output metabolic pathway figure (pathway map)
It shows to reconcile in abundance and lower for the intuitive relationship that gene and metabolin and related biological problem is presented with associated gene figure
Metabolin and expression quantity on reconcile the gene of downward.
2, O2PLS (Bidirectional orthogonal projections to latent structures) mould
Type analysis constructs O2PLS model using gene expression amount and metabolin abundance data, remove model noise components, pass through mould
Type prediction obtains the gene and metabolin collection combined analysis of relevant property.
Specifically, O2PLS model analysis is to be carried out based on all transcript profiles and all metabolism group data using OmicsPLS
O2PLS analysis.O2PLS is a kind of extensive OPLS, can carry out two-way modeling and prediction in two data matrixes, utilize this point
Analysis, can excavate the internal connection between transcript profile and metabolism group, determine the correlation degree of transcript profile Yu metabolism group data, while really
Surely cause this associated oligogene or metabolin.
In the specific embodiment of the invention, O2PLS model analysis includes but is not limited to following steps:
Step 2.1, the building of model: since the fitting of model is insufficient or over-fitting can all analyze data and impact,
It first before formal model analysis, is repeatedly modeled, is calculated by the method for intersection-verifying (cross-validation)
The prediction error modeled every time selects the most suitable model in pre- modeling, and generally, the prediction smaller expression model of error more closes
Reason;By repeatedly pre- modeling, most suitable O2PLS model is obtained, transcript profile and metabolism group data can be split as to 6 major parts,
The respectively associated section (joint part, this part mainly include the gene very big with metabolin correlation degree) of transcript profile,
(Orthogonal part, this part is not influence on metabolin abundance to the quadrature component of transcript profile, only to transcript profile data
Influential gene), there are also noise components, (noise part, this portion gene neither influence transcript profile data and nor affect on generation
Thank to a group data) and the corresponding association of metabolism group, orthogonal and noise components.
Step 2.2, model evaluation: after establishing O2PLS model, the live part calculated in transcript profile and metabolism group (is closed
Connection and quadrature component) to the percentage contribution of model, carry out the building situation of assessment models.The contribution degree of live part is bigger, indicates
The building of model is more reasonable.
Step 2.3, by constructing O2PLS model, it is each element Contribution Analysis: to calculate associated section (joint part)
The size of the contribution degree of gene and each metabolin in entire model, contribution degree is embodied by load (loading) value.Gene
Or load (loading) value absolute value of metabolin is bigger, indicates this gene or metabolin and metabolin or base that in addition group is learned
The correlation degree of cause is bigger.Meanwhile the element importance intuitively to check two groups, for all transcript profiles and metabolism group number
According to drafting exports the load diagram (loading plot) of different groups respectively.
Step 2.4, elements correlation is analyzed between group: element load (loading) value obtained according to previous step is as a result, screening
Out both-end top 2.5% (positive and negative each 2.5%, totally 5%) gene and metabolin carry out integration draw load (loading) figure, with
It shows the maximum gene of correlation degree and metabolism group, and draws output group associated payload figure.
3, relative coefficient model analysis calculates the pearson relative coefficient of gene expression amount and metabolin abundance,
And it is shown with thermal map and network.Here it should be noted that, in the specific embodiment of the invention, when sample packet >=3, just needs
Carry out relative coefficient model analysis.
Specifically, relative coefficient model analysis includes but is not limited to following steps:
Step 3.1, Pearson correlation coefficient (Pearson correlation coefficient) is calculated: Pearson came phase
Relationship number can be used to measure the correlation between two variables, represent the power of two variable co-variations, and value range is
[-1,+1].The pearson coefficient of gene expression amount and metabolin abundance is calculated, using R language cor.tesst function with assessment
The correlation of gene and metabolin.In the specific embodiment of the invention, Pearson correlation coefficient (Pearson correlation
It coefficient) include two types: 1) the pearson system of all differences gene expression amount and all differences metabolin abundance
Number;2) the pearson coefficient of all differences gene expression amount and all metabolin abundance.
Step 3.2, correlation thermal map is drawn: based on the calculated differential gene expression amount of step 3.1 and difference metabolin
The pearson coefficient content of abundance, sorts from high to low by correlation absolute value, takes several before ranking (such as preceding 10), and phase
Closing property absolute value is greater than the gene of preset value (such as 0.5), then by the gene filtered out and all differences metabolin (group difference
The union of metabolin) correlation shown with thermal map.
Step 3.3, it draws correlation networks figure: being metabolized based on the calculated differential gene expression amount of step 3.1 and difference
The pearson coefficient content of object abundance, garbled data draws correlation networks figure, to show the gene in important relative position
Or metabolin.
Fig. 2 is a kind of system architecture diagram of transcript profile and metabolism group data association analysis system of the present invention.As shown in Fig. 2,
A kind of transcript profile of the present invention and metabolism group data association analysis system, comprising:
Difference expression gene analytical unit 201 for carrying out transcript profile sequencing to sample, and gives birth to transcript profile data
Object information analysis, obtains difference expression gene, and difference expression gene analytical unit 201 is specifically used for screening and obtains to sample packet
Influential associated gene set.
Difference metabolite analysis unit 202 for carrying out metabolism group sequencing to sample, and carries out biology to metabolism group data
Information analysis obtains difference metabolin.Difference metabolite analysis unit 202, which is specifically used for screening acquisition, has an impact to sample packet
Metabolin set.
Association analysis unit 203, for carrying out analysis of biological information association based on difference expression gene and difference metabolin
Feature.In the specific embodiment of the invention, association analysis unit 203 is based on cloud computing and carries out analysis of biological information linked character,
Following three kinds of model analysis are carried out based on two groups of data of gene expression amount and metabolin abundance:
1, Pathway functional mode is analyzed, i.e. the shared KEGG metabolic pathway (pathway) of query gene and metabolin,
Analyze the linked character of gene and metabolin in shared pathway.
Specifically, the Pathway functional mode analysis includes but is not limited to that shared metabolic pathway analysis and result are shown,
It is analyzed by group difference, the gene of differential expression is obtained by transcript profile data, obtains difference table by metabolism group data
The metabolin reached, and the KEGG enrichment analysis for having carried out respective group will carry out in association analysis for gene and metabolin
The analysis of shared KEGG metabolic pathway (pathway)
In the specific embodiment of the invention, the analysis of Pathway functional mode includes but is not limited to following three types:
1) group difference gene and group difference metabolin share the analysis of metabolic pathway (pathway);
2) since the type of group difference metabolin may not have target metabolite seldom or in difference metabolin, therefore
It carries out group difference gene and all metabolins shares the analysis of metabolic pathway (pathway);
3) basis as the screening of other personality analysis, carries out all genes and all metabolins share metabolic pathway
(pathway) analysis.
Preferably, the result display diagram is to show difference metabolin by output metabolic pathway figure (pathway map)
With associated gene figure, gene functional character associated with metabolin is presented to be intuitive, reconciles the metabolism of downward in displaying abundance
Reconcile the gene of downward on object and expression quantity.
2, O2PLS (Bidirectional orthogonal projections to latent structures) mould
Type analysis constructs O2PLS model using gene expression amount and metabolin abundance data, remove model noise components, pass through mould
Type prediction obtains the gene and metabolin collection combined analysis of relevant property.
Specifically, O2PLS model analysis is to be carried out based on all transcript profiles and all metabolism group data using OmicsPLS
O2PLS analysis.O2PLS is a kind of extensive OPLS, can carry out two-way modeling and prediction in two data matrixes, utilize this point
Analysis, can excavate the internal connection between transcript profile and metabolism group, determine the correlation degree of transcript profile Yu metabolism group data, while really
Surely cause this associated oligogene or metabolin.
In the specific embodiment of the invention, O2PLS model analysis includes but is not limited to following procedure:
1) building of model: since the fitting of model is insufficient or over-fitting can all analyze data and impact, exist first
It before formal model analysis, is repeatedly modeled by the method for intersection-verifying (cross-validation), calculating is built every time
The prediction error of mould selects the most suitable model in pre- modeling, and generally, the prediction smaller expression model of error is more reasonable;Pass through
Repeatedly pre- modeling, obtains most suitable O2PLS model, and transcript profile and metabolism group data can be split as to 6 major parts, respectively turned
The associated section (joint part, this part mainly include the gene very big with metabolin correlation degree) of record group, transcript profile
(Orthogonal part, this part is not influence on metabolin abundance to quadrature component, only influential on transcript profile data
Gene), there are also noise components, (noise part, this portion gene neither influence transcript profile data and nor affect on metabolism group number
According to) and the corresponding association of metabolism group, orthogonal and noise components.
2) it model evaluation: after establishing O2PLS model, calculates transcript profile and (is associated with and just with the live part in metabolism group
Hand over part) to the percentage contribution of model, carry out the building situation of assessment models.The contribution degree of live part is bigger, indicates model
It is more reasonable to construct.
3), element Contribution Analysis: by constructing O2PLS model, associated section (joint part) each gene is calculated
It is embodied with the size of contribution degree of each metabolin in entire model, contribution degree by load (loading) value.Gene or generation
Load (loading) the value absolute value for thanking to object is bigger, indicates this gene or metabolin and in addition organizes the metabolin or gene learned
Correlation degree is bigger.Meanwhile intuitively to check the element importance of two groups, for all transcript profiles and metabolism group data,
The load diagram (loading plot) of different groups of output is drawn respectively.
4) elements correlation is analyzed between group: element load (loading) value obtained according to previous step is as a result, filter out both-end
(positive and negative each 2.5%, totally 5%) gene and metabolin carry out integration and draw load (loading) figure top 2.5%, are closed with showing
The maximum gene of connection degree and metabolism group, and draw output group associated payload figure.
3, relative coefficient model analysis calculates the pearson relative coefficient of gene expression amount and metabolin abundance,
And it is shown with thermal map and network.Here it should be noted that, in the specific embodiment of the invention, when sample packet >=3, just needs
Carry out relative coefficient model analysis.
Specifically, relative coefficient model analysis includes but is not limited to following procedure:
3.1, it calculates Pearson correlation coefficient (Pearson correlation coefficient): Pearson came phase relation
Number can be used to measure the correlation between two variables, represent the power of two variable co-variations, value range be [- 1 ,+
1].The pearson coefficient of gene expression amount and metabolin abundance is calculated, to assess correlation of the gene with metabolin.In this hair
In bright specific embodiment, Pearson correlation coefficient (Pearson correlation coefficient) includes two types: 1)
The pearson coefficient of all differences gene expression amount and all differences metabolin abundance;2) all differences gene expression amount and institute
There is the pearson coefficient of metabolin abundance.
3.2, draw correlation thermal map: based on the differential gene expression amount and difference metabolin abundance having calculated that in 3.1
Pearson coefficient content, sort from high to low by correlation absolute value, take several before ranking (such as preceding 10), and correlation
Absolute value is greater than the gene of preset value (such as 0.5), then (group difference is metabolized with all differences metabolin by the gene filtered out
The union of object) correlation shown with thermal map.
3.3, draw correlation networks figure: rich based on the differential gene expression amount and difference metabolin having calculated that in 3.1
The pearson coefficient content of degree, garbled data draws correlation networks figure, to show gene or the generation in important relative position
Thank to object.
In conclusion a kind of transcript profile of the present invention and metabolism group data relation analysis method and system are by distinguishing sample
Transcript profile sequencing and the sequencing of metabolism group are carried out, two groups of data of gene expression amount and metabolin abundance based on acquisition are associated point
It analyses, analyzing and associating feature, can solve the problems, such as that there are one-sidedness and partial data unreliability for the single data for organizing sequencing, originally
The entire biological process of invention complete picture excavates the gene and metabolin for participating in regulation process, discloses true gene table
Up to regulated and control network, crucial signal path is determined, obtain more complete access and mechanism parsing, pass through information between biomolecule
Transmitting, matching coordinative finally show specific function, and the confluence analysis of multiple groups is more systematic, and it is complicated to be more advantageous to announcement
Functional mechanism.
The above-described embodiments merely illustrate the principles and effects of the present invention, and is not intended to limit the present invention.Any
Without departing from the spirit and scope of the present invention, modifications and changes are made to the above embodiments by field technical staff.Therefore,
The scope of the present invention, should be as listed in the claims.
Claims (10)
1. a kind of transcript profile and metabolism group data relation analysis method, include the following steps:
Step S1 carries out transcript profile sequencing to sample, and carries out analysis of biological information to transcript profile data, obtains differential expression base
Cause;
Step S2 carries out metabolism group sequencing to sample, and carries out analysis of biological information to metabolism group data, obtains difference metabolism
Object;
Step S3, difference expression gene and difference metabolin based on acquisition carry out analysis of biological information linked character.
2. a kind of transcript profile as described in claim 1 and metabolism group data relation analysis method, it is characterised in that: in step S3
In, gene expression amount and metabolin abundance data based on acquisition carry out including but not limited to Pathway functional mode analysis,
O2PLS model analysis and relative coefficient model analysis, the Pathway functional mode are total for query gene and metabolin
Some KEGG metabolic pathways, analyze the linked character of gene and metabolin in shared metabolic pathway, and the O2PLS model analysis is used
In the gene and metabolism that use gene expression amount and metabolin abundance data building O2PLS model and then the relevant property of prediction acquisition
Object collection combined analysis, the relative coefficient model analysis is for calculating pearson phase of the gene expression amount with metabolin abundance
It closes property coefficient and exports displaying.
3. a kind of transcript profile as claimed in claim 2 and metabolism group data relation analysis method, it is characterised in that: described
The analysis of Pathway functional mode includes but is not limited to point that group difference gene and group difference metabolin share metabolic pathway
The analysis of the shared metabolic pathway of analysis, group difference gene and all metabolins and all genes and the shared metabolism of all metabolins
The analysis of access.
4. a kind of transcript profile as claimed in claim 3 and metabolism group data relation analysis method, it is characterised in that: at every kind point
It analyses in type, first progress pathway annotation, the gene and metabolin and KEGG database pathway that will need to participate in analyzing
Included in the information of gene and metabolin matching is compared, to obtain the pathway where gene and metabolin, so
The gene and metabolin for being present in same pathway are shown with graphic statistics afterwards, annotates the gene for same pathway and metabolism
Object has been deemed likely to potential function association, and the association letter of gene and metabolin is finally obtained in the form of shared pathway
Breath.
5. a kind of transcript profile as claimed in claim 3 and metabolism group data relation analysis method, it is characterised in that: described
Pathway functional mode analysis also draw output gene metabolic pathway figure associated with metabolin, with intuitively present gene and
The associated functional character of metabolin.
6. a kind of transcript profile as claimed in claim 2 and metabolism group data relation analysis method, which is characterized in that described
O2PLS model analysis includes the following steps:
Step 2.1, it is repeatedly modeled by intersection-verifying method, calculates the prediction error modeled every time, select pre- modeling
In most suitable model, generally, prediction the smaller expression model of error it is more reasonable;By repeatedly pre- modeling, most suitable O2PLS is obtained
Model;
Step 2.2, percentage contribution of the live part to model in transcript profile and metabolism group, the building feelings of assessment models are calculated
Condition;
Step 2.3, the contribution degree of each gene of associated section and each metabolin in entire model, the size of contribution degree are calculated
It is embodied by load value, for all transcript profiles and metabolism group data, draws the load diagram of different groups respectively;
Step 2.4, the element load value obtained according to step 2.3 draws load as a result, screening-gene and metabolin carry out integration
Figure, to show the maximum gene of correlation degree and metabolism group, and draws output group associated payload figure.
7. a kind of transcript profile as claimed in claim 2 and metabolism group data relation analysis method, it is characterised in that: the correlation
Property coefficient calculates the pearson coefficient and institute of including but not limited to all differences gene expression amount and all differences metabolin abundance
There is the pearson coefficient of gene expression amount Yu all metabolin abundance.
8. a kind of transcript profile as claimed in claim 7 and metabolism group data relation analysis method, it is characterised in that: the correlation
Property coefficient model analysis is also used to draw output correlation thermal map and network using calculating gained coefficient.
9. a kind of transcript profile as claimed in claim 8 and metabolism group data relation analysis method, which is characterized in that the correlation
Property coefficient model analysis includes the following steps:
Step 3.1, the pearson coefficient of gene expression amount and metabolin abundance is calculated;
Step 3.2, in the pearson coefficient based on the calculated differential gene expression amount of step 3.1 and difference metabolin abundance
Hold, sort from high to low by correlation absolute value, takes several before ranking, and correlation absolute value is greater than the gene of preset value, then
The correlation of the gene filtered out and all differences metabolin is shown with thermal map;
Step 3.3, in the pearson coefficient based on the calculated differential gene expression amount of step 3.1 and difference metabolin abundance
Hold, garbled data draws output correlation network, to show the gene or metabolin that are in important relative position.
10. a kind of transcript profile and metabolism group data association analysis system, comprising:
Difference expression gene analytical unit for carrying out transcript profile sequencing to sample, and carries out biological information to transcript profile data
Analysis obtains difference expression gene;
Difference metabolite analysis unit for carrying out metabolism group sequencing to sample, and carries out biological information point to metabolism group data
Analysis obtains difference metabolin;
Association analysis unit, for carrying out analysis of biological information linked character based on difference expression gene and difference metabolin.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910176587.4A CN109979527A (en) | 2019-03-08 | 2019-03-08 | A kind of transcript profile and metabolism group data relation analysis method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910176587.4A CN109979527A (en) | 2019-03-08 | 2019-03-08 | A kind of transcript profile and metabolism group data relation analysis method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109979527A true CN109979527A (en) | 2019-07-05 |
Family
ID=67078287
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910176587.4A Pending CN109979527A (en) | 2019-03-08 | 2019-03-08 | A kind of transcript profile and metabolism group data relation analysis method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109979527A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110970116A (en) * | 2019-12-05 | 2020-04-07 | 吉林省蒲川生物医药有限公司 | Transcriptomics-based traditional Chinese medicine pharmacological mechanism analysis method |
CN111061818A (en) * | 2019-12-27 | 2020-04-24 | 北京百迈客生物科技有限公司 | Metabolic group and other omics combined analysis method and device |
CN111292809A (en) * | 2020-01-20 | 2020-06-16 | 至本医疗科技(上海)有限公司 | Method, electronic device, and computer storage medium for detecting RNA level gene fusion |
CN111709219A (en) * | 2020-04-28 | 2020-09-25 | 上海欧易生物医学科技有限公司 | Method for personalized display of single omics and multi-group science KEGG PATHWAY map expression heatmaps and application |
CN112986411A (en) * | 2019-12-17 | 2021-06-18 | 中国科学院地理科学与资源研究所 | Biological metabolite screening method |
CN113707221A (en) * | 2021-08-31 | 2021-11-26 | 中国水产科学研究院南海水产研究所 | Fish sauce flavor forming functional microbial exoenzyme mining method based on multi-dimensional data |
CN114333994A (en) * | 2020-09-30 | 2022-04-12 | 天津现代创新中药科技有限公司 | Method and system for determining differential gene pathways based on reference-free transcriptome sequencing |
CN116129991A (en) * | 2023-04-17 | 2023-05-16 | 南京派森诺基因科技有限公司 | Non-targeted metabolic component analysis method based on qualitative and quantitative data of metabolites |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103558354A (en) * | 2013-11-15 | 2014-02-05 | 南京大学 | Water toxicity analysis method based on biologic omics integrated technology |
CN108103176A (en) * | 2018-01-02 | 2018-06-01 | 中国药科大学 | Method based on metabolism group and transcription group association analysis screening fritillaria alkaloid synthesis key gene |
-
2019
- 2019-03-08 CN CN201910176587.4A patent/CN109979527A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103558354A (en) * | 2013-11-15 | 2014-02-05 | 南京大学 | Water toxicity analysis method based on biologic omics integrated technology |
CN108103176A (en) * | 2018-01-02 | 2018-06-01 | 中国药科大学 | Method based on metabolism group and transcription group association analysis screening fritillaria alkaloid synthesis key gene |
Non-Patent Citations (1)
Title |
---|
金玉等: "转录组-代谢组分析方法及其在药物作用机理研究中的应用", 《生物技术通报》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110970116A (en) * | 2019-12-05 | 2020-04-07 | 吉林省蒲川生物医药有限公司 | Transcriptomics-based traditional Chinese medicine pharmacological mechanism analysis method |
CN110970116B (en) * | 2019-12-05 | 2023-09-01 | 吉林省蒲川生物医药有限公司 | Traditional Chinese medicine pharmacological mechanism analysis method based on transcriptome |
CN112986411A (en) * | 2019-12-17 | 2021-06-18 | 中国科学院地理科学与资源研究所 | Biological metabolite screening method |
CN111061818A (en) * | 2019-12-27 | 2020-04-24 | 北京百迈客生物科技有限公司 | Metabolic group and other omics combined analysis method and device |
CN111061818B (en) * | 2019-12-27 | 2023-06-30 | 北京百迈客生物科技有限公司 | Metabolic group and other group combined analysis method and device |
CN111292809A (en) * | 2020-01-20 | 2020-06-16 | 至本医疗科技(上海)有限公司 | Method, electronic device, and computer storage medium for detecting RNA level gene fusion |
CN111709219A (en) * | 2020-04-28 | 2020-09-25 | 上海欧易生物医学科技有限公司 | Method for personalized display of single omics and multi-group science KEGG PATHWAY map expression heatmaps and application |
CN114333994A (en) * | 2020-09-30 | 2022-04-12 | 天津现代创新中药科技有限公司 | Method and system for determining differential gene pathways based on reference-free transcriptome sequencing |
CN113707221A (en) * | 2021-08-31 | 2021-11-26 | 中国水产科学研究院南海水产研究所 | Fish sauce flavor forming functional microbial exoenzyme mining method based on multi-dimensional data |
CN116129991A (en) * | 2023-04-17 | 2023-05-16 | 南京派森诺基因科技有限公司 | Non-targeted metabolic component analysis method based on qualitative and quantitative data of metabolites |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109979527A (en) | A kind of transcript profile and metabolism group data relation analysis method and system | |
Huisman et al. | Software for social network analysis | |
US7856317B2 (en) | Systems and methods for constructing genomic-based phenotypic models | |
Ni et al. | M2IA: a web server for microbiome and metabolome integrative analysis | |
CN107368700A (en) | Based on the microbial diversity interaction analysis system and method for calculating cloud platform | |
Stelzer et al. | Combining the scenario technique with bibliometrics for technology foresight: The case of personalized medicine | |
CN108921221A (en) | Generation method, device, equipment and the storage medium of user characteristics | |
Goutelle et al. | Nonparametric methods in population pharmacokinetics | |
US20160019335A1 (en) | Method, apparatus and computer program product for metabolomics analysis | |
US20190005187A1 (en) | Simulating the metabolic pathway dynamics of an organism | |
CN108335756B (en) | Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database | |
Taylor et al. | Survival estimation and testing via multiple imputation | |
Dagliati et al. | Using topological data analysis and pseudo time series to infer temporal phenotypes from electronic health records | |
Ghadiri et al. | BigFCM: Fast, precise and scalable FCM on hadoop | |
Leydesdorff et al. | Construction of a pragmatic base line for journal classifications and maps based on aggregated journal-journal citation relations | |
Agarwal et al. | Survival prediction based on histopathology imaging and clinical data: A novel, whole slide cnn approach | |
Zhou et al. | A user-driven sampling model for large-scale geographical point data visualization via convolutional neural networks | |
Ashwood et al. | Proceedings of the EuBIC-MS 2020 Developers’ Meeting | |
Luboschik et al. | Feature‐Driven Visual Analytics of Chaotic Parameter‐Dependent Movement | |
Dos Santos | A framework for the visualization of multidimensional and multivariate data | |
Mironov et al. | Monitoring YouTube video views in the educational environment based on situation-oriented database and RESTful Web Services | |
CN104866929A (en) | International investment index data processing and analysis method and international investment index data processing and analysis system | |
Gavai et al. | Constraint-based probabilistic learning of metabolic pathways from tomato volatiles | |
Zhang et al. | Interactive analysis of systems biology molecular expression data | |
Eicher | Understanding glycolysis in Escherichia coli: a systems approach using nuclear magnetic resonance spectroscopy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190705 |