CN104182654A - Protein-protein interaction network based gene set identification method - Google Patents

Protein-protein interaction network based gene set identification method Download PDF

Info

Publication number
CN104182654A
CN104182654A CN201410370730.0A CN201410370730A CN104182654A CN 104182654 A CN104182654 A CN 104182654A CN 201410370730 A CN201410370730 A CN 201410370730A CN 104182654 A CN104182654 A CN 104182654A
Authority
CN
China
Prior art keywords
genes
proteins
protein
gene
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410370730.0A
Other languages
Chinese (zh)
Other versions
CN104182654B (en
Inventor
吴康
黄家颖
范小勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI PUBLIC HEALTH CLINICAL CENTER
Original Assignee
SHANGHAI PUBLIC HEALTH CLINICAL CENTER
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI PUBLIC HEALTH CLINICAL CENTER filed Critical SHANGHAI PUBLIC HEALTH CLINICAL CENTER
Priority to CN201410370730.0A priority Critical patent/CN104182654B/en
Publication of CN104182654A publication Critical patent/CN104182654A/en
Application granted granted Critical
Publication of CN104182654B publication Critical patent/CN104182654B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention relates to a protein-protein interaction network based gene set identification method, and belongs to the technical field of genes. The identification method comprises the following steps: finding out genes/proteins in direct interaction with a 'set A' from a 'set B', and naming as a 'node set B'; counting the number of each gene/protein in the 'node set B' in direct interaction with the 'set A', and naming as a dimensionality 'i'; calling out the interactive genes/proteins from the 'set A' through the 'node set B [i]' with different minimal dimensionalities 'i', and naming as a 'set A [i]'; counting the aggregate z-score of the 'set A [i]'; adopting the 'set A [i]' with the maximal aggregate z-score as the obtained gene set. The identification method can identify the gene sets more relevant to the biological processes, and is helpful for relevant researchers to carry out correlational research work.

Description

Gene set authentication method based on protein-protein interaction network
Technical field
The invention belongs to gene technology field, be specifically related to a kind of gene set authentication method based on protein-protein interaction network.
Background technology
The dynamic change of transcribing group/protein groups causes the change of cell function.Genes/proteins is not independently to play a role, but in protein-protein interaction network by playing a role with the interaction of other albumen.Therefore, the group data mining based on protein-protein interaction network can be found the biological information that some are new.Based on this, if learning data, group can under protein-protein interaction information auxiliary, analyze, analysis result will have biological correlativity more.
At present, the interactive network analysis for remarkable modulation genes/proteins mainly depends on the direct interaction information between these genes/proteins.But the expression of a plurality of genes/proteins shows that it may interact with a key node genes/proteins (remarkable modulation does not occur).This key node genes/proteins also may interact with other a plurality of genes/proteins simultaneously.Analysis based on remarkable modulation genes/proteins direct interaction may cause losing those by key node genes/proteins and the remarkable modulation genes/proteins of Indirect Interaction.Therefore, carry out the group data analysis based on protein-protein interaction network, can not ignore those key node genes/proteins.
Summary of the invention
The object of the invention is to overcome the deficiencies in the prior art, a kind of gene set authentication method based on protein-protein interaction network is provided.Method of the present invention can identify the gene set more relevant with bioprocess, contributes to related researcher to carry out correlative study work.
The present invention realizes by following technical scheme, the present invention relates to a kind of gene set authentication method based on protein-protein interaction network, comprises the steps:
Step 1 is found out the genes/proteins that direct interaction occurs with " set A " from " data set B ", and called after " set of node B "; Genes/proteins in " set of node B " comes from " data set B ", and has genes/proteins with " set A ";
Step 2, in statistics " set of node B " there is the number of direct interaction in each genes/proteins and " set A ", and this number is named as the dimension " i " of genes/proteins in " set of node B ", and the genes/proteins in " set of node B " has different dimensions;
Step 3, with " the set of node B[i] " with different smallest dimension " i ", from " set A ", recall those interactional genes/proteins, and be named as " set A [i] ", in " set A ", remaining genes/proteins is named as " set A [i] is remaining ";
Step 4, the gathering z value of calculating " set A [i] ";
Step 5, " set A [i] " with the maximum z of gathering value is the identified gene set based on protein-protein interaction network.
Preferably, in step 1, described data set B is protein-protein interaction data in public database.
Preferably, in step 1, described set A is the remarkable modulation obtaining from relevant full genetic transcription group data, and has the gene set of biological function enrichment.
Preferably, in step 4, the calculating of described gathering z value comprises the steps:
A) calculate the expression conspicuousness of each genes/proteins, i.e. the correction p value of conspicuousness comparison between biological specimen different disposal interested;
B) with 1, deduct this correction p value, and then divided by normal state cumulative distribution function, generate z value;
C) be added the z value of all genes/proteins in " set A [i] ", and divided by the square root of " set A [i] " genes/proteins number, obtain assembling z value; Relative expression's variation that can relatively there is different genes/albumen number " set A [i] " by assembling z value, gathering z value is higher, and " set A [i] " expresses more remarkable.
Compared with prior art, the present invention has following beneficial effect:
Bioprocess itself has been considered in overall evaluation of a technical project of the present invention, be that the gene that is closely related of function is when answer signal stimulates, the modulation of these genes may be subject to the impact of certain key gene (i.e. " key node genes/proteins "), and remarkable modulation may not occur this key gene.Key itself and the remarkable interactional number of modulation genes/proteins of passing through of key node genes/proteins, dimension " i " embodies.Dimension " i " is larger, and it is more crucial.Meanwhile, the integral body that has also considered institute's identified gene is expressed modulation information, assembles z value.Assemble z value larger, gene set modulation is more remarkable.Certainly, no matter be the dimension " i " of key node genes/proteins, or the gathering z value of gene set, all the objective direct important indicator in bioprocess of having reacted, has utilized the natural law relating in biology.
The gene set that method of the present invention is identified has following effect: identify the gene set more relevant with bioprocess.Also there is important biomolecule function with the interactional node genes/proteins of gene set.Based on this gene set and/or node genes/proteins, contribute to related researcher to carry out next step correlative study work.Such as gene function analysis, medical diagnosis on disease, disease treatment prognosis etc.
Accompanying drawing explanation
By reading the detailed description of non-limiting example being done with reference to the following drawings, it is more obvious that other features, objects and advantages of the present invention will become:
Fig. 1 is the analysis process of identifying based on protein-protein interaction networking gene set.
Fig. 2 is that Much's bacillus (Mtb) infects after THP-1 cell, the main policies that the gene set based on protein-protein interaction data of THP-1 cell transcription spectrum is identified.
Fig. 3 is while adopting the node of different smallest dimension, the identified gene collection THP1r2Mtb-iNet[i of institute] gathering z value (A) and while adopting the node that smallest dimension is 14, the identified gene collection TMtb-iNet of institute, corresponding residue gene set TMtb-iEx, and the box figure of original gene collection THP1r2Mtb-induced expression shows (B).
Fig. 4 is THP1r2Mtb-iNet[i] and THP1r2Mtb-iEx[i] the transcription factor binding site point enrichment analysis (A-C) of gene promoter area, and while adopting the node that smallest dimension is 14, the identified gene collection TMtb-iNet of institute, the transcription factor binding site point enrichment analysis (D) of corresponding residue gene set TMtb-iEx gene promoter area.
Fig. 5 is the biological pathway analysis of THP1r2Mtb-induced and TMtb-iNet.
Fig. 6 is the gene overlap analysis of THP1r2Mtb-induced (A), TMtb-iNet (B) and TMtb-iEx (C) and interferon module gene (M3.1).
Fig. 7 is the correlation analysis of THP1r2Mtb-induced, TMtb-iNet and TMtb-iEx and pulmonary tuberculosis patient correlated expression spectrum data.
Embodiment
Below in conjunction with specific embodiment, further set forth the present invention.These embodiment are only not used in and limit the scope of the invention for the present invention is described.The experimental technique of unreceipted actual conditions in the following example, conventionally according to normal condition, for example Sambrook equimolecular is cloned: laboratory manual (New York:Cold Spring Harbor Laboratory Press, 1989) condition described in, or the condition of advising according to manufacturer.
Interactive network analysis for remarkable modulation genes/proteins mainly depends on the direct interaction information between these genes/proteins.But the expression of a plurality of genes/proteins shows that it may interact with a key node genes/proteins (although remarkable modulation does not occur).The present invention consider genes/proteins modulation degree and with the interaction situation of key node genes/proteins, from flux data, identify the gene set more relevant with bioprocess.Gene set is identified concrete taking into account critical node genes/proteins and the remarkable Degree of interaction (dimension) of modulation genes/proteins and the whole expression of institute's identified gene collection (assembling z value).Expressing the most obviously gene set of (the maximum z of gathering value) is identified gene set.
The present invention is by comprehensive protein-protein interaction information and transcribe group gene modulation information, and consider the interaction situation of remarkable modulation genes/proteins and key node genes/proteins (remarkable modulation does not occur), identify the gene set more relevant to bioprocess.
Before implementing technical scheme of the present invention, need to obtain: 1) the gene expression modulation information of relevant full genetic transcription group data, and obtained remarkable modulation and there is the gene set of certain biological function enrichment, this gene set called after " set A ", and the genes/proteins of set A is (concrete time point, concrete processing etc.) or unanimously rise under states of interest; Or unanimously lower.As " THP1r2Mtb-induced " in embodiment, its gene 18h after Mtb infects significantly raises (with respect to 4h); 2) protein-protein interaction data in public database, this numerical nomenclature is " data set B ", as " the STRING protein-protein interaction data " in embodiment.
Fig. 1 is the analysis process of identifying based on protein-protein interaction networking gene set:
1) from " data set B ", find out the genes/proteins that direct interaction occurs with " set A ", the protein-protein interaction centering in " data set B " only has an albumen to come from " set A ", and called after " set of node B ".Genes/proteins in " set of node B " comes from " data set B ", and " set A " not total genes/proteins.
2) number of each genes/proteins and " set A " generation direct interaction in statistics " set of node B ", i.e. how many genes/proteins generation direct interactions in certain genes/proteins in " set of node B " and " set A ", this number is named as the dimension " i " of genes/proteins in " set of node B ".Genes/proteins in " set of node B " has different dimensions.
3) with " the set of node B[i] " with different smallest dimension " i ", from " set A ", recall those interactional genes/proteins, and be named as " set A [i] ", as " THP1r2Mtb-iNet[i] " in embodiment.Genes/proteins in " set A [i] " may directly interact each other, or by thering is " the set of node B[i] " of different smallest dimension " i " and indirectly interacting.Corresponding, in " set A ", remaining genes/proteins is named as " set A [i] is remaining ", as " THP1r2Mtb-iEx[i] " in embodiment.
4) calculate the gathering z value (aggregate z-score) of " set A [i] " 1.Concrete, assemble being calculated as follows of z value: a) calculate the expression conspicuousness of each genes/proteins, i.e. the correction p value of conspicuousness comparison between biological specimen different disposal interested; B) with 1, deduct this correction p value, and then divided by normal state cumulative distribution function (normal cumulative distribution function, normal CDF), generate z value; C) be added the z value of all genes/proteins in " set A [i] ", and divided by the square root of " set A [i] " genes/proteins number, obtain assembling z value.Relative expression's variation that can relatively there is different genes/albumen number " set A [i] " by assembling z value.Assemble z value higher, " set A [i] " expresses more remarkable; Vice versa.
5) " set A [i] " that have the maximum z of a gathering value is the identified gene set based on protein-protein interaction network.
Below be specifically addressed, the data in following embodiment are based on the metainfective interferon related gene of host's macrophage (THP-1 cell) Killing Mycobacterium Tuberculosis collection (THP1r2Mtb-induced, i.e. " set A " in claims) 2, by conjunction with STRING protein-protein interaction data, and " data set B " in claims 3,4, further excavate one gene set based on protein-protein interaction network, i.e. TMtb-iNet, and further carried out relevant checking.
embodiment
1 method
1.1 protein-protein interaction data
Protein-protein interaction data come from STRING database 3,4.The protein-protein physics that STRING database comprises a plurality of species and function interaction data.Inventor therefrom extracts the special protein-protein interaction data of people, and its interactional combined value (combined socre) is at least 0.7.This standard has guaranteed the high coverage rate of data, has also guaranteed the high-quality of data.
1.2 from the derivative gene set based on protein-protein interaction network of THP1r2Mtb-induced
First, from STING protein-protein interaction data, find out the genes/proteins that direct interaction occurs with THP1r2Mtb-induced, called after " set of node ", i.e. aforesaid " set of node B ".Genes/proteins in set of node comes from protein-protein interaction data, and the not total genes/proteins of THP1r2Mtb-induced.Secondly, in statistics set of node there is the number of direct interaction in each genes/proteins and THP1r2Mtb-induced, and this number is named as the dimension " i " of genes/proteins in set of node.Two nodes as shown in Figure 2, the dimension of a node is 3, the dimension of another node is 4.Genes/proteins in set of node has different dimensions.With the set of node [i] with different smallest dimension " i ", recall those interactional genes/proteins from THP1r2Mtb-induced, it is named as " THP1r2Mtb-iNet[i] ", i.e. " set A [i] " in claims.THP1r2Mtb-iNet[i] in genes/proteins may directly interact each other, or indirectly interact by thering is the set of node [i] of different smallest dimension " i ".Corresponding, in THP1r2Mtb-induced, remaining genes/proteins is named as " THP1r2Mtb-iEx[i] ", i.e. " set A [i] is remaining " in claims.Calculate THP1r2Mtb-iNet[i] gathering z value (aggregate z-score) 1.Concrete, assemble being calculated as follows of z value: a) calculate the expression conspicuousness of each genes/proteins, proofread and correct p value; B) with 1, deduct this correction p value, and then divided by normal state cumulative distribution function (normal cumulative distribution function, normal CDF), generate z value; C) be added THP1r2Mtb-iNet[i] in the z value of all genes/proteins, and divided by THP1r2Mtb-iNet[i] square root of middle genes/proteins number, obtain assembling z value.By assembling z value, can relatively there is different genes/albumen number THP1r2Mtb-iNet[i] relative expression's variation.Assemble z value higher, THP1r2Mtb-iNet[i] expression is more remarkable; Vice versa.The THP1r2Mtb-iNet[i with the maximum z of gathering value] be the identified gene set based on protein-protein interaction network.
2.3 transcription factor binding site point enrichment analyses
PRomoter Integration in Microarray Analysis (PRIMA) is used to related gene collection TFBS enrichment and analyzes 5.The promoter region of analyzing is that transcription initiation site upstream 2000bp is to downstream 200bp.With full genomic gene as a setting.Bonferroni proofreaies and correct p value <0.01 and is considered to have statistical significance.
The enrichment of 2.4KEGG signal path is analyzed
By online database Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7 carries out signal path enrichment analysis 6.The false discovery rate (False Discovery Rate, FDR) of proofreading and correct based on Benjamini and Hochberg carries out statistical study.
(gene set enrichment analysis, GSEA) analyzed in the 2.5 gene set enrichments for pulmonary tuberculosis (pulmonary tuberculosis, PTB)
GSEA can judge gene set the data centralization of sort (sorting from high to low according to expression) be mainly distributed in above or be mainly distributed in below 7.Inventor downloads and obtains transcribing spectrum data set GSE19491 from NCBI GEO 8.
GSE19491 comprises from a large amount of PTB, latent infection (latent tuberculosis, LTB), and the whole blood express spectra data of Healthy People (healthy control, HC).These volunteers are divided into a plurality of groups: 1) training group (training set), comprise PTB, LTB, HC, and it all comes from London; 2) test set (test set), comprises PTB, LTB, HC, and it also comes from London; 3) checking group (validation set), comprises PTB, LTB, and it comes from Cape Town, RSA; 4) test set _ separation (test set_seperated), comprises separated neutrophil leucocyte (neut), monocyte (mono), CD4+ (CD4) and CD8+ (CD8) T cell from PTB and HC; 5) treatment group (longitudinal), comprises before PTB treatment, medicine begin treatment February (PTB_2m), medicine begin treatment Dec (PTB_12m), and HC.
GSEA result judges by NES (Normalized Enrichment Score) and FDR (false discovery rate).Positive NES shows gene set enrichment above express spectra data set, this gene set and this express spectra data set positive correlation is described, at the main up-regulated expression of express spectra data centralization; Negative NES shows the enrichment below express spectra data set of this gene set, and this gene set and this express spectra data set negative correlation are described, in express spectra data centralization, mainly lowers expression.FDR<=0.05 shows that NES has statistical significance 7.
2 results
2.1 identify the gene set based on protein-protein interaction network from THP1r2Mtb-induced, and it has embodied the principal character of THP1r2Mtb-induced
Genes/proteins plays a role in molecular network, and the disturbance meeting of molecular network affects the phenotype of cell 9.Therefore by integral protein-protein-interacting data, THP1r2Mtb-induced can be further by refining.As shown in Figure 2, inventor further extracts genes/proteins interact with each other from THP1r2Mtb-induced, or extracts by the genes/proteins of set of node Indirect Interaction.Interactional gene set, remaining gene set, and
THP1r2Mtb-induced, is further used to the GSEA (Fig. 2) for patient's correlated expression spectrum data.From protein-protein interaction database, select with THP1r2Mtb-induced interactional genes/proteins, i.e. set of node occur.In set of node, the number of each genes/proteins and THP1r2Mtb-induced generation interaction gene/albumen is named as the dimension of node, i.e. i.In THP1r2Mtb-induced there is interactional one genoid/albumen and be named as THP1r2Mtb-iNet[i in set of node interact with each other or that be i by smallest dimension [i] indirectly].
In THP1r2Mtb-induced, remaining gene is named as THP1r2Mtb-iEx[i].Because the dimension of different nodes is different, therefore for a series of THP1r2Mtb-iNet[i], inventor calculates respectively it and assembles z value.As shown in Figure 3A, when the smallest dimension of set of node is 14, i.e. set of node [i=14], corresponding THP1r2Mtb-iNet[i=14] gathering z value maximum.Inventor is by THP1r2Mtb-iNet[i=14] referred to as TMtb-iNet, corresponding THP1r2Mtb-iEx[i=14] referred to as TMtb-iEx.Than TMtb-iEx, TMtb-iNet up-regulated expression is (Fig. 3 B) more significantly.
Three transcription factor binding site points relevant with interferon of gene promoter area significant enrichment of THP1r2Mtb-induced, i.e. ISRE (IFN-stimulated response element), IRF-1 (interferon regulatory factor1), IRF-7 2.Consistent, inventor also labor these three transcription factor binding site points at THP1r2Mtb-iNet[i] and THP1r2Mtb-iEx[i] enrichment degree of gene promoter area.As shown in Fig. 4 A, 4B and 4D, no matter use the set of node of any smallest dimension, ISRE and IRF-7 are enriched in THP1r2Mtb-iNet[i more significantly] gene promoter area.Contrary, IRF-1 is at THP1r2Mtb-iNet[i] and THP1r2Mtb-iEx[i] gene promoter area significant enrichment all, and the dimension of set of node irrelevant (Fig. 4 C and 4D).
Than THP1r2Mtb-induced, TMtb-iNet more significant enrichment cytokine-cytokine receptor interactoin, chemokine signalling, NOD-like receptor signalling signal path (Fig. 5).Any signal path of not enrichment of TMtb-iEx.
In sum, the set of node that is 14 by application smallest dimension, inventor identifies the gene set based on protein-protein interaction network, i.e. a TMtb-iNet.TMtb-iNet expresses modulation significantly (the highest gathering z value), simultaneously also at its gene promoter area significant enrichment ISRE, IRF-7 and these three transcription factor binding site points of IRF-1.
2.2TMtb-iNet contains more interferon related gene than TMtb-iEx
THP1r2Mtb-induced is relevant with interferon process 2.Meanwhile, TMtb-iNet has inherited the main biological characteristic (Fig. 4 and Fig. 5) of THP1r2Mtb-induced.Based on this, whether inventor further analyzes TMtb-iNet contains more interferon related gene than TMtb-iEx.The express spectra data analysis based on to a plurality of disease patient's PERIPHERAL BLOOD MONONUCLEAR CELL such as Chaussabel D, has built series of genes module.These gene modules present special consistent expression in a plurality of diseases.And based on literature research, author has done functional annotation by a plurality of gene modules, comprising interferon correlation module, i.e. a M3.1 10,11.THP1r2Mtb-induced comprises in interferon gene module the gene of half nearly, in 95 genes of level 44 2.Relatively find, TMtb-iNet has comprised wherein 33 genes, and TMtb-iEx has only comprised wherein 11 gene (p=4.32 * 10 -6) (Fig. 6).This result shows, the gene set of identifying based on protein-protein interaction network, and TMtb-iNet, comprises more interferon related gene than TMtb-iEx.And confirmed the rationality based on protein-protein interaction network identified gene diversity method.
2.3 compare with THP1r2Mtb-induced or TMtb-iEx, and TMtb-iNet and patient's PTB positive correlation degree is more consistent, but higher with the separated specific cell group's from patient PTB positive correlation degree
As shown in the PTB_1 & 2 of Fig. 7, no matter PTB comes from training group or test group, the positive correlation degree of TMtb-iNet and THP1r2Mtb-induced and PTB is substantially suitable.The positive correlation degree of TMtb-iEx and PTB is lower.This result shows, the TMtb-iNet identifying based on protein-protein interaction network compares with THP1r2Mtb-induced, has similar up-regulated expression degree in PTB patients blood.
Inventor further analyzes the positive correlation degree of TMtb-iNet and the separated neutrophil leucocyte from patient PTB, monocyte, CD4+ and CD8+ cell.Result shows that TMtb-iNet and these four kinds of cells are also conspicuousness positive correlation.
The positive correlation degree of TMtb-iNet and CD4+, CD8+T cell is higher than THP1r2Mtb-induced.Because TMtb-iNet and neutrophil leucocyte, monocytic positive correlation degree are similar in appearance to THP1r2Mtb-induced, so the higher positive correlation of TMtb-iNet and CD4+, CD8+ has specificity.TMtb-iEx and neutrophil leucocyte, monocytic positive correlation degree are lower; To CD4+, CD8+T cell without conspicuousness relevant (PTB_3-6 of Fig. 7).
In sum, compare with TMtb-iEx with THP1r2Mtb-induced, the gene set TMtb-iNet identifying based on protein-protein interaction network and patient's PTB positive correlation degree are more consistent, but higher with the separated specific cell group's from patient PTB positive correlation degree.
2.4 in the therapeutic process of PTB, and TMtb-iNet declines faster than THP1r2Mtb-induced or TMtb-iEx
As shown in the PTB_7-9 of Fig. 7, the bimester that treatment starts after, the positive correlation of TMtb-iNet and PTB declines to some extent, but still has conspicuousness.But in treatment, start after 12 months, the correlativity of TMtb-iNet and PTB does not have conspicuousness.No matter and be before treatment starts, treatment starts two months, or treatment starts 12 months, the positive correlation of THP1r2Mtb-induced and TMtb-iEx and PTB really has statistical significance always.These results show, the gene set TMtb-iNet identifying based on protein-protein interaction network has responsiveness more to the treatment of PTB.
In sum, the present invention has considered bioprocess itself, be that the gene that is closely related of function is when answer signal stimulates, the modulation of these genes may be subject to the impact of certain key gene (i.e. " key node genes/proteins "), and remarkable modulation may not occur this key node genes/proteins;
Key itself and the remarkable interactional number of modulation genes/proteins of passing through of key node genes/proteins, dimension " i " embodies.Dimension " i " is larger, and it is more crucial.Meanwhile, the integral body that has also considered institute's identified gene is expressed modulation information, assembles z value.Assemble z value larger, gene set modulation is more remarkable.Certainly, no matter be the dimension " i " of key node genes/proteins, or the gathering z value of gene set, all objective direct important indicator in bioprocess of having reacted.
The gene set that method of the present invention is identified has following effect: identify the gene set more relevant with bioprocess.Also there is important biomolecule function with the interactional node genes/proteins of gene set.Based on this gene set and/or node genes/proteins, contribute to related researcher to carry out next step correlative study work.Such as gene function analysis, medical diagnosis on disease, disease treatment prognosis etc.
The reference paper the present invention relates to is listed as follows:
1.Ideker?T,Ozier?O,Schwikowski?B,Siegel?AF.Discovering?regulatory?and?signalling?circuits?in?molecular?interaction?networks.Bioinformatics?2002;18Suppl1:S233-S240.
2.Wu?K,Dong?D,Fang?H?et?al.An?interferon-related?signature?in?the?transcriptional?core?response?of?human?macrophages?to?Mycobacterium?tuberculosis?infection.PLoS?One?2012;7(6):e38367.
3.Snel?B,Lehmann?G,Bork?P,Huynen?MA.STRING:a?web-server?to?retrieve?and?display?the?repeatedly?occurring?neighbourhood?of?a?gene.Nucleic?Acids?Res2000;28(18):3442-3444.
4.Franceschini?A,Szklarczyk?D,Frankild?S?et?al.STRING?v9.1:protein-protein?interaction?networks,with?increased?coverage?and?integration.Nucleic?Acids?Res2013;41(Database?issue):D808-D815.
5.Ulitsky?I,Maron-Katz?A,Shavit?S?et?al.Expander:from?expression?microarrays?to?networks?and?functions.Nat?Protoc?2010;5(2):303-322.
6.Huang?dW,Sherman?BT,Lempicki?RA.Systematic?and?integrative?analysis?of?large?gene?lists?using?DAVID?bioinformatics?resources.Nat?Protoc?2009;4(1):44-57.
7.Subramanian?A,Tamayo?P,Mootha?VK?et?al.Gene?set?enrichment?analysis:a?knowledge-based?approach?for?interpreting?genome-wide?expression?profiles.Proc?Natl?Acad?Sci?U?S?A?2005;102(43):15545-15550.
8.Berry?MP,Graham?CM,McNab?FW?et?al.An?interferon-inducible?neutrophil-driven?blood?transcriptional?signature?in?human?tuberculosis.Nature2010;466(7309):973-977.
9.Vidal?M,Cusick?ME,Barabasi?AL.Interactome?networks?and?human?disease.Cell2011;144(6):986-998.
10.Chaussabel?D,Quinn?C,Shen?J?et?al.A?modular?analysis?framework?for?blood?genomics?studies:application?to?systemic?lupus?erythematosus.Immunity2008;29(1):150-164.
11.Chaussabel?D,Sher?A.Mining?microarray?expression?data?by?literature?profiling.Genome?Biol?2002;3(10):RESEARCH0055.
Above specific embodiments of the invention are described.It will be appreciated that, the present invention is not limited to above-mentioned specific implementations, and those skilled in the art can make various distortion or modification within the scope of the claims, and this does not affect flesh and blood of the present invention.

Claims (4)

1. the gene set authentication method based on protein-protein interaction network, is characterized in that, comprises the steps:
Step 1 is found out the genes/proteins that direct interaction occurs with " set A " from " data set B ", and called after " set of node B "; Genes/proteins in " set of node B " comes from " data set B ", and has genes/proteins with " set A ";
Step 2, in statistics " set of node B " there is the number of direct interaction in each genes/proteins and " set A ", and this number is named as the dimension " i " of genes/proteins in " set of node B ", and the genes/proteins in " set of node B " has different dimensions;
Step 3, with " the set of node B[i] " with different smallest dimension " i ", from " set A ", recall those interactional genes/proteins, and be named as " set A [i] ", in " set A ", remaining genes/proteins is named as " set A [i] is remaining ";
Step 4, the gathering z value of calculating " set A [i] ";
Step 5, " set A [i] " with the maximum z of gathering value is the identified gene set based on protein-protein interaction network.
2. the gene set authentication method based on protein-protein interaction network as claimed in claim 1, is characterized in that, in step 1, described data set B is protein-protein interaction data in public database.
3. the gene set authentication method based on protein-protein interaction network as claimed in claim 1, it is characterized in that, in step 1, described set A is the remarkable modulation obtaining from relevant full genetic transcription group data, and has the gene set of biological function enrichment.
4. the gene set authentication method based on protein-protein interaction network as claimed in claim 1, is characterized in that, in step 4, the calculating of described gathering z value comprises the steps:
A) calculate the expression conspicuousness of each genes/proteins, i.e. the correction p value of conspicuousness comparison between biological specimen different disposal interested;
B) with 1, deduct this correction p value, and then divided by normal state cumulative distribution function, generate z value;
C) be added the z value of all genes/proteins in " set A [i] ", and divided by the square root of " set A [i] " genes/proteins number, obtain assembling z value; Relative expression's variation that can relatively there is different genes/albumen number " set A [i] " by assembling z value, gathering z value is higher, and " set A [i] " expresses more remarkable.
CN201410370730.0A 2014-07-30 2014-07-30 Protein-protein interaction network based gene set identification method Active CN104182654B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410370730.0A CN104182654B (en) 2014-07-30 2014-07-30 Protein-protein interaction network based gene set identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410370730.0A CN104182654B (en) 2014-07-30 2014-07-30 Protein-protein interaction network based gene set identification method

Publications (2)

Publication Number Publication Date
CN104182654A true CN104182654A (en) 2014-12-03
CN104182654B CN104182654B (en) 2017-04-12

Family

ID=51963689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410370730.0A Active CN104182654B (en) 2014-07-30 2014-07-30 Protein-protein interaction network based gene set identification method

Country Status (1)

Country Link
CN (1) CN104182654B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929282A (en) * 2019-12-05 2020-03-27 武汉深佰生物科技有限公司 Protein interaction-based biological characteristic information early warning method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050197783A1 (en) * 2004-03-04 2005-09-08 Kuchinsky Allan J. Methods and systems for extension, exploration, refinement, and analysis of biological networks
CN102086473A (en) * 2010-05-12 2011-06-08 天津市泌尿外科研究所 Undirected network screening method for key genes of human polygenic disease
CN102375840A (en) * 2010-08-19 2012-03-14 浙江中医药大学附属第一医院 Method for screening micro ribonucleic acid (microRNA) target gene based on natural language processing system
CN103065066A (en) * 2013-01-22 2013-04-24 四川大学 Drug combination network based drug combined action predicting method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050197783A1 (en) * 2004-03-04 2005-09-08 Kuchinsky Allan J. Methods and systems for extension, exploration, refinement, and analysis of biological networks
CN102086473A (en) * 2010-05-12 2011-06-08 天津市泌尿外科研究所 Undirected network screening method for key genes of human polygenic disease
CN102375840A (en) * 2010-08-19 2012-03-14 浙江中医药大学附属第一医院 Method for screening micro ribonucleic acid (microRNA) target gene based on natural language processing system
CN103065066A (en) * 2013-01-22 2013-04-24 四川大学 Drug combination network based drug combined action predicting method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
TREY IDEKER ET AL: "Discovering regulatory and signalling circuits in molecular interaction networks", 《BIOINFORMATICS》 *
李冬果 等: "复杂疾病风险基因模块识别及其调控机制研究", 《中国优生与遗传杂志》 *
袁芳 等: "基于蛋白质相互作用网络预测癌症致病基因", 《计算机应用研究》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929282A (en) * 2019-12-05 2020-03-27 武汉深佰生物科技有限公司 Protein interaction-based biological characteristic information early warning method

Also Published As

Publication number Publication date
CN104182654B (en) 2017-04-12

Similar Documents

Publication Publication Date Title
Zhao et al. Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols
Van Dam et al. Gene co-expression analysis for functional classification and gene–disease predictions
Li et al. Constructing biological networks through combined literature mining and microarray analysis: a LMMA approach
Withnell et al. XOmiVAE: an interpretable deep learning model for cancer classification using high-dimensional omics data
Ma et al. Identification of a sixteen-gene prognostic biomarker for lung adenocarcinoma using a machine learning method
Lin et al. CRISPR‐net: A recurrent convolutional network quantifies CRISPR off‐target activities with mismatches and indels
CN108830045B (en) Biomarker system screening method based on multiomics
CN106033502B (en) The method and apparatus for identifying virus
Karczewski et al. Coherent functional modules improve transcription factor target identification, cooperativity prediction, and disease association
Bhattacharyya et al. MicroRNA transcription start site prediction with multi-objective feature selection
Yousef et al. miRModuleNet: detecting miRNA-mRNA regulatory modules
Zhou et al. A chronological atlas of natural selection in the human genome during the past half-million years
Abedini et al. Spatially resolved human kidney multi-omics single cell atlas highlights the key role of the fibrotic microenvironment in kidney disease progression
Zhang et al. Time to infer miRNA sponge modules
US20210398605A1 (en) System and method for promoter prediction in human genome
Zhang et al. Improving single-cell RNA-seq clustering by integrating pathways
Wang et al. Identification of important modules and hub gene in chronic kidney disease based on WGCNA
CN105483210A (en) RNA (ribonucleic acid) editing locus detection method
Zhou et al. EVlncRNA-Dpred: Improved prediction of experimentally validated lncRNAs by deep learning
Goswami et al. RNA-Seq for revealing the function of the transcriptome
CN104182654B (en) Protein-protein interaction network based gene set identification method
Lee et al. lncExplore: a database of pan-cancer analysis and systematic functional annotation for lncRNAs from RNA-sequencing data
Segal et al. Nucleotide variation of regulatory motifs may lead to distinct expression patterns
Harmanci et al. XCVATR: characterization of variant impact on the embeddings of single-cell and bulk RNA-sequencing samples
Wang et al. A cancer gene module mining method based on bio-network of multi-omics gene groups

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant