CN108509768A - Key protein matter recognition methods based on protein space-time sub-network and identifying system - Google Patents

Key protein matter recognition methods based on protein space-time sub-network and identifying system Download PDF

Info

Publication number
CN108509768A
CN108509768A CN201810287578.8A CN201810287578A CN108509768A CN 108509768 A CN108509768 A CN 108509768A CN 201810287578 A CN201810287578 A CN 201810287578A CN 108509768 A CN108509768 A CN 108509768A
Authority
CN
China
Prior art keywords
protein
node
network
space
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810287578.8A
Other languages
Chinese (zh)
Other versions
CN108509768B (en
Inventor
李敏
李文凯
郑瑞清
王建新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201810287578.8A priority Critical patent/CN108509768B/en
Publication of CN108509768A publication Critical patent/CN108509768A/en
Application granted granted Critical
Publication of CN108509768B publication Critical patent/CN108509768B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Landscapes

  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a kind of key protein matter recognition methods based on protein space-time sub-network and identifying systems, include the following steps:Step 1:Obtain urporotein network;Step 2:Build the reactive protein set of different moments;Step 3:Build the protein set in different subcellular structures;Step 4:According to connection relation, the reactive protein set of different moments, the protein set in different subcellular structures between protein node set up all kinds of subcellular structures different moments space-time sub-network;Step 5:Obtain the maximal degree centrality value of each protein node;Step 6:Key protein matter of the M protein node as prediction before descending arrangement, then selected row is carried out to all proteins node according to the maximal degree centrality of protein node.The recognition accuracy of key protein matter can be improved by the above method.

Description

Key protein matter recognition methods based on protein space-time sub-network and identifying system
Technical field
The invention belongs to systems biology technical fields, and in particular to a kind of crucial egg based on protein space-time sub-network White matter recognition methods and identifying system.
Background technology
Interaction between protein molecule is cell activities and the important foundation that protein function executes.DNA is multiple System, metabolic process, numerous life processes such as adjusting of signal transduction and cell cycle are all closely bound up with protein interaction. By the research to protein interaction, it can be better understood by the process of organism vital movement, to be further appreciated that The principle of disease effect, contributes to the discovery of drug targets and the prevention and treatment of disease.Not with high-throughput experimental technique Disconnected development and extensive use, abundant protein interaction data disclose and are used for the research of protein network, are based on net Network topological property has realistic meaning from molecular level prediction key protein matter, can effectively excavate the pass hidden in data Key information, to be excavated to drug design and compound etc., biological fields play a driving role.By analyzing protein networks in yeast, Jeong et al. has found that power-law distribution is obeyed in the degree distribution of the same protein node with other biological network, this phenomenon is shown Connection state between protein node, which is similar to scales-free network, has serious uneven distribution characteristic, i.e., a large amount of in network The number that is connected with each other of node and other nodes it is less, and a small number of node connectivities are bigger, and this kind of node is also known as Hub Node.Centrality to remove them compared to the random other nodes of deletion to entire Hub protein in a network Network topology structure causes the influence of bigger, in addition, statistic analysis result illustrate Hub protein tend to it is key, i.e., " centrality-lethal ".
Why Hub protein be intended to it is keyAngles of the Jeong et al. from network topology, it is believed that protein bio Topological centrality in interactive network has and closely contacts the importance of function with it.He et al. thinks that Hub nodes have Have it is significant it is key be primarily due to them interaction relationship occur with more protein, therefore have higher possibility It participates in critical protein interaction.Zotenko et al. proposes the concept of crucial complex biological module, the module It is enriched with key protein matter, being one group has the function of common biological and close-connected protein set, a large amount of Hub nodes performances Go out key to participate in the module just because of them.
In building protein network, point and side indicate the phase interaction between protein molecule and protein-protein respectively With relationship, this relationship refers generally to the Physical interaction between protein, i.e. protein is physically interconnected together, energy It is enough to play a role jointly, the physical bonds between multiple protein and form protein complex.But it in addition to this, is also deposited in cell In genetic interaction, the change of another gene can be caused by referring to the mutation of a gene, be mainly reflected in protein function It connects each other.On the one hand interaction relationship between protein can be identified by Bioexperiment, on the other hand can also It predicts to obtain with computational methods.The protein obtained by experimental situation and using the restriction of species etc. due to Bioexperiment technology The data that interact are simultaneously not perfect.For example, the protein interaction of low-affinity is difficult to detect to obtain by experimental technique. And the protein interaction that various computational methods obtain its false positive higher.Therefore, the protein that can be obtained at present is mutual Worked upon data inevitably contains higher noise.And the presence of noise can lead in Hub nodes some node actually Be not height value node, to cause Hub protein and its it is key between be closely connected and have deviation.Therefore, how to adopt Effective mode is taken to reduce the influence of noise in data set, to efficiently identify out the key protein matter in protein network extremely It closes important.
Currently, human protein subcellular structure collection of illustrative plates is formally announced on Scientific Magazine, egg is shown in all directions White matter is that researcher inquires into protein function execution and interaction from subcellsular level in various cyto-architectural distribution situations Mode provides necessary foundation, has important meaning to the research of the profound rule and disease for understanding human life activity Justice.Protein Subcellular structure provides a stable place for protein function execution, therefore protein is only suitable Subcellular structure in could normally function, while only positioned at the same subcellular structure protein between ability Firm Physical interaction is formed, to participate in the various vital movements of organism.And single protein interaction number According to the dynamic characteristic that can not embody this space possessed by protein, the reasonable application of subcellular localization information then can be with More efficiently identify key protein matter.Acencio and Lemke researchs find that subcellular localization information is to influence protein key One key factor of property.According to this discovery, the difference that protein Thermodynamic parameters occur for Peng et al. places will be original Protein network is divided into multiple and different subcellular subnets and demonstrates centrality lethal rule again.
On the other hand, the state of gene or protein is not unalterable, with the variation of time, portion in the cell Interaction relationship between protein can decompose or synthesize, disappear and formed according to it, and the moment is in a kind of dynamic equilibrium, So that intracellular interactive network be continue to develop variation to ensure being normally carried out for vital movement.Existing research table Bright, the occurrence and development of disease and this dynamic change are closely related.And static protein network is when can not embody this Between dynamic characteristic, therefore based on static network protein function module identification and disease correlation studies have great limitation Property.Grigoriev is by obtaining by co-expression gene extensive gene expression and protein interaction data statistic analysis Encoded protein to compared to randomly selected protein to being more likely to interact.Bhardwaj et al. is subsequent This discovery is confirmed in an experiment, and is us along with the extensive use such as gene microarray technology, new-generation sequencing technology The research that proteomics is carried out based on gene expression pattern provides a new approaches.
In conclusion can not be kept away based on protein interaction data in existing network-based node key Journal of Sex Research That exempts from causes some node in Hub nodes to be not actually height value node in turn containing higher noise, to cause Hub protein and its it is key between the problem devious that is closely connected, it is necessary to provide be based on protein space-time sub-network Key protein matter recognition methods, the recognition accuracy of key protein matter can be improved.
Invention content
The object of the present invention is to provide the key protein matter recognition methods based on protein space-time sub-network, can reduce height Noise data improves the recognition accuracy of key protein matter to identifying the influence of key protein matter.
The present invention provides the key protein matter recognition methods based on protein space-time sub-network, includes the following steps:
Step 1:Obtain urporotein network G;
A (v, u)=1, there are connection relations by v, u
Connection relation is not present in a (v, u)=0, v, u
Wherein, a (v, u) indicates the connection relation of protein node v, u in urporotein network G;
Step 2:Obtain gene expression values of the protein node in different moments in the urporotein network G, and structure Build the reactive protein set of different moments;
Wherein, if protein node v is greater than or equal to the work of protein node v in the gene expression values e (v, t) of t moment The protein node v, then is added to the reactive protein set TP (t) of t moment by property threshold value THR (v);
TP (t)=v | e (v, t) >=THR (v), v ∈ V }
In formula, V is protein node set in urporotein network G;
Step 3:The subcellular structure belonging to protein node in the urporotein network G is obtained, and builds difference Protein set in subcellular structure;
Wherein, CP (s) indicates the protein set in subcellular structure s;
Step 4:Build all kinds of subcellular structures different moments space-time sub-network;
Wherein, if protein node v and protein node u with connection relation are located at t in urporotein network G In the reactive protein set TP (t) at quarter and in the same subcellular structure s, then by the protein node v and albumen Matter node u be divided to the subcellular structure s t moment space-time sub-network G (t, s);
G (t, s)=(V (t, s), E (t, s))
V (t, s)=v | v ∈ V ∩ v ∈ TP (t) ∩ v ∈ CP (s) }
E (t, s)=(v, u) | a (v, u)=1, v ∈ V (t, s), u ∈ V (t, s) }
In formula, V (t, s) indicates the reactive protein set TP (t) positioned at t moment while being also located in subcellular structure s Protein set CP (s) protein node set, E (t, s) indicate be located at subcellular structure s in and t moment simultaneously Connection relation set between active protein node;
Step 5:The maximal degree centrality of each protein node is calculated based on the space-time sub-network constructed in step 4;
Wherein, the maximal degree centrality calculation formula of the protein node is as follows:
MDC (v)=Max (DC (v)) v ∈ V (t, s)
In formula, DC (v) indicates that neighbours' number of the protein node v in a space-time sub-network, N indicate protein node The number of protein node in space-time sub-network where v, MDC (v) indicate the maximal degree centrality of protein node v;
Step 6:The rule of the maximal degree centrality for all proteins node that foundation step 5 obtains from big to small is to all Protein node carries out key protein matter of the M protein node as prediction before descending arrangement, then selected row;
Wherein, M is integer.
The present invention combines gene expression data and subcellular structure information, and then from spatio-temporal distribution characteristic Angle sub-network division is carried out to primitive network, recycle subnet and comprehensive eye exam carried out to the node in network and ground with this Study carefully the key of protein node, and then obtains the key protein matter of prediction.By the above method, this programme can be carried greatly The accuracy of high key protein matter.Wherein, M protein node before selected row is substantially as key protein matter, will Key protein matter of the Hub nodes as prediction before row.
Wherein, space-time sub-network be substantially exactly according between protein node in step 1 connection relation, in step 2 The reactive protein set of different moments, the protein set in step 3 in different subcellular structures sets up.
Further preferably, the calculation formula of the activity threshold THR (v) of protein node v is as follows:
+ 2.5 σ (v) * of THR (v)=μ (v) (1-1/ (1+ σ2(v)))
In formula, μ (v) indicates the calculated protein node v's of gene expression values based on different moments protein node v The arithmetic mean of instantaneous value of gene expression values, σ (v) indicate the calculated egg of gene expression values based on different moments protein node v The standard deviation of the gene expression values of white matter node v.
Further preferably, protein node is in the acquisition modes of the gene expression values of different moments in step 2:
First, the gene expression values at each moment of the protein node within the different metabolic period are obtained;
Then, the average value of the gene expression values of synchronization of the protein node within the different metabolic period, institute are calculated It is gene expression values of the protein node at the corresponding moment to state average value;
At the time of being divided into 2 or more in each metabolism period.
Further preferably, the number for being metabolized the period is 3, each to be metabolized the period and be divided into 12 moment, between adjacent moment Time interval is 25mn.
Further preferably, the classification of the subcellular structure include cytoskeleton, cytoplasmic matrix, endoplasmic reticulum, endosome, Extracellular matrix, golgiosome, lysosome, mitochondria, nucleus, peroxisome and cytoplasm.
On the other hand, the present invention also provides the identifying system using above-mentioned recognition methods, which includes that data obtain Modulus block, reactive protein set structure module, the protein set of subcellular structure structure module, space-time sub-network build mould Block, computing module and prediction module;
Wherein, reactive protein set structure module, the protein set of subcellular structure structure module, when gap Network struction module, computing module are connect with the data acquisition module, and the prediction module is connect with the computing module;
The data acquisition module, for obtaining albumen in urporotein network G and the urporotein network G Subcellular structure of the matter node belonging to the gene expression values of different moments, protein node;
The reactive protein set builds module, the reactive protein set for building different moments;
The protein set of the subcellular structure builds module, for building the protein collection in different subcellular structures It closes;
The space-time sub-network builds module, for build all kinds of subcellular structures different moments space-time sub-network;
The computing module, the maximal degree centrality for calculating each protein node;
The prediction module, for by the maximal degree centrality of all proteins node rule from big to small to all eggs White matter node carries out key protein matter of the M protein node as prediction before descending arrangement, then selected row.
Advantageous effect
Compared with prior art, advantages of the present invention has:
The present invention is by by the protein set in the reactive protein set of different moments and different subcellular structure Connection relation between protein node be combined with each other, build all kinds of subcellular structures different moments space-time subnet Node in network, then clock synchronization gap network carries out topological analysis, and front row is chosen according to the maximal degree centrality of protein node On the one hand node can reduce the influence that data noise is brought to predict key protein matter, especially compared to Hub nodes point Analysis is either to use subcellular localization information under a single state or using gene expression data to original in the prior art Beginning network is refined, and used primitive network does not embody this time and the spatial character of protein yet, the present invention The method has fully considered the protein characteristic of dynamic change and the distribution character in different subcellular sections at any time, greatly Ground reduces data noise so that on the other hand the reliability higher of final prediction result can preferably embody protein Space-time dynamic characteristic.
Description of the drawings
Fig. 1 is the flow signal of the key protein matter recognition methods provided by the invention based on protein space-time sub-network Figure;
Fig. 2 is the otherness that key protein matter is identified under the different situations provided by the invention based on YBioGRID;
Fig. 3 is that data noise is critical to analysis protein node in protein-protein interaction network provided by the invention It influences;
Fig. 4 is the schematic diagram of Hub node differences analysis provided by the invention.
Specific implementation mode
Below in conjunction with embodiment, the present invention is described further.
Biological data used in the present invention:Due to there is the relevant biological number of abundant yeast in existing database According to so the species are the species being widely studied, the present invention is also illustrated by taking the biological data of yeast as an example.At this The open biological data of yeast species used in invention has protein interaction data, gene expression data, egg respectively White matter subcellular localization data and known key protein prime number evidence.
First, urporotein network and protein interaction data are obtained:It is obtained from existing database original Protein network, such as BioGRID databases, and obtained after wherein self-interaction and the interaction data repeated are deleted To protein interaction data, urporotein network G=(V, E) is obtained, V indicates protein node set, i.e. protein All proteins in the data that interact;E indicates line set, that is, is used to indicate all protein interactions.Such as it is former Beginning protein network G includes 4746 protein nodes and 15166 sides.Key protein matter is identified in order to verify noise data Influence, have collected the protein interaction data set of four kinds of different confidence levels, by confidence level from high to low respectively Y2K, Y11K, Y45K and Y78K, totally 2455 interaction relationships in Y2K, totally 11000 interaction relationships wherein wrap in Y11K Included all interactions of Y2K, Y45K totally 45000 interaction relationships which includes all interactions of Y11K, Which includes all interactions of Y45K for totally 78390 interaction relationships in Y78K.
Then, gene expression data is obtained.The number of GEO databases (number GSE3431) selected from NCBI in the present embodiment According to.Wherein, the gene expression data in three metabolism periods of yeast is contained, each metabolism period was correspondingly provided between 12 times Every adjacent time inter 25min.
And obtain subcellular localization data.The data of COMPARTMENT databases are selected from the present embodiment, the Central Asia is thin The type of born of the same parents' structure include cytoskeleton, cytoplasmic matrix, endoplasmic reticulum, endosome, extracellular matrix, golgiosome, lysosome, Mitochondria, nucleus, peroxisome and cytoplasm, totally 11 kinds, wherein the protein number with annotation information is 4455.
It should be appreciated that for explaining the present invention when data selected by the present embodiment, but the present invention is not limited to this realities Apply the data used in example.
As shown in Figure 1, a kind of protein network Hub nodes identification side based on space-time dynamic provided in an embodiment of the present invention Method includes the following steps:
Step 1:Obtain urporotein network G.
Wherein, there are the connection relations between protein node in urporotein network G.
A (v, u)=1, there are connection relations by v, u
Connection relation is not present in a (v, u)=0, v, u
Wherein, a (v, u) indicates the connection relation of protein node v, u in urporotein network G.Wherein, original protein Matter network G=(V, E) can be indicated with non-directed graph, and the adjoining square of urporotein network G is indicated with matrix A=a (v, u) Battle array.
Step 2:Obtain gene expression values of the protein node in different moments in the urporotein network G, and structure Build the reactive protein set of different moments.
Wherein, if protein node v is greater than or equal to the work of protein node v in the gene expression values e (v, t) of t moment The protein node v, then is added to the reactive protein set TP (t) of t moment by property threshold value THR (v);I.e. in t moment, egg There are e (v, t) >=THR (v) by white matter node v, then the protein node v for regarding the t moment is in activated state, by the moment All proteins screened after, we can obtain the reactive protein set TP (t) in t moment:
TP (t)=v | e (v, t) >=THR (v), v ∈ V }
In formula, V is protein node set in urporotein network G.From above-mentioned expression formula it is found that each protein section The corresponding activity threshold of point.In the present embodiment, therefore the value range of t, which is 1-12, can get 12 in the present embodiment Reactive protein set;In other feasible embodiments, t can be other values.
Wherein, the calculation formula of the activity threshold THR (v) of protein node v is as follows:
+ 2.5 σ (v) * of THR (v)=μ (v) (1-1/ (1+ σ2(v)))
In formula, μ (v) indicates the calculated protein node v's of gene expression values based on different moments protein node v The arithmetic mean of instantaneous value of gene expression values, σ (v) indicate the calculated egg of gene expression values based on different moments protein node v The standard deviation of the gene expression values of white matter node v.For example, in this implementation, using protein node v 12 moment gene table Up to the arithmetic mean of instantaneous value μ (v) for the gene expression values for being worth calculated protein node v, using protein node v 12 moment The calculated protein node v of gene expression values gene expression values standard deviation.
Step 3:The subcellular structure belonging to protein node in the urporotein network G is obtained, and builds difference Protein set in subcellular structure;
Wherein, CP (s) indicates the protein set in subcellular structure s.In the present embodiment, the classification packet of subcellular structure Include cytoskeleton, cytoplasmic matrix, endoplasmic reticulum, endosome, extracellular matrix, golgiosome, lysosome, mitochondria, nucleus, Peroxisome and cytoplasm, 11 kinds altogether.
Step 4:According to the connection relation between protein node in step 1, the activated protein of the different moments in step 2 Protein structure in matter set, step 3 in different subcellular structures set up all kinds of subcellular structures different moments space-time Sub-network;
Wherein, if protein node v and protein node u with connection relation are located at t in urporotein network G In the reactive protein set TP (t) at quarter and in the same subcellular structure s, then by the protein node v and albumen Matter node u be divided to the subcellular structure s t moment space-time sub-network G (t, s);
G (t, s)=(V (t, s), E (t, s))
V (t, s)=v | v ∈ V ∩ v ∈ TP (t) ∩ v ∈ CP (s) }
E (t, s)=(v, u) | a (v, u)=1, v ∈ V (t, s), u ∈ V (t, s) }
In formula, V (t, s) indicates the reactive protein set TP (t) positioned at t moment while being also located in subcellular structure s Protein set CP (s) protein node set, E (t, s) indicate be located at subcellular structure s in and t moment simultaneously Connection relation set between active protein node.
It is i.e. it should be appreciated that being corresponded in the present embodiment there are the subcellular structure of all categories of 1-12 moment and 11, then right There should be 11 × 12 space-time sub-network G (t, s).
Step 5:Calculate the maximal degree centrality of each protein node;
Wherein, the central calculation formula of maximal degree of each protein node is as follows:
MDC (v)=Max (DC (v)) v ∈ V (t, s)
In formula, DC (v) indicates that neighbours' number of the protein node v in a space-time sub-network, N indicate protein node The number of protein node in space-time sub-network where v, MDC (v) indicate the maximal degree centrality of protein node v.
Hub nodes, which are normally defined those, has the node of higher connection number, therefore selection degree centrality is made in the present invention It is characterized and is identified, the degree centrality of wherein protein network interior joint refers to its neighbours' number.It should be appreciated that One protein node v can be in multiple space-time sub-network G (t, s), therefore a protein node may be corresponded in the presence of more A DC (v), then therefrom choose maximal degree centrality of the maximum value as protein node v;If a protein node v is only In 1 space-time sub-network G (t, s), exist then the maximal degree centrality of a protein node v is protein node v Space-time sub-network G (t, s) moderate centrality DC (v).
Step 6:The rule of the maximal degree centrality for all proteins node that foundation step 5 obtains from big to small is to all Protein node carries out key protein matter of the M protein node as prediction before descending arrangement, then selected row.
Specifically, each protein node is corresponding with maximal degree centrality, carried out successively according to maximal degree centrality Descending arranges, before the bigger protein node of maximal degree centrality is more arranged.
On the other hand, the present invention also provides a kind of identifying system using above-mentioned recognition methods, which includes number According to acquisition module, reactive protein set structure module, the protein set of subcellular structure structure module, space-time sub-network structure Model block, computing module and prediction module;
Wherein, reactive protein set structure module, the protein set of subcellular structure structure module, space-time sub-network Structure module, computing module are connect with data acquisition module, and prediction module is connect with computing module;
Data acquisition module exists for obtaining protein node in urporotein network G and urporotein network G Subcellular structure belonging to the gene expression values of different moments, protein node.Such as data acquisition module is from existing data Library obtains biological data used in the present invention.
Reactive protein set builds module, the reactive protein set for building different moments.Wherein, building process Please refer to the associated description of the above method.
The protein set of subcellular structure builds module, for building the protein set in different subcellular structures. Wherein, building process please refers to the associated description of the above method.
Space-time sub-network build module, for build all kinds of subcellular structures different moments space-time sub-network.Wherein, Building process please refers to the associated description of the above method.
Computing module, the maximal degree centrality for calculating each protein node.Computing module is additionally operable to calculate albumen Gene expression values of the matter node in different moments.Wherein, calculating process please refers to the associated description of the above method.
Prediction module, for by the maximal degree centrality of all proteins node rule from big to small to all proteins Node carries out key protein matter of the M protein node as prediction before descending arrangement, then selected row.Emulation and verification
A:The key analysis under space-time condition of Hub nodes:
In order to compare space-time dynamic network to the validity of identification key protein matter, we used public databases Protein networks in yeast in BioGRID is tested.In this experiment, we calculate separately dynamic in primitive network, time The marking value of state network, Spatial distributions network and space-time dynamic network protein node in the case of these four, and press result of calculation Descending arranges, then selects a certain number of protein as Candidate Set successively.By with known key protein matter data set ratio The key protein prime number amount identified in each case can relatively be counted.As in Fig. 2 (a), shown in (b) figure, we select respectively 100,200 before row are selected ..., 800 are compared as forecast set.As can be seen that when by using gene expression data or After person's subcellular localization information architecture time or Spatial distributions network, recognition result has compared to original static network It is apparent to improve, and the method for merging both data structure space-time dynamic network remains best result.
The key analysis in different confidence level networks of B.Hub nodes:
The influence that network in order to analyze different confidence levels identifies key protein matter, we have compiled Mering etc. Protein interaction data in people's paper, have been classified as the protein network of different confidence levels, have been respectively from high to low Y2K, Y11K, Y45K and Y78K, and preceding 100,200,300 and is arranged according to the scoring of maximum neighbours' number based on these data statistics Ratio in 400 prediction protein set shared by known key protein matter.As shown in figure 3, the broken line generation identified with triangle The broken line of experimental result of the table in primitive network, filled circles mark is identified based on after primitive network structure space-time dynamic network Experimental result.As selection row preceding 100, by directly in the crucial egg of primitive network found out Hub nodal methods and identified The ratio of white matter is respectively 0.78,0.58,0.49,0.51, is respectively 0.78 by the ratio after space-time dynamic network struction, 0.87,0.88,0.84.It follows that when in the network of high confidence level, the result of structure space-time dynamic network is equal to or slightly It is poor higher than the recognition result of static network, and with the continuous reduction of network trusted degree, the two starts to show apparent difference Property, this illustrates the functional characteristic that can embody protein by using gene expression data and subcellular localization information, To effectively filter out noise data to improve the accuracy of identification key protein matter.In addition, can be observed how from figure The ratio of Hub node of four broken lines of filled circles in different Top collection relatively, and triangle mark four broken lines with It increasing for noise data and shows downward trend.This analyzes Hub nodes results showed that noise data can influence us It is key, but key be one by merge multi-element biologic data analysis node from different angles and solve well Approach.
C.Hub node differences are analyzed:
It is key that existing research has shown that Hub nodes are intended to.And since the noise in protein network can cause This trend is not obvious.The experimental result of front illustrates that by building space-time dynamic network key protein matter can be improved Recognition accuracy, but how effectively to explain the reason of will appear this phenomenon.For this purpose, we have chosen under different situations Hub nodes are analyzed, and it is two classes that Hub, which is divided to,:One kind is in primitive network for Hub nodes and in time-space network For Hub nodes, i.e. Hub_Hub;One is being Hub nodes but be not Hub nodes in space-time sub-network in primitive network, That is Hub_NonHub.Then, it has carried out testing on YBioGRID networks and has counted known key protein matter in two class Hub nodes Shared ratio, experimental result are as follows.Though figure 4, it is seen that the variation of threshold value, Hub_Hub nodes than The key protein prime number mesh for including in Hub_NonHub nodes is more.When the threshold value of Hub is set as 400, in YBioGRID networks Ratio shared by middle key protein matter is 0.52 (=208/400), and wherein the ratio of key protein matter is in Hub_Hub nodes The ratio of key protein matter is 0.3376 in 0.7771, Hub_NonHub node.Therefore, by build space-time dynamic network it Afterwards, reduce the noise to interact with Hub protein in initial data or invalid neighbor node, to improve The identification precision of key protein matter.
By above-mentioned demonstration, compared to the prior art, the identification that can greatly promote key protein matter is accurate by the present invention Degree.
It is emphasized that example of the present invention is illustrative, without being restrictive, therefore the present invention is unlimited Example described in specific implementation mode, other every obtained according to the technique and scheme of the present invention by those skilled in the art Embodiment does not depart from present inventive concept and range, whether modification or replaces, also belongs to protection model of the invention It encloses.

Claims (6)

1. a kind of key protein matter recognition methods based on protein space-time sub-network, it is characterised in that:Include the following steps:
Step 1:Obtain urporotein network G;
A (v, u)=1, there are connection relations by v, u
Connection relation is not present in a (v, u)=0, v, u
Wherein, a (v, u) indicates the connection relation of protein node v, u in urporotein network G;
Step 2:Gene expression values of the protein node in different moments in the urporotein network G are obtained, and are built not Reactive protein set in the same time;
Wherein, if protein node v is greater than or equal to the active threshold of protein node v in the gene expression values e (v, t) of t moment The protein node v is then added to the reactive protein set TP (t) of t moment by value THR (v);
TP (t)=v | e (v, t) >=THR (v), v ∈ V }
In formula, V is protein node set in urporotein network G;
Step 3:The subcellular structure belonging to protein node in the urporotein network G is obtained, and is built different sub- thin Protein set in born of the same parents' structure;
Wherein, CP (s) indicates the protein set in subcellular structure s;
Step 4:Build all kinds of subcellular structures different moments space-time sub-network;
Wherein, if protein node v and protein node u with connection relation are located at t moment in urporotein network G In reactive protein set TP (t) and in the same subcellular structure s, then by the protein node v and protein section Point u be divided to the subcellular structure s t moment space-time sub-network G (t, s);
G (t, s)=(V (t, s), E (t, s))
V (t, s)=v | v ∈ V ∩ v ∈ TP (t) ∩ v ∈ CP (s) }
E (t, s)=(v, u) | a (v, u)=1, v ∈ V (t, s), u ∈ V (t, s) }
In formula, V (t, s) indicates the egg for being located at the reactive protein set TP (t) of t moment while being also located in subcellular structure s The set of the protein node of white matter set CP (s), E (t, s) are indicated to be located in subcellular structure s and be had simultaneously in t moment Connection relation set between active protein node;
Step 5:The maximal degree centrality of each protein node is calculated based on the space-time sub-network constructed in step 4;
Wherein, the maximal degree centrality calculation formula of the protein node is as follows:
MDC (v)=Max (DC (v)) v ∈ V (t, s)
In formula, DC (v) indicates that neighbours' number of the protein node v in space-time sub-network, N indicate residing for protein node v Protein node total number in space-time sub-network, MDC (v) indicate the maximal degree centrality of protein node v;
Step 6:The maximal degree centrality for all proteins node that foundation step 5 obtains is by rule from big to small to all eggs White matter node carries out key protein matter of the M protein node as prediction before descending arrangement, then selected row;
Wherein, M is integer.
2. according to the method described in claim 1, it is characterized in that:The calculating of the activity threshold THR (v) of protein node v is public Formula is as follows:
+ 2.5 σ (v) * of THR (v)=μ (v) (1-1/ (1+ σ2(v)))
In formula, μ (v) indicates the gene of the calculated protein node v of gene expression values based on different moments protein node v The arithmetic mean of instantaneous value of expression value, σ (v) indicate the calculated protein of gene expression values based on different moments protein node v The standard deviation of the gene expression values of node v.
3. according to the method described in claim 1, it is characterized in that:Gene table of the protein node in different moments in step 2 Acquisition modes up to value are:
First, the gene expression values at each moment of the protein node within the different metabolic period are obtained;
Then, the average value of the gene expression values of synchronization of the protein node within the different metabolic period is calculated, it is described flat Mean value is gene expression values of the protein node at the corresponding moment;
At the time of being divided into 2 or more in each metabolism period.
4. according to the method described in claim 3, it is characterized in that:The number for being metabolized the period is 3, and each metabolism period is divided into 12 A moment, the time interval between adjacent moment are 25min.
5. according to the method described in claim 1, it is characterized in that:The classification of the subcellular structure includes cytoskeleton, thin Cytoplasmic matrix, endoplasmic reticulum, endosome, extracellular matrix, golgiosome, lysosome, mitochondria, nucleus, peroxisome And cytoplasm.
6. using the identifying system of any one of claim 1-5 the methods, it is characterised in that:Including data acquisition module, work Property protein set structure module, the protein set of subcellular structure structure module, space-time sub-network structure module, calculate mould Block and prediction module;
Wherein, the reactive protein set structure module, the protein set of subcellular structure structure module, space-time sub-network Structure module, computing module are connect with the data acquisition module, and the prediction module is connect with the computing module;
The data acquisition module, for obtaining protein section in urporotein network G and the urporotein network G Subcellular structure of the point belonging to the gene expression values of different moments, protein node;
The reactive protein set builds module, the reactive protein set for building different moments;
The protein set of the subcellular structure builds module, for building the protein set in different subcellular structures;
The space-time sub-network builds module, for build all kinds of subcellular structures different moments space-time sub-network;
The computing module, the maximal degree centrality for calculating each protein node;
The prediction module, for by the maximal degree centrality of all proteins node rule from big to small to all proteins Node carries out key protein matter of the M protein node as prediction before descending arrangement, then selected row.
CN201810287578.8A 2018-03-31 2018-03-31 Key protein identification method and system based on protein space-time subnetwork Active CN108509768B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810287578.8A CN108509768B (en) 2018-03-31 2018-03-31 Key protein identification method and system based on protein space-time subnetwork

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810287578.8A CN108509768B (en) 2018-03-31 2018-03-31 Key protein identification method and system based on protein space-time subnetwork

Publications (2)

Publication Number Publication Date
CN108509768A true CN108509768A (en) 2018-09-07
CN108509768B CN108509768B (en) 2022-02-11

Family

ID=63379840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810287578.8A Active CN108509768B (en) 2018-03-31 2018-03-31 Key protein identification method and system based on protein space-time subnetwork

Country Status (1)

Country Link
CN (1) CN108509768B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945333A (en) * 2012-12-04 2013-02-27 中南大学 Key protein predicating method based on priori knowledge and network topology characteristics
CN104156634A (en) * 2014-08-14 2014-11-19 中南大学 Key protein identification method based on subcellular localization specificity
CN105930684A (en) * 2016-04-26 2016-09-07 中南大学 Genetic expression and subcellular localization information-based protein network refining method
CN106874961A (en) * 2017-03-03 2017-06-20 北京奥开信息科技有限公司 A kind of indoor scene recognition methods using the very fast learning machine based on local receptor field

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945333A (en) * 2012-12-04 2013-02-27 中南大学 Key protein predicating method based on priori knowledge and network topology characteristics
CN104156634A (en) * 2014-08-14 2014-11-19 中南大学 Key protein identification method based on subcellular localization specificity
CN105930684A (en) * 2016-04-26 2016-09-07 中南大学 Genetic expression and subcellular localization information-based protein network refining method
CN106874961A (en) * 2017-03-03 2017-06-20 北京奥开信息科技有限公司 A kind of indoor scene recognition methods using the very fast learning machine based on local receptor field

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
MIN LI ET AL.: "Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information", 《ELSEVIER》 *
QIANGHUA XIAO 等: "Identifying essential proteins from active PPI networks constructed with dynamic gene expression", 《BMC GENOMICS》 *
XIANGMAO MENG ET AL.: "Construction of the spatial and temporal active protein interaction network for identifying protein complexes", 《IEEE》 *
XIAOQING PENG ET AL.: "Framework to Identify Protein Complexes Based on Similarity Preclustering", 《TSINGHUA SCIENCE AND TECHNOLOGY》 *
张含会: "融合蛋白质网络和基因表达数据的关键蛋白质识别方法", 《中国优秀硕士学位论文全文数据库 基础科学辑》 *

Also Published As

Publication number Publication date
CN108509768B (en) 2022-02-11

Similar Documents

Publication Publication Date Title
Calderoni et al. Communities in criminal networks: A case study
Hudson et al. Beyond differential expression: the quest for causal mutations and effector molecules
Chua et al. Increasing the reliability of protein interactomes
CN109411033A (en) A kind of curative effect of medication screening technique based on complex network
Yamaguchi et al. Finding module-based gene networks with state-space models-Mining high-dimensional and short time-course gene expression data
Sewell et al. Restructuring partitioned knowledge: The role of recoordination in category learning
Todorov et al. TinGa: fast and flexible trajectory inference with Growing Neural Gas
CN107885971A (en) Using the method for improving flower pollination algorithm identification key protein matter
Esposito et al. Polymer physics reveals a combinatorial code linking 3D chromatin architecture to 1D chromatin states
CN112185458B (en) Method for predicting binding free energy of protein and ligand molecule based on convolutional neural network
CN108509768A (en) Key protein matter recognition methods based on protein space-time sub-network and identifying system
Almaas et al. Scale-free networks in biology
Salehi et al. Motif structure and cooperation in real-world complex networks
CN106127503A (en) A kind of Analysis of Network Information method based on true social relations and big data
CN114398430A (en) Complex network link prediction method based on multi-target mixed integer programming model
CN113345535A (en) Drug target prediction method and system for keeping chemical property and function consistency of drug
Lin Single-cell topological simplicial analysis reveals higher-order cellular complexity
Tran et al. Single-cell RNA sequencing data imputation using deep neural network
Tagore et al. Detecting breakdown points in metabolic networks
Ahmed et al. Time-varying networks: Recovering temporally rewiring genetic networks during the life cycle of drosophila melanogaster
CN112231934B (en) Community structure generation method and system based on gravity model
Yilmaz et al. Prediction of Kinase-Substrate Associations Using The Functional Landscape of Kinases and Phosphorylation Sites
Ayati et al. Prediction of Kinase-Substrate Associations Using The Functional Landscape of Kinases and Phosphorylation Sites
Lomi et al. EXPLORING THE RELATIONAL BASES OF AGE DEPENDENCE IN ORGANIZATIONAL MORTALITY RATES.
Goltsev et al. Time warping of evolutionary distant temporal gene expression data based on noise suppression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant