CN108509768A - Key protein matter recognition methods based on protein space-time sub-network and identifying system - Google Patents
Key protein matter recognition methods based on protein space-time sub-network and identifying system Download PDFInfo
- Publication number
- CN108509768A CN108509768A CN201810287578.8A CN201810287578A CN108509768A CN 108509768 A CN108509768 A CN 108509768A CN 201810287578 A CN201810287578 A CN 201810287578A CN 108509768 A CN108509768 A CN 108509768A
- Authority
- CN
- China
- Prior art keywords
- protein
- node
- network
- space
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Landscapes
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a kind of key protein matter recognition methods based on protein space-time sub-network and identifying systems, include the following steps:Step 1:Obtain urporotein network;Step 2:Build the reactive protein set of different moments;Step 3:Build the protein set in different subcellular structures;Step 4:According to connection relation, the reactive protein set of different moments, the protein set in different subcellular structures between protein node set up all kinds of subcellular structures different moments space-time sub-network;Step 5:Obtain the maximal degree centrality value of each protein node;Step 6:Key protein matter of the M protein node as prediction before descending arrangement, then selected row is carried out to all proteins node according to the maximal degree centrality of protein node.The recognition accuracy of key protein matter can be improved by the above method.
Description
Technical field
The invention belongs to systems biology technical fields, and in particular to a kind of crucial egg based on protein space-time sub-network
White matter recognition methods and identifying system.
Background technology
Interaction between protein molecule is cell activities and the important foundation that protein function executes.DNA is multiple
System, metabolic process, numerous life processes such as adjusting of signal transduction and cell cycle are all closely bound up with protein interaction.
By the research to protein interaction, it can be better understood by the process of organism vital movement, to be further appreciated that
The principle of disease effect, contributes to the discovery of drug targets and the prevention and treatment of disease.Not with high-throughput experimental technique
Disconnected development and extensive use, abundant protein interaction data disclose and are used for the research of protein network, are based on net
Network topological property has realistic meaning from molecular level prediction key protein matter, can effectively excavate the pass hidden in data
Key information, to be excavated to drug design and compound etc., biological fields play a driving role.By analyzing protein networks in yeast,
Jeong et al. has found that power-law distribution is obeyed in the degree distribution of the same protein node with other biological network, this phenomenon is shown
Connection state between protein node, which is similar to scales-free network, has serious uneven distribution characteristic, i.e., a large amount of in network
The number that is connected with each other of node and other nodes it is less, and a small number of node connectivities are bigger, and this kind of node is also known as Hub
Node.Centrality to remove them compared to the random other nodes of deletion to entire Hub protein in a network
Network topology structure causes the influence of bigger, in addition, statistic analysis result illustrate Hub protein tend to it is key, i.e.,
" centrality-lethal ".
Why Hub protein be intended to it is keyAngles of the Jeong et al. from network topology, it is believed that protein bio
Topological centrality in interactive network has and closely contacts the importance of function with it.He et al. thinks that Hub nodes have
Have it is significant it is key be primarily due to them interaction relationship occur with more protein, therefore have higher possibility
It participates in critical protein interaction.Zotenko et al. proposes the concept of crucial complex biological module, the module
It is enriched with key protein matter, being one group has the function of common biological and close-connected protein set, a large amount of Hub nodes performances
Go out key to participate in the module just because of them.
In building protein network, point and side indicate the phase interaction between protein molecule and protein-protein respectively
With relationship, this relationship refers generally to the Physical interaction between protein, i.e. protein is physically interconnected together, energy
It is enough to play a role jointly, the physical bonds between multiple protein and form protein complex.But it in addition to this, is also deposited in cell
In genetic interaction, the change of another gene can be caused by referring to the mutation of a gene, be mainly reflected in protein function
It connects each other.On the one hand interaction relationship between protein can be identified by Bioexperiment, on the other hand can also
It predicts to obtain with computational methods.The protein obtained by experimental situation and using the restriction of species etc. due to Bioexperiment technology
The data that interact are simultaneously not perfect.For example, the protein interaction of low-affinity is difficult to detect to obtain by experimental technique.
And the protein interaction that various computational methods obtain its false positive higher.Therefore, the protein that can be obtained at present is mutual
Worked upon data inevitably contains higher noise.And the presence of noise can lead in Hub nodes some node actually
Be not height value node, to cause Hub protein and its it is key between be closely connected and have deviation.Therefore, how to adopt
Effective mode is taken to reduce the influence of noise in data set, to efficiently identify out the key protein matter in protein network extremely
It closes important.
Currently, human protein subcellular structure collection of illustrative plates is formally announced on Scientific Magazine, egg is shown in all directions
White matter is that researcher inquires into protein function execution and interaction from subcellsular level in various cyto-architectural distribution situations
Mode provides necessary foundation, has important meaning to the research of the profound rule and disease for understanding human life activity
Justice.Protein Subcellular structure provides a stable place for protein function execution, therefore protein is only suitable
Subcellular structure in could normally function, while only positioned at the same subcellular structure protein between ability
Firm Physical interaction is formed, to participate in the various vital movements of organism.And single protein interaction number
According to the dynamic characteristic that can not embody this space possessed by protein, the reasonable application of subcellular localization information then can be with
More efficiently identify key protein matter.Acencio and Lemke researchs find that subcellular localization information is to influence protein key
One key factor of property.According to this discovery, the difference that protein Thermodynamic parameters occur for Peng et al. places will be original
Protein network is divided into multiple and different subcellular subnets and demonstrates centrality lethal rule again.
On the other hand, the state of gene or protein is not unalterable, with the variation of time, portion in the cell
Interaction relationship between protein can decompose or synthesize, disappear and formed according to it, and the moment is in a kind of dynamic equilibrium,
So that intracellular interactive network be continue to develop variation to ensure being normally carried out for vital movement.Existing research table
Bright, the occurrence and development of disease and this dynamic change are closely related.And static protein network is when can not embody this
Between dynamic characteristic, therefore based on static network protein function module identification and disease correlation studies have great limitation
Property.Grigoriev is by obtaining by co-expression gene extensive gene expression and protein interaction data statistic analysis
Encoded protein to compared to randomly selected protein to being more likely to interact.Bhardwaj et al. is subsequent
This discovery is confirmed in an experiment, and is us along with the extensive use such as gene microarray technology, new-generation sequencing technology
The research that proteomics is carried out based on gene expression pattern provides a new approaches.
In conclusion can not be kept away based on protein interaction data in existing network-based node key Journal of Sex Research
That exempts from causes some node in Hub nodes to be not actually height value node in turn containing higher noise, to cause
Hub protein and its it is key between the problem devious that is closely connected, it is necessary to provide be based on protein space-time sub-network
Key protein matter recognition methods, the recognition accuracy of key protein matter can be improved.
Invention content
The object of the present invention is to provide the key protein matter recognition methods based on protein space-time sub-network, can reduce height
Noise data improves the recognition accuracy of key protein matter to identifying the influence of key protein matter.
The present invention provides the key protein matter recognition methods based on protein space-time sub-network, includes the following steps:
Step 1:Obtain urporotein network G;
A (v, u)=1, there are connection relations by v, u
Connection relation is not present in a (v, u)=0, v, u
Wherein, a (v, u) indicates the connection relation of protein node v, u in urporotein network G;
Step 2:Obtain gene expression values of the protein node in different moments in the urporotein network G, and structure
Build the reactive protein set of different moments;
Wherein, if protein node v is greater than or equal to the work of protein node v in the gene expression values e (v, t) of t moment
The protein node v, then is added to the reactive protein set TP (t) of t moment by property threshold value THR (v);
TP (t)=v | e (v, t) >=THR (v), v ∈ V }
In formula, V is protein node set in urporotein network G;
Step 3:The subcellular structure belonging to protein node in the urporotein network G is obtained, and builds difference
Protein set in subcellular structure;
Wherein, CP (s) indicates the protein set in subcellular structure s;
Step 4:Build all kinds of subcellular structures different moments space-time sub-network;
Wherein, if protein node v and protein node u with connection relation are located at t in urporotein network G
In the reactive protein set TP (t) at quarter and in the same subcellular structure s, then by the protein node v and albumen
Matter node u be divided to the subcellular structure s t moment space-time sub-network G (t, s);
G (t, s)=(V (t, s), E (t, s))
V (t, s)=v | v ∈ V ∩ v ∈ TP (t) ∩ v ∈ CP (s) }
E (t, s)=(v, u) | a (v, u)=1, v ∈ V (t, s), u ∈ V (t, s) }
In formula, V (t, s) indicates the reactive protein set TP (t) positioned at t moment while being also located in subcellular structure s
Protein set CP (s) protein node set, E (t, s) indicate be located at subcellular structure s in and t moment simultaneously
Connection relation set between active protein node;
Step 5:The maximal degree centrality of each protein node is calculated based on the space-time sub-network constructed in step 4;
Wherein, the maximal degree centrality calculation formula of the protein node is as follows:
MDC (v)=Max (DC (v)) v ∈ V (t, s)
In formula, DC (v) indicates that neighbours' number of the protein node v in a space-time sub-network, N indicate protein node
The number of protein node in space-time sub-network where v, MDC (v) indicate the maximal degree centrality of protein node v;
Step 6:The rule of the maximal degree centrality for all proteins node that foundation step 5 obtains from big to small is to all
Protein node carries out key protein matter of the M protein node as prediction before descending arrangement, then selected row;
Wherein, M is integer.
The present invention combines gene expression data and subcellular structure information, and then from spatio-temporal distribution characteristic
Angle sub-network division is carried out to primitive network, recycle subnet and comprehensive eye exam carried out to the node in network and ground with this
Study carefully the key of protein node, and then obtains the key protein matter of prediction.By the above method, this programme can be carried greatly
The accuracy of high key protein matter.Wherein, M protein node before selected row is substantially as key protein matter, will
Key protein matter of the Hub nodes as prediction before row.
Wherein, space-time sub-network be substantially exactly according between protein node in step 1 connection relation, in step 2
The reactive protein set of different moments, the protein set in step 3 in different subcellular structures sets up.
Further preferably, the calculation formula of the activity threshold THR (v) of protein node v is as follows:
+ 2.5 σ (v) * of THR (v)=μ (v) (1-1/ (1+ σ2(v)))
In formula, μ (v) indicates the calculated protein node v's of gene expression values based on different moments protein node v
The arithmetic mean of instantaneous value of gene expression values, σ (v) indicate the calculated egg of gene expression values based on different moments protein node v
The standard deviation of the gene expression values of white matter node v.
Further preferably, protein node is in the acquisition modes of the gene expression values of different moments in step 2:
First, the gene expression values at each moment of the protein node within the different metabolic period are obtained;
Then, the average value of the gene expression values of synchronization of the protein node within the different metabolic period, institute are calculated
It is gene expression values of the protein node at the corresponding moment to state average value;
At the time of being divided into 2 or more in each metabolism period.
Further preferably, the number for being metabolized the period is 3, each to be metabolized the period and be divided into 12 moment, between adjacent moment
Time interval is 25mn.
Further preferably, the classification of the subcellular structure include cytoskeleton, cytoplasmic matrix, endoplasmic reticulum, endosome,
Extracellular matrix, golgiosome, lysosome, mitochondria, nucleus, peroxisome and cytoplasm.
On the other hand, the present invention also provides the identifying system using above-mentioned recognition methods, which includes that data obtain
Modulus block, reactive protein set structure module, the protein set of subcellular structure structure module, space-time sub-network build mould
Block, computing module and prediction module;
Wherein, reactive protein set structure module, the protein set of subcellular structure structure module, when gap
Network struction module, computing module are connect with the data acquisition module, and the prediction module is connect with the computing module;
The data acquisition module, for obtaining albumen in urporotein network G and the urporotein network G
Subcellular structure of the matter node belonging to the gene expression values of different moments, protein node;
The reactive protein set builds module, the reactive protein set for building different moments;
The protein set of the subcellular structure builds module, for building the protein collection in different subcellular structures
It closes;
The space-time sub-network builds module, for build all kinds of subcellular structures different moments space-time sub-network;
The computing module, the maximal degree centrality for calculating each protein node;
The prediction module, for by the maximal degree centrality of all proteins node rule from big to small to all eggs
White matter node carries out key protein matter of the M protein node as prediction before descending arrangement, then selected row.
Advantageous effect
Compared with prior art, advantages of the present invention has:
The present invention is by by the protein set in the reactive protein set of different moments and different subcellular structure
Connection relation between protein node be combined with each other, build all kinds of subcellular structures different moments space-time subnet
Node in network, then clock synchronization gap network carries out topological analysis, and front row is chosen according to the maximal degree centrality of protein node
On the one hand node can reduce the influence that data noise is brought to predict key protein matter, especially compared to Hub nodes point
Analysis is either to use subcellular localization information under a single state or using gene expression data to original in the prior art
Beginning network is refined, and used primitive network does not embody this time and the spatial character of protein yet, the present invention
The method has fully considered the protein characteristic of dynamic change and the distribution character in different subcellular sections at any time, greatly
Ground reduces data noise so that on the other hand the reliability higher of final prediction result can preferably embody protein
Space-time dynamic characteristic.
Description of the drawings
Fig. 1 is the flow signal of the key protein matter recognition methods provided by the invention based on protein space-time sub-network
Figure;
Fig. 2 is the otherness that key protein matter is identified under the different situations provided by the invention based on YBioGRID;
Fig. 3 is that data noise is critical to analysis protein node in protein-protein interaction network provided by the invention
It influences;
Fig. 4 is the schematic diagram of Hub node differences analysis provided by the invention.
Specific implementation mode
Below in conjunction with embodiment, the present invention is described further.
Biological data used in the present invention:Due to there is the relevant biological number of abundant yeast in existing database
According to so the species are the species being widely studied, the present invention is also illustrated by taking the biological data of yeast as an example.At this
The open biological data of yeast species used in invention has protein interaction data, gene expression data, egg respectively
White matter subcellular localization data and known key protein prime number evidence.
First, urporotein network and protein interaction data are obtained:It is obtained from existing database original
Protein network, such as BioGRID databases, and obtained after wherein self-interaction and the interaction data repeated are deleted
To protein interaction data, urporotein network G=(V, E) is obtained, V indicates protein node set, i.e. protein
All proteins in the data that interact;E indicates line set, that is, is used to indicate all protein interactions.Such as it is former
Beginning protein network G includes 4746 protein nodes and 15166 sides.Key protein matter is identified in order to verify noise data
Influence, have collected the protein interaction data set of four kinds of different confidence levels, by confidence level from high to low respectively Y2K,
Y11K, Y45K and Y78K, totally 2455 interaction relationships in Y2K, totally 11000 interaction relationships wherein wrap in Y11K
Included all interactions of Y2K, Y45K totally 45000 interaction relationships which includes all interactions of Y11K,
Which includes all interactions of Y45K for totally 78390 interaction relationships in Y78K.
Then, gene expression data is obtained.The number of GEO databases (number GSE3431) selected from NCBI in the present embodiment
According to.Wherein, the gene expression data in three metabolism periods of yeast is contained, each metabolism period was correspondingly provided between 12 times
Every adjacent time inter 25min.
And obtain subcellular localization data.The data of COMPARTMENT databases are selected from the present embodiment, the Central Asia is thin
The type of born of the same parents' structure include cytoskeleton, cytoplasmic matrix, endoplasmic reticulum, endosome, extracellular matrix, golgiosome, lysosome,
Mitochondria, nucleus, peroxisome and cytoplasm, totally 11 kinds, wherein the protein number with annotation information is 4455.
It should be appreciated that for explaining the present invention when data selected by the present embodiment, but the present invention is not limited to this realities
Apply the data used in example.
As shown in Figure 1, a kind of protein network Hub nodes identification side based on space-time dynamic provided in an embodiment of the present invention
Method includes the following steps:
Step 1:Obtain urporotein network G.
Wherein, there are the connection relations between protein node in urporotein network G.
A (v, u)=1, there are connection relations by v, u
Connection relation is not present in a (v, u)=0, v, u
Wherein, a (v, u) indicates the connection relation of protein node v, u in urporotein network G.Wherein, original protein
Matter network G=(V, E) can be indicated with non-directed graph, and the adjoining square of urporotein network G is indicated with matrix A=a (v, u)
Battle array.
Step 2:Obtain gene expression values of the protein node in different moments in the urporotein network G, and structure
Build the reactive protein set of different moments.
Wherein, if protein node v is greater than or equal to the work of protein node v in the gene expression values e (v, t) of t moment
The protein node v, then is added to the reactive protein set TP (t) of t moment by property threshold value THR (v);I.e. in t moment, egg
There are e (v, t) >=THR (v) by white matter node v, then the protein node v for regarding the t moment is in activated state, by the moment
All proteins screened after, we can obtain the reactive protein set TP (t) in t moment:
TP (t)=v | e (v, t) >=THR (v), v ∈ V }
In formula, V is protein node set in urporotein network G.From above-mentioned expression formula it is found that each protein section
The corresponding activity threshold of point.In the present embodiment, therefore the value range of t, which is 1-12, can get 12 in the present embodiment
Reactive protein set;In other feasible embodiments, t can be other values.
Wherein, the calculation formula of the activity threshold THR (v) of protein node v is as follows:
+ 2.5 σ (v) * of THR (v)=μ (v) (1-1/ (1+ σ2(v)))
In formula, μ (v) indicates the calculated protein node v's of gene expression values based on different moments protein node v
The arithmetic mean of instantaneous value of gene expression values, σ (v) indicate the calculated egg of gene expression values based on different moments protein node v
The standard deviation of the gene expression values of white matter node v.For example, in this implementation, using protein node v 12 moment gene table
Up to the arithmetic mean of instantaneous value μ (v) for the gene expression values for being worth calculated protein node v, using protein node v 12 moment
The calculated protein node v of gene expression values gene expression values standard deviation.
Step 3:The subcellular structure belonging to protein node in the urporotein network G is obtained, and builds difference
Protein set in subcellular structure;
Wherein, CP (s) indicates the protein set in subcellular structure s.In the present embodiment, the classification packet of subcellular structure
Include cytoskeleton, cytoplasmic matrix, endoplasmic reticulum, endosome, extracellular matrix, golgiosome, lysosome, mitochondria, nucleus,
Peroxisome and cytoplasm, 11 kinds altogether.
Step 4:According to the connection relation between protein node in step 1, the activated protein of the different moments in step 2
Protein structure in matter set, step 3 in different subcellular structures set up all kinds of subcellular structures different moments space-time
Sub-network;
Wherein, if protein node v and protein node u with connection relation are located at t in urporotein network G
In the reactive protein set TP (t) at quarter and in the same subcellular structure s, then by the protein node v and albumen
Matter node u be divided to the subcellular structure s t moment space-time sub-network G (t, s);
G (t, s)=(V (t, s), E (t, s))
V (t, s)=v | v ∈ V ∩ v ∈ TP (t) ∩ v ∈ CP (s) }
E (t, s)=(v, u) | a (v, u)=1, v ∈ V (t, s), u ∈ V (t, s) }
In formula, V (t, s) indicates the reactive protein set TP (t) positioned at t moment while being also located in subcellular structure s
Protein set CP (s) protein node set, E (t, s) indicate be located at subcellular structure s in and t moment simultaneously
Connection relation set between active protein node.
It is i.e. it should be appreciated that being corresponded in the present embodiment there are the subcellular structure of all categories of 1-12 moment and 11, then right
There should be 11 × 12 space-time sub-network G (t, s).
Step 5:Calculate the maximal degree centrality of each protein node;
Wherein, the central calculation formula of maximal degree of each protein node is as follows:
MDC (v)=Max (DC (v)) v ∈ V (t, s)
In formula, DC (v) indicates that neighbours' number of the protein node v in a space-time sub-network, N indicate protein node
The number of protein node in space-time sub-network where v, MDC (v) indicate the maximal degree centrality of protein node v.
Hub nodes, which are normally defined those, has the node of higher connection number, therefore selection degree centrality is made in the present invention
It is characterized and is identified, the degree centrality of wherein protein network interior joint refers to its neighbours' number.It should be appreciated that
One protein node v can be in multiple space-time sub-network G (t, s), therefore a protein node may be corresponded in the presence of more
A DC (v), then therefrom choose maximal degree centrality of the maximum value as protein node v;If a protein node v is only
In 1 space-time sub-network G (t, s), exist then the maximal degree centrality of a protein node v is protein node v
Space-time sub-network G (t, s) moderate centrality DC (v).
Step 6:The rule of the maximal degree centrality for all proteins node that foundation step 5 obtains from big to small is to all
Protein node carries out key protein matter of the M protein node as prediction before descending arrangement, then selected row.
Specifically, each protein node is corresponding with maximal degree centrality, carried out successively according to maximal degree centrality
Descending arranges, before the bigger protein node of maximal degree centrality is more arranged.
On the other hand, the present invention also provides a kind of identifying system using above-mentioned recognition methods, which includes number
According to acquisition module, reactive protein set structure module, the protein set of subcellular structure structure module, space-time sub-network structure
Model block, computing module and prediction module;
Wherein, reactive protein set structure module, the protein set of subcellular structure structure module, space-time sub-network
Structure module, computing module are connect with data acquisition module, and prediction module is connect with computing module;
Data acquisition module exists for obtaining protein node in urporotein network G and urporotein network G
Subcellular structure belonging to the gene expression values of different moments, protein node.Such as data acquisition module is from existing data
Library obtains biological data used in the present invention.
Reactive protein set builds module, the reactive protein set for building different moments.Wherein, building process
Please refer to the associated description of the above method.
The protein set of subcellular structure builds module, for building the protein set in different subcellular structures.
Wherein, building process please refers to the associated description of the above method.
Space-time sub-network build module, for build all kinds of subcellular structures different moments space-time sub-network.Wherein,
Building process please refers to the associated description of the above method.
Computing module, the maximal degree centrality for calculating each protein node.Computing module is additionally operable to calculate albumen
Gene expression values of the matter node in different moments.Wherein, calculating process please refers to the associated description of the above method.
Prediction module, for by the maximal degree centrality of all proteins node rule from big to small to all proteins
Node carries out key protein matter of the M protein node as prediction before descending arrangement, then selected row.Emulation and verification
A:The key analysis under space-time condition of Hub nodes:
In order to compare space-time dynamic network to the validity of identification key protein matter, we used public databases
Protein networks in yeast in BioGRID is tested.In this experiment, we calculate separately dynamic in primitive network, time
The marking value of state network, Spatial distributions network and space-time dynamic network protein node in the case of these four, and press result of calculation
Descending arranges, then selects a certain number of protein as Candidate Set successively.By with known key protein matter data set ratio
The key protein prime number amount identified in each case can relatively be counted.As in Fig. 2 (a), shown in (b) figure, we select respectively
100,200 before row are selected ..., 800 are compared as forecast set.As can be seen that when by using gene expression data or
After person's subcellular localization information architecture time or Spatial distributions network, recognition result has compared to original static network
It is apparent to improve, and the method for merging both data structure space-time dynamic network remains best result.
The key analysis in different confidence level networks of B.Hub nodes:
The influence that network in order to analyze different confidence levels identifies key protein matter, we have compiled Mering etc.
Protein interaction data in people's paper, have been classified as the protein network of different confidence levels, have been respectively from high to low
Y2K, Y11K, Y45K and Y78K, and preceding 100,200,300 and is arranged according to the scoring of maximum neighbours' number based on these data statistics
Ratio in 400 prediction protein set shared by known key protein matter.As shown in figure 3, the broken line generation identified with triangle
The broken line of experimental result of the table in primitive network, filled circles mark is identified based on after primitive network structure space-time dynamic network
Experimental result.As selection row preceding 100, by directly in the crucial egg of primitive network found out Hub nodal methods and identified
The ratio of white matter is respectively 0.78,0.58,0.49,0.51, is respectively 0.78 by the ratio after space-time dynamic network struction,
0.87,0.88,0.84.It follows that when in the network of high confidence level, the result of structure space-time dynamic network is equal to or slightly
It is poor higher than the recognition result of static network, and with the continuous reduction of network trusted degree, the two starts to show apparent difference
Property, this illustrates the functional characteristic that can embody protein by using gene expression data and subcellular localization information,
To effectively filter out noise data to improve the accuracy of identification key protein matter.In addition, can be observed how from figure
The ratio of Hub node of four broken lines of filled circles in different Top collection relatively, and triangle mark four broken lines with
It increasing for noise data and shows downward trend.This analyzes Hub nodes results showed that noise data can influence us
It is key, but key be one by merge multi-element biologic data analysis node from different angles and solve well
Approach.
C.Hub node differences are analyzed:
It is key that existing research has shown that Hub nodes are intended to.And since the noise in protein network can cause
This trend is not obvious.The experimental result of front illustrates that by building space-time dynamic network key protein matter can be improved
Recognition accuracy, but how effectively to explain the reason of will appear this phenomenon.For this purpose, we have chosen under different situations
Hub nodes are analyzed, and it is two classes that Hub, which is divided to,:One kind is in primitive network for Hub nodes and in time-space network
For Hub nodes, i.e. Hub_Hub;One is being Hub nodes but be not Hub nodes in space-time sub-network in primitive network,
That is Hub_NonHub.Then, it has carried out testing on YBioGRID networks and has counted known key protein matter in two class Hub nodes
Shared ratio, experimental result are as follows.Though figure 4, it is seen that the variation of threshold value, Hub_Hub nodes than
The key protein prime number mesh for including in Hub_NonHub nodes is more.When the threshold value of Hub is set as 400, in YBioGRID networks
Ratio shared by middle key protein matter is 0.52 (=208/400), and wherein the ratio of key protein matter is in Hub_Hub nodes
The ratio of key protein matter is 0.3376 in 0.7771, Hub_NonHub node.Therefore, by build space-time dynamic network it
Afterwards, reduce the noise to interact with Hub protein in initial data or invalid neighbor node, to improve
The identification precision of key protein matter.
By above-mentioned demonstration, compared to the prior art, the identification that can greatly promote key protein matter is accurate by the present invention
Degree.
It is emphasized that example of the present invention is illustrative, without being restrictive, therefore the present invention is unlimited
Example described in specific implementation mode, other every obtained according to the technique and scheme of the present invention by those skilled in the art
Embodiment does not depart from present inventive concept and range, whether modification or replaces, also belongs to protection model of the invention
It encloses.
Claims (6)
1. a kind of key protein matter recognition methods based on protein space-time sub-network, it is characterised in that:Include the following steps:
Step 1:Obtain urporotein network G;
A (v, u)=1, there are connection relations by v, u
Connection relation is not present in a (v, u)=0, v, u
Wherein, a (v, u) indicates the connection relation of protein node v, u in urporotein network G;
Step 2:Gene expression values of the protein node in different moments in the urporotein network G are obtained, and are built not
Reactive protein set in the same time;
Wherein, if protein node v is greater than or equal to the active threshold of protein node v in the gene expression values e (v, t) of t moment
The protein node v is then added to the reactive protein set TP (t) of t moment by value THR (v);
TP (t)=v | e (v, t) >=THR (v), v ∈ V }
In formula, V is protein node set in urporotein network G;
Step 3:The subcellular structure belonging to protein node in the urporotein network G is obtained, and is built different sub- thin
Protein set in born of the same parents' structure;
Wherein, CP (s) indicates the protein set in subcellular structure s;
Step 4:Build all kinds of subcellular structures different moments space-time sub-network;
Wherein, if protein node v and protein node u with connection relation are located at t moment in urporotein network G
In reactive protein set TP (t) and in the same subcellular structure s, then by the protein node v and protein section
Point u be divided to the subcellular structure s t moment space-time sub-network G (t, s);
G (t, s)=(V (t, s), E (t, s))
V (t, s)=v | v ∈ V ∩ v ∈ TP (t) ∩ v ∈ CP (s) }
E (t, s)=(v, u) | a (v, u)=1, v ∈ V (t, s), u ∈ V (t, s) }
In formula, V (t, s) indicates the egg for being located at the reactive protein set TP (t) of t moment while being also located in subcellular structure s
The set of the protein node of white matter set CP (s), E (t, s) are indicated to be located in subcellular structure s and be had simultaneously in t moment
Connection relation set between active protein node;
Step 5:The maximal degree centrality of each protein node is calculated based on the space-time sub-network constructed in step 4;
Wherein, the maximal degree centrality calculation formula of the protein node is as follows:
MDC (v)=Max (DC (v)) v ∈ V (t, s)
In formula, DC (v) indicates that neighbours' number of the protein node v in space-time sub-network, N indicate residing for protein node v
Protein node total number in space-time sub-network, MDC (v) indicate the maximal degree centrality of protein node v;
Step 6:The maximal degree centrality for all proteins node that foundation step 5 obtains is by rule from big to small to all eggs
White matter node carries out key protein matter of the M protein node as prediction before descending arrangement, then selected row;
Wherein, M is integer.
2. according to the method described in claim 1, it is characterized in that:The calculating of the activity threshold THR (v) of protein node v is public
Formula is as follows:
+ 2.5 σ (v) * of THR (v)=μ (v) (1-1/ (1+ σ2(v)))
In formula, μ (v) indicates the gene of the calculated protein node v of gene expression values based on different moments protein node v
The arithmetic mean of instantaneous value of expression value, σ (v) indicate the calculated protein of gene expression values based on different moments protein node v
The standard deviation of the gene expression values of node v.
3. according to the method described in claim 1, it is characterized in that:Gene table of the protein node in different moments in step 2
Acquisition modes up to value are:
First, the gene expression values at each moment of the protein node within the different metabolic period are obtained;
Then, the average value of the gene expression values of synchronization of the protein node within the different metabolic period is calculated, it is described flat
Mean value is gene expression values of the protein node at the corresponding moment;
At the time of being divided into 2 or more in each metabolism period.
4. according to the method described in claim 3, it is characterized in that:The number for being metabolized the period is 3, and each metabolism period is divided into 12
A moment, the time interval between adjacent moment are 25min.
5. according to the method described in claim 1, it is characterized in that:The classification of the subcellular structure includes cytoskeleton, thin
Cytoplasmic matrix, endoplasmic reticulum, endosome, extracellular matrix, golgiosome, lysosome, mitochondria, nucleus, peroxisome
And cytoplasm.
6. using the identifying system of any one of claim 1-5 the methods, it is characterised in that:Including data acquisition module, work
Property protein set structure module, the protein set of subcellular structure structure module, space-time sub-network structure module, calculate mould
Block and prediction module;
Wherein, the reactive protein set structure module, the protein set of subcellular structure structure module, space-time sub-network
Structure module, computing module are connect with the data acquisition module, and the prediction module is connect with the computing module;
The data acquisition module, for obtaining protein section in urporotein network G and the urporotein network G
Subcellular structure of the point belonging to the gene expression values of different moments, protein node;
The reactive protein set builds module, the reactive protein set for building different moments;
The protein set of the subcellular structure builds module, for building the protein set in different subcellular structures;
The space-time sub-network builds module, for build all kinds of subcellular structures different moments space-time sub-network;
The computing module, the maximal degree centrality for calculating each protein node;
The prediction module, for by the maximal degree centrality of all proteins node rule from big to small to all proteins
Node carries out key protein matter of the M protein node as prediction before descending arrangement, then selected row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810287578.8A CN108509768B (en) | 2018-03-31 | 2018-03-31 | Key protein identification method and system based on protein space-time subnetwork |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810287578.8A CN108509768B (en) | 2018-03-31 | 2018-03-31 | Key protein identification method and system based on protein space-time subnetwork |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108509768A true CN108509768A (en) | 2018-09-07 |
CN108509768B CN108509768B (en) | 2022-02-11 |
Family
ID=63379840
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810287578.8A Active CN108509768B (en) | 2018-03-31 | 2018-03-31 | Key protein identification method and system based on protein space-time subnetwork |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108509768B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102945333A (en) * | 2012-12-04 | 2013-02-27 | 中南大学 | Key protein predicating method based on priori knowledge and network topology characteristics |
CN104156634A (en) * | 2014-08-14 | 2014-11-19 | 中南大学 | Key protein identification method based on subcellular localization specificity |
CN105930684A (en) * | 2016-04-26 | 2016-09-07 | 中南大学 | Genetic expression and subcellular localization information-based protein network refining method |
CN106874961A (en) * | 2017-03-03 | 2017-06-20 | 北京奥开信息科技有限公司 | A kind of indoor scene recognition methods using the very fast learning machine based on local receptor field |
-
2018
- 2018-03-31 CN CN201810287578.8A patent/CN108509768B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102945333A (en) * | 2012-12-04 | 2013-02-27 | 中南大学 | Key protein predicating method based on priori knowledge and network topology characteristics |
CN104156634A (en) * | 2014-08-14 | 2014-11-19 | 中南大学 | Key protein identification method based on subcellular localization specificity |
CN105930684A (en) * | 2016-04-26 | 2016-09-07 | 中南大学 | Genetic expression and subcellular localization information-based protein network refining method |
CN106874961A (en) * | 2017-03-03 | 2017-06-20 | 北京奥开信息科技有限公司 | A kind of indoor scene recognition methods using the very fast learning machine based on local receptor field |
Non-Patent Citations (5)
Title |
---|
MIN LI ET AL.: "Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information", 《ELSEVIER》 * |
QIANGHUA XIAO 等: "Identifying essential proteins from active PPI networks constructed with dynamic gene expression", 《BMC GENOMICS》 * |
XIANGMAO MENG ET AL.: "Construction of the spatial and temporal active protein interaction network for identifying protein complexes", 《IEEE》 * |
XIAOQING PENG ET AL.: "Framework to Identify Protein Complexes Based on Similarity Preclustering", 《TSINGHUA SCIENCE AND TECHNOLOGY》 * |
张含会: "融合蛋白质网络和基因表达数据的关键蛋白质识别方法", 《中国优秀硕士学位论文全文数据库 基础科学辑》 * |
Also Published As
Publication number | Publication date |
---|---|
CN108509768B (en) | 2022-02-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Calderoni et al. | Communities in criminal networks: A case study | |
Hudson et al. | Beyond differential expression: the quest for causal mutations and effector molecules | |
Chua et al. | Increasing the reliability of protein interactomes | |
CN109411033A (en) | A kind of curative effect of medication screening technique based on complex network | |
Yamaguchi et al. | Finding module-based gene networks with state-space models-Mining high-dimensional and short time-course gene expression data | |
Sewell et al. | Restructuring partitioned knowledge: The role of recoordination in category learning | |
Todorov et al. | TinGa: fast and flexible trajectory inference with Growing Neural Gas | |
CN107885971A (en) | Using the method for improving flower pollination algorithm identification key protein matter | |
Esposito et al. | Polymer physics reveals a combinatorial code linking 3D chromatin architecture to 1D chromatin states | |
CN112185458B (en) | Method for predicting binding free energy of protein and ligand molecule based on convolutional neural network | |
CN108509768A (en) | Key protein matter recognition methods based on protein space-time sub-network and identifying system | |
Almaas et al. | Scale-free networks in biology | |
Salehi et al. | Motif structure and cooperation in real-world complex networks | |
CN106127503A (en) | A kind of Analysis of Network Information method based on true social relations and big data | |
CN114398430A (en) | Complex network link prediction method based on multi-target mixed integer programming model | |
CN113345535A (en) | Drug target prediction method and system for keeping chemical property and function consistency of drug | |
Lin | Single-cell topological simplicial analysis reveals higher-order cellular complexity | |
Tran et al. | Single-cell RNA sequencing data imputation using deep neural network | |
Tagore et al. | Detecting breakdown points in metabolic networks | |
Ahmed et al. | Time-varying networks: Recovering temporally rewiring genetic networks during the life cycle of drosophila melanogaster | |
CN112231934B (en) | Community structure generation method and system based on gravity model | |
Yilmaz et al. | Prediction of Kinase-Substrate Associations Using The Functional Landscape of Kinases and Phosphorylation Sites | |
Ayati et al. | Prediction of Kinase-Substrate Associations Using The Functional Landscape of Kinases and Phosphorylation Sites | |
Lomi et al. | EXPLORING THE RELATIONAL BASES OF AGE DEPENDENCE IN ORGANIZATIONAL MORTALITY RATES. | |
Goltsev et al. | Time warping of evolutionary distant temporal gene expression data based on noise suppression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |