CN107784196A - Method based on Artificial Fish Swarm Optimization Algorithm identification key protein matter - Google Patents
Method based on Artificial Fish Swarm Optimization Algorithm identification key protein matter Download PDFInfo
- Publication number
- CN107784196A CN107784196A CN201710912037.5A CN201710912037A CN107784196A CN 107784196 A CN107784196 A CN 107784196A CN 201710912037 A CN201710912037 A CN 201710912037A CN 107784196 A CN107784196 A CN 107784196A
- Authority
- CN
- China
- Prior art keywords
- protein
- node
- fish
- formula
- artificial fish
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Physics & Mathematics (AREA)
- Genetics & Genomics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Physiology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of method based on Artificial Fish Swarm Optimization Algorithm identification key protein matter, by protein-protein interaction network be converted into non-directed graph, structure purification protein-protein interaction network, obtain protein corresponding to the degree of nbccs gene expression value, GO annotation informations and protein in known compound, the protein-protein interaction network side after purification and node are handled, choose known key protein matter as original manual fish, Artificial Fish execution foraging behavior, random behavior, behavior of knocking into the back, bunching behavior and produces key protein matter.The inventive method can identify key protein matter exactly;The simulation experiment result shows that the index performance such as susceptibility, specificity, positive predictive value, negative predictive value is more excellent;Compared with other key protein matter recognition methods, the topological characteristic of the optimization characteristics of artificial fish-swarm and protein-protein interaction network is combined to the identification process for realizing key protein matter, improves the recognition accuracy of key protein matter.
Description
Technical field
The invention belongs to biological information field, and in particular to one kind is based on Artificial Fish Swarm Optimization Algorithm identification key protein matter
Method.
Background technology
Key protein matter is the product of key gene, is that organism sustains life the essential part of activity.It is crucial
The missing of protein can cause vital movement not to be normally carried out, and even result in organisms die.The prediction of key protein matter with
Identification is a significant research work, on the one hand, helps to study the adjusting and controlling growth process related to cell;Separately
On the one hand, also have far-reaching significance for medical diagnosis on disease and drug design.Initially, the identification of key protein matter is mainly logical
BIOLOGICAL TEST METHODS, such as single-gene knockout and RNA interference etc. are crossed, identifies although key protein is accurate by these experimental techniques
Effect is truly had, but cost is high, and efficiency is low.Therefore, identify that key protein becomes by the method for calculating in field of bioinformatics
The focus and emphasis of research.
At present, to realize that the identification of key protein matter mainly has by computational methods following two:Knot based on network topology
The method that dot center's property method, PPI networks and biological data combine.
Jeong is equal to " centrality-lethal " rule proposed in 2001 and points out key and egg of a protein
White matter is closely related in the topological property of protein-protein interaction network, that is, possesses the missing of protein of more neighbor node more
It is easy to influence the topological structure of whole network.In short, in protein network, the higher protein node of degree more tends to table
Reveal key, the missing of the proteinoid, more easily cause the forfeiture of body function, produce lethal effect.The rule is base
Laid a good foundation in the key protein matter identification of network topology.Afterwards, it is a series of to be known based on the central key protein matter of topology
Other method is suggested, including degree centrality (Degree Centrality, DC), betweenness center (Betweenness
Centrality, BC), tight ness rating centrality (Closeness Centrality, CC), eigenvector centrality
(Eigenvector Centrality, EC), information centre's property (Information Centrality, IC), subgraph centrality
(Subgraph Centrality, SC).These methods be all by all proteins node in protein-protein interaction network
In some central value given a mark, sorted, and then identify key protein.But these centrality methods highly rely on egg
The reliability of white matter interactive network, because protein-protein interaction network is obtained by high flux Bioexperiment, wherein
A large amount of false positives are contained, this largely have impact on the accuracy rate of key protein matter identification.
The shortcomings that existing for centrality method identification key protein matter, researcher propose that some new recognition methods come
Improve the accuracy rate of identification key protein matter.If PeC key protein matter recognition methods is by protein-protein interaction network and gene
Express spectra combines, and ION key protein matter recognition methods enters the homogeneous character of protein with protein-protein interaction network
Row combines, and UDoNC key protein matter recognition methods combines protein domain and protein-protein interaction network, and SCP is crucial
Subcellular Localization information and protein-protein interaction network are combined by protein identification method.In addition, there are some to be based on
Priori carries out key protein matter and knows method for distinguishing, such as CPPK and CEPPK, using key protein matter known to part as priori
Knowledge, the key of the protein is judged by the tightness degree of other protein and priori in network.
Numerous studies show that protein is key, and there is close contact between protein complex.Hart et al.
Found by research experiment, the key of protein is determined by single protein, and is often depending on protein and is answered
The function of compound.And show that often richness concentrates in some compounds key protein matter by experimental data.Therefore largely it is based on
The key protein matter recognition methods of protein complex and functional module is suggested.
Although as the development of bioinformatics, identification of the researcher to key protein matter conducts in-depth research, but
It is still relatively low to be currently based on the accuracy rate of the recognition methods of network topology, and most methods are all isolated or by piecemeal made
With a small number of parameters or signature analysis key protein matter, lack for node from the assurance in the overall and overall situation.Further, since pass through
The protein interaction packet that high-throughput techniques obtain contains substantial amounts of false positive, it is impossible to real protein network is represented,
Therefore structure one more truly the protein-protein interaction network of mimic biology body can help further to lift key protein matter
Recognition accuracy.
The defects of summary key protein matter recognition methods, mainly have and do not consider the reliable of protein-protein interaction network
Property, only consider that Partial Feature lacks assurance global and on the whole, key protein matter recognition accuracy is relatively low.
The content of the invention
The shortcomings that it is an object of the invention to overcome prior art and deficiency, there is provided one kind is based on Artificial Fish Swarm Optimization Algorithm
The method for identifying key protein matter, builds the protein-protein interaction network of a purification, and the recognition accuracy of key protein matter is high.
To reach above-mentioned purpose, the present invention adopts the following technical scheme that:
Comprise the following steps:
(1) protein-protein interaction network is converted into non-directed graph:
Protein-protein interaction network is changed into a non-directed graph G=(V, E), wherein, V={ vi, i=1,2 ..., n }
For node viSet, E be side e set, node viProtein is represented, side e represents the interaction between protein;
(2) protein-protein interaction network of structure purification:
In time point t, node viGene expression values EpitIf being more than activity of gene expression threshold value A ctive_Th (i),
Then think node viIt is active in time point t, otherwise it is assumed that the node does not have activity in time point t;If any two in V
Different node v, u is simultaneously active in time point t, then it is assumed that the node v under time point t, u are co-expressed;By in non-directed graph
All leave out under all time points without the side corresponding to the protein interaction of coexpression, build the protein of a purification
Interactive network;
(3) side and node of the protein-protein interaction network of purification are handled:The convergence factor ECC on calculating side,
While Pearson correlation coefficients PCC, while degree inside protein complex of GO functional similarity and node;
(4) key protein matter composition original manual fish known to choosing:
It is artificial fingerling group scale to make N, and m is the quantity of the known key protein matter included in every Artificial Fish;Current
The Artificial Fish that m known key protein matter form a priori is randomly selected in known key protein matter;Fish (k) tables
Show the known key protein matter set included in kth bar original manual fish, k=1,2 ... N;Cn is of candidate key protein
Number;
(5) foraging behavior:
All neighbours' protein of protein in every Artificial Fish are found out, form neighbours' protein node set Neighbor
(k), and set Neighbor (k) and the protein in set Neighbor (l) are different, k=1,2 ... N, l=1,2 ...
N,k≠l;For each node v in Neighbor (k)iAccording to formula score1 (i)=fitness1 (vi, Fish (k)) really
Surely the possibility being merged into Artificial Fish Fish (k), by the node in neighbours protein node set Neighbor (k) according to it
Score1 scores carry out descending sort, and score1 value highest protein node is added in Fish (k), is added to simultaneously
In set Add (k);Foraging behavior is repeated Tn times, and Tn protein node is added into original manual fish;
(6) knock into the back behavior:
After foraging behavior performs, every Artificial Fish is determined according to formula S core2 (k)=fitness2 (Add (k))
Artificial Fish in optimum state, descending sort, Score2 value highest are carried out according to its Score2 score to all Artificial Fishs
Artificial Fish be optimal Artificial Fish Fish (p), p ∈ [1, N], the set Add (p) corresponding to optimal Artificial Fish Fish (p)
In protein node be added in set Candidate;
(7) bunch behavior:
In addition to set Add (p) corresponding to optimal Artificial Fish Fish (p), it will gather corresponding to remaining Artificial Fish Fish (k)
Node v in Add (k)iAccording to formula S core3 (i)=fitness3 (vi) calculate score, wherein k ≠ p;To all viAccording to
Its Score3 score carries out descending sort, and it is the crowding factor to make δ, and δ protein node for selecting to come above is added to collection
Close in Candidate;
(8) key protein matter is produced:
Exported the protein node in the set Candidate obtained by step (7) as key protein matter.
Further, gene expression threshold value A ctive_Th (i) is obtained by formula (1):
Active_Th (the i)=σ of μ (i)+3 (i) (1-F (i)) formula (1)
μ (i) is node v in formula (1)iAverage gene expression value, σ (i) are the standard deviations of gene expression values;F (i)=1/ (1
+σ2) it is weight function.
Further, in step (3), the convergence factor on side is calculated by formula (2):
In formula, Ni,NjNode v is represented respectivelyi,vjNeighbor node collection;
The Pearson correlation coefficients on side are calculated by formula (3):
In formula, EpitAnd EpjtNode v is represented respectivelyi,And vjGene expression values in time point t, μ (i) and μ (j) are
Node viAnd vjAverage gene expression value, T be time point maximum;
The GO functional similarity on side is calculated by formula (4):
In formula, GOi,GOjAnnotation node v is represented respectivelyiWith node vjGO terms;
Node v is calculated by formula (5)iDegree inside protein complex:
In formula, V (| C |) represent the node set included in protein complex, CviExpression includes node viAlbumen
Matter compound, Din(vi, Cvi) represent node viIn protein complex CviIn degree, vjIt is viNeighbor node.
Further, node v in the middle set Neighbor (k) of step (5)iThe possibility being added in Artificial Fish Fish (k)
Property fiitness1 is obtained by formula (6):
V in formulajIt is the protein node inside Artificial Fish Fish (k), ECC is node viWith node vjBetween side it is poly-
Collect coefficient, PCC is node viWith node vjBetween side Pearson correlation coefficients, GO_sim is node viWith node vjBetween
Functional similarity.
Further, in step (5), if there is no suitable protein node to be added in foraging behavior implementation procedure
In Artificial Fish, then random behavior is performed, one protein node of random selection is added to neighbours' protein node set
In Neighbor (k).
Further, the possibility fitness2 that the middle determination Artificial Fish of step (6) is in optimum state is obtained by formula (7):
In formula, Add (k) represents that kth bar Artificial Fish passes through the protein node set that Tn foraging behavior is added.
Further, node v in determination set Add (k) in step (7), k ≠ piScore fitness3 by formula (8)
Arrive:
W(vi,vj)=ECC (vi,vj)×(PCC(vi,vj)+GO_sim(vi,vj)) formula (9)
In formula (8), a, b are coefficients, meet a+b=1, Nei (vi) represent node viNeighbor node set, DIC (vi) table
Show node viDegree inside protein complex.
Further, δ=Cn-Tn in step (7).
The present invention compared with the existing methods, has advantages below:
1st, key protein matter known to selected section of the present invention is more likely to phase each other as priori according to key protein
Connect, the neighbor node for forming artificial fish protein is scanned for complete key protein by the foraging behavior of Artificial Fish
The prediction of matter, the topological property of key protein is in a network taken into full account.
2nd, the present invention in when Artificial Fish execution bunch behavior protein node is given a mark when, used side convergence factor
(ECC), Pearson correlation coefficients (PCC), GO functional similarity (GO_sim), the protein of two interactions has been considered
Between the tightness degree, the similitude of gene expression, the protein function correlation that connect;And protein has been used in compound
Internal participation (DIC), it is contemplated that protein is key to cause key protein with the relation of compound, the fusion of multifrequency nature
Identification is more accurate.
3rd, the process for looking for food or finding companion of present invention simulation artificial fish-swarm identifies key protein matter, and build one can
The protein-protein interaction network leaned on, topological property, the gene table of protein of protein-protein interaction network are considered
Up to value, GO Semantic Similarities, protein complex information and priori, and the Optimization Mechanism of artificial fish-swarm is added, in many ways
Region feature is used so that the degree of accuracy of the key protein matter identified using the present invention than using other crucial eggs at present
The degree of accuracy of white matter recognition methods identification is high.
4th, the inventive method can identify key event exactly;The simulation experiment result shows, susceptibility, specificity, the positive
The index performance such as predicted value, negative predictive value is more excellent;It is compared with other key protein recognition methods, the optimization of artificial fish-swarm is special
Property with the topological characteristic of node interactive network be combined the identification process for realizing key event, improve key event
Recognition accuracy.
5th, key protein matter can effectively be identified from protein-protein interaction network using the present invention, not only assisted in
Understand the adjusting and controlling growth process of cell and the Operational Mechanisms of vital movement, while to how accurately to develop medicine and diagnoses and treatment
Disease also has extremely important theory value.
Brief description of the drawings
Fig. 1 is the process chart of the embodiment of the present invention 1.
Fig. 2 is part signal of the key protein matter drawn using embodiment 1 in whole protein-protein interaction network
Figure.
Fig. 3 is key protein matter situation in java standard library corresponding to Fig. 2.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only the part of the embodiment of the present invention, rather than whole embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made
Embodiment, belong to the scope of protection of the invention.
As shown in figure 1, method of the present invention based on Artificial Fish Swarm Optimization Algorithm identification key protein matter, including following step
Suddenly:
(1) protein-protein interaction network is converted into non-directed graph
Protein-protein interaction network is changed into a non-directed graph G=(V, E), wherein, V={ vi, i=1,2 ..., n }
For node viSet, E be side e set, node viProtein is represented, side e represents the interaction between protein;
(2) protein-protein interaction network of structure purification
In time point t, node viGene expression values EpitIf being more than activity of gene expression threshold value A ctive_Th (i),
Then think node viIt is active in time point t;Otherwise it is assumed that the node does not have activity in time point t;If any two in V
Different node v, u is simultaneously active in time point t, then it is assumed that the node v under time point t, u are co-expressed;By in non-directed graph
All leave out under all time points without the side corresponding to the protein interaction of coexpression, build a new protein phase
Interaction network, that is, the protein network purified;
EpitFor node viGene expression values at time point t;
Activity of gene expression threshold value A ctive_Th (i) is obtained by formula (1):
Active_Th (the i)=σ of μ (i)+3 (i) (1-F (i)) formula (1)
μ (i) is node v in formula (1)iAverage gene expression value, σ (i) are the standard deviations of gene expression values;F (i)=1/ (1+ σ2)
It is weight function;
(3) side and node of the protein-protein interaction network after purification are handled
The convergence factor on side is calculated by formula (2):
In formula, Ni,NjNode v is represented respectivelyi,vjNeighbor node collection;
The Pearson correlation coefficients on side are calculated by formula (3):
In formula, EpitAnd EpjtNode v is represented respectivelyiAnd vjGene expression values in time point t, μ (i) and μ (j) are knots
Point viAnd vjAverage gene expression value, T be time point maximum;
The GO functional similarity on side is calculated by formula (4):
In formula, GOi,GOjAnnotation node v is represented respectivelyiWith node vjGO terms.
To node viPretreatment:Node v is calculated by formula (5)iDegree inside protein complex:
In formula, V (| C |) represent the protein node set included in protein complex, CviExpression includes node vi
Protein complex, Din(vi, Cvi) represent node viIn protein complex CviIn degree, vjIt is viNeighbor node;
(4) key protein matter is as original manual fish known to choosing
It is artificial fingerling group scale to make N, and m is the quantity of the known key protein matter included in every Artificial Fish;In standard
The Artificial Fish that m known key protein matter form a priori is randomly selected in storehouse (the key protein matter being currently known);
Fish (k) represents the known key protein matter set included in kth bar original manual fish, k=1,2 ... N;Cn is candidate key egg
The number of white matter;
(5) foraging behavior
Artificial Fish search of food in visual range is to find the albumen that direct interaction be present with artificial fish protein
Matter, all neighbours' protein of protein in every Artificial Fish are found out, form neighbours protein node set Neighbor (k),
And node in set Neighbor (k) and set Neighbor (l) it is different (k=1,2 ... N, l=1,2 ... N, k ≠
L), for each node v in Neighbor (k)iAccording to formula score1 (i)=fitness1 (vi, Fish (k)) determine to close
And to the possibility in Artificial Fish Fish (k), by the node in neighbours protein node set Neighbor (k) according to it
Score1 scores carry out descending sort, score1 value highest node are added in Fish (k), while be added to set
In Add (k);
Random behavior:If there is no suitable protein to be added in Artificial Fish in foraging behavior implementation procedure, hold
Row random behavior, one protein node of random selection are added in set Neighbor (k);
Foraging behavior is repeated Tn times, i.e., Tn protein node is added into original manual fish;
(6) knock into the back behavior
After foraging behavior performs, every Artificial Fish is determined according to formula S core2 (k)=fitness2 (Add (k))
Artificial Fish in optimum state, descending sort, Score2 value highest are carried out according to its Score2 score to all Artificial Fishs
Artificial Fish be optimal Artificial Fish Fish (p), p ∈ [1, N], the set Add (p) corresponding to optimal Artificial Fish Fish (p)
In protein node be added in set Candidate;
(7) bunch behavior
In addition to set Add (p) corresponding to optimal Artificial Fish Fish (p), by corresponding to remaining Artificial Fish Fish (k) (k ≠ p)
Node v in set Add (k) (k ≠ p)iAccording to formula S core3 (i)=fitness3 (vi) score is calculated, to all viAccording to
Its Score3 score carries out descending sort, and it is the crowding factor to make δ, δ=Cn-Tn, selects to come δ protein node above
It is added in set Candidate;
(8) key protein matter is produced
Exported the protein in set Candidate as key protein matter.
Set Neighbor (k) nodes v in the step (5) of the present inventioniThe possibility being added in Artificial Fish Fish (k)
Fiitness1 is obtained by formula (6):
V in formulajIt is the node inside Artificial Fish Fish (k);ECC is node viWith node vjBetween side convergence factor,
Obtained by formula (2);PCC is node viWith node vjBetween side Pearson correlation coefficients, obtained by formula (3);GO_sim
It is node viWith node vjBetween functional similarity, obtained by formula (4).
The possibility fitness2 that the middle determination Artificial Fish of the step (6) of the present invention is in optimum state is obtained by formula (7):
In formula, Add (k) represents that kth bar Artificial Fish passes through the protein node set that Tn foraging behavior is added,
fitness1(vi, Fish (k)) and as shown in formula (6).
Determine the score fitness3 of protein node in set Add (k) (k ≠ p) by formula in the step (7) of the present invention
(8) obtain:
W(vi,vj)=ECC (vi,vj)×(PCC(vi,vj)+GO_sim(vi,vj)) formula (9)
In formula (8), a, b are coefficients, meet a+b=1, Nei (vi) represent node viNeighbor node set, DIC (vi) represent knot
Point viObtained in the degree of inside compounds by formula (5).
Below by way of specific embodiment, the present invention is described in more detail:
Embodiment 1
By taking protein network as an example it is a kind of based on Artificial Fish Swarm Optimization Algorithm identification key protein matter method the step of such as
Under:
The present embodiment is used as emulation data set, DIP using the yeast data set (DIP 20160114 editions) for picking up from DIP databases
Data contain 5028 protein and 22303 interaction relationships.The ferment that gene expression dataset is picked up from GEO databases
Female metabolism expression data set GSE3431, including 9336 genes, the genic value at totally 36 time points in 3 cycles, cover
95% protein covered in DIP.GO data include annotation spectrum and SGD, it is known that protein complex information is to come from
CYC2008, including 408 protein complexes, cover 1492 protein, key protein prime number according to by integrate MIPS,
Data in tetra- databases of SGD, DEG and SGDP obtain, and contain 1285 key protein matter altogether, correspond to 5028 albumen
It is key protein to have 1152 in matter, and remaining is regarded as non-key albumen.Experiment porch is the operating systems of Windows 10,
Intel Duo i5-6600 double-core 3.31GHz processors, 8GB physical memories, realize the present invention's with Matlab R2014a softwares
Method.
1st, protein-protein interaction network is converted into non-directed graph
Protein-protein interaction network comprising 5028 protein and 22303 interaction relationships is changed into one
Non-directed graph G=(V, E), wherein, V={ vi, i=1,2 ..., 5028 } it is node viSet, E be side e set, node viTable
Show protein, side e represents the interaction between protein.
2nd, the protein-protein interaction network of structure purification
In time point t, node viGene expression values EpitIf being more than activity of gene expression threshold value A ctive_Th (i),
Then think node viIt is active in time point t;Otherwise it is assumed that the node does not have activity in time point t;If any two in V
Different node v, u is simultaneously active in time point t, then it is assumed that the v under time point t, u are co-expressed;Activity of gene expression threshold
Value Active_Th (i) is obtained by formula (1):
Active_Th (the i)=σ of μ (i)+3 (i) (1-F (i)) formula (1)
μ (i) is node v in formula (1)iAverage gene expression value, σ (i) are the standard deviations of gene expression values;F (i)=1/ (1+ σ2)
It is weight function.By above-mentioned processing, correspond in crude protein interactive network, delete at all time points all without table altogether
The protein interaction reached, form a new protein interaction with 5028 protein nodes and 9576 sides
Network, that is, the protein-protein interaction network purified.
3rd, the side and node of the protein-protein interaction network after purification are handled
The convergence factor on side is calculated by formula (2):
In formula, Ni,NjPoint v is represented respectivelyi,vjNeighbor node number, di,djIt is point v respectivelyi,vjDegree;Based on formula (3)
Calculate the Pearson correlation coefficients on side:
In formula, EPit,EPjtRepresent node vi,vjGene expression values in time point t, μ (i), μ (j) are node vi,vj
Average gene expression value, T be time point maximum;The GO functional similarity on side is calculated by formula (4):
In formula, GOi,GOjAnnotation protein v is represented respectivelyiWith protein vjGO terms.
To node viPretreatment:I=1,2 ..., 5028, to the i of a given determination, node v can be calculatediIn protein
The participation of inside compounds, node v is calculated by formula (5)iDegree inside protein complex:
In formula, V (| C |) represent the protein node set included in protein complex, CviExpression includes protein
viProtein complex, Din(vi,Cvi) represent protein viIn protein complex CviIn degree, vjIt is viNeighbours knot
Point.
4th, key protein matter composition original manual fish known to choosing
It is artificial fingerling group scale to make N, and m is the quantity of the known key protein matter included in every Artificial Fish;For every
Bar Artificial Fish, 100 known key protein matter compositions, one priori is randomly selected in 1152 key protein matter in java standard library
The Artificial Fish of knowledge, Fish (k) represent the protein set included in kth bar original manual fish;N=100 in this example, m
=100;Cn is the number of candidate key protein.
5th, foraging behavior
Artificial Fish search of food in visual range is to find the albumen that direct interaction be present with artificial fish protein
Matter, find out all neighbours protein N eighbor (k) of protein in every Artificial Fish, and set Neighbor (k) and collection
The protein closed in Neighbor (l) is different (100, k ≠ l of k=1,2 ... 100, l=1,2 ...), for Neighbor (k)
In each protein viAccording to score1 (i)=fitness1 (vi, Fish (k)) determine to be merged into Artificial Fish Fish (k)
Possibility, the node in protein node set Neighbor (k) is subjected to descending sort according to its score1 score, will
Score1 value highest node is added in Fish (k), while is added in set Add (k), and score1 (i) is egg in formula
White matter viWith the cohesion of all proteins in Artificial Fish, cohesion is obtained by formula (6):
V in formulajIt is the protein node inside Artificial Fish Fish (k), ECC is node viWith node vjBetween side it is poly-
Collection coefficient is obtained by formula (2), and PCC is node viWith node vjBetween the Pearson correlation coefficients on side obtained by formula (3),
GO_sim is node viWith node vjBetween functional similarity obtained by formula (4).
If there is no suitable protein to be added in Artificial Fish in foraging behavior implementation procedure, random row is performed
For one protein node of random selection is added in set Neighbor (k).Foraging behavior (or random behavior) repeats
The Tn Tn protein node of addition i.e. into original manual fish.
6th, knock into the back behavior
After foraging behavior (or random behavior) performs, to every Artificial Fish according to formula S core2 (k)=fitness2
(Add (k)) determines the Artificial Fish in optimum state, and descending sort is carried out according to its Score2 score to all Artificial Fishs,
Score2 value highest Artificial Fish is optimal Artificial Fish Fish (p) (p ∈ [1,100]), corresponding to optimal Artificial Fish
Protein node in Fish (p) set Add (p) is added in set Candidate, and fitness2 represents to the addition of albumen
After matter, the fitness of every Artificial Fish, obtained by formula (7):
In formula, Add (k) represents that kth bar Artificial Fish passes through the protein node set that Tn foraging behavior is added,
fitness1(vi, Fish (k)) and as shown in formula (6).
7th, bunch behavior
In addition to set Add (p) corresponding to optimal Artificial Fish Fish (p), by corresponding to remaining Artificial Fish Fish (k) (k ≠ p)
Protein node v in set Add (k) (k ≠ p)iAccording to formula S core3 (i)=fitness3 (vi) score is calculated, to all
viDescending sort is carried out according to its Score3 score, it is the crowding factor to make δ (δ=Cn-Tn), and selection comes δ egg above
White matter node is added in set Candidate, and fitness3 represents the score value of protein node in set Add (k) (k ≠ p),
Obtained by formula (8):
W(vi,vj)=ECC (vi,vj)×(PCC(vi,vj)+GO_sim(vi,vj)) formula (9)
In formula, a, b are coefficients, a=0.8, b=0.2, Nei (vi) represent node viNeighbor node set, DIC (vi) represent knot
Point viObtained in the degree of inside compounds by formula (5).
9th, key protein matter is produced
Exported the protein in set Candidate as key protein matter.
In order to verify effectiveness of the invention, inventor is closed using the identification of the Artificial Fish Swarm Optimization Algorithm of the embodiment of the present invention 1
Key method of protein carries out the identification of key protein matter to the protein network in DIP databases, to candidate key protein
When number (Cn) is 100,200,300,400,500 and 600, the key protein matter correctly identified is analyzed, in this reality
In testing, we are that every Artificial Fish takes 100 known key protein matter as priori, in view of being used as priori in experimentation
Known key protein matter randomly select, therefore experiment is carried out 50 times, takes the average value of 50 experimental results as most
Terminate fruit, the results are shown in Table 1, Fig. 2 and Fig. 3, and table 1 shows the knot identified with the method for other current identification key protein matter
The comparison of accuracy rate is identified in fruit.The distribution of the Partial key protein of the invention identified in a network is shown in fig. 2
Situation, Fig. 3 show Fig. 2 corresponding java standard library part.
Contrast of the present invention of table 1 with other method identification key protein matter in accuracy rate
Table 1 show present invention will identify that 100,200,300,400,500,600 protein as candidate key
Recognition accuracy of the protein compared with key protein matter in java standard library, and identify key protein matter sides with current other
The contrast of method recognition result.Before identification during 600 key protein matter, shown compared with remaining 8 kinds of key protein recognition methods
Going out the present invention has higher predictablity rate.Found out by table 2, effectively key protein matter can be identified by the present invention, wait
Selecting the number of key protein, the present invention suffers from highest recognition accuracy from 100 to 600.Fig. 2 shows that the present invention identifies
Position of the Partial key protein in protein-protein interaction network.That carry dark-background color in Fig. 2 is the present invention
The key protein matter correctly identified, the key protein matter come out with light background wrong identification, white background is non-pass
Key protein.Fig. 3 is the key protein matter situation in java standard library corresponding to Fig. 2.By Fig. 2 and Fig. 3 contrast it can be found that originally
Inventing the wrong protein identified has " YDR283C " " YPL246C ", and the key protein matter for leaking identification has " YBR152W ".If
Using key protein matter known to part as priori, then the inventive method can correctly identify the big portion around the priori
Divide key protein matter.
Method of the present invention based on Artificial Fish Swarm Optimization Algorithm identification key protein matter, protein-protein interaction network is turned
Turn to non-directed graph, the protein-protein interaction network that structure purifies, obtain nbccs gene expression value, GO corresponding to protein
The degree of annotation information and protein in known compound, the protein-protein interaction network side after purification and node are carried out
Key protein matter known to processing, selection performs foraging behavior, random behavior, behavior of knocking into the back as original manual fish, Artificial Fish, gathered
Group's behavior simultaneously produces key protein matter.The inventive method can identify key protein matter exactly;The simulation experiment result shows, sensitive
The index performance such as degree, specificity, positive predictive value, negative predictive value is more excellent;, will compared with other key protein matter recognition methods
The optimization characteristics of artificial fish-swarm are combined the identification for realizing key protein matter with the topological characteristic of node interactive network
Journey, improve the recognition accuracy of key protein matter.
Described above is the preferred embodiment of the present invention, passes through described above content, the related work of the art
Personnel can carry out various improvement and replacement on the premise of without departing from the technology of the present invention principle, and these improve and replaced
It should be regarded as protection scope of the present invention.
Claims (8)
1. the method based on Artificial Fish Swarm Optimization Algorithm identification key protein matter, it is characterised in that:Comprise the following steps:
(1) protein-protein interaction network is converted into non-directed graph:
Protein-protein interaction network is changed into a non-directed graph G=(V, E), wherein, V={ vi, i=1,2 ..., n } it is knot
Point viSet, E be side e set, node viProtein is represented, side e represents the interaction between protein;
(2) protein-protein interaction network of structure purification:
In time point t, node viGene expression values EpitIf being more than activity of gene expression threshold value A ctive_Th (i), recognize
For node viIt is active in time point t, otherwise it is assumed that the node does not have activity in time point t;If any two is different in V
Node v, u time point t simultaneously it is active, then it is assumed that the node v under time point t, u co-express;By in non-directed graph in institute
All leave out under having time point without the side corresponding to the protein interaction of coexpression, the protein for building a purification is mutual
Act on network;
(3) side and node of the protein-protein interaction network of purification are handled:Calculate while convergence factor ECC, while
The degree of Pearson correlation coefficients PCC, the GO functional similarity and node on side inside protein complex;
(4) key protein matter composition original manual fish known to choosing:
It is artificial fingerling group scale to make N, and m is the quantity of the known key protein matter included in every Artificial Fish;It is being currently known
Key protein matter in randomly select the Artificial Fish that the known key protein matter of m form a priori;Fish (k) represents the
The known key protein matter set included in k bar original manual fishes, k=1,2 ... N;Cn is the number of candidate key protein;
(5) foraging behavior:
All neighbours' protein of protein in every Artificial Fish are found out, form neighbours protein node set Neighbor (k),
And set Neighbor (k) and the protein in set Neighbor (l) are different, k=1,2 ... N, l=1,2 ... N, k
≠l;For each node v in Neighbor (k)iAccording to formula score1 (i)=fitness1 (vi, Fish (k)) determine to close
And to the possibility in Artificial Fish Fish (k), by the node in neighbours protein node set Neighbor (k) according to it
Score1 scores carry out descending sort, and score1 value highest protein node is added in Fish (k), is added to simultaneously
In set Add (k);Foraging behavior is repeated Tn times, and Tn protein node is added into original manual fish;
(6) knock into the back behavior:
After foraging behavior performs, every Artificial Fish is determined to be according to formula S core2 (k)=fitness2 (Add (k))
The Artificial Fish of optimum state, descending sort, Score2 value highest people are carried out according to its Score2 score to all Artificial Fishs
Work fish is optimal Artificial Fish Fish (p), p ∈ [1, N], corresponding in the set Add (p) of optimal Artificial Fish Fish (p)
Protein node is added in set Candidate;
(7) bunch behavior:
In addition to set Add (p) corresponding to optimal Artificial Fish Fish (p), by set Add (k) corresponding to remaining Artificial Fish Fish (k)
In node viAccording to formula S core3 (i)=fitness3 (vi) calculate score, wherein k ≠ p;To all viAccording to it
Score3 scores carry out descending sort, and it is the crowding factor to make δ, and δ protein node for selecting to come above is added to set
In Candidate;
(8) key protein matter is produced:
Exported the protein node in the set Candidate obtained by step (7) as key protein matter.
2. the method as claimed in claim 1 based on Artificial Fish Swarm Optimization Algorithm identification key protein matter, it is characterised in that:
Gene expression threshold value A ctive_Th (i) is obtained by formula (1):
Active_Th (the i)=σ of μ (i)+3 (i) (1-F (i)) formula (1)
μ (i) is node v in formula (1)iAverage gene expression value, σ (i) are the standard deviations of gene expression values;F (i)=1/ (1+ σ2)
It is weight function.
3. the method as claimed in claim 1 based on Artificial Fish Swarm Optimization Algorithm identification key protein matter, it is characterised in that:Step
Suddenly in (3), the convergence factor on side is calculated by formula (2):
In formula, Ni,NjNode v is represented respectivelyi,vjNeighbor node collection;
The Pearson correlation coefficients on side are calculated by formula (3):
In formula, EpitAnd EpjtNode v is represented respectivelyi,And vjGene expression values in time point t, μ (i) and μ (j) are node vi
And vjAverage gene expression value, T be time point maximum;
The GO functional similarity on side is calculated by formula (4):
In formula, GOi,GOjAnnotation node v is represented respectivelyiWith node vjGO terms;
Node v is calculated by formula (5)iDegree inside protein complex:
In formula, V (| C |) represent the node set included in protein complex, CviExpression includes node viProtein answer
Compound, Din(vi, Cvi) represent node viIn protein complex CviIn degree, vjIt is viNeighbor node.
4. the method as claimed in claim 1 based on Artificial Fish Swarm Optimization Algorithm identification key protein matter, it is characterised in that:Step
Suddenly node v in (5) middle set Neighbor (k)iThe possibility fiitness1 being added in Artificial Fish Fish (k) is obtained by formula (6)
Arrive:
V in formulajIt is the protein node inside Artificial Fish Fish (k), ECC is node viWith node vjBetween side aggregation system
Number, PCC is node viWith node vjBetween side Pearson correlation coefficients, GO_sim is node viWith node vjBetween work(
Can similitude.
5. the method as claimed in claim 1 based on Artificial Fish Swarm Optimization Algorithm identification key protein matter, it is characterised in that:Step
Suddenly in (5), if there is no suitable protein node to be added in Artificial Fish in foraging behavior implementation procedure, perform random
Behavior, one protein node of random selection are added in neighbours protein node set Neighbor (k).
6. the method as claimed in claim 1 based on Artificial Fish Swarm Optimization Algorithm identification key protein matter, it is characterised in that:Step
Suddenly the possibility fitness2 that (6) middle determination Artificial Fish is in optimum state is obtained by formula (7):
In formula, Add (k) represents that kth bar Artificial Fish passes through the protein node set that Tn foraging behavior is added.
7. the method as claimed in claim 1 based on Artificial Fish Swarm Optimization Algorithm identification key protein matter, it is characterised in that:Step
Suddenly node v in determination set Add (k) in (7), k ≠ piScore fitness3 obtained by formula (8):
W(vi,vj)=ECC (vi,vj)×(PCC(vi,vj)+GO_sim(vi,vj)) formula (9)
In formula (8), a, b are coefficients, meet a+b=1, Nei (vi) represent node viNeighbor node set, DIC (vi) represent knot
Point viDegree inside protein complex.
8. the method as claimed in claim 1 based on Artificial Fish Swarm Optimization Algorithm identification key protein matter, it is characterised in that:Step
Suddenly δ=Cn-Tn in (7).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710912037.5A CN107784196B (en) | 2017-09-29 | 2017-09-29 | Method for identifying key protein based on artificial fish school optimization algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710912037.5A CN107784196B (en) | 2017-09-29 | 2017-09-29 | Method for identifying key protein based on artificial fish school optimization algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107784196A true CN107784196A (en) | 2018-03-09 |
CN107784196B CN107784196B (en) | 2021-07-09 |
Family
ID=61433970
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710912037.5A Active CN107784196B (en) | 2017-09-29 | 2017-09-29 | Method for identifying key protein based on artificial fish school optimization algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107784196B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108629159A (en) * | 2018-05-14 | 2018-10-09 | 辽宁大学 | A method of for finding the pathogenic key protein matter of alzheimer's disease |
CN109509509A (en) * | 2018-09-29 | 2019-03-22 | 江西理工大学 | Protein complex method for digging based on dynamic weighting protein-protein interaction network |
CN110895672A (en) * | 2018-12-29 | 2020-03-20 | 研祥智能科技股份有限公司 | Face recognition method based on artificial fish swarm algorithm |
CN111312330A (en) * | 2020-02-13 | 2020-06-19 | 兰州理工大学 | Key protein identification method and system based on protein node characteristics |
CN112259157A (en) * | 2020-10-28 | 2021-01-22 | 杭州师范大学 | Protein interaction prediction method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102945333A (en) * | 2012-12-04 | 2013-02-27 | 中南大学 | Key protein predicating method based on priori knowledge and network topology characteristics |
WO2015054266A1 (en) * | 2013-10-08 | 2015-04-16 | The Regents Of The University Of California | Predictive optimization of network system response |
CN105279397A (en) * | 2015-10-26 | 2016-01-27 | 华东交通大学 | Method for identifying key proteins in protein-protein interaction network |
CN107169983A (en) * | 2017-04-13 | 2017-09-15 | 西安电子科技大学 | Multi-threshold image segmentation method based on cross and variation artificial fish-swarm algorithm |
-
2017
- 2017-09-29 CN CN201710912037.5A patent/CN107784196B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102945333A (en) * | 2012-12-04 | 2013-02-27 | 中南大学 | Key protein predicating method based on priori knowledge and network topology characteristics |
WO2015054266A1 (en) * | 2013-10-08 | 2015-04-16 | The Regents Of The University Of California | Predictive optimization of network system response |
CN105279397A (en) * | 2015-10-26 | 2016-01-27 | 华东交通大学 | Method for identifying key proteins in protein-protein interaction network |
CN107169983A (en) * | 2017-04-13 | 2017-09-15 | 西安电子科技大学 | Multi-threshold image segmentation method based on cross and variation artificial fish-swarm algorithm |
Non-Patent Citations (5)
Title |
---|
WOOCHANG HWANG等: "A novel functional module detection algorithm for protein-protein interaction networks", 《ALGORITHMS FOR MOLECULAR BIOLOGY》 * |
吴爽 等: "融合人工鱼群机理的PPI网络聚类模型与算法", 《计算机科学》 * |
吴爽: "基于群智能机理的PPI网络功能模块聚类", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》 * |
尤梦丽: "群智能优化算法及其在PPI网络中的应用及评价研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
陈新: "基于图的蛋白质相互作用网络比对方法", 《中国优秀硕士学位论文全文数据库 基础科学辑》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108629159A (en) * | 2018-05-14 | 2018-10-09 | 辽宁大学 | A method of for finding the pathogenic key protein matter of alzheimer's disease |
CN109509509A (en) * | 2018-09-29 | 2019-03-22 | 江西理工大学 | Protein complex method for digging based on dynamic weighting protein-protein interaction network |
CN109509509B (en) * | 2018-09-29 | 2020-12-22 | 江西理工大学 | Protein compound mining method based on dynamic weighted protein interaction network |
CN110895672A (en) * | 2018-12-29 | 2020-03-20 | 研祥智能科技股份有限公司 | Face recognition method based on artificial fish swarm algorithm |
CN110895672B (en) * | 2018-12-29 | 2022-05-17 | 研祥智能科技股份有限公司 | Face recognition method based on artificial fish swarm algorithm |
CN111312330A (en) * | 2020-02-13 | 2020-06-19 | 兰州理工大学 | Key protein identification method and system based on protein node characteristics |
CN112259157A (en) * | 2020-10-28 | 2021-01-22 | 杭州师范大学 | Protein interaction prediction method |
CN112259157B (en) * | 2020-10-28 | 2023-10-03 | 杭州师范大学 | Protein interaction prediction method |
Also Published As
Publication number | Publication date |
---|---|
CN107784196B (en) | 2021-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107784196A (en) | Method based on Artificial Fish Swarm Optimization Algorithm identification key protein matter | |
Tan | Cascade ARTMAP: Integrating neural computation and symbolic knowledge processing | |
CN104156634B (en) | key protein identification method based on subcellular localization specificity | |
CN106874708B (en) | Using the method for the artificial bee colony optimization algorithm identification key protein matter for the mechanism of looking for food | |
CN105279397B (en) | A kind of method of key protein matter in identification of protein interactive network | |
CN109637579B (en) | Tensor random walk-based key protein identification method | |
CN108319812A (en) | A method of key protein matter is identified based on cuckoo searching algorithm | |
CN107885971A (en) | Using the method for improving flower pollination algorithm identification key protein matter | |
Ceccarelli | Behavioral mimicry in Myrmarachne species (Araneae, Salticidae) from North Queensland, Australia | |
CN109727637A (en) | Method based on shuffled frog leaping algorithm identification key protein matter | |
Brophy et al. | Otolith shape variation provides a marker of stock origin for north Atlantic bluefin tuna (Thunnus thynnus) | |
CN109816087B (en) | Strong convection weather discrimination method for rough set attribute reduction based on artificial fish swarm and frog swarm hybrid algorithm | |
CN109686403A (en) | Based on key protein matter recognition methods in uncertain protein-protein interaction network | |
CN108229643A (en) | A kind of method using drosophila optimization algorithm identification key protein matter | |
Lein et al. | Studying the evolution of social behaviour in one of Darwin’s Dreamponds: a case for the Lamprologine shell-dwelling cichlids | |
CN108804871A (en) | Key protein matter recognition methods based on maximum neighbours' subnet | |
Liu et al. | Simple primitives with feasibility-and contextuality-dependence for open-world compositional zero-shot learning | |
Xu et al. | Prdp: Person reidentification with dirty and poor data | |
Bertrand et al. | Reconstruction of ancestral genome subject to whole genome duplication, speciation, rearrangement and loss | |
Aslan | An Artificial Bee Colony-Guided Approach for Electro-Encephalography Signal Decomposition-Based Big Data Optimization | |
Cardoso et al. | Snake Species Identification Using Deep Convolutional Neural Networks | |
Carmona et al. | Mapping extinction risk in the global functional spectra across the tree of life | |
CN113254458A (en) | Intelligent diagnosis method for aquatic disease | |
Yu et al. | Knowledge-aware global reasoning for situation recognition | |
CN110400599A (en) | Method based on dove colony optimization algorithm identification key protein matter |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |