CN106874708A - The method that key protein matter is recognized using the artificial bee colony optimized algorithm of the mechanism of looking for food - Google Patents

The method that key protein matter is recognized using the artificial bee colony optimized algorithm of the mechanism of looking for food Download PDF

Info

Publication number
CN106874708A
CN106874708A CN201710050587.0A CN201710050587A CN106874708A CN 106874708 A CN106874708 A CN 106874708A CN 201710050587 A CN201710050587 A CN 201710050587A CN 106874708 A CN106874708 A CN 106874708A
Authority
CN
China
Prior art keywords
protein
node
nectar source
formula
honeybee
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710050587.0A
Other languages
Chinese (zh)
Other versions
CN106874708B (en
Inventor
雷秀娟
丁玉连
陆铖
代才
程适
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Normal University
Original Assignee
Shaanxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi Normal University filed Critical Shaanxi Normal University
Priority to CN201710050587.0A priority Critical patent/CN106874708B/en
Publication of CN106874708A publication Critical patent/CN106874708A/en
Application granted granted Critical
Publication of CN106874708B publication Critical patent/CN106874708B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Analytical Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The invention discloses a kind of method that artificial bee colony optimized algorithm using mechanism of looking for food recognizes key protein matter, protein-protein interaction network is converted into non-directed graph, obtain the corresponding nbccs gene expression value of protein, to protein-protein interaction network side and node pretreatment, build dynamic protein-protein interaction network, choose known key protein matter as nectar source, gathering honey honeybee search nectar source neighborhood, follow honeybee to search for gathering honey honeybee neighborhood, update nectar source, the new nectar source of investigation honeybee global search, update nectar source, produce key protein matter.The inventive method can exactly recognize key protein matter;The simulation experiment result shows that the index performance such as susceptibility, specificity, positive predictive value, negative predictive value is more excellent;Compared with other key protein recognition methods, the identification process of key protein matter is realized with reference to the optimization characteristics of artificial bee colony and the feature of protein-protein interaction network, improve the recognition accuracy of key protein matter.

Description

The method that key protein matter is recognized using the artificial bee colony optimized algorithm of the mechanism of looking for food
【Technical field】
The invention belongs to biological information field, it is related to a kind of knowledge of key protein matter in dynamic protein-protein interaction network Other method, and in particular to the method that key protein matter is recognized using the artificial bee colony optimized algorithm of mechanism of looking for food.
【Background technology】
Key protein matter is organism existence and protein necessary to breeding, and the missing of key protein matter can cause relevant Protein complex function is lost, and causes the organism cannot to survive.Because key protein matter plays the part of important in vital movement Role, therefore prediction for key protein matter turns into an important research job with identification.Biologically, key protein matter Identification be mainly and rely on BIOLOGICAL TEST METHODS, such as single-gene is chosen with conditional gene knockout etc..Skill is tested by these Although the result that art is obtained is clear and definite and effective, but cost is high, and efficiency is low, on probation to be limited in scope.Therefore, it is biological using calculating Method come predict key protein matter turn into a new developing direction.
At present, realize that the identification of key protein matter is based primarily upon two kinds of measures, topological centrality method by computational methods With foreign peoples source fusion method.
" centrality-lethal " rule for proposing for 2001 points out the key and protein-protein interaction network of protein Topological structure be closely related, be embodied in the protein for possessing more neighborhood of nodes missing be easier to influence whole network Topological structure, and then produce lethal effect.That is, protein network moderate protein node higher more tends to Performance is key.The theory becomes the basis of the key protein matter identification based on network topology structure.Hereafter, many research people Member is proposed based on topological central key protein matter recognition methods, including degree centrality (Degree Centrality, DC), betweenness center (Betweenness Centrality, BC), tight ness rating centrality (Closeness Centrality, CC), eigenvector centrality (Eigenvector Centrality, EC), information centre's property (Information Centrality, IC), subgraph centrality (Subgraph Centrality, SC).By calculating protein The size of all proteins node certain central value in a network is come to judge it be key protein matter in interactive network Possibility.The accuracy of these centrality method height dependent protein matter interactive networks.But protein interaction net Network is obtained by high flux Bioexperiment, contains many false positives, greatly have impact on the accurate of key protein matter identification Rate.
The shortcoming of key protein matter is recognized for centrality topological characteristic, researcher proposes that some new recognition methods are entered One step improves the recognition accuracy of key protein matter.As PeC key protein matter recognition methods by protein-protein interaction network with Gene expression profile is integrated, homogeneous character and protein phase of the ION key protein matter recognition methods mainly in combination with protein Interaction network.The key protein matter recognition methods of the convergence factor based on side.By considering protein in itself and its surrounding is adjacent The aggregation situation in residence carrys out identification of protein.Additionally, also some carry out the side of key protein matter identification by merging other information Method, such as the key protein matter recognition methods based on domain, the key protein matter recognition methods based on gene co-expressing etc..
In recent years, the significant modular nature of bio-networks is pointed out with the presence of research, shows as depositing in protein network In substantial amounts of protein complex functional module.Hart et al. propositions are key to be an attribute of protein complex, and leads to Cross during experimental data shows that key protein matter often largely concentrates on some compounds.Subsequent Zotenko et al. proposes pass The concept of key compound module, and point out the protein network function of the height UNICOM with identical function or close biological function There are a large amount of key protein matter in module.Therefore many researchers propose the crucial egg based on protein complex and functional module White matter recognition methods.
Although the identification problem of key protein matter increasingly causes the concern of people, the identification of current jointed with network information The accuracy rate of method is still relatively low, and most methods are all isolated or by piecemeal use a small number of parameters or signature analysis pass Key protein, the assurance from the overall and overall situation is lacked for node.In addition, current key protein recognition methods is mostly based on Static protein-protein interaction network identification, and the activity of protein is with the life cycle of organism in organism Change, thus build one more can the protein-protein interaction network of the true dynamic life of mimic biology body can help into one Step lifting key protein matter recognition accuracy.
The defect of summary key protein matter recognition methods, mainly there is the dynamic for not considering protein-protein interaction network Property, only consider local feature and ignore the false positive of overall importance and protein-protein interaction network data of network, it is crucial Protein identification accuracy rate is low.
【The content of the invention】
Shortcoming and deficiency it is an object of the invention to overcome prior art, there is provided a kind of people worker bee using mechanism of looking for food The method that colony optimization algorithm recognizes key protein matter, can truly simulated albumin matter interactive network dynamic, crucial egg White matter recognition accuracy is high.
To reach above-mentioned purpose, the present invention is adopted the following technical scheme that:
The method for recognizing key protein matter using the artificial bee colony optimized algorithm of the mechanism of looking for food, comprises the following steps:
(1) protein-protein interaction network is converted into non-directed graph
Protein-protein interaction network is changed into a non-directed graph G=(V, E), wherein, V={ vi, i=1,2 ..., n } It is node viSet, E for side e set, node viProtein is represented, side e represents the interaction between protein;
(2) to the pretreatment of protein-protein interaction network side and node
To node viPretreatment:Node v is calculated by formula (1)iBetweenness center:
In formula ρ (s, v, t) represent protein-protein interaction network between node s and node t by the shortest path of node v The bar number in footpath, the bar number of the shortest path in ρ (s, t) expression protein-protein interaction networks between node s and node t;
The convergence factor on side is calculated by formula (2):
In formula, Z (vi,vj) represent comprising side (vi,vj) triangle number, di,djIt is respectively point vi,vjDegree;
The Pearson correlation coefficients on side are calculated by formula (3):
In formula, xi,yiRepresent protein vx,vyGene expression values in time point t, μ (x), μ (y) are protein vx,vy Average gene expression value, T for time point maximum;
(3) dynamic protein-protein interaction network is built
In time point t, protein viGene expression values GEitIf being more than gene expression threshold value A T (i), it is considered as egg White matter viIt is active in time point t;Otherwise it is assumed that the node does not have activity in time point t;By the activity at all time points Protein combination together, corresponds to one new protein interaction net of formation in former static state protein-protein interaction network Network, i.e., dynamic protein network;
GEitIt is protein viGene expression values at time point t;
Gene expression threshold value A T (i) is obtained by formula (4):
AT (the i)=σ of μ (i)+3 (i) (1-F (i)) formula (4)
μ (i) is protein vi average gene expression values in formula, and σ (i) is the standard deviation of gene expression values, F (i)=1/ (1+ σ2(i)) it is weight function;
(4) known key protein matter is chosen as nectar source
It is the quantity of known key protein matter included in nectar source to make N, is selected at random in the key protein matter being currently known Take nectar source of N number of key protein matter as priori;Ep_set represents the set of the protein that nectar source includes;Iter, Maxiter represents current iteration number of times and maximum iteration, iter=1, matxiter ∈ [100,800] respectively;
(5) gathering honey honeybee search nectar source neighborhood
The neighborhood in nectar source is the protein node set niber_set1 for having interaction with nectar source protein, and each is adjacent Domain node regards a gathering honey honeybee as;According to score1 (i)=relevant (vi,, Ep_set) and determine gathering honey honeybee present position Nectar source income degree and the neighborhood node turn into new nectar source possibility, in formula score1 (i) for gathering honey honeybee current location honey Source income degree, viIt is the protein node representated by gathering honey honeybee, relevant represents protein node viGather with current nectar source The degree of association between Ep_set;
(6) honeybee is followed to search for gathering honey honeybee neighborhood
If gathering honey honeybee viNeighborhood be with the protein representated by gathering honey honeybee have interaction and not current nectar source gather Protein node set in Ep_set is niber_set2;Honeybee is followed to receive the information of gathering honey honeybee and to the neighborhood of gathering honey honeybee Scan for, that is, follow honeybee according to formula score2 (i)=fitness (vi,, niber_set2, Ep_set) and determine current location Possibility as new nectar source, v in formulaiIt is the protein node representated by gathering honey honeybee, niber_set2 represents the neighbour of gathering honey honeybee Domain protein node, fitness represents that current location turns into the fitness in nectar source;
(7) nectar source is updated
Descending sort is carried out according to its score2 score to the node in protein node set niber_set1, will The value highest node of score2 is set to optimal nectar source position g_best, using node score2 second high as suboptimum candidate honey Source s_best;If score2 (g_best)-score2 (s_best)>Threshold value thd, then be incorporated into collection using g_best as new nectar source In conjunction Ep_set, and turn to step (5);Otherwise turn to step (8);Iter iteration adds 1;
(8) the new nectar source of investigation honeybee global search
The other oroteins in addition to nectar source during honeybee is investigated to protein-protein interaction network carry out betweenness center calculating; Then the value BC according to betweenness center carries out descending sort to all nodes, selects the maximum node conduct of betweenness center value Optimal nectar source position g_best;
(9) nectar source is updated
Optimal nectar source position g_best is incorporated into set Ep_set as new nectar source;
(10) key protein matter is produced
If the value of iter is less than or equal to maxiter, step (5) is turned to;Otherwise, using the protein in set Ep_set as Key protein matter is exported.
Further, protein node v in step (5)iWith the degree of association relevant between current nectar source set Ep_set Obtained by formula (5):
V in formulajIt is the protein node inside nectar source set EP_set, ECC is node viWith node vjBetween side it is poly- Collection coefficient is obtained by formula (2), and PCC is node viWith node vjBetween Pearson's convergence factor on side obtained by formula (3).
Further, fitness fitness of the current location as nectar source is obtained by formula (6) in step (6):
In formula, niber_set2 represents gathering honey honeybee viNeighborhood protein node set, Ep_set represents current nectar source collection Close.
The present invention compared with the existing methods, with advantages below:
1st, the present invention is based on the known key protein matter priori in part, by gathering honey honeybee and follows honeybee to close current nectar source The neighbor node of key protein and the neighbor node of neighbor node scan for completing the local prediction of key protein matter, this The local nodes characteristic that secondary search considers not only nectar source is planted, has also further contemplated the neighbours' of the neighbor node in nectar source Local characteristicses, can preferably embody key protein matter and exist than current one-level Local Search protein complex recognizing method Characteristic in protein-protein interaction network.
2nd, investigation honeybee is used when gathering honey honeybee is with following honeybee to be explored less than optimal solution key protein matter in part in the present invention The overall situation is scanned for determine optimal solution, so key protein matter is considered not only during key protein matter is predicted Local characteristicses, have also considered key protein matter global property in a network, solve current key protein prediction Network shortcoming of overall importance can not totally be considered.
3rd, present invention simulation artificial bee colony looks for food process to recognize key protein matter, has considered protein phase interaction With the topological property and dynamic of network, the gene expression values of protein, priori, and add the optimization of looking for food of artificial bee colony Mechanism, multi-party region feature is used so that the degree of accuracy of key protein matter identified using the present invention than using it at present His degree of accuracy of key protein matter recognition methods identification is high.
4th, the key protein matter in protein-protein interaction network can be efficiently identified using result of the invention, to grind Study carefully personnel and inquire into mechanism, disease treatment, prevention from suffering from the diseases and the new drug development of major disease there is provided theoretical foundation, and me can be helped Understand the primary demand that life entity sustains life required for activity.The key protein matter of present invention identification can help study people For the fields such as biology and medicine and pharmacology provide important information from protein group and gene assembly level, its research is not only facilitated member Understand the growth regulating process of cell, and the discovery for genopathy and the design important in inhibiting of drug targets.
【Brief description of the drawings】
Fig. 1 is the process chart of the embodiment of the present invention 1
Fig. 2 is that part of the key protein matter drawn using embodiment 1 in whole protein-protein interaction network is illustrated Figure
Fig. 3 is key protein matter situation in the corresponding java standard libraries of Fig. 2
【Specific embodiment】
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.Base Embodiment in the present invention, those of ordinary skill in the art obtained under the premise of creative work is not made it is all its His embodiment, belongs to the scope of protection of the invention.
As shown in figure 1, the method that the present invention recognizes key protein matter using the artificial bee colony optimized algorithm of mechanism of looking for food, bag Include following steps:
(1) protein-protein interaction network is converted into non-directed graph
Protein-protein interaction network is changed into a non-directed graph G=(V, E), wherein, V={ vi, i=1,2 ..., n } It is node viSet, E for side e set, node viProtein is represented, side e represents the interaction between protein;
(2) to the pretreatment of protein-protein interaction network side and node
To node viPretreatment:Node v is calculated by formula (1)iBetweenness center:
In formula ρ (s, v, t) represent protein-protein interaction network between node s and node t by the shortest path of node v The bar number in footpath, the bar number of the shortest path in ρ (s, t) expression protein-protein interaction networks between node s and node t;
The convergence factor on side is calculated by formula (2):
In formula, Z (vi,vj) represent comprising side (vi,vj) triangle number, di,djIt is respectively point vi,vjDegree;
The Pearson correlation coefficients on side are calculated by formula (3):
In formula, xi,yiRepresent protein vx,vyGene expression values in time point t, μ (x), μ (y) are protein vx,vy Average gene expression value, T for time point maximum;
(3) dynamic protein-protein interaction network is built
In time point t, protein viGene expression values GEitIf being more than gene expression threshold value A T (i), it is considered as egg White matter viIt is active in time point t;Otherwise it is assumed that the node does not have activity in time point t;By the activity at all time points Protein combination together, corresponds to one new protein interaction net of formation in former static state protein-protein interaction network Network, i.e., dynamic protein network;
GEitIt is protein viGene expression values at time point t;
Gene expression threshold value A T (i) is obtained by formula (4):
AT (the i)=σ of μ (i)+3 (i) (1-F (i)) formula (4)
μ (i) is protein vi average gene expression values in formula, and σ (i) is the standard deviation of gene expression values, F (i)=1/ (1+ σ2(i)) it is weight function;
(4) known key protein matter is chosen as nectar source
It is the quantity of known key protein matter included in nectar source to make N, is selected at random in the key protein matter being currently known Take nectar source of N number of key protein matter as priori;Ep_set represents the set of the protein that nectar source includes;Iter, Maxiter represents current iteration number of times and maximum iteration, iter=1, matxiter ∈ [100,800] respectively;
(5) gathering honey honeybee search nectar source neighborhood
The neighborhood in nectar source is the protein node set niber_set1 for having interaction with nectar source protein, and each is adjacent Domain node regards a gathering honey honeybee as;According to score1 (i)=relevant (vi,, Ep_set) and determine gathering honey honeybee present position Nectar source income degree and the neighborhood node turn into new nectar source possibility, in formula score1 (i) for gathering honey honeybee current location honey Source income degree, viIt is the protein node representated by gathering honey honeybee, relevant represents protein node viGather with current nectar source The degree of association between Ep_set;
(6) honeybee is followed to search for gathering honey honeybee neighborhood
If gathering honey honeybee viNeighborhood be with the protein representated by gathering honey honeybee have interaction and not current nectar source gather Protein node set in Ep_set is niber_set2;Honeybee is followed to receive the information of gathering honey honeybee and to the neighborhood of gathering honey honeybee Scan for, that is, follow honeybee according to formula score2 (i)=fitness (vi,, niber_set2, Ep_set) and determine current location Possibility as new nectar source, v in formulaiIt is the protein node representated by gathering honey honeybee, niber_set2 represents the neighbour of gathering honey honeybee Domain protein node, fitness represents that current location turns into the fitness in nectar source;
(7) nectar source is updated
Descending sort is carried out according to its score2 score to the node in protein node set niber_set1, will The value highest node of score2 is set to optimal nectar source position g_best, using node score2 second high as suboptimum candidate honey Source s_best;If score2 (g_best)-score2 (s_best)>Threshold value thd, then be incorporated into collection using g_best as new nectar source In conjunction Ep_set, and turn to step (5);Otherwise turn to step (8);Iter iteration adds 1;
(8) the new nectar source of investigation honeybee global search
The other oroteins in addition to nectar source during honeybee is investigated to protein-protein interaction network carry out betweenness center calculating; Then the value BC according to betweenness center carries out descending sort to all nodes, selects the maximum node conduct of betweenness center value Optimal nectar source position g_best;
(9) nectar source is updated
Optimal nectar source position g_best is incorporated into set Ep_set as new nectar source;
(10) key protein matter is produced
If the value of iter is less than or equal to maxiter, step (5) is turned to;Otherwise, using the protein in set Ep_set as Key protein matter is exported.
Protein node v in step (5) of the inventioniWith the degree of association relevant between current nectar source set Ep_set Obtained by formula (5):
V in formulajIt is the protein node inside nectar source set EP_set, ECC is node viWith node vjBetween side it is poly- Collection coefficient is obtained by formula (2), and PCC is node viWith node vjBetween Pearson's convergence factor on side obtained by formula (3);
Fitness fitness of the current location as nectar source is obtained by formula (6) in step (8) of the invention:
In formula, niber_set2 represents gathering honey honeybee viNeighborhood protein node, Ep_set represents current nectar source.
Below by way of specific embodiment, the present invention is described in more detail:
Embodiment 1
A kind of artificial bee colony optimized algorithm using mechanism of looking for food recognizes the side of key protein matter by taking protein network as an example The step of method, is as follows:
The present embodiment is to pick up from the yeast data set (DIP 20140427 editions) of DIP databases as emulation data set, DIP Data contain 4995 protein and 21554 interaction relationships.The ferment that gene expression dataset is picked up from GEO databases Female metabolism expression data set GSE3431, including 6777 genes, 3 cycles totally 36 genic values at time point cover The protein of 95% covered in DIP.Key protein prime number is according in by integrating tetra- databases of MIPS, SGD, DEG and SGDP Data obtain, 1167 key protein matter are contained altogether.Experiment porch is the operating systems of Windows 7, Intel Duos 2 pairs Core 3.1GHz processors, 4GB physical memories realize the method for the present invention with Matlab R2010b softwares.
1st, protein-protein interaction network is converted into non-directed graph
Protein-protein interaction network comprising 4995 protein and 21554 interaction relationships is changed into one Non-directed graph G=(V, E), wherein, V={ vi, i=1,2 ..., 4995 } it is node viSet, E is 21554 set of side e, Node viProtein is represented, side e represents the interaction between protein.
2nd, to the pretreatment of protein-protein interaction network side and node
To node viPretreatment:I=1,2 ..., 4995, an i for determination is often given, in can calculating the betweenness of node i Disposition, node v is calculated by formula (1)iBetweenness center:
In formula ρ (s, v, t) represent protein-protein interaction network between node s and node t by the shortest path of node v The bar number in footpath, the bar number of the shortest path in ρ (s, t) expression protein-protein interaction networks between node s and node t;By formula (2) convergence factor on side is calculated:
In formula, Z (vi,vj) represent comprising side (vi,vj) triangle number, di,djIt is respectively point vi,vjDegree;By formula (3) Pearson correlation coefficients on side are calculated:
In formula, xi,yiRepresent protein vx,vyGene expression values in time point t, μ (x), μ (y) are protein vx,vy Average gene expression value, T for time point maximum.
3rd, dynamic protein-protein interaction network is built
In time point t, protein viGene expression values GEitIf being more than gene expression threshold value A T (i), it is considered as egg White matter viIt is active in time point t;Otherwise it is assumed that the node does not have activity in time point t;Gene expression threshold value A T (i) by Formula (4) is obtained:
AT (the i)=σ of μ (i)+3 (i) (1-F (i)) formula (4)
μ (i) is protein vi gene expression values in formula, and σ (i) is the standard deviation of gene expression values, F (i)=1/ (1+ σ2 (i)) it is weight function.Whether by above-mentioned treatment, it is activity that can obtain each protein node at each time point.Will be all The reactive protein at time point is combined, and is corresponded in former static state protein-protein interaction network, is deleted at any one Time point there all is not the protein node of activity and connected side, formed one it is new with 3172 protein nodes With 10234 protein-protein interaction networks on side, i.e., dynamic protein network.
4th, known key protein matter is chosen as nectar source
It is the quantity of known key protein matter included in nectar source to make N, in 1167 key protein matter being currently known Randomly select nectar source of the N=100 key protein matter as priori;Ep_set represents the collection of the protein that nectar source includes Close, i.e., the 100 protein nodes chosen from known 1167 key proteins matter node at random;Iter, maxiter distinguish Represent current iteration number of times and maximum iteration, iter=1, matxiter ∈ [100,1200].
5th, gathering honey honeybee search nectar source neighborhood
The neighborhood in nectar source is the protein node set niber_set1 for having interaction with nectar source protein, and each is adjacent Domain node regards a gathering honey honeybee as;According to score1 (i)=relevant (vi,, Ep_set) and determine gathering honey honeybee present position Nectar source income degree and the neighborhood node turn into new nectar source possibility, in formula score1 (i) for gathering honey honeybee current location honey Source income degree, viIt is the protein node representated by gathering honey honeybee, relevant represents protein node viGather with current nectar source The degree of association between Ep_set, the degree of association is obtained by formula (5):
Vj is the protein node inside nectar source set EP_set in formula, and ECC is the side between node vi and node vj Convergence factor is obtained by formula (2), and PCC is that the Pearson correlation coefficients on the side between node vi and node vj are obtained by formula (3) Arrive.
6th, honeybee is followed to search for gathering honey honeybee neighborhood
If gathering honey honeybee viNeighborhood be with the protein representated by gathering honey honeybee have interaction and not in current nectar source Protein node set in set Ep_set is niber_set2;Honeybee is followed to receive the information of gathering honey honeybee and to gathering honey honeybee Neighborhood scan for, that is, follow honeybee according to formula score2 (i)=fitness (vi,, niber_set2, Ep_set) really Settled front position turns into the possibility in new nectar source, v in formulaiIt is the protein node representated by gathering honey honeybee, niber_set2 is represented The neighborhood protein node of gathering honey honeybee, fitness represents that current location turns into the fitness in nectar source, is obtained by formula (6):
In formula, niber_set2 represents gathering honey honeybee viNeighborhood protein node, Ep_set represents current nectar source.
7th, nectar source is updated
Descending sort is carried out according to its score2 score to the node in set niber_set1, by the value highest of score2 Node be set to optimal nectar source position g_best, using node score2 second high as suboptimum candidate nectar source s_best;If score2(g_best)-score2(s_best)>Threshold value thd, then be incorporated into g_best as new nectar source in set Ep_set, And turn to step (5);Otherwise turn to step (8);Iter iteration adds 1.
8th, the new nectar source of investigation honeybee global search
The other oroteins in addition to nectar source during honeybee is investigated to protein-protein interaction network carry out betweenness center calculating; Then a descending sort is carried out to all nodes according to the value BC of betweenness center obtained by formula (1), in selecting betweenness The maximum node of disposition value is used as optimal nectar source position g_best;
9th, nectar source is updated
It is incorporated into g_best as new nectar source in set Ep_set;
10th, key protein matter is produced
If the value of iter is less than or equal to maxiter, step (5) is turned to;Otherwise, using the protein in set Ep_set as Key protein matter is exported.
In order to verify beneficial effects of the present invention, inventor is recognized using the artificial bee colony optimized algorithm of the embodiment of the present invention 1 The method of key protein matter carries out the identification of key protein matter to the protein network in DIP databases, to the crucial egg for recognizing The preceding 600 key protein matter of white matter is analyzed, and the results are shown in Table 1 Fig. 2 Fig. 3, and table 1 shows and the current crucial egg of other identifications The result that the method for white matter is identified contrast the comparing of accuracy rate.Show that the part of present invention identification is closed in fig. 2 Key protein distribution situation in a network, Fig. 3 shows the corresponding java standard library part of Fig. 2.
Comparing of the key protein matter that the present invention of table 1 is recognized with other method in accuracy rate
Table 2 shows the preceding 600 key protein matter in the result that the present invention is identified and key protein matter in java standard library The accuracy rate made comparisons, and the comparing with current other identification key protein matter method recognition results.With 6 traditional centers Property method show that accurately probability all will than six centrality methods in the preceding 600 key protein matter of the present invention identification when comparing It is good, when compared with currently newer LAC and NC methods, the standard of the preceding 400 key protein matter of the result that the present invention is identified The result accuracy rate that really rate will be much than current new method is high.Found out by table 2, the present invention can efficiently identify key protein Matter, particularly in the forward part of the result for recognizing, there is accuracy rate very high.Close the part that the present invention that Fig. 2 shows is identified Position of the key protein in protein-protein interaction network.In Fig. 2 without background color be the present invention correctly identify Key protein matter, with dark-background is non-key protein matter, with light color is wrong identification key protein matter out.Figure 3 is the key protein matter situation in the corresponding java standard libraries of Fig. 2.By the contrast of Fig. 2 and Fig. 3 it can be found that the present invention is identified Wrong protein have " YGL163W " " YLR191W ", leak identification key protein matter have " YBR103W ".If with core It is priori key protein matter, then the inventive method can correctly identify the most of key protein around the priori Matter.
The above is the preferred embodiment of the present invention, by described above content, the related work of the art Personnel can carry out various improvement and replacement on the premise of without departing from the technology of the present invention principle, and these improve and replace Should be regarded as protection scope of the present invention.

Claims (3)

1. the method for key protein matter being recognized using the artificial bee colony optimized algorithm of mechanism of looking for food, it is characterised in that including following step Suddenly:
(1) protein-protein interaction network is converted into non-directed graph
Protein-protein interaction network is changed into a non-directed graph G=(V, E), wherein, V={ vi, i=1,2 ..., n } it is knot Point viSet, E for side e set, node viProtein is represented, side e represents the interaction between protein;
(2) to the pretreatment of protein-protein interaction network side and node
To node viPretreatment:Node v is calculated by formula (1)iBetweenness center:
ρ (s, v, t) is represented in protein-protein interaction network between node s and node t by the shortest path of node v in formula Bar number, the bar number of the shortest path in ρ (s, t) expression protein-protein interaction networks between node s and node t;
The convergence factor on side is calculated by formula (2):
In formula, Z (vi,vj) represent comprising side (vi,vj) triangle number, di,djIt is respectively point vi,vjDegree;
The Pearson correlation coefficients on side are calculated by formula (3):
In formula, xi,yiRepresent protein vx,vyGene expression values in time point t, μ (x), μ (y) are protein vx,vyIt is flat Equal gene expression values, T is the maximum at time point;
(3) dynamic protein-protein interaction network is built
In time point t, protein viGene expression values GEitIf being more than gene expression threshold value A T (i), it is considered as protein viIt is active in time point t;Otherwise it is assumed that the node does not have activity in time point t;By the activated protein at all time points Matter is combined, and corresponds to one new protein-protein interaction network of formation in former static state protein-protein interaction network, I.e. dynamic protein network;
GEitIt is protein viGene expression values at time point t;
Gene expression threshold value A T (i) is obtained by formula (4):
AT (the i)=σ of μ (i)+3 (i) (1-F (i)) formula (4)
μ (i) is protein vi average gene expression values in formula, and σ (i) is the standard deviation of gene expression values, F (i)=1/ (1+ σ2 (i)) it is weight function;
(4) known key protein matter is chosen as nectar source
It is the quantity of known key protein matter included in nectar source to make N, is randomly selected in the key protein matter being currently known N number of Key protein matter as priori nectar source;Ep_set represents the set of the protein that nectar source includes;Iter, maxiter point Biao Shi not current iteration number of times and maximum iteration, iter=1, matxiter ∈ [100,800];
(5) gathering honey honeybee search nectar source neighborhood
The neighborhood in nectar source is the protein node set niber_set1 for having interaction with nectar source protein, each neighborhood knot From the point of view of make a gathering honey honeybee;According to score1 (i)=relevant (vi,, Ep_set) and determine the honey of gathering honey honeybee present position Source income degree and the neighborhood node turn into the possibility in new nectar source, and score1 (i) is the nectar source receipts of gathering honey honeybee current location in formula Beneficial degree, viIt is the protein node representated by gathering honey honeybee, relevant represents protein node viWith current nectar source set Ep_set Between the degree of association;
(6) honeybee is followed to search for gathering honey honeybee neighborhood
If gathering honey honeybee viNeighborhood be with the protein representated by gathering honey honeybee have interaction and not in current nectar source set Ep_set Interior protein node set is niber_set2;The information and the neighborhood to gathering honey honeybee for following honeybee to receive gathering honey honeybee are searched Rope, that is, follow honeybee according to formula score2 (i)=fitness (vi,, niber_set2, Ep_set) and determine that current location turns into new The possibility in nectar source, v in formulaiIt is the protein node representated by gathering honey honeybee, niber_set2 represents the neighborhood albumen of gathering honey honeybee Matter node, fitness represents that current location turns into the fitness in nectar source;
(7) nectar source is updated
Descending sort is carried out according to its score2 score to the node in protein node set niber_set1, by score2's Value highest node is set to optimal nectar source position g_best, using node score2 second high as suboptimum candidate nectar source s_ best;If score2 (g_best)-score2 (s_best)>Threshold value thd, then be incorporated into set using g_best as new nectar source In Ep_set, and turn to step (5);Otherwise turn to step (8);Iter iteration adds 1;
(8) the new nectar source of investigation honeybee global search
The other oroteins in addition to nectar source during honeybee is investigated to protein-protein interaction network carry out betweenness center calculating;Then Value BC according to betweenness center carries out descending sort to all nodes, selects the maximum node of betweenness center value as optimal Nectar source position g_best;
(9) nectar source is updated
Optimal nectar source position g_best is incorporated into set Ep_set as new nectar source;
(10) key protein matter is produced
If the value of iter is less than or equal to maxiter, step (5) is turned to;Otherwise, using the protein in set Ep_set as key Protein is exported.
2. the method that the artificial bee colony optimized algorithm using mechanism of looking for food as claimed in claim 1 recognizes key protein matter, its It is characterised by:Protein node v in step (5)iWith the degree of association relevant between current nectar source set Ep_set by formula (5) Obtain:
V in formulajIt is the protein node inside nectar source set EP_set, ECC is node viWith node vjBetween side aggregation system Number is obtained by formula (2), and PCC is node viWith node vjBetween Pearson's convergence factor on side obtained by formula (3).
3. the method that the artificial bee colony optimized algorithm using mechanism of looking for food as claimed in claim 1 recognizes key protein matter, its It is characterised by:Fitness fitness of the current location as nectar source is obtained by formula (6) in step (6):
In formula, niber_set2 represents gathering honey honeybee viNeighborhood protein node set, Ep_set represents the set of current nectar source.
CN201710050587.0A 2017-01-23 2017-01-23 Using the method for the artificial bee colony optimization algorithm identification key protein matter for the mechanism of looking for food Expired - Fee Related CN106874708B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710050587.0A CN106874708B (en) 2017-01-23 2017-01-23 Using the method for the artificial bee colony optimization algorithm identification key protein matter for the mechanism of looking for food

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710050587.0A CN106874708B (en) 2017-01-23 2017-01-23 Using the method for the artificial bee colony optimization algorithm identification key protein matter for the mechanism of looking for food

Publications (2)

Publication Number Publication Date
CN106874708A true CN106874708A (en) 2017-06-20
CN106874708B CN106874708B (en) 2018-06-22

Family

ID=59157870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710050587.0A Expired - Fee Related CN106874708B (en) 2017-01-23 2017-01-23 Using the method for the artificial bee colony optimization algorithm identification key protein matter for the mechanism of looking for food

Country Status (1)

Country Link
CN (1) CN106874708B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609341A (en) * 2017-08-16 2018-01-19 天津师范大学 Based on shortest path from global interactions between protein network extraction sub-network method and system
CN107885971A (en) * 2017-10-30 2018-04-06 陕西师范大学 Using the method for improving flower pollination algorithm identification key protein matter
CN108229643A (en) * 2018-02-05 2018-06-29 陕西师范大学 A kind of method using drosophila optimization algorithm identification key protein matter
CN109376842A (en) * 2018-08-20 2019-02-22 安徽大学 A kind of functional module method for digging based on attribute optimization protein network
CN111312330A (en) * 2020-02-13 2020-06-19 兰州理工大学 Key protein identification method and system based on protein node characteristics

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779241A (en) * 2012-07-06 2012-11-14 陕西师范大学 PPI (Point-Point Interaction) network clustering method based on artificial swarm reproduction mechanism

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779241A (en) * 2012-07-06 2012-11-14 陕西师范大学 PPI (Point-Point Interaction) network clustering method based on artificial swarm reproduction mechanism

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
GUOPU ZHU等: ""Gbest-guided artificial bee colony algorithm for numerical function optimization"", 《APPLIED MATHEMATICS AND COMPUTATION》 *
XIUJUAN LEI等: ""Detecting protein complexes from DPINs by density based clustering with Pigeon-Inspired Optimization Algorithm"", 《SCIENCE CHINA INFORMATION SCIENCES》 *
XIUJUAN LEI等: ""Identification of dynamic protein complexes based on fruit fly optimization algorithm"", 《KNOWLEDGE-BASED SYSTEMS》 *
田建芳等: ""基于蜂群和广度优先遍历的PPT网络聚类"", 《模式识别与人工智能》 *
秦全德等: ""人工蜂群算法研究综述"", 《智能系统学报》 *
雷秀娟等: ""蛋白质相互作用网络的蜂群信息流聚类模型与算法"", 《计算机学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609341A (en) * 2017-08-16 2018-01-19 天津师范大学 Based on shortest path from global interactions between protein network extraction sub-network method and system
CN107885971A (en) * 2017-10-30 2018-04-06 陕西师范大学 Using the method for improving flower pollination algorithm identification key protein matter
CN108229643A (en) * 2018-02-05 2018-06-29 陕西师范大学 A kind of method using drosophila optimization algorithm identification key protein matter
CN108229643B (en) * 2018-02-05 2022-04-29 陕西师范大学 Method for identifying key protein by using drosophila optimization algorithm
CN109376842A (en) * 2018-08-20 2019-02-22 安徽大学 A kind of functional module method for digging based on attribute optimization protein network
CN109376842B (en) * 2018-08-20 2022-04-05 安徽大学 Functional module mining method based on attribute optimization protein network
CN111312330A (en) * 2020-02-13 2020-06-19 兰州理工大学 Key protein identification method and system based on protein node characteristics

Also Published As

Publication number Publication date
CN106874708B (en) 2018-06-22

Similar Documents

Publication Publication Date Title
CN106874708B (en) Using the method for the artificial bee colony optimization algorithm identification key protein matter for the mechanism of looking for food
Michalski et al. Automated construction of classifications: Conceptual clustering versus numerical taxonomy
CN105279397B (en) A kind of method of key protein matter in identification of protein interactive network
Bara’a et al. A review of heuristics and metaheuristics for community detection in complex networks: Current usage, emerging development and future directions
CN108319812B (en) Method for identifying key protein based on cuckoo search algorithm
CN104156634B (en) key protein identification method based on subcellular localization specificity
CN109801674B (en) Key protein identification method based on heterogeneous biological network fusion
CN107784196A (en) Method based on Artificial Fish Swarm Optimization Algorithm identification key protein matter
CN106021990B (en) A method of biological gene is subjected to classification and Urine scent with specific character
CN109727637B (en) Method for identifying key proteins based on mixed frog-leaping algorithm
CN111145830A (en) Protein function prediction method based on network propagation
CN107885971B (en) Method for identifying key protein by adopting improved flower pollination algorithm
CN109614495A (en) A kind of associated companies method for digging of combination knowledge mapping and text information
CN109086356A (en) The incorrect link relationship diagnosis of extensive knowledge mapping and modification method
CN109840284A (en) Family's affiliation knowledge mapping construction method and system
CN108229643B (en) Method for identifying key protein by using drosophila optimization algorithm
Lei et al. Detecting protein complexes from DPINs by density based clustering with Pigeon-Inspired Optimization Algorithm
CN109686403A (en) Based on key protein matter recognition methods in uncertain protein-protein interaction network
CN114242168B (en) Method for identifying biological essential protein
CN110533072A (en) Based on the SOAP service similarity calculation and clustering method of Bigraph structure under Web environment
CN109509509A (en) Protein complex method for digging based on dynamic weighting protein-protein interaction network
CN106780501A (en) Based on the image partition method for improving artificial bee colony algorithm
Wei et al. A Bi-Objective Evolutionary Algorithm for Multimodal Multiobjective Optimization
Babu et al. A simplex method-based bacterial colony optimization algorithm for data clustering analysis
Liu et al. Simple primitives with feasibility-and contextuality-dependence for open-world compositional zero-shot learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180622

Termination date: 20210123