CN108009403A

CN108009403A - Protein complex recognizing method based on multisource data fusion and multiple-objection optimization

Info

Publication number: CN108009403A
Application number: CN201711190016.3A
Authority: CN
Inventors: 朱媛; 彭晓宇; 吴崇
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2017-11-24
Filing date: 2017-11-24
Publication date: 2018-05-08

Abstract

Pre-processed the invention discloses the protein complex recognizing method based on multisource data fusion and multiple-objection optimization, including to protein-protein interaction network data, obtain adjacency matrix；Protein complex preliminary clusters, obtain starting protein compound module；Further optimize compound module, the function similar characteristic of the topological structure characteristic of fused protein interactive network data and GO annotation data in optimization process, and combining adaptive multiple target black hole optimization algorithm frame carries out optimizing operation, obtains more accurate protein complex module；Post-processing operation is carried out, obtains final optimal protein complex.The present invention improves the recognition speed and accuracy of identification of protein complex, can be simultaneously extended to suitable for protein-protein interaction network into other complicated community network analyses, have very strong practicality in Complex Networks Analysis.

Description

Protein complex recognizing method based on multisource data fusion and multiple-objection optimization

Technical field

The present invention relates to field of bioinformatics, more particularly to a kind of egg based on multisource data fusion and multiple-objection optimization White matter complex recognizing method.

Background technology

Protein is the product of gene expression, is the executor of organism physiological function, and the direct body of biological phenomena Existing person.Proteomics is that the subject of systematization research is carried out to characteristic contained by protein, can be biosystem in healthy and disease Structure, function and regulation and control under diseased state provide detailed description.Almost all of bioprocess, is all by a series of egg White matter interaction is completed.From the angle of systems biology, protein-protein interaction network research and analysis biology work(is utilized Can have important prospect and practical value.

Protein complex is the albumen for passing through a polymolecular mechanism of the composition that interacts in same time and space Matter set, it is the principal mode that protein performs its function.Identification of protein compound not only contributes to understand complicated life Life activity, while provide theory support to excavate complex disease formation mechanism and rational drug development.As high throughput is tested The development of technology and proteomics so that people probe into protein function, interaction relationship using the method for network theory And excavate complex disease mechanism and be possibly realized.Numerous studies show that protein network is (mutual between all proteins in organism Interactively) there is obvious modular construction, these structures are usually corresponding with protein complex, utilize protein network Identification of protein compound can improve efficiency, and guide Bioexperiment.But the albumen obtained by high throughput sequencing technologies Matter interaction data often have higher false positive and false negative, and single utilization protein interaction data, can influence The rate of precision of protein complex identification.

With the development of biotechnology, multi-source biological data continues to bring out, such as protein interaction (ProteinProtein Interaction, PPI) data, gene ontology (Gene Ontology, GO) data, time series The multi-source informations such as RNA-seq data, the gene expression data of time series, Subcellular Localization information, disease Relational database.Cause This, by integrating multi-source data, improves the accuracy of identification of protein complex, becomes the research direction to receive much concern.

The content of the invention

In view of this, the embodiment provides a kind of convergence strategy of multi-source biological information, and multiple target is combined Optimization method identification of protein compound, finally realize protein complex identification and prediction based on multisource data fusion and The protein complex recognizing method of multiple-objection optimization.

The embodiment of the present invention provides the protein complex recognizing method based on multisource data fusion and multiple-objection optimization, Comprise the following steps：

S1. regard protein-protein interaction network as full-mesh figure, pre-process, obtain adjacency matrix；

S2. all proteins node that will abut against in matrix is clustered, and obtains preliminary protein clustering module；

S3. each cluster module is further optimized on the basis of preliminary protein clustering module, in optimization process The function phase of topological structure characteristic and GO (Gene Ontology) the annotation data of middle fused protein interactive network data Like characteristic, and combining adaptive multiple target black hole optimization algorithm frame carries out optimizing operation, each protein module is regarded as black Hole, each protein node regard asterism as, and black hole center is the cluster centre of initial thick cluster module, by selecting and deleting not Former individual new asterism is same as to constantly update black hole, the adaptive value in new black hole and black hole where former asterism is calculated, is compared, If the adaptive value in new black hole is better than original black hole, original black hole is substituted with newly generated black hole, it is compound to obtain protein Thing module；

S4. post-processed, remove and be not connected with other protein nodes in each protein complex module The isolated node on side, and remove the protein complex module that all scales are less than 3, the protein complex obtained by processing Module is the optimal protein complex of this method identification.

Further, in the step S1, adjacency matrix obtains by the following method：

S1.1. protein interaction database is obtained；

S1.2. the protein of redundancy is removed, retains the protein list pair with interaction, two row are taken into union, are obtained To whole protein list；

S1.3. handle to obtain adjacency matrix by MATLAB simulation softwares again.

Further, in the step S3, it is poly- further to optimize each on the basis of preliminary protein clustering module Generic module concretely comprises the following steps：Initialization strategy, multisource data fusion strategy, the addition of asterism and the deletion strategy in black hole.

Further, the specific method of the initialization strategy in the black hole is：The thought of biclustering algorithms is used for reference, point The other row and column to adjacency matrix performs K-means clusters, i.e., selects K in all nodes in protein-protein interaction network Cluster centre, compares remaining node to the distance of each cluster centre, the module being included to where nearest cluster centre In, obtain K initial module.

Further, the main method of the multisource data fusion strategy is：The topology of conjugated protein interactive network The functional characteristic of architectural characteristic and GO annotation data, by integrating different types of genomic data and protein interaction number According to as the object function in multiple-objection optimization strategy, guiding the continuous iterative process of optimal solution.

Further, the specific method that the object function in the multiple-objection optimization strategy is set as：Conjugated protein is mutual The topological structure characteristic of network and the functional characteristic of GO annotation data are acted on, before alternative between taking into full account each target Put the suitable object function of selection.

Further, the object function in the multiple-objection optimization strategy is density, node degree of approach center, outside phase interaction With and functional similarity；

Density is expressed as：

Wherein, N_vRepresent the number of protein node in protein complex module, maximize protein complex module Density ensure that the protein complex inside modules that cluster obtains are compact-sized, be completely embedded；

Node degree of approach center is expressed as：

Wherein, the degree of approach represents protein node v_iThe sum of beeline d_ijInverse be multiplied by other node numbers, it is maximum Changing the node degree of approach makes the node in same cluster all be more nearly cluster centre；

Outwards interaction is expressed as：

Wherein,Represent in protein complex module with child node n_iThe number for the node being connected directly, minimizes Outwards interaction ensure it is as few as possible between different protein complex modules be connected by pontin protein, be different clusters Between it is as far as possible independent；

In protein-protein interaction network, the protein belonged in same protein compound has similar work( Energy characteristic, increases random walk probability under the basis of standard feature similarity formula, and the calculating of the random walk probability is public Formula is：

Wherein, ζ represents all known leaf nodes,WithRepresent in infinite moment node i and node j respectively from v₀ To v₁Migration probability；

Standard feature similarity formula is：

Wherein s represents to include child nodeThe value range of E is [0,1], power Value is represented with w；

Functional similarity is expressed as：

Further, the addition of the asterism and deletion strategy specific method are：Asterism regards protein node as, using adding Adduction delete operation, during addition and deletion, black hole will select the asterism of surrounding to be absorbed and removed.

Further, the specific method of the adaptive multiple target black hole optimization algorithm frame progress optimizing operation is：

S3.1. each protein module is subjected to object initialization；

S3.2. all objects are assessed according to evaluation criteria and therefrom selects the most strong object of fitness as just Beginning black hole；

S3.3. remaining object is moved to initial black hole, during movement, if the fitness of object is more black than current Hole fitness is strong, and current black hole is replaced and becomes new black hole by it；

S3.4. if during being moved to black hole, enter in the range of being absorbed by black hole, absorbed, on an equal basis The object of quantity can also randomly generate within its absorbed same time, and algorithm terminates, otherwise, return to step S3.2.

Compared with prior art, the invention has the advantages that：Fused protein interactive network data are opened up The function similar characteristic of architectural characteristic and gene ontology annotation data is flutterred, the accuracy of compound identification is improved from multi-angle. Adaptive multiple target black hole frame (AMOBH) with improving search range and operation efficiency；The black hole initialization of double focusing class Method ensure that the reliability of initial rough sort；The movement of asterism and absorption process improve the accuracy of later stage disaggregated classification.Should Method improves the recognition speed and accuracy of identification of protein complex, can be simultaneously extended to suitable for protein-protein interaction network Into other complicated community network analyses, there is very strong practicality in Complex Networks Analysis.

Brief description of the drawings

Fig. 1 is that the flow of the protein complex recognizing method of the invention based on multisource data fusion and multiple-objection optimization is shown It is intended to.

Fig. 2 is that the theory of the protein complex recognizing method of the invention based on multisource data fusion and multiple-objection optimization is shown It is intended to.

Fig. 3 is that the asterism of multiple target frame in the present invention is mobile and by black hole assimilation effect figure.

Fig. 4 is the method for the present invention and the Pareto leading surface effect contrast figure of NSGA-II.

Embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is further described.

Please refer to Fig.1 and Fig. 2, the embodiment provides the albumen based on multisource data fusion and multiple-objection optimization Matter complex recognizing method, comprises the following steps：

Protein interaction database is obtained from public web site, protein-protein interaction network is abstracted as by multiple The network-in-dialing figure G=(V, E) that interaction between protein node and node is formed, V be protein node set, E The set on the side of interaction between protein node.Due in protein interaction database there are some self-interactions and again The redundant data of interaction, therefore, the pretreatment to data mainly include：1. remove the (independent that phase interaction does not occur of redundancy With) protein, retain the protein list with interaction to (two row), two row are taken into union, obtain whole protein row Table；2. obtained interaction relationship is obtained into adjacency matrix by MATLAB simulation softwares.Phase interaction occurs for two protein Use " 1 " represents that the use " 0 " not interacted represents, protein list is corresponded with adjacency matrix, for rear Continuous inquiry.

S3. each cluster module is further optimized on the basis of preliminary protein clustering module, in optimization process The function similar characteristic of topological structure characteristic and GO the annotation data of middle fused protein interactive network data, and combine certainly Adapt to multiple target black hole optimization algorithm frame and carry out optimizing operation, regard each protein module as black hole, each protein section From the point of view of make asterism, black hole center is the cluster centre of initial thick cluster module, is different from the new of former individual by selecting and deleting Asterism constantly updates black hole, calculates the adaptive value in new black hole and black hole where former asterism, is compared, if the adaptation in new black hole Value is better than original black hole, then substitutes original black hole with newly generated black hole, obtain protein complex module；

Further optimize concretely comprising the following steps for each cluster module on the basis of preliminary protein clustering module：It is black Initialization strategy, multisource data fusion strategy, the addition of asterism and the deletion strategy in hole.

Black hole initialization strategy specific method be：The thought of biclustering algorithms is used for reference, respectively to adjacent square The row and column of battle array performs K-means clusters, i.e., selects K cluster centre in all nodes in protein-protein interaction network, Compare remaining node to the distance of each cluster centre, be included in the module where nearest cluster centre, obtain just The K module of beginning.

The main method of multisource data fusion strategy is：The topological structure characteristic and GO of conjugated protein interactive network The functional characteristic of data is annotated, by integrating different types of genomic data and protein interaction data, as more mesh The object function in optimisation strategy is marked, instructs the continuous iterative process of optimal solution.

The specific method that object function in multiple-objection optimization strategy is set as：Conjugated protein interactive network is opened up The functional characteristic of architectural characteristic and GO annotation data is flutterred, is selected on the premise of alternative between taking into full account each target, it is more Object function in objective optimization strategy is set as density, outwards node degree of approach center, interaction and functional similarity；

Density is expressed as：

Node degree of approach center is expressed as：

Wherein, the degree of approach represents protein node v_iThe sum of beeline d_ijInverse be multiplied by other node numbers, it is maximum Change the node degree of approach, the node in same cluster is all more nearly cluster centre；

Outwards interaction is expressed as：

Wherein,Represent in protein complex module with child node n_iThe number for the node being connected directly, minimizes Outwards interaction ensure it is as few as possible between different protein complex modules be connected by pontin protein, be different clusters Between it is as far as possible independent

In protein-protein interaction network, the protein belonged in same protein compound has similar work( Energy characteristic, therefore, functional similarity can participate in the selection of Multi-object policy as the important component of Multi-source Information Fusion, To further improve the reliability of cluster.

Increase random walk probability under the basis of standard feature similarity formula, the calculating of the random walk probability is public Formula is：

Standard feature similarity formula is：

Functional similarity is expressed as：

After object function is determined, optimize in adaptive multiple target black hole and perform following behaviour in algorithm frame to object function Make：First, the binding effect according to Topological Structure of Protein characteristic is clustered, and GO annotates data and is not involved in constraining at this time； During protein node is added and deleted, according to probability selection network node (protein), the GO of the proteinoid is utilized at this time Annotation information, i.e., the protein in similar protein complex generally have intimate characteristic, decision node whether with Compound has identical functional information, performs constraint manipulation；Finally, using Topological Structure of Protein as target, common constraint The moving process of protein node, makes to reach in same protein compound maximum functional similarity, increases the accurate of cluster Property.

The addition of asterism and deletion strategy specific method are：Asterism regards protein node as, using addition and deletes behaviour Make, during addition and deletion, black hole will select the asterism of surrounding to be absorbed and removed.

Shown in its concrete operations Fig. 3：The figure shows adaptive multiple target black hole to optimize algorithm during selection and deletion Implementation strategy, dash area represents an initial black hole (cluster) of the algorithm of the multiple target, and wherein canescence node represents By the arbitrary node of certain probability selection, at this time, perform addition by identical probability and delete two kinds of operations, adding procedure is such as right Shown in upper figure, the grey chromoprotein being connected with canescence node is preferentially absorbed into form new black hole, deletes process such as bottom-right graph Shown, canescence node is removed from initial black hole, and new black hole is formed with other exterior protein.

Adaptive multiple target black hole optimization algorithm frame carries out being mainly characterized in that for optimizing operation：Using one kind based on improvement Multi-objective particle swarm algorithm, be mainly reflected in the raising of search range, and multi-objective particle swarm in the raising of computational efficiency Algorithm is in itself relative to the raising of conventional evolutionary algorithm processing speed.Algorithm specifically improves and is mainly reflected in following three points：

1. using a certain fixed area centered on global extremum particle as black hole, which is approximate true solution location Domain, only particle does not add new search space, and can accelerate convergence process.

2. optimal forward position is obtained using the Pareto prevailing conditions of Problem with Some Constrained Conditions, and according to " Distance evaluation index " from Selection compromise optimal solution in optimal forward position.For multi-objective optimization question, a kind of new movement and adding method is used to increase The diversity of solution.

3. using new population density appraisal procedure, by variation introducing black hole algorithm, local using common variation, the overall situation is most Excellent solution is made a variation using elite, is defined the black hole border under higher-dimension multiple target and is illegally occupied and the calculation formula of celestial body radius, with biography The evolution algorithm (NSGA-II etc.) of system is compared, and shortens operation efficiency and time.

Adaptively the specific method of multiple target black hole optimization algorithm frame progress optimizing operation is：

S3.1. each protein module is subjected to object initialization；

Algorithm frame is optimized and by NSGA-II multiple-objection optimizations frame in identical input by adaptive multiple target black hole Under conditions of parameter, compare Pareto leading surface design sketch, as shown in Figure 4.Optimize algorithm frame using adaptive multiple target black hole Frame carries out optimizing operation more preferable performance on search range and search efficiency.

Final output of the present invention is black hole (cell array), and certain numeral is assembled in each black hole, by with it is pretreated The protein list obtained in journey corresponds, it may be determined that the final protein complex title in each black hole.

The work(of topological structure characteristic and gene ontology the annotation data of fused protein interactive network data of the present invention Energy similar characteristic, the accuracy of compound identification is improved from multi-angle.The utilization of adaptive multiple target black hole frame (AMOBH) Improve search range and operation efficiency；The black hole initial method of double focusing class ensure that the reliability of initial rough sort；Asterism Movement and absorption process improve the accuracy of later stage disaggregated classification.The method increase protein complex recognition speed and Accuracy of identification, can simultaneously be extended into other complicated community network analyses suitable for protein-protein interaction network, in complex web There is very strong practicality in network analysis.

In the case where there is no conflict, the feature in embodiment and embodiment herein-above set forth can be combined with each other.

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on, should all be included in the protection scope of the present invention.

Claims

1. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization, it is characterised in that including following Step：

S3. further optimize each cluster module on the basis of preliminary protein clustering module, melt in optimization process The function similar characteristic of topological structure characteristic and GO the annotation data of hop protein matter interactive network data, and combining adaptive Multiple target black hole optimization algorithm frame carries out optimizing operation, regards each protein module as black hole, each protein node is seen Make asterism, black hole center is the cluster centre of initial thick cluster module, is different from former individual new asterism by selecting and deleting To constantly update black hole, the adaptive value in new black hole and black hole where former asterism is calculated, is compared, if the adaptive value in new black hole is excellent In original black hole, then original black hole is substituted with newly generated black hole, obtain protein complex module；

S4. post-processed, remove the edge fit that is not connected with other protein nodes in each protein complex module Isolated node, and remove the protein complex module that all scales are less than 3, the protein complex module obtained by processing The as optimal protein complex of this method identification.

2. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 1, its It is characterized in that, in the step S1, adjacency matrix obtains by the following method：

S1.1. protein interaction database is obtained；

S1.2. the protein of redundancy is removed, retains the protein list pair with interaction, two row are taken into union, have been obtained Whole protein list；

S1.3. handle to obtain the adjacency matrix that can reflect interaction relationship by MATLAB simulation softwares.

3. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 1, its It is characterized in that, in the step S3, further optimizes each cluster module on the basis of preliminary protein clustering module Concretely comprise the following steps：Initialization strategy, multisource data fusion strategy, the addition of asterism and the deletion strategy in black hole.

4. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 3, its It is characterized in that, the specific method of the initialization strategy in the black hole is：The thought of biclustering algorithms is used for reference, respectively to neighbour The row and column for connecing matrix performs K-means cluster operations, i.e., selects K to gather in all nodes in protein-protein interaction network Class center, compares remaining node to the distance of each cluster centre, is included in the module where nearest cluster centre, Obtain K initial protein complex module.

5. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 3, its It is characterized in that, the main method of the multisource data fusion strategy is：The topological structure of conjugated protein interactive network is special Property and GO annotation data function similar characteristic, by integrating different types of genomic data and protein interaction number According to as the object function in multiple-objection optimization strategy, guiding the continuous iterative process of optimal solution.

6. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 5, its Be characterized in that, the specific method that the object function in the multiple-objection optimization strategy is set as：Conjugated protein interaction net The functional characteristic of topological structure characteristic and GO the annotation data of network, is selected on the premise of alternative between taking into full account each target Select suitable object function.

7. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 5, its It is characterized in that, the object function in the multiple-objection optimization strategy is density, node degree of approach center, outwards interaction and work( Can similitude；

Density is expressed as：

<mrow> <mi>D</mi> <mi>e</mi> <mi>n</mi> <mi>s</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> <mrow> <mo>(</mo> <mi>S</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mn>2</mn> <mo>&times;</mo> <msub> <mi>N</mi> <mi>v</mi> </msub> </mrow> <mrow> <msub> <mi>N</mi> <mi>v</mi> </msub> <mo>&times;</mo> <mrow> <mo>(</mo> <msub> <mi>N</mi> <mi>v</mi> </msub> <mo>&times;</mo> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>

Wherein, N_vRepresent the number of protein node in protein complex module, maximize protein complex block density and protect Demonstrate,prove that the obtained protein complex inside modules of cluster are compact-sized, are completely embedded；

Node degree of approach center is expressed as：

<mrow> <mi>C</mi> <mi>l</mi> <mi>o</mi> <mi>s</mi> <mi>e</mi> <mi>n</mi> <mi>e</mi> <mi>s</mi> <mi>s</mi> <mi> </mi> <mi>c</mi> <mi>e</mi> <mi>n</mi> <mi>t</mi> <mi>r</mi> <mi>a</mi> <mi>l</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mo>(</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mrow> <munderover> <munder> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> </munder> <mrow> <mi>j</mi> <mo>&NotEqual;</mo> <mi>i</mi> </mrow> <mi>N</mi> </munderover> <msub> <mi>d</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow> </mfrac> </mrow>

Wherein, the degree of approach represents protein node v_iThe sum of beeline d_ijInverse be multiplied by other node numbers, maximize section The point degree of approach makes the node in same cluster all be more nearly cluster centre；

Outwards interaction is expressed as：

<mrow> <mi>O</mi> <mi>u</mi> <mi>t</mi> <mi>w</mi> <mi>a</mi> <mi>r</mi> <mi>d</mi> <mi> </mi> <mi>I</mi> <mi>n</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>a</mi> <mi>c</mi> <mi>t</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <msub> <mi>n</mi> <mi>i</mi> </msub> <mo>&Element;</mo> <mi>S</mi> </mrow> </munder> <mfrac> <mrow> <mo>|</mo> <msub> <mi>N</mi> <msub> <mi>n</mi> <mi>i</mi> </msub> </msub> <mo>|</mo> </mrow> <mrow> <mi>deg</mi> <mi>r</mi> <mi>e</mi> <mi>e</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>

Wherein,Represent in protein complex module with child node n_iThe number for the node being connected directly, minimizes outside Interaction ensure it is as few as possible between different protein complex modules be connected by pontin protein, be between different clusters It is as far as possible independent；

In protein-protein interaction network, the protein belonged in same protein compound has the function of similar spy Property, increase random walk probability under the basis of standard feature similarity formula, the calculation formula of the random walk probability is：

<mrow> <mi>R</mi> <mi>W</mi> <mi>C</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mn>0</mn> </msub> <mo>,</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>&Element;</mo> <mi>&zeta;</mi> </mrow> </munder> <msubsup> <mi>W</mi> <mi>&infin;</mi> <msub> <mi>v</mi> <mn>0</mn> </msub> </msubsup> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <msubsup> <mi>W</mi> <mi>&infin;</mi> <msub> <mi>v</mi> <mn>1</mn> </msub> </msubsup> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> <mi>s</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow>

Wherein, ζ represents all known leaf nodes,WithRepresent in infinite moment node i and node j respectively from v₀To v₁ Migration probability；

Standard feature similarity formula is：

<mrow> <mi>s</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <msub> <mi>v</mi> <mn>0</mn> </msub> <mo>=</mo> <mn>1</mn> <mo>,</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>=</mo> <mn>1</mn> <mo>,</mo> <msub> <mi>v</mi> <mn>0</mn> </msub> <mo>&NotEqual;</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> </mrow> <mi>p</mi> </msubsup> <mi>w</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <msub> <mi>v</mi> <mn>0</mn> </msub> </msub> <mo>,</mo> <msub> <mi>n</mi> <msub> <mi>v</mi> <mn>1</mn> </msub> </msub> <mo>)</mo> </mrow> </mrow> <mi>p</mi> </mfrac> </mrow>

Wherein s represents to include child nodeThe value range of E is [0,1], weights w Represent；

Functional similarity is expressed as：

8. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 3, its It is characterized in that, the addition of the asterism and deletion strategy specific method are：Regard asterism as protein node, using addition and Delete operation, during addition and deletion, black hole will select the asterism of surrounding to be absorbed and removed.

9. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 1, its It is characterized in that, the specific method that the adaptive multiple target black hole optimization algorithm frame carries out optimizing operation is：

S3.1. each protein module is subjected to object initialization；

S3.2. all objects are assessed according to evaluation criteria and therefrom selects the most strong object of fitness as initial black Hole；

S3.3. remaining object is moved to initial black hole, during movement, if the fitness of object is fitted than current black hole Response is strong, and current black hole is replaced and becomes new black hole by it；

S3.4. if during being moved to black hole, enter in the range of being absorbed by black hole, absorbed, equivalent amount Object can also be randomly generated within its absorbed same time, algorithm terminates, otherwise, repeat step S3.2-3.4.