CN108009403A - Protein complex recognizing method based on multisource data fusion and multiple-objection optimization - Google Patents

Protein complex recognizing method based on multisource data fusion and multiple-objection optimization Download PDF

Info

Publication number
CN108009403A
CN108009403A CN201711190016.3A CN201711190016A CN108009403A CN 108009403 A CN108009403 A CN 108009403A CN 201711190016 A CN201711190016 A CN 201711190016A CN 108009403 A CN108009403 A CN 108009403A
Authority
CN
China
Prior art keywords
mrow
msub
protein
black hole
protein complex
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711190016.3A
Other languages
Chinese (zh)
Inventor
朱媛
彭晓宇
吴崇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CN201711190016.3A priority Critical patent/CN108009403A/en
Publication of CN108009403A publication Critical patent/CN108009403A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Physiology (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Pre-processed the invention discloses the protein complex recognizing method based on multisource data fusion and multiple-objection optimization, including to protein-protein interaction network data, obtain adjacency matrix;Protein complex preliminary clusters, obtain starting protein compound module;Further optimize compound module, the function similar characteristic of the topological structure characteristic of fused protein interactive network data and GO annotation data in optimization process, and combining adaptive multiple target black hole optimization algorithm frame carries out optimizing operation, obtains more accurate protein complex module;Post-processing operation is carried out, obtains final optimal protein complex.The present invention improves the recognition speed and accuracy of identification of protein complex, can be simultaneously extended to suitable for protein-protein interaction network into other complicated community network analyses, have very strong practicality in Complex Networks Analysis.

Description

Protein complex recognizing method based on multisource data fusion and multiple-objection optimization
Technical field
The present invention relates to field of bioinformatics, more particularly to a kind of egg based on multisource data fusion and multiple-objection optimization White matter complex recognizing method.
Background technology
Protein is the product of gene expression, is the executor of organism physiological function, and the direct body of biological phenomena Existing person.Proteomics is that the subject of systematization research is carried out to characteristic contained by protein, can be biosystem in healthy and disease Structure, function and regulation and control under diseased state provide detailed description.Almost all of bioprocess, is all by a series of egg White matter interaction is completed.From the angle of systems biology, protein-protein interaction network research and analysis biology work(is utilized Can have important prospect and practical value.
Protein complex is the albumen for passing through a polymolecular mechanism of the composition that interacts in same time and space Matter set, it is the principal mode that protein performs its function.Identification of protein compound not only contributes to understand complicated life Life activity, while provide theory support to excavate complex disease formation mechanism and rational drug development.As high throughput is tested The development of technology and proteomics so that people probe into protein function, interaction relationship using the method for network theory And excavate complex disease mechanism and be possibly realized.Numerous studies show that protein network is (mutual between all proteins in organism Interactively) there is obvious modular construction, these structures are usually corresponding with protein complex, utilize protein network Identification of protein compound can improve efficiency, and guide Bioexperiment.But the albumen obtained by high throughput sequencing technologies Matter interaction data often have higher false positive and false negative, and single utilization protein interaction data, can influence The rate of precision of protein complex identification.
With the development of biotechnology, multi-source biological data continues to bring out, such as protein interaction (ProteinProtein Interaction, PPI) data, gene ontology (Gene Ontology, GO) data, time series The multi-source informations such as RNA-seq data, the gene expression data of time series, Subcellular Localization information, disease Relational database.Cause This, by integrating multi-source data, improves the accuracy of identification of protein complex, becomes the research direction to receive much concern.
The content of the invention
In view of this, the embodiment provides a kind of convergence strategy of multi-source biological information, and multiple target is combined Optimization method identification of protein compound, finally realize protein complex identification and prediction based on multisource data fusion and The protein complex recognizing method of multiple-objection optimization.
The embodiment of the present invention provides the protein complex recognizing method based on multisource data fusion and multiple-objection optimization, Comprise the following steps:
S1. regard protein-protein interaction network as full-mesh figure, pre-process, obtain adjacency matrix;
S2. all proteins node that will abut against in matrix is clustered, and obtains preliminary protein clustering module;
S3. each cluster module is further optimized on the basis of preliminary protein clustering module, in optimization process The function phase of topological structure characteristic and GO (Gene Ontology) the annotation data of middle fused protein interactive network data Like characteristic, and combining adaptive multiple target black hole optimization algorithm frame carries out optimizing operation, each protein module is regarded as black Hole, each protein node regard asterism as, and black hole center is the cluster centre of initial thick cluster module, by selecting and deleting not Former individual new asterism is same as to constantly update black hole, the adaptive value in new black hole and black hole where former asterism is calculated, is compared, If the adaptive value in new black hole is better than original black hole, original black hole is substituted with newly generated black hole, it is compound to obtain protein Thing module;
S4. post-processed, remove and be not connected with other protein nodes in each protein complex module The isolated node on side, and remove the protein complex module that all scales are less than 3, the protein complex obtained by processing Module is the optimal protein complex of this method identification.
Further, in the step S1, adjacency matrix obtains by the following method:
S1.1. protein interaction database is obtained;
S1.2. the protein of redundancy is removed, retains the protein list pair with interaction, two row are taken into union, are obtained To whole protein list;
S1.3. handle to obtain adjacency matrix by MATLAB simulation softwares again.
Further, in the step S3, it is poly- further to optimize each on the basis of preliminary protein clustering module Generic module concretely comprises the following steps:Initialization strategy, multisource data fusion strategy, the addition of asterism and the deletion strategy in black hole.
Further, the specific method of the initialization strategy in the black hole is:The thought of biclustering algorithms is used for reference, point The other row and column to adjacency matrix performs K-means clusters, i.e., selects K in all nodes in protein-protein interaction network Cluster centre, compares remaining node to the distance of each cluster centre, the module being included to where nearest cluster centre In, obtain K initial module.
Further, the main method of the multisource data fusion strategy is:The topology of conjugated protein interactive network The functional characteristic of architectural characteristic and GO annotation data, by integrating different types of genomic data and protein interaction number According to as the object function in multiple-objection optimization strategy, guiding the continuous iterative process of optimal solution.
Further, the specific method that the object function in the multiple-objection optimization strategy is set as:Conjugated protein is mutual The topological structure characteristic of network and the functional characteristic of GO annotation data are acted on, before alternative between taking into full account each target Put the suitable object function of selection.
Further, the object function in the multiple-objection optimization strategy is density, node degree of approach center, outside phase interaction With and functional similarity;
Density is expressed as:
Wherein, NvRepresent the number of protein node in protein complex module, maximize protein complex module Density ensure that the protein complex inside modules that cluster obtains are compact-sized, be completely embedded;
Node degree of approach center is expressed as:
Wherein, the degree of approach represents protein node viThe sum of beeline dijInverse be multiplied by other node numbers, it is maximum Changing the node degree of approach makes the node in same cluster all be more nearly cluster centre;
Outwards interaction is expressed as:
Wherein,Represent in protein complex module with child node niThe number for the node being connected directly, minimizes Outwards interaction ensure it is as few as possible between different protein complex modules be connected by pontin protein, be different clusters Between it is as far as possible independent;
In protein-protein interaction network, the protein belonged in same protein compound has similar work( Energy characteristic, increases random walk probability under the basis of standard feature similarity formula, and the calculating of the random walk probability is public Formula is:
Wherein, ζ represents all known leaf nodes,WithRepresent in infinite moment node i and node j respectively from v0 To v1Migration probability;
Standard feature similarity formula is:
Wherein s represents to include child nodeThe value range of E is [0,1], power Value is represented with w;
Functional similarity is expressed as:
Further, the addition of the asterism and deletion strategy specific method are:Asterism regards protein node as, using adding Adduction delete operation, during addition and deletion, black hole will select the asterism of surrounding to be absorbed and removed.
Further, the specific method of the adaptive multiple target black hole optimization algorithm frame progress optimizing operation is:
S3.1. each protein module is subjected to object initialization;
S3.2. all objects are assessed according to evaluation criteria and therefrom selects the most strong object of fitness as just Beginning black hole;
S3.3. remaining object is moved to initial black hole, during movement, if the fitness of object is more black than current Hole fitness is strong, and current black hole is replaced and becomes new black hole by it;
S3.4. if during being moved to black hole, enter in the range of being absorbed by black hole, absorbed, on an equal basis The object of quantity can also randomly generate within its absorbed same time, and algorithm terminates, otherwise, return to step S3.2.
Compared with prior art, the invention has the advantages that:Fused protein interactive network data are opened up The function similar characteristic of architectural characteristic and gene ontology annotation data is flutterred, the accuracy of compound identification is improved from multi-angle. Adaptive multiple target black hole frame (AMOBH) with improving search range and operation efficiency;The black hole initialization of double focusing class Method ensure that the reliability of initial rough sort;The movement of asterism and absorption process improve the accuracy of later stage disaggregated classification.Should Method improves the recognition speed and accuracy of identification of protein complex, can be simultaneously extended to suitable for protein-protein interaction network Into other complicated community network analyses, there is very strong practicality in Complex Networks Analysis.
Brief description of the drawings
Fig. 1 is that the flow of the protein complex recognizing method of the invention based on multisource data fusion and multiple-objection optimization is shown It is intended to.
Fig. 2 is that the theory of the protein complex recognizing method of the invention based on multisource data fusion and multiple-objection optimization is shown It is intended to.
Fig. 3 is that the asterism of multiple target frame in the present invention is mobile and by black hole assimilation effect figure.
Fig. 4 is the method for the present invention and the Pareto leading surface effect contrast figure of NSGA-II.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is further described.
Please refer to Fig.1 and Fig. 2, the embodiment provides the albumen based on multisource data fusion and multiple-objection optimization Matter complex recognizing method, comprises the following steps:
S1. regard protein-protein interaction network as full-mesh figure, pre-process, obtain adjacency matrix;
Protein interaction database is obtained from public web site, protein-protein interaction network is abstracted as by multiple The network-in-dialing figure G=(V, E) that interaction between protein node and node is formed, V be protein node set, E The set on the side of interaction between protein node.Due in protein interaction database there are some self-interactions and again The redundant data of interaction, therefore, the pretreatment to data mainly include:1. remove the (independent that phase interaction does not occur of redundancy With) protein, retain the protein list with interaction to (two row), two row are taken into union, obtain whole protein row Table;2. obtained interaction relationship is obtained into adjacency matrix by MATLAB simulation softwares.Phase interaction occurs for two protein Use " 1 " represents that the use " 0 " not interacted represents, protein list is corresponded with adjacency matrix, for rear Continuous inquiry.
S2. all proteins node that will abut against in matrix is clustered, and obtains preliminary protein clustering module;
S3. each cluster module is further optimized on the basis of preliminary protein clustering module, in optimization process The function similar characteristic of topological structure characteristic and GO the annotation data of middle fused protein interactive network data, and combine certainly Adapt to multiple target black hole optimization algorithm frame and carry out optimizing operation, regard each protein module as black hole, each protein section From the point of view of make asterism, black hole center is the cluster centre of initial thick cluster module, is different from the new of former individual by selecting and deleting Asterism constantly updates black hole, calculates the adaptive value in new black hole and black hole where former asterism, is compared, if the adaptation in new black hole Value is better than original black hole, then substitutes original black hole with newly generated black hole, obtain protein complex module;
Further optimize concretely comprising the following steps for each cluster module on the basis of preliminary protein clustering module:It is black Initialization strategy, multisource data fusion strategy, the addition of asterism and the deletion strategy in hole.
Black hole initialization strategy specific method be:The thought of biclustering algorithms is used for reference, respectively to adjacent square The row and column of battle array performs K-means clusters, i.e., selects K cluster centre in all nodes in protein-protein interaction network, Compare remaining node to the distance of each cluster centre, be included in the module where nearest cluster centre, obtain just The K module of beginning.
The main method of multisource data fusion strategy is:The topological structure characteristic and GO of conjugated protein interactive network The functional characteristic of data is annotated, by integrating different types of genomic data and protein interaction data, as more mesh The object function in optimisation strategy is marked, instructs the continuous iterative process of optimal solution.
The specific method that object function in multiple-objection optimization strategy is set as:Conjugated protein interactive network is opened up The functional characteristic of architectural characteristic and GO annotation data is flutterred, is selected on the premise of alternative between taking into full account each target, it is more Object function in objective optimization strategy is set as density, outwards node degree of approach center, interaction and functional similarity;
Density is expressed as:
Wherein, NvRepresent the number of protein node in protein complex module, maximize protein complex module Density ensure that the protein complex inside modules that cluster obtains are compact-sized, be completely embedded;
Node degree of approach center is expressed as:
Wherein, the degree of approach represents protein node viThe sum of beeline dijInverse be multiplied by other node numbers, it is maximum Change the node degree of approach, the node in same cluster is all more nearly cluster centre;
Outwards interaction is expressed as:
Wherein,Represent in protein complex module with child node niThe number for the node being connected directly, minimizes Outwards interaction ensure it is as few as possible between different protein complex modules be connected by pontin protein, be different clusters Between it is as far as possible independent
In protein-protein interaction network, the protein belonged in same protein compound has similar work( Energy characteristic, therefore, functional similarity can participate in the selection of Multi-object policy as the important component of Multi-source Information Fusion, To further improve the reliability of cluster.
Increase random walk probability under the basis of standard feature similarity formula, the calculating of the random walk probability is public Formula is:
Wherein, ζ represents all known leaf nodes,WithRepresent in infinite moment node i and node j respectively from v0 To v1Migration probability;
Standard feature similarity formula is:
Wherein s represents to include child nodeThe value range of E is [0,1], power Value is represented with w;
Functional similarity is expressed as:
After object function is determined, optimize in adaptive multiple target black hole and perform following behaviour in algorithm frame to object function Make:First, the binding effect according to Topological Structure of Protein characteristic is clustered, and GO annotates data and is not involved in constraining at this time; During protein node is added and deleted, according to probability selection network node (protein), the GO of the proteinoid is utilized at this time Annotation information, i.e., the protein in similar protein complex generally have intimate characteristic, decision node whether with Compound has identical functional information, performs constraint manipulation;Finally, using Topological Structure of Protein as target, common constraint The moving process of protein node, makes to reach in same protein compound maximum functional similarity, increases the accurate of cluster Property.
The addition of asterism and deletion strategy specific method are:Asterism regards protein node as, using addition and deletes behaviour Make, during addition and deletion, black hole will select the asterism of surrounding to be absorbed and removed.
Shown in its concrete operations Fig. 3:The figure shows adaptive multiple target black hole to optimize algorithm during selection and deletion Implementation strategy, dash area represents an initial black hole (cluster) of the algorithm of the multiple target, and wherein canescence node represents By the arbitrary node of certain probability selection, at this time, perform addition by identical probability and delete two kinds of operations, adding procedure is such as right Shown in upper figure, the grey chromoprotein being connected with canescence node is preferentially absorbed into form new black hole, deletes process such as bottom-right graph Shown, canescence node is removed from initial black hole, and new black hole is formed with other exterior protein.
Adaptive multiple target black hole optimization algorithm frame carries out being mainly characterized in that for optimizing operation:Using one kind based on improvement Multi-objective particle swarm algorithm, be mainly reflected in the raising of search range, and multi-objective particle swarm in the raising of computational efficiency Algorithm is in itself relative to the raising of conventional evolutionary algorithm processing speed.Algorithm specifically improves and is mainly reflected in following three points:
1. using a certain fixed area centered on global extremum particle as black hole, which is approximate true solution location Domain, only particle does not add new search space, and can accelerate convergence process.
2. optimal forward position is obtained using the Pareto prevailing conditions of Problem with Some Constrained Conditions, and according to " Distance evaluation index " from Selection compromise optimal solution in optimal forward position.For multi-objective optimization question, a kind of new movement and adding method is used to increase The diversity of solution.
3. using new population density appraisal procedure, by variation introducing black hole algorithm, local using common variation, the overall situation is most Excellent solution is made a variation using elite, is defined the black hole border under higher-dimension multiple target and is illegally occupied and the calculation formula of celestial body radius, with biography The evolution algorithm (NSGA-II etc.) of system is compared, and shortens operation efficiency and time.
Adaptively the specific method of multiple target black hole optimization algorithm frame progress optimizing operation is:
S3.1. each protein module is subjected to object initialization;
S3.2. all objects are assessed according to evaluation criteria and therefrom selects the most strong object of fitness as just Beginning black hole;
S3.3. remaining object is moved to initial black hole, during movement, if the fitness of object is more black than current Hole fitness is strong, and current black hole is replaced and becomes new black hole by it;
S3.4. if during being moved to black hole, enter in the range of being absorbed by black hole, absorbed, on an equal basis The object of quantity can also randomly generate within its absorbed same time, and algorithm terminates, otherwise, return to step S3.2.
S4. post-processed, remove and be not connected with other protein nodes in each protein complex module The isolated node on side, and remove the protein complex module that all scales are less than 3, the protein complex obtained by processing Module is the optimal protein complex of this method identification.
Algorithm frame is optimized and by NSGA-II multiple-objection optimizations frame in identical input by adaptive multiple target black hole Under conditions of parameter, compare Pareto leading surface design sketch, as shown in Figure 4.Optimize algorithm frame using adaptive multiple target black hole Frame carries out optimizing operation more preferable performance on search range and search efficiency.
Final output of the present invention is black hole (cell array), and certain numeral is assembled in each black hole, by with it is pretreated The protein list obtained in journey corresponds, it may be determined that the final protein complex title in each black hole.
The work(of topological structure characteristic and gene ontology the annotation data of fused protein interactive network data of the present invention Energy similar characteristic, the accuracy of compound identification is improved from multi-angle.The utilization of adaptive multiple target black hole frame (AMOBH) Improve search range and operation efficiency;The black hole initial method of double focusing class ensure that the reliability of initial rough sort;Asterism Movement and absorption process improve the accuracy of later stage disaggregated classification.The method increase protein complex recognition speed and Accuracy of identification, can simultaneously be extended into other complicated community network analyses suitable for protein-protein interaction network, in complex web There is very strong practicality in network analysis.
In the case where there is no conflict, the feature in embodiment and embodiment herein-above set forth can be combined with each other.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on, should all be included in the protection scope of the present invention.

Claims (9)

1. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization, it is characterised in that including following Step:
S1. regard protein-protein interaction network as full-mesh figure, pre-process, obtain adjacency matrix;
S2. all proteins node that will abut against in matrix is clustered, and obtains preliminary protein clustering module;
S3. further optimize each cluster module on the basis of preliminary protein clustering module, melt in optimization process The function similar characteristic of topological structure characteristic and GO the annotation data of hop protein matter interactive network data, and combining adaptive Multiple target black hole optimization algorithm frame carries out optimizing operation, regards each protein module as black hole, each protein node is seen Make asterism, black hole center is the cluster centre of initial thick cluster module, is different from former individual new asterism by selecting and deleting To constantly update black hole, the adaptive value in new black hole and black hole where former asterism is calculated, is compared, if the adaptive value in new black hole is excellent In original black hole, then original black hole is substituted with newly generated black hole, obtain protein complex module;
S4. post-processed, remove the edge fit that is not connected with other protein nodes in each protein complex module Isolated node, and remove the protein complex module that all scales are less than 3, the protein complex module obtained by processing The as optimal protein complex of this method identification.
2. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 1, its It is characterized in that, in the step S1, adjacency matrix obtains by the following method:
S1.1. protein interaction database is obtained;
S1.2. the protein of redundancy is removed, retains the protein list pair with interaction, two row are taken into union, have been obtained Whole protein list;
S1.3. handle to obtain the adjacency matrix that can reflect interaction relationship by MATLAB simulation softwares.
3. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 1, its It is characterized in that, in the step S3, further optimizes each cluster module on the basis of preliminary protein clustering module Concretely comprise the following steps:Initialization strategy, multisource data fusion strategy, the addition of asterism and the deletion strategy in black hole.
4. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 3, its It is characterized in that, the specific method of the initialization strategy in the black hole is:The thought of biclustering algorithms is used for reference, respectively to neighbour The row and column for connecing matrix performs K-means cluster operations, i.e., selects K to gather in all nodes in protein-protein interaction network Class center, compares remaining node to the distance of each cluster centre, is included in the module where nearest cluster centre, Obtain K initial protein complex module.
5. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 3, its It is characterized in that, the main method of the multisource data fusion strategy is:The topological structure of conjugated protein interactive network is special Property and GO annotation data function similar characteristic, by integrating different types of genomic data and protein interaction number According to as the object function in multiple-objection optimization strategy, guiding the continuous iterative process of optimal solution.
6. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 5, its Be characterized in that, the specific method that the object function in the multiple-objection optimization strategy is set as:Conjugated protein interaction net The functional characteristic of topological structure characteristic and GO the annotation data of network, is selected on the premise of alternative between taking into full account each target Select suitable object function.
7. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 5, its It is characterized in that, the object function in the multiple-objection optimization strategy is density, node degree of approach center, outwards interaction and work( Can similitude;
Density is expressed as:
<mrow> <mi>D</mi> <mi>e</mi> <mi>n</mi> <mi>s</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> <mrow> <mo>(</mo> <mi>S</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mn>2</mn> <mo>&amp;times;</mo> <msub> <mi>N</mi> <mi>v</mi> </msub> </mrow> <mrow> <msub> <mi>N</mi> <mi>v</mi> </msub> <mo>&amp;times;</mo> <mrow> <mo>(</mo> <msub> <mi>N</mi> <mi>v</mi> </msub> <mo>&amp;times;</mo> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
Wherein, NvRepresent the number of protein node in protein complex module, maximize protein complex block density and protect Demonstrate,prove that the obtained protein complex inside modules of cluster are compact-sized, are completely embedded;
Node degree of approach center is expressed as:
<mrow> <mi>C</mi> <mi>l</mi> <mi>o</mi> <mi>s</mi> <mi>e</mi> <mi>n</mi> <mi>e</mi> <mi>s</mi> <mi>s</mi> <mi> </mi> <mi>c</mi> <mi>e</mi> <mi>n</mi> <mi>t</mi> <mi>r</mi> <mi>a</mi> <mi>l</mi> <mi>i</mi> <mi>t</mi> <mi>y</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mo>(</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mrow> <munderover> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> </munder> <mrow> <mi>j</mi> <mo>&amp;NotEqual;</mo> <mi>i</mi> </mrow> <mi>N</mi> </munderover> <msub> <mi>d</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow> </mfrac> </mrow>
Wherein, the degree of approach represents protein node viThe sum of beeline dijInverse be multiplied by other node numbers, maximize section The point degree of approach makes the node in same cluster all be more nearly cluster centre;
Outwards interaction is expressed as:
<mrow> <mi>O</mi> <mi>u</mi> <mi>t</mi> <mi>w</mi> <mi>a</mi> <mi>r</mi> <mi>d</mi> <mi> </mi> <mi>I</mi> <mi>n</mi> <mi>t</mi> <mi>e</mi> <mi>r</mi> <mi>a</mi> <mi>c</mi> <mi>t</mi> <mi>i</mi> <mi>o</mi> <mi>n</mi> <mo>=</mo> <munder> <mo>&amp;Sigma;</mo> <mrow> <msub> <mi>n</mi> <mi>i</mi> </msub> <mo>&amp;Element;</mo> <mi>S</mi> </mrow> </munder> <mfrac> <mrow> <mo>|</mo> <msub> <mi>N</mi> <msub> <mi>n</mi> <mi>i</mi> </msub> </msub> <mo>|</mo> </mrow> <mrow> <mi>deg</mi> <mi>r</mi> <mi>e</mi> <mi>e</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
Wherein,Represent in protein complex module with child node niThe number for the node being connected directly, minimizes outside Interaction ensure it is as few as possible between different protein complex modules be connected by pontin protein, be between different clusters It is as far as possible independent;
In protein-protein interaction network, the protein belonged in same protein compound has the function of similar spy Property, increase random walk probability under the basis of standard feature similarity formula, the calculation formula of the random walk probability is:
<mrow> <mi>R</mi> <mi>W</mi> <mi>C</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mn>0</mn> </msub> <mo>,</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>&amp;Element;</mo> <mi>&amp;zeta;</mi> </mrow> </munder> <msubsup> <mi>W</mi> <mi>&amp;infin;</mi> <msub> <mi>v</mi> <mn>0</mn> </msub> </msubsup> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <msubsup> <mi>W</mi> <mi>&amp;infin;</mi> <msub> <mi>v</mi> <mn>1</mn> </msub> </msubsup> <mrow> <mo>(</mo> <mi>j</mi> <mo>)</mo> </mrow> <mi>s</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> </mrow>
Wherein, ζ represents all known leaf nodes,WithRepresent in infinite moment node i and node j respectively from v0To v1 Migration probability;
Standard feature similarity formula is:
<mrow> <mi>s</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <mi>s</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <msub> <mi>v</mi> <mn>0</mn> </msub> <mo>=</mo> <mn>1</mn> <mo>,</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>=</mo> <mn>1</mn> <mo>,</mo> <msub> <mi>v</mi> <mn>0</mn> </msub> <mo>&amp;NotEqual;</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> </mrow> <mi>p</mi> </msubsup> <mi>w</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <msub> <mi>v</mi> <mn>0</mn> </msub> </msub> <mo>,</mo> <msub> <mi>n</mi> <msub> <mi>v</mi> <mn>1</mn> </msub> </msub> <mo>)</mo> </mrow> </mrow> <mi>p</mi> </mfrac> </mrow>
Wherein s represents to include child nodeThe value range of E is [0,1], weights w Represent;
Functional similarity is expressed as:
<mrow> <mi>I</mi> <mi>S</mi> <mi>M</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mn>0</mn> </msub> <mo>,</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>R</mi> <mi>W</mi> <mi>C</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mn>0</mn> </msub> <mo>,</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mi>s</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mn>0</mn> </msub> <mo>,</mo> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> </mrow> <mn>2</mn> </mfrac> <mo>.</mo> </mrow>
8. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 3, its It is characterized in that, the addition of the asterism and deletion strategy specific method are:Regard asterism as protein node, using addition and Delete operation, during addition and deletion, black hole will select the asterism of surrounding to be absorbed and removed.
9. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 1, its It is characterized in that, the specific method that the adaptive multiple target black hole optimization algorithm frame carries out optimizing operation is:
S3.1. each protein module is subjected to object initialization;
S3.2. all objects are assessed according to evaluation criteria and therefrom selects the most strong object of fitness as initial black Hole;
S3.3. remaining object is moved to initial black hole, during movement, if the fitness of object is fitted than current black hole Response is strong, and current black hole is replaced and becomes new black hole by it;
S3.4. if during being moved to black hole, enter in the range of being absorbed by black hole, absorbed, equivalent amount Object can also be randomly generated within its absorbed same time, algorithm terminates, otherwise, repeat step S3.2-3.4.
CN201711190016.3A 2017-11-24 2017-11-24 Protein complex recognizing method based on multisource data fusion and multiple-objection optimization Pending CN108009403A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711190016.3A CN108009403A (en) 2017-11-24 2017-11-24 Protein complex recognizing method based on multisource data fusion and multiple-objection optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711190016.3A CN108009403A (en) 2017-11-24 2017-11-24 Protein complex recognizing method based on multisource data fusion and multiple-objection optimization

Publications (1)

Publication Number Publication Date
CN108009403A true CN108009403A (en) 2018-05-08

Family

ID=62053784

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711190016.3A Pending CN108009403A (en) 2017-11-24 2017-11-24 Protein complex recognizing method based on multisource data fusion and multiple-objection optimization

Country Status (1)

Country Link
CN (1) CN108009403A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629159A (en) * 2018-05-14 2018-10-09 辽宁大学 A method of for finding the pathogenic key protein matter of alzheimer's disease
CN108733976A (en) * 2018-05-23 2018-11-02 扬州大学 Key protein matter recognition methods based on fusion biology and topological characteristic
CN108932402A (en) * 2018-06-27 2018-12-04 华中师范大学 A kind of protein complex recognizing method
CN109033746A (en) * 2018-06-29 2018-12-18 大连理工大学 A kind of protein complex recognizing method based on knot vector
CN109166604A (en) * 2018-08-22 2019-01-08 华东交通大学 A kind of calculation method merging more data characteristics prediction key protein matter
CN109390057A (en) * 2018-08-20 2019-02-26 安徽大学 A kind of disease module detection method based on multiple-objection optimization
CN110459264A (en) * 2019-08-02 2019-11-15 陕西师范大学 Based on grad enhancement decision tree prediction circular rna and disease associated method
CN110706740A (en) * 2019-09-29 2020-01-17 长沙理工大学 Method, device and equipment for predicting protein function based on module decomposition
CN111554346A (en) * 2020-04-29 2020-08-18 上海交通大学 Protein sequence design implementation method based on multi-objective optimization
CN113223610A (en) * 2021-05-27 2021-08-06 浙江大学 Method for integrating disease protein interaction network and mining cross-disease action module
CN113470739A (en) * 2021-07-03 2021-10-01 中国科学院新疆理化技术研究所 Protein interaction prediction method and system based on mixed membership degree random block model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105590039A (en) * 2015-03-05 2016-05-18 华中师范大学 Method for identifying protein complex based on BSO (Brain Storm Optimization)
CN105930688A (en) * 2016-04-18 2016-09-07 福州大学 Improved PSO algorithm based protein function module detection method
CN106228036A (en) * 2016-07-26 2016-12-14 陕西师范大学 A kind of method using fireworks algorithm identification of protein complex
CN106778057A (en) * 2016-11-15 2017-05-31 浙江工业大学 A kind of protein conformation space optimization method based on quantum evolutionary algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105590039A (en) * 2015-03-05 2016-05-18 华中师范大学 Method for identifying protein complex based on BSO (Brain Storm Optimization)
CN105930688A (en) * 2016-04-18 2016-09-07 福州大学 Improved PSO algorithm based protein function module detection method
CN106228036A (en) * 2016-07-26 2016-12-14 陕西师范大学 A kind of method using fireworks algorithm identification of protein complex
CN106778057A (en) * 2016-11-15 2017-05-31 浙江工业大学 A kind of protein conformation space optimization method based on quantum evolutionary algorithm

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HATAMLOU A.ETC: "A new heuristic optimization approach for data clustering", 《INFORMATION SCIENCE》 *
李琨等: "基于IBH-LSSVM的混沌时间序列预测及其在抽油井动液面短期预测中的应用", 《信息与控制》 *
蒋兴鹏等: "微生物组学的大数据研究", 《数学建模及其应用》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629159A (en) * 2018-05-14 2018-10-09 辽宁大学 A method of for finding the pathogenic key protein matter of alzheimer's disease
CN108733976B (en) * 2018-05-23 2021-12-03 扬州大学 Key protein identification method based on fusion biology and topological characteristics
CN108733976A (en) * 2018-05-23 2018-11-02 扬州大学 Key protein matter recognition methods based on fusion biology and topological characteristic
CN108932402A (en) * 2018-06-27 2018-12-04 华中师范大学 A kind of protein complex recognizing method
CN109033746A (en) * 2018-06-29 2018-12-18 大连理工大学 A kind of protein complex recognizing method based on knot vector
CN109033746B (en) * 2018-06-29 2020-01-14 大连理工大学 Protein compound identification method based on node vector
CN109390057A (en) * 2018-08-20 2019-02-26 安徽大学 A kind of disease module detection method based on multiple-objection optimization
CN109390057B (en) * 2018-08-20 2021-12-14 安徽大学 Disease module detection method based on multi-objective optimization
CN109166604A (en) * 2018-08-22 2019-01-08 华东交通大学 A kind of calculation method merging more data characteristics prediction key protein matter
CN109166604B (en) * 2018-08-22 2021-07-02 华东交通大学 Calculation method for predicting key protein by fusing multi-data features
CN110459264A (en) * 2019-08-02 2019-11-15 陕西师范大学 Based on grad enhancement decision tree prediction circular rna and disease associated method
CN110706740A (en) * 2019-09-29 2020-01-17 长沙理工大学 Method, device and equipment for predicting protein function based on module decomposition
CN110706740B (en) * 2019-09-29 2022-03-22 长沙理工大学 Method, device and equipment for predicting protein function based on module decomposition
CN111554346A (en) * 2020-04-29 2020-08-18 上海交通大学 Protein sequence design implementation method based on multi-objective optimization
CN111554346B (en) * 2020-04-29 2023-05-23 上海交通大学 Protein sequence design implementation method based on multi-objective optimization
CN113223610A (en) * 2021-05-27 2021-08-06 浙江大学 Method for integrating disease protein interaction network and mining cross-disease action module
CN113470739A (en) * 2021-07-03 2021-10-01 中国科学院新疆理化技术研究所 Protein interaction prediction method and system based on mixed membership degree random block model

Similar Documents

Publication Publication Date Title
CN108009403A (en) Protein complex recognizing method based on multisource data fusion and multiple-objection optimization
Otero et al. Inducing decision trees with an ant colony optimization algorithm
Guendouz et al. A discrete modified fireworks algorithm for community detection in complex networks
Shi et al. Protein complex detection with semi-supervised learning in protein interaction networks
Cuevas et al. A cuckoo search algorithm for multimodal optimization
CN105930688A (en) Improved PSO algorithm based protein function module detection method
Forsati et al. A novel approach for feature selection based on the bee colony optimization
CN106789359A (en) A kind of net flow assorted method and device based on grey wolf algorithm
Forti et al. Growing Hierarchical Tree SOM: An unsupervised neural network with dynamic topology
Dehuri et al. Multi-criterion Pareto based particle swarm optimized polynomial neural network for classification: A review and state-of-the-art
Xiong et al. Multi-feature fusion and selection method for an improved particle swarm optimization
Zhang et al. Application of natural computation inspired method in community detection
Attea et al. Improving the performance of evolutionary-based complex detection models in protein–protein interaction networks
CN109509509A (en) Protein complex method for digging based on dynamic weighting protein-protein interaction network
Ji et al. Ant colony optimization with multi-agent evolution for detecting functional modules in protein-protein interaction networks
Shang et al. Multi-objective clustering technique based on k-nodes update policy and similarity matrix for mining communities in social networks
Moreira The use of Boolean concepts in general classification contexts
Lim et al. Predicting drug-target interaction using 3D structure-embedded graph representations from graph neural networks
Chowdhury et al. Cell type identification from single-cell transcriptomic data via gene embedding
Luong et al. Lightweight multi-objective evolutionary neural architecture search with low-cost proxy metrics
Ray et al. Disease associated protein complex detection: a multi-objective evolutionary approach
Hu et al. Apenas: An asynchronous parallel evolution based multi-objective neural architecture search
Shi et al. Semi-supervised learning protein complexes from protein interaction networks
Chiu et al. Cluster analysis based on artificial immune system and ant algorithm
Michelakos et al. Ant colony optimization and data mining: Techniques and trends

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180508

RJ01 Rejection of invention patent application after publication