CN108009403A - Protein complex recognizing method based on multisource data fusion and multiple-objection optimization - Google Patents
Protein complex recognizing method based on multisource data fusion and multiple-objection optimization Download PDFInfo
- Publication number
- CN108009403A CN108009403A CN201711190016.3A CN201711190016A CN108009403A CN 108009403 A CN108009403 A CN 108009403A CN 201711190016 A CN201711190016 A CN 201711190016A CN 108009403 A CN108009403 A CN 108009403A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- protein
- black hole
- protein complex
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Physiology (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Pre-processed the invention discloses the protein complex recognizing method based on multisource data fusion and multiple-objection optimization, including to protein-protein interaction network data, obtain adjacency matrix;Protein complex preliminary clusters, obtain starting protein compound module;Further optimize compound module, the function similar characteristic of the topological structure characteristic of fused protein interactive network data and GO annotation data in optimization process, and combining adaptive multiple target black hole optimization algorithm frame carries out optimizing operation, obtains more accurate protein complex module;Post-processing operation is carried out, obtains final optimal protein complex.The present invention improves the recognition speed and accuracy of identification of protein complex, can be simultaneously extended to suitable for protein-protein interaction network into other complicated community network analyses, have very strong practicality in Complex Networks Analysis.
Description
Technical field
The present invention relates to field of bioinformatics, more particularly to a kind of egg based on multisource data fusion and multiple-objection optimization
White matter complex recognizing method.
Background technology
Protein is the product of gene expression, is the executor of organism physiological function, and the direct body of biological phenomena
Existing person.Proteomics is that the subject of systematization research is carried out to characteristic contained by protein, can be biosystem in healthy and disease
Structure, function and regulation and control under diseased state provide detailed description.Almost all of bioprocess, is all by a series of egg
White matter interaction is completed.From the angle of systems biology, protein-protein interaction network research and analysis biology work(is utilized
Can have important prospect and practical value.
Protein complex is the albumen for passing through a polymolecular mechanism of the composition that interacts in same time and space
Matter set, it is the principal mode that protein performs its function.Identification of protein compound not only contributes to understand complicated life
Life activity, while provide theory support to excavate complex disease formation mechanism and rational drug development.As high throughput is tested
The development of technology and proteomics so that people probe into protein function, interaction relationship using the method for network theory
And excavate complex disease mechanism and be possibly realized.Numerous studies show that protein network is (mutual between all proteins in organism
Interactively) there is obvious modular construction, these structures are usually corresponding with protein complex, utilize protein network
Identification of protein compound can improve efficiency, and guide Bioexperiment.But the albumen obtained by high throughput sequencing technologies
Matter interaction data often have higher false positive and false negative, and single utilization protein interaction data, can influence
The rate of precision of protein complex identification.
With the development of biotechnology, multi-source biological data continues to bring out, such as protein interaction
(ProteinProtein Interaction, PPI) data, gene ontology (Gene Ontology, GO) data, time series
The multi-source informations such as RNA-seq data, the gene expression data of time series, Subcellular Localization information, disease Relational database.Cause
This, by integrating multi-source data, improves the accuracy of identification of protein complex, becomes the research direction to receive much concern.
The content of the invention
In view of this, the embodiment provides a kind of convergence strategy of multi-source biological information, and multiple target is combined
Optimization method identification of protein compound, finally realize protein complex identification and prediction based on multisource data fusion and
The protein complex recognizing method of multiple-objection optimization.
The embodiment of the present invention provides the protein complex recognizing method based on multisource data fusion and multiple-objection optimization,
Comprise the following steps:
S1. regard protein-protein interaction network as full-mesh figure, pre-process, obtain adjacency matrix;
S2. all proteins node that will abut against in matrix is clustered, and obtains preliminary protein clustering module;
S3. each cluster module is further optimized on the basis of preliminary protein clustering module, in optimization process
The function phase of topological structure characteristic and GO (Gene Ontology) the annotation data of middle fused protein interactive network data
Like characteristic, and combining adaptive multiple target black hole optimization algorithm frame carries out optimizing operation, each protein module is regarded as black
Hole, each protein node regard asterism as, and black hole center is the cluster centre of initial thick cluster module, by selecting and deleting not
Former individual new asterism is same as to constantly update black hole, the adaptive value in new black hole and black hole where former asterism is calculated, is compared,
If the adaptive value in new black hole is better than original black hole, original black hole is substituted with newly generated black hole, it is compound to obtain protein
Thing module;
S4. post-processed, remove and be not connected with other protein nodes in each protein complex module
The isolated node on side, and remove the protein complex module that all scales are less than 3, the protein complex obtained by processing
Module is the optimal protein complex of this method identification.
Further, in the step S1, adjacency matrix obtains by the following method:
S1.1. protein interaction database is obtained;
S1.2. the protein of redundancy is removed, retains the protein list pair with interaction, two row are taken into union, are obtained
To whole protein list;
S1.3. handle to obtain adjacency matrix by MATLAB simulation softwares again.
Further, in the step S3, it is poly- further to optimize each on the basis of preliminary protein clustering module
Generic module concretely comprises the following steps:Initialization strategy, multisource data fusion strategy, the addition of asterism and the deletion strategy in black hole.
Further, the specific method of the initialization strategy in the black hole is:The thought of biclustering algorithms is used for reference, point
The other row and column to adjacency matrix performs K-means clusters, i.e., selects K in all nodes in protein-protein interaction network
Cluster centre, compares remaining node to the distance of each cluster centre, the module being included to where nearest cluster centre
In, obtain K initial module.
Further, the main method of the multisource data fusion strategy is:The topology of conjugated protein interactive network
The functional characteristic of architectural characteristic and GO annotation data, by integrating different types of genomic data and protein interaction number
According to as the object function in multiple-objection optimization strategy, guiding the continuous iterative process of optimal solution.
Further, the specific method that the object function in the multiple-objection optimization strategy is set as:Conjugated protein is mutual
The topological structure characteristic of network and the functional characteristic of GO annotation data are acted on, before alternative between taking into full account each target
Put the suitable object function of selection.
Further, the object function in the multiple-objection optimization strategy is density, node degree of approach center, outside phase interaction
With and functional similarity;
Density is expressed as:
Wherein, NvRepresent the number of protein node in protein complex module, maximize protein complex module
Density ensure that the protein complex inside modules that cluster obtains are compact-sized, be completely embedded;
Node degree of approach center is expressed as:
Wherein, the degree of approach represents protein node viThe sum of beeline dijInverse be multiplied by other node numbers, it is maximum
Changing the node degree of approach makes the node in same cluster all be more nearly cluster centre;
Outwards interaction is expressed as:
Wherein,Represent in protein complex module with child node niThe number for the node being connected directly, minimizes
Outwards interaction ensure it is as few as possible between different protein complex modules be connected by pontin protein, be different clusters
Between it is as far as possible independent;
In protein-protein interaction network, the protein belonged in same protein compound has similar work(
Energy characteristic, increases random walk probability under the basis of standard feature similarity formula, and the calculating of the random walk probability is public
Formula is:
Wherein, ζ represents all known leaf nodes,WithRepresent in infinite moment node i and node j respectively from v0
To v1Migration probability;
Standard feature similarity formula is:
Wherein s represents to include child nodeThe value range of E is [0,1], power
Value is represented with w;
Functional similarity is expressed as:
Further, the addition of the asterism and deletion strategy specific method are:Asterism regards protein node as, using adding
Adduction delete operation, during addition and deletion, black hole will select the asterism of surrounding to be absorbed and removed.
Further, the specific method of the adaptive multiple target black hole optimization algorithm frame progress optimizing operation is:
S3.1. each protein module is subjected to object initialization;
S3.2. all objects are assessed according to evaluation criteria and therefrom selects the most strong object of fitness as just
Beginning black hole;
S3.3. remaining object is moved to initial black hole, during movement, if the fitness of object is more black than current
Hole fitness is strong, and current black hole is replaced and becomes new black hole by it;
S3.4. if during being moved to black hole, enter in the range of being absorbed by black hole, absorbed, on an equal basis
The object of quantity can also randomly generate within its absorbed same time, and algorithm terminates, otherwise, return to step S3.2.
Compared with prior art, the invention has the advantages that:Fused protein interactive network data are opened up
The function similar characteristic of architectural characteristic and gene ontology annotation data is flutterred, the accuracy of compound identification is improved from multi-angle.
Adaptive multiple target black hole frame (AMOBH) with improving search range and operation efficiency;The black hole initialization of double focusing class
Method ensure that the reliability of initial rough sort;The movement of asterism and absorption process improve the accuracy of later stage disaggregated classification.Should
Method improves the recognition speed and accuracy of identification of protein complex, can be simultaneously extended to suitable for protein-protein interaction network
Into other complicated community network analyses, there is very strong practicality in Complex Networks Analysis.
Brief description of the drawings
Fig. 1 is that the flow of the protein complex recognizing method of the invention based on multisource data fusion and multiple-objection optimization is shown
It is intended to.
Fig. 2 is that the theory of the protein complex recognizing method of the invention based on multisource data fusion and multiple-objection optimization is shown
It is intended to.
Fig. 3 is that the asterism of multiple target frame in the present invention is mobile and by black hole assimilation effect figure.
Fig. 4 is the method for the present invention and the Pareto leading surface effect contrast figure of NSGA-II.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is further described.
Please refer to Fig.1 and Fig. 2, the embodiment provides the albumen based on multisource data fusion and multiple-objection optimization
Matter complex recognizing method, comprises the following steps:
S1. regard protein-protein interaction network as full-mesh figure, pre-process, obtain adjacency matrix;
Protein interaction database is obtained from public web site, protein-protein interaction network is abstracted as by multiple
The network-in-dialing figure G=(V, E) that interaction between protein node and node is formed, V be protein node set, E
The set on the side of interaction between protein node.Due in protein interaction database there are some self-interactions and again
The redundant data of interaction, therefore, the pretreatment to data mainly include:1. remove the (independent that phase interaction does not occur of redundancy
With) protein, retain the protein list with interaction to (two row), two row are taken into union, obtain whole protein row
Table;2. obtained interaction relationship is obtained into adjacency matrix by MATLAB simulation softwares.Phase interaction occurs for two protein
Use " 1 " represents that the use " 0 " not interacted represents, protein list is corresponded with adjacency matrix, for rear
Continuous inquiry.
S2. all proteins node that will abut against in matrix is clustered, and obtains preliminary protein clustering module;
S3. each cluster module is further optimized on the basis of preliminary protein clustering module, in optimization process
The function similar characteristic of topological structure characteristic and GO the annotation data of middle fused protein interactive network data, and combine certainly
Adapt to multiple target black hole optimization algorithm frame and carry out optimizing operation, regard each protein module as black hole, each protein section
From the point of view of make asterism, black hole center is the cluster centre of initial thick cluster module, is different from the new of former individual by selecting and deleting
Asterism constantly updates black hole, calculates the adaptive value in new black hole and black hole where former asterism, is compared, if the adaptation in new black hole
Value is better than original black hole, then substitutes original black hole with newly generated black hole, obtain protein complex module;
Further optimize concretely comprising the following steps for each cluster module on the basis of preliminary protein clustering module:It is black
Initialization strategy, multisource data fusion strategy, the addition of asterism and the deletion strategy in hole.
Black hole initialization strategy specific method be:The thought of biclustering algorithms is used for reference, respectively to adjacent square
The row and column of battle array performs K-means clusters, i.e., selects K cluster centre in all nodes in protein-protein interaction network,
Compare remaining node to the distance of each cluster centre, be included in the module where nearest cluster centre, obtain just
The K module of beginning.
The main method of multisource data fusion strategy is:The topological structure characteristic and GO of conjugated protein interactive network
The functional characteristic of data is annotated, by integrating different types of genomic data and protein interaction data, as more mesh
The object function in optimisation strategy is marked, instructs the continuous iterative process of optimal solution.
The specific method that object function in multiple-objection optimization strategy is set as:Conjugated protein interactive network is opened up
The functional characteristic of architectural characteristic and GO annotation data is flutterred, is selected on the premise of alternative between taking into full account each target, it is more
Object function in objective optimization strategy is set as density, outwards node degree of approach center, interaction and functional similarity;
Density is expressed as:
Wherein, NvRepresent the number of protein node in protein complex module, maximize protein complex module
Density ensure that the protein complex inside modules that cluster obtains are compact-sized, be completely embedded;
Node degree of approach center is expressed as:
Wherein, the degree of approach represents protein node viThe sum of beeline dijInverse be multiplied by other node numbers, it is maximum
Change the node degree of approach, the node in same cluster is all more nearly cluster centre;
Outwards interaction is expressed as:
Wherein,Represent in protein complex module with child node niThe number for the node being connected directly, minimizes
Outwards interaction ensure it is as few as possible between different protein complex modules be connected by pontin protein, be different clusters
Between it is as far as possible independent
In protein-protein interaction network, the protein belonged in same protein compound has similar work(
Energy characteristic, therefore, functional similarity can participate in the selection of Multi-object policy as the important component of Multi-source Information Fusion,
To further improve the reliability of cluster.
Increase random walk probability under the basis of standard feature similarity formula, the calculating of the random walk probability is public
Formula is:
Wherein, ζ represents all known leaf nodes,WithRepresent in infinite moment node i and node j respectively from v0
To v1Migration probability;
Standard feature similarity formula is:
Wherein s represents to include child nodeThe value range of E is [0,1], power
Value is represented with w;
Functional similarity is expressed as:
After object function is determined, optimize in adaptive multiple target black hole and perform following behaviour in algorithm frame to object function
Make:First, the binding effect according to Topological Structure of Protein characteristic is clustered, and GO annotates data and is not involved in constraining at this time;
During protein node is added and deleted, according to probability selection network node (protein), the GO of the proteinoid is utilized at this time
Annotation information, i.e., the protein in similar protein complex generally have intimate characteristic, decision node whether with
Compound has identical functional information, performs constraint manipulation;Finally, using Topological Structure of Protein as target, common constraint
The moving process of protein node, makes to reach in same protein compound maximum functional similarity, increases the accurate of cluster
Property.
The addition of asterism and deletion strategy specific method are:Asterism regards protein node as, using addition and deletes behaviour
Make, during addition and deletion, black hole will select the asterism of surrounding to be absorbed and removed.
Shown in its concrete operations Fig. 3:The figure shows adaptive multiple target black hole to optimize algorithm during selection and deletion
Implementation strategy, dash area represents an initial black hole (cluster) of the algorithm of the multiple target, and wherein canescence node represents
By the arbitrary node of certain probability selection, at this time, perform addition by identical probability and delete two kinds of operations, adding procedure is such as right
Shown in upper figure, the grey chromoprotein being connected with canescence node is preferentially absorbed into form new black hole, deletes process such as bottom-right graph
Shown, canescence node is removed from initial black hole, and new black hole is formed with other exterior protein.
Adaptive multiple target black hole optimization algorithm frame carries out being mainly characterized in that for optimizing operation:Using one kind based on improvement
Multi-objective particle swarm algorithm, be mainly reflected in the raising of search range, and multi-objective particle swarm in the raising of computational efficiency
Algorithm is in itself relative to the raising of conventional evolutionary algorithm processing speed.Algorithm specifically improves and is mainly reflected in following three points:
1. using a certain fixed area centered on global extremum particle as black hole, which is approximate true solution location
Domain, only particle does not add new search space, and can accelerate convergence process.
2. optimal forward position is obtained using the Pareto prevailing conditions of Problem with Some Constrained Conditions, and according to " Distance evaluation index " from
Selection compromise optimal solution in optimal forward position.For multi-objective optimization question, a kind of new movement and adding method is used to increase
The diversity of solution.
3. using new population density appraisal procedure, by variation introducing black hole algorithm, local using common variation, the overall situation is most
Excellent solution is made a variation using elite, is defined the black hole border under higher-dimension multiple target and is illegally occupied and the calculation formula of celestial body radius, with biography
The evolution algorithm (NSGA-II etc.) of system is compared, and shortens operation efficiency and time.
Adaptively the specific method of multiple target black hole optimization algorithm frame progress optimizing operation is:
S3.1. each protein module is subjected to object initialization;
S3.2. all objects are assessed according to evaluation criteria and therefrom selects the most strong object of fitness as just
Beginning black hole;
S3.3. remaining object is moved to initial black hole, during movement, if the fitness of object is more black than current
Hole fitness is strong, and current black hole is replaced and becomes new black hole by it;
S3.4. if during being moved to black hole, enter in the range of being absorbed by black hole, absorbed, on an equal basis
The object of quantity can also randomly generate within its absorbed same time, and algorithm terminates, otherwise, return to step S3.2.
S4. post-processed, remove and be not connected with other protein nodes in each protein complex module
The isolated node on side, and remove the protein complex module that all scales are less than 3, the protein complex obtained by processing
Module is the optimal protein complex of this method identification.
Algorithm frame is optimized and by NSGA-II multiple-objection optimizations frame in identical input by adaptive multiple target black hole
Under conditions of parameter, compare Pareto leading surface design sketch, as shown in Figure 4.Optimize algorithm frame using adaptive multiple target black hole
Frame carries out optimizing operation more preferable performance on search range and search efficiency.
Final output of the present invention is black hole (cell array), and certain numeral is assembled in each black hole, by with it is pretreated
The protein list obtained in journey corresponds, it may be determined that the final protein complex title in each black hole.
The work(of topological structure characteristic and gene ontology the annotation data of fused protein interactive network data of the present invention
Energy similar characteristic, the accuracy of compound identification is improved from multi-angle.The utilization of adaptive multiple target black hole frame (AMOBH)
Improve search range and operation efficiency;The black hole initial method of double focusing class ensure that the reliability of initial rough sort;Asterism
Movement and absorption process improve the accuracy of later stage disaggregated classification.The method increase protein complex recognition speed and
Accuracy of identification, can simultaneously be extended into other complicated community network analyses suitable for protein-protein interaction network, in complex web
There is very strong practicality in network analysis.
In the case where there is no conflict, the feature in embodiment and embodiment herein-above set forth can be combined with each other.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent replacement, improvement and so on, should all be included in the protection scope of the present invention.
Claims (9)
1. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization, it is characterised in that including following
Step:
S1. regard protein-protein interaction network as full-mesh figure, pre-process, obtain adjacency matrix;
S2. all proteins node that will abut against in matrix is clustered, and obtains preliminary protein clustering module;
S3. further optimize each cluster module on the basis of preliminary protein clustering module, melt in optimization process
The function similar characteristic of topological structure characteristic and GO the annotation data of hop protein matter interactive network data, and combining adaptive
Multiple target black hole optimization algorithm frame carries out optimizing operation, regards each protein module as black hole, each protein node is seen
Make asterism, black hole center is the cluster centre of initial thick cluster module, is different from former individual new asterism by selecting and deleting
To constantly update black hole, the adaptive value in new black hole and black hole where former asterism is calculated, is compared, if the adaptive value in new black hole is excellent
In original black hole, then original black hole is substituted with newly generated black hole, obtain protein complex module;
S4. post-processed, remove the edge fit that is not connected with other protein nodes in each protein complex module
Isolated node, and remove the protein complex module that all scales are less than 3, the protein complex module obtained by processing
The as optimal protein complex of this method identification.
2. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 1, its
It is characterized in that, in the step S1, adjacency matrix obtains by the following method:
S1.1. protein interaction database is obtained;
S1.2. the protein of redundancy is removed, retains the protein list pair with interaction, two row are taken into union, have been obtained
Whole protein list;
S1.3. handle to obtain the adjacency matrix that can reflect interaction relationship by MATLAB simulation softwares.
3. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 1, its
It is characterized in that, in the step S3, further optimizes each cluster module on the basis of preliminary protein clustering module
Concretely comprise the following steps:Initialization strategy, multisource data fusion strategy, the addition of asterism and the deletion strategy in black hole.
4. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 3, its
It is characterized in that, the specific method of the initialization strategy in the black hole is:The thought of biclustering algorithms is used for reference, respectively to neighbour
The row and column for connecing matrix performs K-means cluster operations, i.e., selects K to gather in all nodes in protein-protein interaction network
Class center, compares remaining node to the distance of each cluster centre, is included in the module where nearest cluster centre,
Obtain K initial protein complex module.
5. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 3, its
It is characterized in that, the main method of the multisource data fusion strategy is:The topological structure of conjugated protein interactive network is special
Property and GO annotation data function similar characteristic, by integrating different types of genomic data and protein interaction number
According to as the object function in multiple-objection optimization strategy, guiding the continuous iterative process of optimal solution.
6. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 5, its
Be characterized in that, the specific method that the object function in the multiple-objection optimization strategy is set as:Conjugated protein interaction net
The functional characteristic of topological structure characteristic and GO the annotation data of network, is selected on the premise of alternative between taking into full account each target
Select suitable object function.
7. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 5, its
It is characterized in that, the object function in the multiple-objection optimization strategy is density, node degree of approach center, outwards interaction and work(
Can similitude;
Density is expressed as:
<mrow>
<mi>D</mi>
<mi>e</mi>
<mi>n</mi>
<mi>s</mi>
<mi>i</mi>
<mi>t</mi>
<mi>y</mi>
<mrow>
<mo>(</mo>
<mi>S</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<mn>2</mn>
<mo>&times;</mo>
<msub>
<mi>N</mi>
<mi>v</mi>
</msub>
</mrow>
<mrow>
<msub>
<mi>N</mi>
<mi>v</mi>
</msub>
<mo>&times;</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>N</mi>
<mi>v</mi>
</msub>
<mo>&times;</mo>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
</mrow>
Wherein, NvRepresent the number of protein node in protein complex module, maximize protein complex block density and protect
Demonstrate,prove that the obtained protein complex inside modules of cluster are compact-sized, are completely embedded;
Node degree of approach center is expressed as:
<mrow>
<mi>C</mi>
<mi>l</mi>
<mi>o</mi>
<mi>s</mi>
<mi>e</mi>
<mi>n</mi>
<mi>e</mi>
<mi>s</mi>
<mi>s</mi>
<mi> </mi>
<mi>c</mi>
<mi>e</mi>
<mi>n</mi>
<mi>t</mi>
<mi>r</mi>
<mi>a</mi>
<mi>l</mi>
<mi>i</mi>
<mi>t</mi>
<mi>y</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>v</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<mo>(</mo>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mrow>
<munderover>
<munder>
<mo>&Sigma;</mo>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
</munder>
<mrow>
<mi>j</mi>
<mo>&NotEqual;</mo>
<mi>i</mi>
</mrow>
<mi>N</mi>
</munderover>
<msub>
<mi>d</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
</mrow>
</mfrac>
</mrow>
Wherein, the degree of approach represents protein node viThe sum of beeline dijInverse be multiplied by other node numbers, maximize section
The point degree of approach makes the node in same cluster all be more nearly cluster centre;
Outwards interaction is expressed as:
<mrow>
<mi>O</mi>
<mi>u</mi>
<mi>t</mi>
<mi>w</mi>
<mi>a</mi>
<mi>r</mi>
<mi>d</mi>
<mi> </mi>
<mi>I</mi>
<mi>n</mi>
<mi>t</mi>
<mi>e</mi>
<mi>r</mi>
<mi>a</mi>
<mi>c</mi>
<mi>t</mi>
<mi>i</mi>
<mi>o</mi>
<mi>n</mi>
<mo>=</mo>
<munder>
<mo>&Sigma;</mo>
<mrow>
<msub>
<mi>n</mi>
<mi>i</mi>
</msub>
<mo>&Element;</mo>
<mi>S</mi>
</mrow>
</munder>
<mfrac>
<mrow>
<mo>|</mo>
<msub>
<mi>N</mi>
<msub>
<mi>n</mi>
<mi>i</mi>
</msub>
</msub>
<mo>|</mo>
</mrow>
<mrow>
<mi>deg</mi>
<mi>r</mi>
<mi>e</mi>
<mi>e</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>n</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
</mrow>
Wherein,Represent in protein complex module with child node niThe number for the node being connected directly, minimizes outside
Interaction ensure it is as few as possible between different protein complex modules be connected by pontin protein, be between different clusters
It is as far as possible independent;
In protein-protein interaction network, the protein belonged in same protein compound has the function of similar spy
Property, increase random walk probability under the basis of standard feature similarity formula, the calculation formula of the random walk probability is:
<mrow>
<mi>R</mi>
<mi>W</mi>
<mi>C</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>v</mi>
<mn>0</mn>
</msub>
<mo>,</mo>
<msub>
<mi>v</mi>
<mn>1</mn>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munder>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>&Element;</mo>
<mi>&zeta;</mi>
</mrow>
</munder>
<msubsup>
<mi>W</mi>
<mi>&infin;</mi>
<msub>
<mi>v</mi>
<mn>0</mn>
</msub>
</msubsup>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>)</mo>
</mrow>
<msubsup>
<mi>W</mi>
<mi>&infin;</mi>
<msub>
<mi>v</mi>
<mn>1</mn>
</msub>
</msubsup>
<mrow>
<mo>(</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
<mi>s</mi>
<mi>i</mi>
<mi>m</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
</mrow>
Wherein, ζ represents all known leaf nodes,WithRepresent in infinite moment node i and node j respectively from v0To v1
Migration probability;
Standard feature similarity formula is:
<mrow>
<mi>s</mi>
<mi>i</mi>
<mi>m</mi>
<mrow>
<mo>(</mo>
<mi>s</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<msub>
<mi>v</mi>
<mn>0</mn>
</msub>
<mo>=</mo>
<mn>1</mn>
<mo>,</mo>
<msub>
<mi>v</mi>
<mn>1</mn>
</msub>
<mo>=</mo>
<mn>1</mn>
<mo>,</mo>
<msub>
<mi>v</mi>
<mn>0</mn>
</msub>
<mo>&NotEqual;</mo>
<msub>
<mi>v</mi>
<mn>1</mn>
</msub>
</mrow>
<mi>p</mi>
</msubsup>
<mi>w</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>n</mi>
<msub>
<mi>v</mi>
<mn>0</mn>
</msub>
</msub>
<mo>,</mo>
<msub>
<mi>n</mi>
<msub>
<mi>v</mi>
<mn>1</mn>
</msub>
</msub>
<mo>)</mo>
</mrow>
</mrow>
<mi>p</mi>
</mfrac>
</mrow>
Wherein s represents to include child nodeThe value range of E is [0,1], weights w
Represent;
Functional similarity is expressed as:
<mrow>
<mi>I</mi>
<mi>S</mi>
<mi>M</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>v</mi>
<mn>0</mn>
</msub>
<mo>,</mo>
<msub>
<mi>v</mi>
<mn>1</mn>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<mi>R</mi>
<mi>W</mi>
<mi>C</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>v</mi>
<mn>0</mn>
</msub>
<mo>,</mo>
<msub>
<mi>v</mi>
<mn>1</mn>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mi>s</mi>
<mi>i</mi>
<mi>m</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>v</mi>
<mn>0</mn>
</msub>
<mo>,</mo>
<msub>
<mi>v</mi>
<mn>1</mn>
</msub>
<mo>)</mo>
</mrow>
</mrow>
<mn>2</mn>
</mfrac>
<mo>.</mo>
</mrow>
8. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 3, its
It is characterized in that, the addition of the asterism and deletion strategy specific method are:Regard asterism as protein node, using addition and
Delete operation, during addition and deletion, black hole will select the asterism of surrounding to be absorbed and removed.
9. the protein complex recognizing method based on multisource data fusion and multiple-objection optimization as claimed in claim 1, its
It is characterized in that, the specific method that the adaptive multiple target black hole optimization algorithm frame carries out optimizing operation is:
S3.1. each protein module is subjected to object initialization;
S3.2. all objects are assessed according to evaluation criteria and therefrom selects the most strong object of fitness as initial black
Hole;
S3.3. remaining object is moved to initial black hole, during movement, if the fitness of object is fitted than current black hole
Response is strong, and current black hole is replaced and becomes new black hole by it;
S3.4. if during being moved to black hole, enter in the range of being absorbed by black hole, absorbed, equivalent amount
Object can also be randomly generated within its absorbed same time, algorithm terminates, otherwise, repeat step S3.2-3.4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711190016.3A CN108009403A (en) | 2017-11-24 | 2017-11-24 | Protein complex recognizing method based on multisource data fusion and multiple-objection optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711190016.3A CN108009403A (en) | 2017-11-24 | 2017-11-24 | Protein complex recognizing method based on multisource data fusion and multiple-objection optimization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108009403A true CN108009403A (en) | 2018-05-08 |
Family
ID=62053784
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711190016.3A Pending CN108009403A (en) | 2017-11-24 | 2017-11-24 | Protein complex recognizing method based on multisource data fusion and multiple-objection optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108009403A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108629159A (en) * | 2018-05-14 | 2018-10-09 | 辽宁大学 | A method of for finding the pathogenic key protein matter of alzheimer's disease |
CN108733976A (en) * | 2018-05-23 | 2018-11-02 | 扬州大学 | Key protein matter recognition methods based on fusion biology and topological characteristic |
CN108932402A (en) * | 2018-06-27 | 2018-12-04 | 华中师范大学 | A kind of protein complex recognizing method |
CN109033746A (en) * | 2018-06-29 | 2018-12-18 | 大连理工大学 | A kind of protein complex recognizing method based on knot vector |
CN109166604A (en) * | 2018-08-22 | 2019-01-08 | 华东交通大学 | A kind of calculation method merging more data characteristics prediction key protein matter |
CN109390057A (en) * | 2018-08-20 | 2019-02-26 | 安徽大学 | A kind of disease module detection method based on multiple-objection optimization |
CN110459264A (en) * | 2019-08-02 | 2019-11-15 | 陕西师范大学 | Based on grad enhancement decision tree prediction circular rna and disease associated method |
CN110706740A (en) * | 2019-09-29 | 2020-01-17 | 长沙理工大学 | Method, device and equipment for predicting protein function based on module decomposition |
CN111554346A (en) * | 2020-04-29 | 2020-08-18 | 上海交通大学 | Protein sequence design implementation method based on multi-objective optimization |
CN113223610A (en) * | 2021-05-27 | 2021-08-06 | 浙江大学 | Method for integrating disease protein interaction network and mining cross-disease action module |
CN113470739A (en) * | 2021-07-03 | 2021-10-01 | 中国科学院新疆理化技术研究所 | Protein interaction prediction method and system based on mixed membership degree random block model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105590039A (en) * | 2015-03-05 | 2016-05-18 | 华中师范大学 | Method for identifying protein complex based on BSO (Brain Storm Optimization) |
CN105930688A (en) * | 2016-04-18 | 2016-09-07 | 福州大学 | Improved PSO algorithm based protein function module detection method |
CN106228036A (en) * | 2016-07-26 | 2016-12-14 | 陕西师范大学 | A kind of method using fireworks algorithm identification of protein complex |
CN106778057A (en) * | 2016-11-15 | 2017-05-31 | 浙江工业大学 | A kind of protein conformation space optimization method based on quantum evolutionary algorithm |
-
2017
- 2017-11-24 CN CN201711190016.3A patent/CN108009403A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105590039A (en) * | 2015-03-05 | 2016-05-18 | 华中师范大学 | Method for identifying protein complex based on BSO (Brain Storm Optimization) |
CN105930688A (en) * | 2016-04-18 | 2016-09-07 | 福州大学 | Improved PSO algorithm based protein function module detection method |
CN106228036A (en) * | 2016-07-26 | 2016-12-14 | 陕西师范大学 | A kind of method using fireworks algorithm identification of protein complex |
CN106778057A (en) * | 2016-11-15 | 2017-05-31 | 浙江工业大学 | A kind of protein conformation space optimization method based on quantum evolutionary algorithm |
Non-Patent Citations (3)
Title |
---|
HATAMLOU A.ETC: "A new heuristic optimization approach for data clustering", 《INFORMATION SCIENCE》 * |
李琨等: "基于IBH-LSSVM的混沌时间序列预测及其在抽油井动液面短期预测中的应用", 《信息与控制》 * |
蒋兴鹏等: "微生物组学的大数据研究", 《数学建模及其应用》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108629159A (en) * | 2018-05-14 | 2018-10-09 | 辽宁大学 | A method of for finding the pathogenic key protein matter of alzheimer's disease |
CN108733976B (en) * | 2018-05-23 | 2021-12-03 | 扬州大学 | Key protein identification method based on fusion biology and topological characteristics |
CN108733976A (en) * | 2018-05-23 | 2018-11-02 | 扬州大学 | Key protein matter recognition methods based on fusion biology and topological characteristic |
CN108932402A (en) * | 2018-06-27 | 2018-12-04 | 华中师范大学 | A kind of protein complex recognizing method |
CN109033746A (en) * | 2018-06-29 | 2018-12-18 | 大连理工大学 | A kind of protein complex recognizing method based on knot vector |
CN109033746B (en) * | 2018-06-29 | 2020-01-14 | 大连理工大学 | Protein compound identification method based on node vector |
CN109390057A (en) * | 2018-08-20 | 2019-02-26 | 安徽大学 | A kind of disease module detection method based on multiple-objection optimization |
CN109390057B (en) * | 2018-08-20 | 2021-12-14 | 安徽大学 | Disease module detection method based on multi-objective optimization |
CN109166604A (en) * | 2018-08-22 | 2019-01-08 | 华东交通大学 | A kind of calculation method merging more data characteristics prediction key protein matter |
CN109166604B (en) * | 2018-08-22 | 2021-07-02 | 华东交通大学 | Calculation method for predicting key protein by fusing multi-data features |
CN110459264A (en) * | 2019-08-02 | 2019-11-15 | 陕西师范大学 | Based on grad enhancement decision tree prediction circular rna and disease associated method |
CN110706740A (en) * | 2019-09-29 | 2020-01-17 | 长沙理工大学 | Method, device and equipment for predicting protein function based on module decomposition |
CN110706740B (en) * | 2019-09-29 | 2022-03-22 | 长沙理工大学 | Method, device and equipment for predicting protein function based on module decomposition |
CN111554346A (en) * | 2020-04-29 | 2020-08-18 | 上海交通大学 | Protein sequence design implementation method based on multi-objective optimization |
CN111554346B (en) * | 2020-04-29 | 2023-05-23 | 上海交通大学 | Protein sequence design implementation method based on multi-objective optimization |
CN113223610A (en) * | 2021-05-27 | 2021-08-06 | 浙江大学 | Method for integrating disease protein interaction network and mining cross-disease action module |
CN113470739A (en) * | 2021-07-03 | 2021-10-01 | 中国科学院新疆理化技术研究所 | Protein interaction prediction method and system based on mixed membership degree random block model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108009403A (en) | Protein complex recognizing method based on multisource data fusion and multiple-objection optimization | |
Otero et al. | Inducing decision trees with an ant colony optimization algorithm | |
Guendouz et al. | A discrete modified fireworks algorithm for community detection in complex networks | |
Shi et al. | Protein complex detection with semi-supervised learning in protein interaction networks | |
Cuevas et al. | A cuckoo search algorithm for multimodal optimization | |
CN105930688A (en) | Improved PSO algorithm based protein function module detection method | |
Forsati et al. | A novel approach for feature selection based on the bee colony optimization | |
CN106789359A (en) | A kind of net flow assorted method and device based on grey wolf algorithm | |
Forti et al. | Growing Hierarchical Tree SOM: An unsupervised neural network with dynamic topology | |
Dehuri et al. | Multi-criterion Pareto based particle swarm optimized polynomial neural network for classification: A review and state-of-the-art | |
Xiong et al. | Multi-feature fusion and selection method for an improved particle swarm optimization | |
Zhang et al. | Application of natural computation inspired method in community detection | |
Attea et al. | Improving the performance of evolutionary-based complex detection models in protein–protein interaction networks | |
CN109509509A (en) | Protein complex method for digging based on dynamic weighting protein-protein interaction network | |
Ji et al. | Ant colony optimization with multi-agent evolution for detecting functional modules in protein-protein interaction networks | |
Shang et al. | Multi-objective clustering technique based on k-nodes update policy and similarity matrix for mining communities in social networks | |
Moreira | The use of Boolean concepts in general classification contexts | |
Lim et al. | Predicting drug-target interaction using 3D structure-embedded graph representations from graph neural networks | |
Chowdhury et al. | Cell type identification from single-cell transcriptomic data via gene embedding | |
Luong et al. | Lightweight multi-objective evolutionary neural architecture search with low-cost proxy metrics | |
Ray et al. | Disease associated protein complex detection: a multi-objective evolutionary approach | |
Hu et al. | Apenas: An asynchronous parallel evolution based multi-objective neural architecture search | |
Shi et al. | Semi-supervised learning protein complexes from protein interaction networks | |
Chiu et al. | Cluster analysis based on artificial immune system and ant algorithm | |
Michelakos et al. | Ant colony optimization and data mining: Techniques and trends |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180508 |
|
RJ01 | Rejection of invention patent application after publication |