CN111507506A - Consensus embedding-based complex network community discovery method - Google Patents

Consensus embedding-based complex network community discovery method Download PDF

Info

Publication number
CN111507506A
CN111507506A CN202010202056.0A CN202010202056A CN111507506A CN 111507506 A CN111507506 A CN 111507506A CN 202010202056 A CN202010202056 A CN 202010202056A CN 111507506 A CN111507506 A CN 111507506A
Authority
CN
China
Prior art keywords
consensus
network
embedding
pop
solution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010202056.0A
Other languages
Chinese (zh)
Inventor
曾湘祥
杜妍孜
刘向荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202010202056.0A priority Critical patent/CN111507506A/en
Publication of CN111507506A publication Critical patent/CN111507506A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Public Health (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Medical Informatics (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computing Systems (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A consensus embedding-based complex network community discovery method relates to a multi-objective optimization technology. The method comprises the following steps: 1) giving a maximum algebra maxgen and a particle swarm size pop; 2) given network G ═ V, E, the network size is n, carry on the network and represent and study; 3) representing a learning result by using a network, initializing a particle swarm to obtain 100 particles POP, wherein the iteration number t is 1; 4) carrying out updating and mutation based on consensus embedding on the POP; 5) stopping conditions are as follows: and if t is not more than maxgen, t ← t +1 and go to step 3), otherwise, stopping and returning pareto front edge solution, namely a plurality of community division results. The updating process is more efficient and accurate, and the obtained pareto frontier effect is more competitive; the accuracy rate of community discovery is improved, the convergence time of the method is effectively reduced, and the method has good practicability in practical application of function prediction, recommendation systems and the like.

Description

Consensus embedding-based complex network community discovery method
Technical Field
The invention relates to a multi-objective optimization technology, in particular to a consensus embedding-based complex network community discovery method which can be applied to the fields of function prediction, recommendation systems and the like.
Background
Networks with some or all of the properties of self-organization, self-similarity, attractors, worlds, scale-free are called Complex networks (Complex networks). In the complex network, more nodes of the same type are connected to form a small community, and fewer nodes of different types are connected to form an important channel for communicating different communities.
In order to explore the structural characteristics of a complex network and further understand the functions of the complex network, people extensively research the community structure of the complex network, and provide a plurality of community discovery methods which are mainly divided into four methods: agglomeration methods, fragmentation methods, optimization methods, and simulation methods. These four methods are not independent in many ways, and one method may embody multiple ideas simultaneously.
In recent years, a multi-objective optimization method has been successful, and therefore has a great significance in improving a multi-objective particle swarm algorithm, wherein the representative method is Maoguo Gong et al (M.Gong, Q.Cai, X.Chen, and L. Ma, "compact network cluster based redundancy by multi-objective discrete particle swarm optimization on composition," IEEEtransactions on evolution, vol.18, No.1, pp.82-97,2014) discrete particle swarm algorithm is applied to the multi-objective network clustering (PSO) but the PSO algorithm is initialized to generate a large amount of spatial redundancy, so that the PSO algorithm is useless and large in scale, and the PSO algorithm is not good in large scale and can only perform well on the network.
Disclosure of Invention
The invention aims to provide a consensus embedding-based complex network community discovery method aiming at the problems that in the prior art, convergence speed is too low, a large number of redundant solutions exist, underlying structure information hidden in a network is not deeply excavated, and the like.
The invention comprises the following steps:
1) giving a maximum algebra maxgen and a particle swarm size pop;
2) given network G ═ V, E, the network size is n, carry on the network and represent and study;
3) representing a learning result by using a network, initializing a particle swarm to obtain 100 particles POP, wherein the iteration number t is 1;
4) carrying out updating and mutation based on consensus embedding on the POP;
5) stopping conditions are as follows: and if t is not more than maxgen, t ← t +1 and go to step 3), otherwise, stopping and returning pareto front edge solution, namely a plurality of community division results.
In step 2), the specific step of performing network representation learning may be:
the method comprises the steps of adopting a network embedding method AROPE based on singular value decomposition and eigenvalue decomposition frames and keeping any order of similarity, mapping a high-dimensional adjacent matrix into a low-dimensional continuous eigenspace, mining underlying structure information hidden in a network, and obtaining an eigenvector E ═ { E } of a node1,e2,e3,...,enIn which ei={ei1,ei2,ei3,...,eidD is the dimensionality after dimensionality reduction.
In step 3), the specific steps of representing the learning result by using a network and initializing the particle swarm to obtain 100 particles of POP include:
calculating the similarity to obtain
Figure BDA0002419728150000021
Particle swarm POP capable of obtaining initialization based on similarity by using k-meanst={x1,x2,x3,...,xpop}t(ii) a The local optimal solution for the particle is also initialized to Pbest ═ POPt
In step 4), the specific steps of updating and mutating the POP based on consensus embedding are as follows:
performing consensus-based updates in each generation;
performing consensus embedding every 10 generations;
and performing neighborhood-based single-point mutation on the particles with mutation probability of pm in each generation.
And iterating the update variation process of the particles for maxgen times.
When updating based on consensus is carried out in each generation, taking the communities in the gbest as consensus, and randomly selecting one of the communities to be embedded into the current particle; when the consensus embedding is carried out every 10 generations, extracting the characteristic solution in the current particle swarm every 10 generations, extracting all two-node communities from the characteristic solution, voting in the current particle swarm, and if the support degree is more than or equal to 70%, selecting the characteristic solution as the consensus community and completely embedding the characteristic solution into the current particle;
wherein, the characteristic solutions extracted from the solution set are three representative solutions, which are respectively:
a. solutions with minimum KKM
b. Solutions with minimum RC
c. Local knee solution with minimum manhattan distance
When single-point variation based on neighborhood is carried out on the particles with variation probability pm in each generation, a position i is randomly selected for each particle needing variation, and the label of the position i is replaced by another label in the neighborhood, so that the generation of a new possible solution is ensured.
The objective functions employed in the present invention are variants of KKM and RC:
Figure BDA0002419728150000031
wherein Vh∈V,
Figure BDA0002419728150000032
Therefore, it is not only easy to use
Figure BDA0002419728150000033
And
Figure BDA0002419728150000034
is a set VhThe sum of the internal and external similarities of the middle node, "| · |" represents the large of the setIs small.
A non-dominated sorting strategy is used in the selection of particles.
Compared with the prior art, the invention has the following outstanding technical effects:
compared with the evolution process of MODPSO, the updating modes of the consensus embedding-based complex network community discovery algorithm particles are different, and the updating process is more efficient and accurate. Experimental data also show that the pareto frontier effect obtained by the method is more competitive; less convergence time is required based on the invention. The method improves the accuracy of community discovery, can effectively reduce the convergence time of the method, and has good practicability in the practical application of function prediction, recommendation systems and the like. Such as analysis of metabolic networks, analysis of gene regulatory networks, identification of major genes in the biological field; finding out a key community and a key node of the infectious disease in the fields of propagation evolution and prediction prevention and control of the disease to predict a propagation path and cut off the propagation path in time; the method realizes accurate advertisement delivery in the Internet, establishes a more reliable recommendation system on an e-commerce system, and provides more personalized search results in a search engine.
Drawings
FIG. 1 is a schematic diagram of the particle update method in each generation.
Fig. 2 is a schematic diagram of a common embedding-based update method for every 10 generations of particles.
FIG. 3 is a comparison graph of the runtime of the method of the present invention (NE-PSO) versus the MODPSO algorithm on a small scale network data set.
FIG. 4 is a line graph of the runtime of the method of the present invention (NE-PSO) on a large scale network data set.
Detailed Description
The following examples will further illustrate the present invention with reference to the accompanying drawings.
The embodiment of the consensus embedding-based complex network community discovery method specifically comprises the following steps:
1) giving a network G ═ V, E, a maximum algebra maxgen, a particle swarm size pop, a network size n and a variation probability pm;
2) performing network representation learning on a network G by using a network embedding method AROPE which is based on singular value decomposition and eigenvalue decomposition frames and retains the similarity of any order, mapping a high-dimensional adjacent matrix into a low-dimensional continuous eigenspace, mining underlying structure information hidden in the network, and obtaining an eigenvector E of a node as { E ═ E1,e2,e3,...,enIn which ei={ei1,ei2,ei3,...,eidD is the dimensionality after dimensionality reduction;
3) using the network to represent the learning result and the feature vector of the node, and calculating cos similarity between the nodes to obtain
Figure BDA0002419728150000041
Particle swarm POP capable of obtaining initialization based on similarity by using k-meanst={x1,x2,x3,...,xpop}t(ii) a The local optimal solution for the particle is also initialized to Pbest ═ POPt(ii) a The iteration time t is 1;
4) carrying out updating and mutation based on consensus embedding on POP, wherein:
carrying out consensus-based updating in each generation (as shown in FIG. 1), taking communities in the gbest as consensus, and randomly selecting one of the communities to be embedded into the current particle;
performing consensus embedding every 10 generations (as shown in FIG. 2), extracting feature solutions in the current particle swarm every 10 generations, extracting all two-node communities from the feature solutions, voting in the current particle swarm, and if the support degree is more than or equal to 70%, selecting the two-node communities as the consensus communities and completely embedding the two-node communities into the current particles;
and performing neighborhood-based single-point mutation on the particles with mutation probability of pm in each generation, and randomly selecting a position i for each particle needing mutation, wherein the label of the position i is replaced by another label in the neighborhood, so that the generation of a new possible solution is ensured.
5) Stopping conditions are as follows: and if t is not more than maxgen, t ← t +1 and go to step 3, otherwise, stopping the algorithm and returning a pareto front solution, namely a plurality of community division results.
FIG. 1 is a schematic diagram of the particle update method in each generation. In each generation of updating, all particles are sorted according to a non-dominance relation, a global optimal solution gbest is maintained, communities divided in the gbest can be regarded as common-acquainted communities, the particles of the current generation are randomly embedded by the common-acquainted communities, the common-acquainted communities and the original particles are subjected to non-dominance sorting selection, and the generated new particles are left.
Fig. 2 is a schematic diagram of a common embedding-based update method for every 10 generations of particles. In the updating of every 10 generations, a local knee point (namely an extreme point), a KKM minimum value point and an RC minimum value point are selected from the pareto frontiers consisting of all particles, a consensus community is extracted from the h points (the support degree set in the invention is 70%), and the obtained consensus community is embedded into all the particles of the current generation to update all the particles.
Table 1 shows the modularity Q (maximum and mean) of the method of the invention (NE-PSO) versus other reference algorithms over 11 real world network datasets.
TABLE 1
Figure BDA0002419728150000051
Figure BDA0002419728150000061
Note: "-" indicates that the corresponding method does not give a result in a given time. "? "indicating that the corresponding method is not disclosed results in the corresponding experimental results being unknown.
As can be seen from table 1, the method of the present invention has 9 data sets with better effect than other reference algorithms, especially on the data sets with more network nodes. And according to the line graphs of the running time on the small-scale network data set and the large-scale network data set in fig. 3 and 4, it can be seen that the NE-PSO also obviously reduces the convergence time of the multi-target particle swarm algorithm. Therefore, the invention improves the accuracy of community discovery, can effectively reduce the convergence time of the algorithm, and has good practicability in practical application.
The above-described embodiments are merely preferred embodiments of the present invention, and should not be construed as limiting the scope of the invention. All equivalent changes and modifications made within the scope of the present invention shall fall within the scope of the present invention.

Claims (5)

1. A consensus embedding-based complex network community discovery method is characterized by comprising the following steps:
1) giving a maximum algebra maxgen and a particle swarm size pop;
2) given network G ═ V, E, the network size is n, carry on the network and represent and study;
3) representing a learning result by using a network, initializing a particle swarm to obtain 100 particles POP, wherein the iteration number t is 1;
4) carrying out updating and mutation based on consensus embedding on the POP;
5) stopping conditions are as follows: and if t is not more than maxgen, t ← t +1 and go to step 3), otherwise, stopping and returning pareto front edge solution, namely a plurality of community division results.
2. The method for discovering complex web communities based on consensus embedding as claimed in claim 1, wherein in step 2), the specific steps of performing web representation learning are:
the method comprises the steps of adopting a network embedding method AROPE based on singular value decomposition and eigenvalue decomposition frames and keeping any order of similarity, mapping a high-dimensional adjacent matrix into a low-dimensional continuous eigenspace, mining underlying structure information hidden in a network, and obtaining an eigenvector E ═ { E } of a node1,e2,e3,...,enIn which ei={ei1,ei2,ei3,...,eidD is the dimensionality after dimensionality reduction.
3. The method as claimed in claim 1, wherein in step 3), the specific steps of using the network to represent the learning result and performing initialization of the particle swarm to obtain 100 particle POPs are as follows:
calculating similarity to obtain S ═ Sij)n*n(ii) a Particle swarm POP capable of obtaining initialization based on similarity by using k-meanst={x1,x2,x3,...,xpop}t(ii) a The local optimal solution for the particle is also initialized to Pbest ═ POPt
4. The method as claimed in claim 1, wherein in step 4), the step of updating and mutating the POP based on consensus embedding comprises:
performing consensus-based updates in each generation;
performing consensus embedding every 10 generations;
and performing neighborhood-based single-point mutation on the particles with mutation probability of pm in each generation.
5. The method for discovering complex network communities based on consensus embedding as claimed in claim 1, wherein in step 4), the updating and mutation iterates the updating and mutation process of the particles for maxgen times;
when updating based on consensus is carried out in each generation, taking the communities in the gbest as consensus, and randomly selecting one of the communities to be embedded into the current particle; when the consensus embedding is carried out every 10 generations, extracting the characteristic solution in the current particle swarm every 10 generations, extracting all two-node communities from the characteristic solution, voting in the current particle swarm, and if the support degree is more than or equal to 70%, selecting the characteristic solution as the consensus community and completely embedding the characteristic solution into the current particle;
wherein, the characteristic solutions extracted from the solution set are three representative solutions, which are respectively:
(1) solutions with minimum KKM
(2) Solutions with minimum RC
(3) Local knee solution with minimum manhattan distance
When single-point variation based on neighborhood is carried out on the particles with variation probability pm in each generation, a position i is randomly selected for each particle needing variation, and the label of the position i is replaced by another label in the neighborhood, so that the generation of a new possible solution is ensured.
CN202010202056.0A 2020-03-20 2020-03-20 Consensus embedding-based complex network community discovery method Pending CN111507506A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010202056.0A CN111507506A (en) 2020-03-20 2020-03-20 Consensus embedding-based complex network community discovery method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010202056.0A CN111507506A (en) 2020-03-20 2020-03-20 Consensus embedding-based complex network community discovery method

Publications (1)

Publication Number Publication Date
CN111507506A true CN111507506A (en) 2020-08-07

Family

ID=71869303

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010202056.0A Pending CN111507506A (en) 2020-03-20 2020-03-20 Consensus embedding-based complex network community discovery method

Country Status (1)

Country Link
CN (1) CN111507506A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022142467A1 (en) * 2020-12-30 2022-07-07 南方科技大学 Epidemic prevention and control method and apparatus, and device and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080294403A1 (en) * 2004-04-30 2008-11-27 Jun Zhu Systems and Methods for Reconstructing Gene Networks in Segregating Populations
CN103971160A (en) * 2014-05-05 2014-08-06 北京航空航天大学 Particle swarm optimization method based on complex network
CN109859065A (en) * 2019-02-28 2019-06-07 桂林理工大学 Multiple target complex network community discovery method based on spectral clustering
US20190179615A1 (en) * 2016-10-27 2019-06-13 Tencent Technology (Shenzhen) Company Limited Community discovery method, device, server and computer storage medium
CN109921936A (en) * 2019-03-13 2019-06-21 南京邮电大学 Multiple target dynamic network community division method based on memetic frame

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080294403A1 (en) * 2004-04-30 2008-11-27 Jun Zhu Systems and Methods for Reconstructing Gene Networks in Segregating Populations
CN103971160A (en) * 2014-05-05 2014-08-06 北京航空航天大学 Particle swarm optimization method based on complex network
US20190179615A1 (en) * 2016-10-27 2019-06-13 Tencent Technology (Shenzhen) Company Limited Community discovery method, device, server and computer storage medium
CN109859065A (en) * 2019-02-28 2019-06-07 桂林理工大学 Multiple target complex network community discovery method based on spectral clustering
CN109921936A (en) * 2019-03-13 2019-06-21 南京邮电大学 Multiple target dynamic network community division method based on memetic frame

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIANGRONG LIU , YANZI DU, MIN JIANG AND XIANGXIANG ZENG: "Multiobjective Particle Swarm Optimization Based on Network Embedding for Complex Network Community Detection", 《IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS》, vol. 7, no. 2, 4 February 2020 (2020-02-04), pages 437 - 449, XP011781725, DOI: 10.1109/TCSS.2020.2964027 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022142467A1 (en) * 2020-12-30 2022-07-07 南方科技大学 Epidemic prevention and control method and apparatus, and device and medium

Similar Documents

Publication Publication Date Title
Wang et al. A cluster-based competitive particle swarm optimizer with a sparse truncation operator for multi-objective optimization
CN114022693B (en) Single-cell RNA-seq data clustering method based on double self-supervision
Hu et al. FCAN-MOPSO: an improved fuzzy-based graph clustering algorithm for complex networks with multi-objective particle swarm optimization
Li et al. Disentangled graph contrastive learning with independence promotion
Yang et al. Linearly decreasing weight particle swarm optimization with accelerated strategy for data clustering
Kao et al. Combining K-means and particle swarm optimization for dynamic data clustering problems
CN109002858B (en) Evidence reasoning-based integrated clustering method for user behavior analysis
CN111507506A (en) Consensus embedding-based complex network community discovery method
Yi et al. New feature analysis-based elastic net algorithm with clustering objective function
Jin et al. Neural networks for fitness approximation in evolutionary optimization
CN113989544A (en) Group discovery method based on deep map convolution network
Dey et al. A quantum inspired differential evolution algorithm for automatic clustering of real life datasets
CN117056763A (en) Community discovery method based on variogram embedding
Ye et al. Feature selection based on adaptive particle swarm optimization with leadership learning
Maulik et al. Multiobjective fuzzy biclustering in microarray data: method and a new performance measure
de Oliveira et al. Data clustering based on complex network community detection
Hesamipour et al. Detecting communities in complex networks using an adaptive genetic algorithm and node similarity-based encoding
CN110288606B (en) Three-dimensional grid model segmentation method of extreme learning machine based on ant lion optimization
Guo et al. THGNCDA: circRNA–disease association prediction based on triple heterogeneous graph network
Ellouze Social Network Community Detection by Combining Self-Organizing Maps and Genetic Algorithms
CN112561599A (en) Click rate prediction method based on attention network learning and fusing domain feature interaction
Bhattacharya et al. DAFHEA: a dynamic approximate fitness-based hybrid EA for optimisation problems
Luo et al. An entropy driven multiobjective particle swarm optimization algorithm for feature selection
Chopade et al. Recent trends in incremental clustering: A review
Li et al. An efficient feature selection algorithm for computer-aided polyp detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200807

WD01 Invention patent application deemed withdrawn after publication