CN111507506A

CN111507506A - Consensus embedding-based complex network community discovery method

Info

Publication number: CN111507506A
Application number: CN202010202056.0A
Authority: CN
Inventors: 曾湘祥; 杜妍孜; 刘向荣
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2020-03-20
Filing date: 2020-03-20
Publication date: 2020-08-07

Abstract

A consensus embedding-based complex network community discovery method relates to a multi-objective optimization technology. The method comprises the following steps: 1) giving a maximum algebra maxgen and a particle swarm size pop; 2) given network G ═ V, E, the network size is n, carry on the network and represent and study; 3) representing a learning result by using a network, initializing a particle swarm to obtain 100 particles POP, wherein the iteration number t is 1; 4) carrying out updating and mutation based on consensus embedding on the POP; 5) stopping conditions are as follows: and if t is not more than maxgen, t ← t +1 and go to step 3), otherwise, stopping and returning pareto front edge solution, namely a plurality of community division results. The updating process is more efficient and accurate, and the obtained pareto frontier effect is more competitive; the accuracy rate of community discovery is improved, the convergence time of the method is effectively reduced, and the method has good practicability in practical application of function prediction, recommendation systems and the like.

Description

Consensus embedding-based complex network community discovery method

Technical Field

The invention relates to a multi-objective optimization technology, in particular to a consensus embedding-based complex network community discovery method which can be applied to the fields of function prediction, recommendation systems and the like.

Background

Networks with some or all of the properties of self-organization, self-similarity, attractors, worlds, scale-free are called Complex networks (Complex networks). In the complex network, more nodes of the same type are connected to form a small community, and fewer nodes of different types are connected to form an important channel for communicating different communities.

In order to explore the structural characteristics of a complex network and further understand the functions of the complex network, people extensively research the community structure of the complex network, and provide a plurality of community discovery methods which are mainly divided into four methods: agglomeration methods, fragmentation methods, optimization methods, and simulation methods. These four methods are not independent in many ways, and one method may embody multiple ideas simultaneously.

In recent years, a multi-objective optimization method has been successful, and therefore has a great significance in improving a multi-objective particle swarm algorithm, wherein the representative method is Maoguo Gong et al (M.Gong, Q.Cai, X.Chen, and L. Ma, "compact network cluster based redundancy by multi-objective discrete particle swarm optimization on composition," IEEEtransactions on evolution, vol.18, No.1, pp.82-97,2014) discrete particle swarm algorithm is applied to the multi-objective network clustering (PSO) but the PSO algorithm is initialized to generate a large amount of spatial redundancy, so that the PSO algorithm is useless and large in scale, and the PSO algorithm is not good in large scale and can only perform well on the network.

Disclosure of Invention

The invention aims to provide a consensus embedding-based complex network community discovery method aiming at the problems that in the prior art, convergence speed is too low, a large number of redundant solutions exist, underlying structure information hidden in a network is not deeply excavated, and the like.

The invention comprises the following steps:

1) giving a maximum algebra maxgen and a particle swarm size pop;

2) given network G ═ V, E, the network size is n, carry on the network and represent and study;

3) representing a learning result by using a network, initializing a particle swarm to obtain 100 particles POP, wherein the iteration number t is 1;

4) carrying out updating and mutation based on consensus embedding on the POP;

5) stopping conditions are as follows: and if t is not more than maxgen, t ← t +1 and go to step 3), otherwise, stopping and returning pareto front edge solution, namely a plurality of community division results.

In step 2), the specific step of performing network representation learning may be:

the method comprises the steps of adopting a network embedding method AROPE based on singular value decomposition and eigenvalue decomposition frames and keeping any order of similarity, mapping a high-dimensional adjacent matrix into a low-dimensional continuous eigenspace, mining underlying structure information hidden in a network, and obtaining an eigenvector E ═ { E } of a node₁,e₂,e₃,...,e_nIn which e_i＝{e_i1,e_i2,e_i3,...,e_idD is the dimensionality after dimensionality reduction.

In step 3), the specific steps of representing the learning result by using a network and initializing the particle swarm to obtain 100 particles of POP include:

calculating the similarity to obtain

Particle swarm POP capable of obtaining initialization based on similarity by using k-means^t＝{x₁,x₂,x₃,...,x_pop}^t(ii) a The local optimal solution for the particle is also initialized to Pbest ═ POP^t。

In step 4), the specific steps of updating and mutating the POP based on consensus embedding are as follows:

performing consensus-based updates in each generation;

performing consensus embedding every 10 generations;

and performing neighborhood-based single-point mutation on the particles with mutation probability of pm in each generation.

And iterating the update variation process of the particles for maxgen times.

When updating based on consensus is carried out in each generation, taking the communities in the gbest as consensus, and randomly selecting one of the communities to be embedded into the current particle; when the consensus embedding is carried out every 10 generations, extracting the characteristic solution in the current particle swarm every 10 generations, extracting all two-node communities from the characteristic solution, voting in the current particle swarm, and if the support degree is more than or equal to 70%, selecting the characteristic solution as the consensus community and completely embedding the characteristic solution into the current particle;

wherein, the characteristic solutions extracted from the solution set are three representative solutions, which are respectively:

a. solutions with minimum KKM

b. Solutions with minimum RC

c. Local knee solution with minimum manhattan distance

When single-point variation based on neighborhood is carried out on the particles with variation probability pm in each generation, a position i is randomly selected for each particle needing variation, and the label of the position i is replaced by another label in the neighborhood, so that the generation of a new possible solution is ensured.

The objective functions employed in the present invention are variants of KKM and RC:

wherein V_h∈V，

Therefore, it is not only easy to use

And

is a set V_hThe sum of the internal and external similarities of the middle node, "| · |" represents the large of the setIs small.

A non-dominated sorting strategy is used in the selection of particles.

Compared with the prior art, the invention has the following outstanding technical effects:

compared with the evolution process of MODPSO, the updating modes of the consensus embedding-based complex network community discovery algorithm particles are different, and the updating process is more efficient and accurate. Experimental data also show that the pareto frontier effect obtained by the method is more competitive; less convergence time is required based on the invention. The method improves the accuracy of community discovery, can effectively reduce the convergence time of the method, and has good practicability in the practical application of function prediction, recommendation systems and the like. Such as analysis of metabolic networks, analysis of gene regulatory networks, identification of major genes in the biological field; finding out a key community and a key node of the infectious disease in the fields of propagation evolution and prediction prevention and control of the disease to predict a propagation path and cut off the propagation path in time; the method realizes accurate advertisement delivery in the Internet, establishes a more reliable recommendation system on an e-commerce system, and provides more personalized search results in a search engine.

Drawings

FIG. 1 is a schematic diagram of the particle update method in each generation.

Fig. 2 is a schematic diagram of a common embedding-based update method for every 10 generations of particles.

FIG. 3 is a comparison graph of the runtime of the method of the present invention (NE-PSO) versus the MODPSO algorithm on a small scale network data set.

FIG. 4 is a line graph of the runtime of the method of the present invention (NE-PSO) on a large scale network data set.

Detailed Description

The following examples will further illustrate the present invention with reference to the accompanying drawings.

The embodiment of the consensus embedding-based complex network community discovery method specifically comprises the following steps:

1) giving a network G ═ V, E, a maximum algebra maxgen, a particle swarm size pop, a network size n and a variation probability pm;

2) performing network representation learning on a network G by using a network embedding method AROPE which is based on singular value decomposition and eigenvalue decomposition frames and retains the similarity of any order, mapping a high-dimensional adjacent matrix into a low-dimensional continuous eigenspace, mining underlying structure information hidden in the network, and obtaining an eigenvector E of a node as { E ═ E₁,e₂,e₃,...,e_nIn which e_i＝{e_i1,e_i2,e_i3,...,e_idD is the dimensionality after dimensionality reduction;

3) using the network to represent the learning result and the feature vector of the node, and calculating cos similarity between the nodes to obtain

Particle swarm POP capable of obtaining initialization based on similarity by using k-means^t＝{x₁,x₂,x₃,...,x_pop}^t(ii) a The local optimal solution for the particle is also initialized to Pbest ═ POP^t(ii) a The iteration time t is 1;

4) carrying out updating and mutation based on consensus embedding on POP, wherein:

carrying out consensus-based updating in each generation (as shown in FIG. 1), taking communities in the gbest as consensus, and randomly selecting one of the communities to be embedded into the current particle;

performing consensus embedding every 10 generations (as shown in FIG. 2), extracting feature solutions in the current particle swarm every 10 generations, extracting all two-node communities from the feature solutions, voting in the current particle swarm, and if the support degree is more than or equal to 70%, selecting the two-node communities as the consensus communities and completely embedding the two-node communities into the current particles;

and performing neighborhood-based single-point mutation on the particles with mutation probability of pm in each generation, and randomly selecting a position i for each particle needing mutation, wherein the label of the position i is replaced by another label in the neighborhood, so that the generation of a new possible solution is ensured.

5) Stopping conditions are as follows: and if t is not more than maxgen, t ← t +1 and go to step 3, otherwise, stopping the algorithm and returning a pareto front solution, namely a plurality of community division results.

FIG. 1 is a schematic diagram of the particle update method in each generation. In each generation of updating, all particles are sorted according to a non-dominance relation, a global optimal solution gbest is maintained, communities divided in the gbest can be regarded as common-acquainted communities, the particles of the current generation are randomly embedded by the common-acquainted communities, the common-acquainted communities and the original particles are subjected to non-dominance sorting selection, and the generated new particles are left.

Fig. 2 is a schematic diagram of a common embedding-based update method for every 10 generations of particles. In the updating of every 10 generations, a local knee point (namely an extreme point), a KKM minimum value point and an RC minimum value point are selected from the pareto frontiers consisting of all particles, a consensus community is extracted from the h points (the support degree set in the invention is 70%), and the obtained consensus community is embedded into all the particles of the current generation to update all the particles.

Table 1 shows the modularity Q (maximum and mean) of the method of the invention (NE-PSO) versus other reference algorithms over 11 real world network datasets.

TABLE 1

Note: "-" indicates that the corresponding method does not give a result in a given time. "? "indicating that the corresponding method is not disclosed results in the corresponding experimental results being unknown.

As can be seen from table 1, the method of the present invention has 9 data sets with better effect than other reference algorithms, especially on the data sets with more network nodes. And according to the line graphs of the running time on the small-scale network data set and the large-scale network data set in fig. 3 and 4, it can be seen that the NE-PSO also obviously reduces the convergence time of the multi-target particle swarm algorithm. Therefore, the invention improves the accuracy of community discovery, can effectively reduce the convergence time of the algorithm, and has good practicability in practical application.

The above-described embodiments are merely preferred embodiments of the present invention, and should not be construed as limiting the scope of the invention. All equivalent changes and modifications made within the scope of the present invention shall fall within the scope of the present invention.

Claims

1. A consensus embedding-based complex network community discovery method is characterized by comprising the following steps:

1) giving a maximum algebra maxgen and a particle swarm size pop;

4) carrying out updating and mutation based on consensus embedding on the POP;

2. The method for discovering complex web communities based on consensus embedding as claimed in claim 1, wherein in step 2), the specific steps of performing web representation learning are:

3. The method as claimed in claim 1, wherein in step 3), the specific steps of using the network to represent the learning result and performing initialization of the particle swarm to obtain 100 particle POPs are as follows:

calculating similarity to obtain S ═ S_ij)_n*n(ii) a Particle swarm POP capable of obtaining initialization based on similarity by using k-means^t＝{x₁,x₂,x₃,...,x_pop}^t(ii) a The local optimal solution for the particle is also initialized to Pbest ═ POP^t。

4. The method as claimed in claim 1, wherein in step 4), the step of updating and mutating the POP based on consensus embedding comprises:

performing consensus-based updates in each generation;

performing consensus embedding every 10 generations;

5. The method for discovering complex network communities based on consensus embedding as claimed in claim 1, wherein in step 4), the updating and mutation iterates the updating and mutation process of the particles for maxgen times;

(1) solutions with minimum KKM

(2) Solutions with minimum RC

(3) Local knee solution with minimum manhattan distance