CN113065037A - Label propagation community detection method and device based on density peak optimization - Google Patents

Label propagation community detection method and device based on density peak optimization Download PDF

Info

Publication number
CN113065037A
CN113065037A CN202110407213.6A CN202110407213A CN113065037A CN 113065037 A CN113065037 A CN 113065037A CN 202110407213 A CN202110407213 A CN 202110407213A CN 113065037 A CN113065037 A CN 113065037A
Authority
CN
China
Prior art keywords
nodes
matrix
label
node
density
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110407213.6A
Other languages
Chinese (zh)
Inventor
陈国强
马岩
赵艳丽
周宏基
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University
Original Assignee
Henan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University filed Critical Henan University
Priority to CN202110407213.6A priority Critical patent/CN113065037A/en
Publication of CN113065037A publication Critical patent/CN113065037A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions

Abstract

The invention belongs to the technical field of complex networks and discloses a label propagation community detection method and device based on density peak optimization. According to the invention, the density peak value is introduced to find the clustering center, the rudiment of the community is determined firstly, the number of the communities and the clustering center of the complex network are fixed, and then the community is detected by adopting a label propagation algorithm, so that the accuracy and robustness of community discovery are improved, the iteration times are reduced, and the formation of the community is accelerated. Compared with other advanced algorithms, the method can quickly and effectively solve the community detection problem, can predict the community number under the condition of no prior condition, and has better stability and accuracy because the discovered community number is always consistent with the actual community number.

Description

Label propagation community detection method and device based on density peak optimization
Technical Field
The invention belongs to the technical field of complex networks, and particularly relates to a label propagation community detection method and device based on density peak optimization.
Background
Community structures are an extremely important attribute in complex networks. The community structure plays a crucial role in analyzing the social relationship in the social network, analyzing the functional relationship of the organization and the organ in the biological network, and analyzing the quotation relationship in the scientist collaboration network. Thus, the discovery of community structures from complex networks has been extensively studied over the last decade. In 2002, Girvan and Newman (M.Girvan, M.E.J.New. Community Structure in Social and Biological Networks [ J ]. Proceedings of the National Academy of Sciences of the United States of America,2002,99 (12)) have taken pioneering work, pointed out that complex Networks have ubiquitous community structures, and proposed a modularity Q to measure the stability of communities in Networks. Although the definition of the community structure is not determined consistently by explicit related research, a community is generally considered to be a group of nodes, and may also be referred to as a community or a group of modules. The nodes have the characteristics of tight community internal connection and sparse community external connection.
The community discovery algorithm based on label propagation is widely applied to community detection as one of the hot spots of the current research. The algorithm is a semi-supervised learning method based on a graph, and the semi-supervised learning has the advantage that a large number of unlabelled samples can be determined through a small number of labeled samples, so that the effectiveness in the learning process is improved. The basic idea of label propagation is to predict label information of unmarked nodes by using topological relation between nodes from label information of marked nodes, and finally complete graph division to form a cluster structure. Although the algorithm has the advantages of simple implementation, clear logic, no need of knowing the number of communities in advance, time complexity close to linearity and the like, the algorithm has the defects of unstable division result and strong randomness. In each iteration process of the label propagation algorithm, which community the node belongs to depends on the label with the largest cumulative weight of the neighbor nodes, so that when more than one maximum neighbor label of one node appears, one label is randomly selected as the own label. This randomness causes an avalanche effect, i.e. a small cluster result error that has just started to appear is amplified continuously. And the updating sequence of the node labels has little influence on the result, and the earlier updating of the more important nodes accelerates the convergence process. In the label propagation algorithm, the closer the setting of the initial label is to the core point, the more accurate clustering effect can be obtained.
Disclosure of Invention
The invention provides a label propagation community detection method and device based on density peak optimization, aiming at the problems that labels are randomly selected in the existing label propagation algorithm and the community division result is unstable.
In order to achieve the purpose, the invention adopts the following technical scheme:
a label propagation community detection method based on density peak optimization comprises the following steps:
step 1: constructing an adjacency matrix A from a complex network G ═ V, E; the node set with V being G comprises n nodes; e is an edge set of G, and comprises m edges;
step 2: calculating a similarity matrix S between nodes in the complex network by adopting cosine similarity;
and step 3: calculating a distance matrix d of nodes in the complex network based on the similarity matrix S between the nodes;
and 4, step 4: calculating the local density of the nodes by adopting a Gaussian kernel function and standardizing to obtain the local density rho of the standardized nodes*
And 5: distance matrix d based on nodes and local density rho of nodes after standardization*Obtaining the distance between the nodes in the complex network and the high-density nodes, and standardizing to obtain the distance delta between the nodes after standardization and the high-density nodes*
Step 6: local density rho based on normalized nodes*And the distance delta between the normalized node and the high-density node*Acquiring K core points;
and 7: adopting a Gaussian kernel function method to construct weights among nodes, and constructing a probability transfer matrix P based on the weights among the nodes;
and 8: constructing a label matrix F based on the obtained K core points;
and step 9: and (3) propagating the label matrix F according to the similarity between the nodes in the probability transition matrix P, resetting the label matrix F, then propagating and resetting the label matrix F, and iterating the process until the change difference value of the label which is not marked in the label matrix F reaches a critical point, thereby completing the division of the label.
Further, the step 3 comprises:
calculating a distance matrix d of nodes in the complex network according to the following mode:
Figure BDA0003022766130000031
wherein d isi,jRepresenting the distance between the node i and the node j as an element in the distance matrix d; s (i, j) is an element in the similarity matrix S and represents the similarity of the node i and the node j; σ is a small positive number.
Further, the step 5 includes:
and calculating the distance between the node and the high-density node in the complex network according to the following mode:
Figure BDA0003022766130000032
where ρ isiRepresenting the local density, p, of the node ijRepresenting the local density of node j.
Further, the step 6 comprises:
calculating the product γ ═ ρ at each node*×δ*And selecting a value larger than the sum of the average value of gamma and the standard deviation of gamma into a list, then arranging the values in sequence, and finally selecting the nodes with the largest first n x 20% in the list as core points, namely the number K of the core points is equal to n x 20%.
A label propagation community detection device based on density peak optimization comprises:
a first constructing module, configured to construct an adjacency matrix a from a complex network G ═ (V, E); the node set with V being G comprises n nodes; e is an edge set of G, and comprises m edges;
the first calculation module is used for calculating a similarity matrix S between nodes in the complex network by adopting cosine similarity;
the second calculation module is used for calculating a distance matrix d of the nodes in the complex network based on the similarity matrix S between the nodes;
a third calculation module for calculating and standardizing the local density of the nodes by adopting a Gaussian kernel function to obtain the standardized local density rho of the nodes*
A fourth calculation module for calculating a distance matrix d based on the nodes and a normalized local density ρ of the nodes*Obtaining the distance between the nodes in the complex network and the high-density nodes, and standardizing to obtain the distance delta between the nodes after standardization and the high-density nodes*
A core point deriving module for deriving a local density ρ based on the normalized nodes*And the distance delta between the normalized node and the high-density node*Acquiring K core points;
the second construction module is used for constructing the weight among the nodes by adopting a Gaussian kernel function method and constructing a probability transfer matrix P based on the weight among the nodes;
the third building module is used for building a label matrix F based on the obtained K core points;
and the label propagation module is used for propagating the label matrix F according to the similarity between the nodes in the probability transition matrix P, resetting the label matrix F, propagating and resetting the label matrix F, and iterating the process until the change difference value of the label which is not marked in the label matrix F reaches a critical point, thereby completing the division of the label.
Compared with the prior art, the invention has the following beneficial effects:
the invention can predict the number of the communities under the condition of no prior condition, avoids the defects of unstable division and strong randomness of the random label algorithm, and effectively improves the accuracy of community excavation and the stability of the algorithm. In addition, because a probability transition matrix is constructed, the iteration times of label propagation are reduced, so that the method has high operation efficiency, and finally the community structure of the network can be quickly found. Compared with other advanced algorithms, the method can quickly and effectively solve the community detection problem, can predict the community number under the condition of no prior condition, and has better stability and accuracy because the discovered community number is always consistent with the actual community number.
Drawings
FIG. 1 is a basic flowchart of a tag propagation community detection method based on density peak optimization according to an embodiment of the present invention;
FIG. 2 is a graph comparing the results of different NMI experiments on an LFR reference data set;
FIG. 3 is a visualization result diagram of the Football network division by the method of the present invention;
FIG. 4 is a visualization result diagram of the Karate network partitioning using the method of the present invention;
FIG. 5 is a visualization result diagram of the Dolphins network partitioning by the method of the present invention.
Detailed Description
The invention is further illustrated by the following examples in conjunction with the accompanying drawings:
example 1
As shown in fig. 1, a label propagation community detection method based on density peak optimization, which is abbreviated as DPLPA for convenience of description, includes:
step S101: constructing an adjacency matrix A from a complex network G ═ V, E; the node set with V being G comprises n nodes; e is an edge set of G, and comprises m edges;
step S102: calculating a similarity matrix S between nodes in the complex network by adopting cosine similarity;
step S103: calculating a distance matrix d of nodes in the complex network based on the similarity matrix S between the nodes;
step S104: calculating the local density of the nodes by adopting a Gaussian kernel function and standardizing to obtain the local density rho of the standardized nodes*
Step S105: distance matrix d based on nodes and local density rho of nodes after standardization*To derive the complexityStandardizing the distance between the nodes in the network and the high-density nodes to obtain the distance delta between the nodes and the high-density nodes after standardization*
Step S106: local density rho based on normalized nodes*And the distance delta between the normalized node and the high-density node*Acquiring K core points;
step S107: adopting a Gaussian kernel function method to construct weights among nodes, and constructing a probability transfer matrix P based on the weights among the nodes;
step S108: constructing a label matrix F based on the obtained K core points;
step S109: and (3) propagating the label matrix F according to the similarity between the nodes in the probability transition matrix P, resetting the label matrix F, then propagating and resetting the label matrix F, and iterating the process until the change difference value of the label which is not marked in the label matrix F reaches a critical point, thereby completing the division of the label.
Further, the step S103 includes:
calculating a distance matrix d of nodes in the complex network according to the following mode:
Figure BDA0003022766130000051
wherein d isi,jRepresenting the distance between the node i and the node j as an element in the distance matrix d; s (i, j) is an element in the similarity matrix S and represents the similarity of the node i and the node j; σ is a small positive number.
Further, the step S105 includes:
and calculating the distance between the node and the high-density node in the complex network according to the following mode:
Figure BDA0003022766130000052
where ρ isiRepresenting the local density, p, of the node ijRepresenting the local density of node j.
Further, the step S106 includes:
calculating the product γ ═ ρ at each node*×δ*And selecting a value larger than the sum of the average value of gamma and the standard deviation of gamma into a list, then arranging the values in sequence, and finally selecting the nodes with the largest first n x 20% in the list as core points, namely the number K of the core points is equal to n x 20%.
In particular, the amount of the solvent to be used,
let G ═ V, E be a complex network without directional weights. The node set V comprises n nodes, the edge set E comprises m edges, the adjacency matrix of the graph G is A, wherein if the node i and the node j have one connected edge, a in the adjacency matrix Aij1 otherwise aij0. Therefore, a node similarity formula of the node i and the node j is obtained, and the cosine similarity is used for expressing:
Figure BDA0003022766130000061
wherein N (i) and N (j) represent neighbor nodes of node i and node j, respectively, | N (i) | represents the number of neighbor nodes of node i, so the formula | N (i) # N (j) | represents the number of neighbors shared by node i and node j, and the denominator formula
Figure BDA0003022766130000062
Indicating the number of neighbors that node i and node j are expected to share. The value of S (i, j) is between 0 and 1, and the closer S (i, j) is to 1, the higher the similarity of two nodes is. And the distance formula for node i and node j is as follows:
Figure BDA0003022766130000063
where σ is a small positive number to avoid a denominator of 0.
Next, a local density of the nodes is calculated by using a gaussian kernel function, and the formula is as follows:
Figure BDA0003022766130000064
where ρ isiRepresenting the local density of node i, di,jRepresents the distance between node i and node j, dcDenotes the cut-off distance, in particular 1% to 2% of the total number of data points, as an embodiment, dcThe size of (d) was chosen to be 1.5% of the total number of data points. Then p is measurediThe values were normalized:
Figure BDA0003022766130000071
then, a distance formula between the nodes and the high-density nodes is defined:
Figure BDA0003022766130000072
wherein, when the local density of the node i is the maximum, the distance thereof is the maximum of the distances between the node i and other nodes. When the local density of node i is not at its maximum, its distance is a distance from node i that is slightly greater than the local density of node i.
Then to deltaiAnd (4) carrying out standardization:
Figure BDA0003022766130000073
threshold value daSelected from the delta list at around 80% of the delta list arranged from small to large.
Finally, the product γ ═ ρ is calculated at each node*×δ*And selecting a value larger than the sum of the average value of gamma and the standard deviation of gamma into a list, then sequentially arranging, and finally selecting K-n-20% as the number of core points (known labels label) to be transmitted to a label propagation algorithm (LP algorithm) to form a label matrix.
The label propagation algorithm is a graph-based clustering algorithm, so a graph G needs to be constructed first. The nodes of the graph are data points, and the weight between the two nodes is constructed by adopting a Gaussian kernel function method:
Figure BDA0003022766130000074
wherein d isijAnd (4) obtaining a similarity matrix formed by weights w, wherein the distance between the node i and the node j is represented, and the beta is a super parameter.
Next, the known labels are propagated through the edges between the nodes. The greater the weight of an edge, the more similar two nodes are represented, and the easier it is for label to propagate through. Defining a probability transition matrix:
Figure BDA0003022766130000081
wherein P isijRepresenting the probability of propagating the label of node i to node j. Because there are core points of K known labels, a label matrix YL of K × K known label nodes is defined, and the ith row represents the label indication vector of the node i, i.e. if the label of the ith node is j, the jth element of the row is 1, and the rest is 0. And simultaneously defining an unlabel matrix YU of unknown label nodes. And combining to obtain the label matrixes of all the nodes:
F=[YL,YU] (13)
and then, propagating the label matrix F according to the similarity between the nodes in the probability matrix P, wherein the formula is expressed as:
F=PF (14)
after one propagation pass, the label matrix F needs to be reset because the YL in the known label matrix F is changed during the propagation process, but the YL is previously obtained and the accurate label should not be changed, and the formula is:
FL=YL (15)
then, the label matrix F is propagated, reset, and the process is iterated until the unlabeled label variation difference in F reaches the critical point, at which time the DPLPA completes the label partition.
TABLE 1 DPLPA pseudo code
Figure BDA0003022766130000082
Figure BDA0003022766130000091
After the clustered label matrix F is obtained, the DPLPA can cluster the nodes with the same dimension and the numerical value of 1 together from the F to form a community, all the nodes are divided according to the dimension, the clustering algorithm is finished, and the complex network is also divided.
To evaluate DPLPA effects, the present invention was tested using various real and synthetic data sets, and compared to some classical methods, including: newman's fast greedy discovery algorithm (FN) (Newman M E J. fast algorithm for detecting communication structure in networks. J. Physical Review. E, Statistical, nonliner, and soft tester physics,2004,69(6Pt 2)), Lovain algorithm (B G L) (Vision D Blindel, Jean-Loupu Guillaile, Renaud Lambliotte, Etien Lefebvre. fast information of communication in networks [ J. Journal of Statistical Mechanics: Theory and Experiment,2008 (10)), original LPA algorithm (raw relational) of communication network in networks [ J. Journal of Statistical Mechanics: Theory and Experiment,2008, 10), original LPA algorithm (raw relational expression, algorithm J. environmental friendly, algorithm, transform J., simulation, R. environmental simulation, R. 12. S. D. environmental simulation, R. balance, R. 12, R. balance, R. D. simulation, PT. E, PT. D. C. E, R. D. E, R. E. C. D. C. E. R. D. E. D. C. E, R. D. E. D. C. E. 1. C. E. D. 1. C. 1. the original greedy, E. A. propagation algorithm, A. 2009,80(2):026129.). The hardware environment of the experiment was as follows: inter (R) core (TM) i7-7700M CPU, 3.60GHz and 8GB memory. The programming language adopts Python 3.764-bit.
The modularity function Q provided by Newman is used as an evaluation index of an experiment. The modularity is defined as:
Figure BDA0003022766130000101
where E represents the total number of edges of the social network, A represents the adjacency matrix, kiDegree of node i, ciRepresenting the community assigned by node i. Theta (c)i,cj) The definition is as follows:
Figure BDA0003022766130000102
wherein, when the node i and the node j are in the same community, theta (c)i,cj) Is 1, otherwise is 0. It is generally considered that the higher the modularity is, the more obvious the community structure is.
In order to verify the accuracy of the DPLPA, the invention also adopts standardized mutual information (NMI) to measure the similarity degree of two clustering results, which is one of important measurement indexes found by the community and can basically objectively evaluate the accuracy of the comparison between one community partition and the real partition. The value range of NMI is [0,1], and higher values represent that the divided communities are closer to the real community result. NMI (a, B) is defined as:
Figure BDA0003022766130000103
wherein A (B) represents a community discovery algorithm A (B), C is a confusion matrix, CijRepresenting the number of nodes shared in the partition of A (B), CA (CB) representing the number of communities found in the community method A (B), Ci.(C.j) Represents the sum of the elements in the ith row (column j) in C, and N is the number of nodes. If the clustering results of algorithms a and B are the same, NMI (a, B) ═ 1.
Evaluating the effectiveness of the algorithm by using an artificially synthesized network becomes an effective means for testing the quality of the algorithm, wherein the most common Benchmark test network for community detection is LFR Benchmark proposed by Lancihineti Andrea. The LFR reference network is an extension of the GN reference network, and has higher practical value. The LFR reference network reflects heterogeneity of community distribution and power law distribution of node degree, in which some important parameters are described as follows: n represents the number of network nodes, k represents the average degree of the nodes, max k represents the maximum degree of the nodes, min c represents the minimum value of the community size, max c represents the maximum value of the community size, tau 1 and tau 2 represent the negative indexes of the node degree and the power law distribution of the community size respectively, mu is equal to the ratio of the number of edges connected between the communities in the network to the total number of the edges and is used for representing the obvious degree of the communities in the network, and the smaller the mu value, the more obvious the structure of the communities is. Fig. 2 is a comparison of the results of NMI experiments with the algorithm on the LFR reference data set.
The LFR experiment set up parameters as: n is 1000, k is 15, maxk is 40, minc is 20, maxc is 50, τ 1 is 2, τ 2 is 1, and μ ranges from 0.1 to 0.8. It can be seen from fig. 2 that when μ is small, that is, the community structure of the complex network is obvious, the NMI values of the algorithm results are high except for the FN algorithm, but as μ increases, the community structure becomes more complex, the NMI values of the FN algorithm and the LPA algorithm start to decrease significantly, and the remaining algorithms start to decrease when μ is 0.6, but the DPLAP algorithm decreases relatively slowly compared with the BGLL and LPAm algorithms, and finally the NMI value is high, which indicates that the DPLPA algorithm has high accuracy in community exploration and has better stability in the community exploration with high complexity.
In order to further compare the advantages and disadvantages of the algorithm, the algorithm test is also carried out in a plurality of real-existing community networks. These networks are typically of different sizes and relate to various fields. The details are shown in table 2, where n represents the node, m represents the number of edges, and k represents the number of already defined communities.
TABLE 2 detailed description of the real network
Figure BDA0003022766130000111
Wherein, Karate is a membership data set of the air-lane club of university of Union of America, is constructed according to the interaction situation among club members, and is commonly used for the analysis of social networks. Dolphins are a member network constructed of life habits of 62 wide mouth Dolphins, often together with Dolphins corresponding to an edge between nodes. Polbook is a network of communities constructed from political books sold by Amazon, USA, each node represents a book, and if two books are purchased by the same customer, there is an edge between them on the corresponding node. Football is a network constructed by American university Football game, and nodes represent teams, and if there is a game between the nodes, an edge is formed between the nodes. The results of the different algorithms on different networks are shown in table 3.
TABLE 3 comparison of Q values for different algorithms in a real network
Figure BDA0003022766130000121
In order to better compare the clustering effect of the DPLPA algorithm on the data set, the invention is explained in detail through the Football data set. The actual grouping of the Football data sets is shown in Table 4, and the clustering effect of the DPLPA algorithm is shown in FIG. 3.
TABLE 4 actual grouping of football dataset networks
Figure BDA0003022766130000122
As can be seen from table 3, although the Q value of the method of the present invention is not the best in some data sets, the partitioning result of the DPLPA algorithm is identical to the actual community distribution, which can be seen from table 4 and fig. 3. The probability transition matrix well inhibits the randomness of the propagation process in the label propagation process, so that each update of the nodes is updated to the label of the same community node as much as possible, and the result of community division is more stable and closer to the real community condition. The comparison of K values for different algorithms on different networks is shown in table 5.
TABLE 5 comparison of K values for different algorithms in a real network
Figure BDA0003022766130000131
It is also found from table 5 that the DPLPA algorithm can detect the true number of communities, which is exactly the same as the actual K value. This is mainly because the DPLPA algorithm starts to compute the local density and distance of the nodes through the topology of the network at the very beginning and selects the number of K values by means of a decision graph. Therefore, the K value does not need to be provided, and the DPLPA algorithm has the advantage of detecting the K value.
In order to better display the experimental results, the Karate network and the Dolphins network are taken as case studies, and the detected communities are visualized. The nodes of the same community are divided in the same color. Fig. 4 is a visualization of DPLPA algorithm partitioning of a Karate network. Fig. 5 is a visualization result of DPLPA algorithm partitioning of the Dolphins network.
As can be seen from fig. 4, the local densities of the node 1 and the node 34 are the highest, and as can be seen from fig. 5, the local densities of the node 15 and the node 18 are the highest, and the nodes have higher node distances, so that it is very reasonable for the DPLPA algorithm to select the nodes as K, and the division result is completely consistent with the division result of the actual community. Therefore, the DPLPA algorithm can perform high-quality community detection in real communities.
In conclusion, the invention can predict the number of the communities under the condition of no prior condition, avoids the defects of unstable division and strong randomness of the random label algorithm, and effectively improves the accuracy of community mining and the stability of the algorithm. In addition, because a probability transition matrix is constructed, the iteration times of label propagation are reduced, so that the method has high operation efficiency, and finally the community structure of the network can be quickly found. Compared with other advanced algorithms, the method can quickly and effectively solve the community detection problem, can predict the community number under the condition of no prior condition, and has better stability and accuracy because the discovered community number is always consistent with the actual community number.
Example 2
The invention also discloses a label propagation community detection device based on density peak optimization, which comprises the following steps:
a first constructing module, configured to construct an adjacency matrix a from a complex network G ═ (V, E); the node set with V being G comprises n nodes; e is an edge set of G, and comprises m edges;
the first calculation module is used for calculating a similarity matrix S between nodes in the complex network by adopting cosine similarity;
the second calculation module is used for calculating a distance matrix d of the nodes in the complex network based on the similarity matrix S between the nodes;
a third calculation module for calculating and standardizing the local density of the nodes by adopting a Gaussian kernel function to obtain the standardized local density rho of the nodes*
A fourth calculation module for calculating a distance matrix d based on the nodes and a normalized local density ρ of the nodes*Obtaining the distance between nodes in the complex network, and standardizing to obtain the distance delta between the nodes after standardization*
A core point deriving module for deriving a local density ρ based on the normalized nodes*And distance delta between nodes after normalization*Acquiring K core points;
the second construction module is used for constructing the weight among the nodes by adopting a Gaussian kernel function method and constructing a probability transfer matrix P based on the weight among the nodes;
the third building module is used for building a label matrix F based on the obtained K core points;
and the label propagation module is used for propagating the label matrix F according to the similarity between the nodes in the probability transition matrix P, resetting the label matrix F, propagating and resetting the label matrix F, and iterating the process until the change difference value of the label which is not marked in the label matrix F reaches a critical point, thereby completing the division of the label.
Further, the second calculation module is specifically configured to:
calculating a distance matrix d of nodes in the complex network according to the following mode:
Figure BDA0003022766130000141
wherein d isi,jRepresenting the distance between the node i and the node j as an element in the distance matrix d; s (i, j) is an element in the similarity matrix S and represents the similarity of the node i and the node j; σ is a small positive number.
Further, the fourth calculating module is specifically configured to:
the distance between nodes in the complex network is calculated as follows:
Figure BDA0003022766130000151
where ρ isiRepresenting the local density, p, of the node ijRepresenting the local density of node j.
Further, the core point deriving module is specifically configured to:
calculating the product γ ═ ρ at each node*×δ*And selecting a value larger than the sum of the average value of gamma and the standard deviation of gamma into a list, then arranging the values in sequence, and finally selecting the nodes with the largest first n x 20% in the list as core points, namely the number K of the core points is equal to n x 20%.
The above shows only the preferred embodiments of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims (5)

1. A label propagation community detection method based on density peak optimization is characterized by comprising the following steps:
step 1: constructing an adjacency matrix A from a complex network G ═ V, E; the node set with V being G comprises n nodes; e is an edge set of G, and comprises m edges;
step 2: calculating a similarity matrix S between nodes in the complex network by adopting cosine similarity;
and step 3: calculating a distance matrix d of nodes in the complex network based on the similarity matrix S between the nodes;
and 4, step 4: calculating the local density of the nodes by adopting a Gaussian kernel function and standardizing to obtain the local density rho of the standardized nodes*
And 5: distance matrix d based on nodes and local density rho of nodes after standardization*Obtaining the distance between the nodes in the complex network and the high-density nodes, and standardizing to obtain the distance delta between the nodes after standardization and the high-density nodes*
Step 6: local density rho based on normalized nodes*And the distance delta between the normalized node and the high-density node*Acquiring K core points;
and 7: adopting a Gaussian kernel function method to construct weights among nodes, and constructing a probability transfer matrix P based on the weights among the nodes;
and 8: constructing a label matrix F based on the obtained K core points;
and step 9: and (3) propagating the label matrix F according to the similarity between the nodes in the probability transition matrix P, resetting the label matrix F, then propagating and resetting the label matrix F, and iterating the process until the change difference value of the label which is not marked in the label matrix F reaches a critical point, thereby completing the division of the label.
2. The label propagation community detection method based on density peak optimization as claimed in claim 1, wherein the step 3 comprises:
calculating a distance matrix d of nodes in the complex network according to the following mode:
Figure FDA0003022766120000011
wherein d isi,jRepresenting the distance between the node i and the node j as an element in the distance matrix d; s (i, j) is an element in the similarity matrix S and represents the similarity of the node i and the node j; σ is a small positive number.
3. The label propagation community detection method based on density peak optimization as claimed in claim 2, wherein the step 5 comprises:
and calculating the distance between the node and the high-density node in the complex network according to the following mode:
Figure FDA0003022766120000021
where ρ isiRepresenting the local density, p, of the node ijRepresenting the local density of node j.
4. The label propagation community detection method based on density peak optimization as claimed in claim 1, wherein the step 6 comprises:
calculating the product γ ═ ρ at each node*×δ*And selecting a value larger than the sum of the average value of gamma and the standard deviation of gamma into a list, then arranging the values in sequence, and finally selecting the nodes with the largest first n x 20% in the list as core points, namely the number K of the core points is equal to n x 20%.
5. A label propagation community detection device based on density peak optimization is characterized by comprising:
a first constructing module, configured to construct an adjacency matrix a from a complex network G ═ (V, E); the node set with V being G comprises n nodes; e is an edge set of G, and comprises m edges;
the first calculation module is used for calculating a similarity matrix S between nodes in the complex network by adopting cosine similarity;
the second calculation module is used for calculating a distance matrix d of the nodes in the complex network based on the similarity matrix S between the nodes;
a third calculation module for calculating and standardizing the local density of the nodes by adopting a Gaussian kernel function to obtain the standardized local density rho of the nodes*
A fourth calculation module for calculating a distance matrix d based on the nodes and the normalized nodesLocal density ρ*Obtaining the distance between the nodes in the complex network and the high-density nodes, and standardizing to obtain the distance delta between the nodes after standardization and the high-density nodes*
A core point deriving module for deriving a local density ρ based on the normalized nodes*And the distance delta between the normalized node and the high-density node*Acquiring K core points;
the second construction module is used for constructing the weight among the nodes by adopting a Gaussian kernel function method and constructing a probability transfer matrix P based on the weight among the nodes;
the third building module is used for building a label matrix F based on the obtained K core points;
and the label propagation module is used for propagating the label matrix F according to the similarity between the nodes in the probability transition matrix P, resetting the label matrix F, propagating and resetting the label matrix F, and iterating the process until the change difference value of the label which is not marked in the label matrix F reaches a critical point, thereby completing the division of the label.
CN202110407213.6A 2021-04-15 2021-04-15 Label propagation community detection method and device based on density peak optimization Pending CN113065037A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110407213.6A CN113065037A (en) 2021-04-15 2021-04-15 Label propagation community detection method and device based on density peak optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110407213.6A CN113065037A (en) 2021-04-15 2021-04-15 Label propagation community detection method and device based on density peak optimization

Publications (1)

Publication Number Publication Date
CN113065037A true CN113065037A (en) 2021-07-02

Family

ID=76566710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110407213.6A Pending CN113065037A (en) 2021-04-15 2021-04-15 Label propagation community detection method and device based on density peak optimization

Country Status (1)

Country Link
CN (1) CN113065037A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113706459A (en) * 2021-07-15 2021-11-26 电子科技大学 Detection and simulation restoration device for abnormal brain area of autism patient
CN115563400A (en) * 2022-09-19 2023-01-03 广东技术师范大学 Multi-path network community detection method and device based on motif weighted aggregation

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113706459A (en) * 2021-07-15 2021-11-26 电子科技大学 Detection and simulation restoration device for abnormal brain area of autism patient
CN113706459B (en) * 2021-07-15 2023-06-20 电子科技大学 Detection and simulation repair device for abnormal brain area of autism patient
CN115563400A (en) * 2022-09-19 2023-01-03 广东技术师范大学 Multi-path network community detection method and device based on motif weighted aggregation

Similar Documents

Publication Publication Date Title
Ma et al. Community detection in multi-layer networks using joint nonnegative matrix factorization
Hübler et al. Metropolis algorithms for representative subgraph sampling
Hu et al. Community detection by signaling on complex networks
Orman et al. On accuracy of community structure discovery algorithms
Chung et al. Computing heat kernel pagerank and a local clustering algorithm
CN113065037A (en) Label propagation community detection method and device based on density peak optimization
Zhang et al. Identifying node importance by combining betweenness centrality and katz centrality
Heller et al. A class of multivariate distribution-free tests of independence based on graphs
Mahyar Detection of top-k central nodes in social networks: A compressive sensing approach
Hu et al. A new algorithm CNM-Centrality of detecting communities based on node centrality
Xu et al. A community detection method based on local optimization in social networks
Zhang et al. A hypothesis testing framework for modularity based network community detection
Xu A spectral method to detect community structure based on the communicability modularity
Zou et al. Nonparametric detection of anomalous data via kernel mean embedding
Zhao et al. Community detection using label propagation in entropic order
van den Burg et al. SparseStep: Approximating the counting norm for sparse regularization
Wang et al. Feature selection methods in the framework of mrmr
Liu et al. Learning distributed representations for community search using node embedding
CN114048819A (en) Power distribution network topology identification method based on attention mechanism and convolutional neural network
Lv et al. An improved link prediction algorithm based on comprehensive consideration of joint influence of adjacent nodes for random walk with restart
Lu et al. A novel centrality measure for identifying influential nodes based on minimum weighted degree decomposition
Pan et al. A spin-glass model based local community detection method in social networks
Wang et al. Community detection with self-adapting switching based on affinity
Long et al. A unified community detection algorithm in large-scale complex networks
CN112802543B (en) Gene regulation network analysis method based on probability map

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination