CN111738516B - Social network community discovery system through local distance and node rank optimization function - Google Patents

Social network community discovery system through local distance and node rank optimization function Download PDF

Info

Publication number
CN111738516B
CN111738516B CN202010582081.6A CN202010582081A CN111738516B CN 111738516 B CN111738516 B CN 111738516B CN 202010582081 A CN202010582081 A CN 202010582081A CN 111738516 B CN111738516 B CN 111738516B
Authority
CN
China
Prior art keywords
node
module
network
social
community
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010582081.6A
Other languages
Chinese (zh)
Other versions
CN111738516A (en
Inventor
刘小洋
丁楠
吴松阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Jiuzhou Longteng Scientific And Technological Achievement Transformation Co ltd
Nantong Baisong Data Technology Co.,Ltd.
Original Assignee
Chongqing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Technology filed Critical Chongqing University of Technology
Priority to CN202010582081.6A priority Critical patent/CN111738516B/en
Publication of CN111738516A publication Critical patent/CN111738516A/en
Application granted granted Critical
Publication of CN111738516B publication Critical patent/CN111738516B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a social network community discovery system through a local distance and node rank optimization function, which comprises a data acquisition module, a Laplace node matrix calculation module, a network social node value calculation discovery module, a community optimization module and a display module, wherein the data acquisition module is used for acquiring a local distance and node rank value; the data output end of the data acquisition module is connected with the data input end of the Laplace node matrix calculation module, the data output end of the Laplace node matrix calculation module is connected with the data input end of the network social node value calculation discovery module, the data output end of the network social node value calculation discovery module is connected with the data input end of the community optimization module, and the data output end of the community optimization module is connected with the display data end of the display module. The invention first considers the problem of node self-transmission. Secondly, the method comprehensively considers the problem of the edge weight and can effectively show the characteristic structure of the whole social network. Thirdly, in terms of processing the multi-scale optimization problem, the optimization function of the invention can effectively find the optimal community structure. Finally, compared with other methods, the method has better performance.

Description

Social network community discovery system through local distance and node rank optimization function
Technical Field
The invention relates to the technical field of social networks, in particular to a social network community discovery system based on local distance and node rank optimization functions.
Background
In the last two decades, the internet has increased in speed in developing a global process, the position of data networks in human society has become more and more important, and researchers have become more and more interested in the study of complex networks. In nature, complex networks are diverse in form and are composed of communities with relatively independent mutual influence. Such as social networks, biological networks, economic networks, information networks, and so forth. The community structure is an important topological attribute of the complex network, so community discovery has important significance in the research of complex network analysis, data mining and the like. This attribute allows community discovery to better analyze complex networks and extract useful information and apply to various fields, such as text analysis, personality recommendation systems, user identification, epidemic propagation, behavior prediction.
Although there are many articles on social network community discovery, in a network, the nodes contained in each cluster must be somehow related to each other, rather than to nodes outside the cluster, to form a community. Most researchers believe that communities are characterized by tight connections between community nodes and sparse connections with nodes outside the community. Since the initiative of Girvan and Newman, many algorithms for community detection in complex networks have been proposed, the most typical of which are, for example, a modularity optimization algorithm, a label propagation algorithm, a greedy algorithm, a random walk algorithm, a spectrum division algorithm, and a fuzzy algorithm.
Disclosure of Invention
The invention aims to at least solve the technical problems in the prior art, and particularly provides a social network community discovery system based on a local distance and node rank optimization function.
In order to achieve the above object, the present invention provides a social network community discovery system through a local distance and node rank optimization function, which includes a data acquisition module, a laplacian node matrix computation module, a network social node value computation discovery module, a community optimization module, and a display module;
the data output end of the data acquisition module is connected with the data input end of the Laplace node matrix calculation module, the data output end of the Laplace node matrix calculation module is connected with the data input end of the network social node value calculation discovery module, the data output end of the network social node value calculation discovery module is connected with the data input end of the community optimization module, and the data output end of the community optimization module is connected with the display data end of the display module;
the data acquisition module is used for acquiring a network social node data set;
the Laplace node matrix calculation module is used for carrying out Laplace normalization processing on the network social node data set acquired in the data acquisition module; obtaining a Laplace node matrix;
the network social node value calculation discovery module is used for calculating to obtain a network social node value according to the internal distance and the external distance of the network social:
if the network social node value is larger than or equal to the preset network social node value, discovering a network social community;
if the network social node value is smaller than the preset network social node value, rediscovering the network social community;
the community optimization module is used for optimizing the social network communities found in the social network node value calculation and discovery module;
and the display module is used for displaying the social networking communities obtained by the community optimization module or/and the social networking node value calculation and discovery module.
In a preferred embodiment of the present invention, a method for performing laplacian normalization processing calculation on the obtained social networking nodes in a laplacian node matrix calculation module is as follows:
Figure GDA0003144233580000021
wherein D represents a node degree matrix;
Figure GDA0003144233580000022
represents the un-normalized laplacian matrix;
a denotes an adjacency matrix.
In a preferred embodiment of the present invention, a method for calculating an element value in a laplacian node matrix calculation module is as follows:
Figure GDA0003144233580000031
wherein deg (v)i) Represents the degree of node i;
deg(vj) Represents the degree of node j;
virepresents a node i;
vjrepresents node j;
Figure GDA0003144233580000032
and the element values of the ith row and the jth column in the Laplace node matrix are represented.
In a preferred embodiment of the present invention, the method for calculating the social networking internal distance in the social networking node value calculation discovery module comprises:
Figure GDA0003144233580000033
wherein L issymRepresenting a laplacian node matrix;
Figure GDA0003144233580000034
representing a set of nodes VkThe adjacency matrix of (a);
g represents a social network;
Vkrepresenting a set of nodes; k is 1,2,3, …, K;
dinternal(G,Vk) Representing the internal distance of network societies.
In a preferred embodiment of the present invention, the method for calculating the external distance of the social network in the social network node value calculation discovery module comprises:
Figure GDA0003144233580000035
wherein L issymRepresenting a laplacian node matrix;
Figure GDA0003144233580000036
represents V-VkThe adjacency matrix of (a);
Figure GDA0003144233580000041
representing a set of nodes VkThe adjacency matrix of (a);
v represents a node partition set; v ═ V1,V2,V3,...,VK};
G represents a social network;
Vkrepresenting a set of nodes; k is 1,2,3, …, K;
dexternal(G,Vk) Representing the external distance of network socialization.
In a preferred embodiment of the present invention, the method for calculating the social networking node value in the social networking node value calculation discovery module includes:
Figure GDA0003144233580000042
wherein, VkRepresenting a set of nodes; k is 1,2,3, …, K;
v represents a node partition set; v ═ V1,V2,V3,…,VK};
dinternal(G,Vk) An internal distance representing network socializing;
dexternal(G,Vk) An external distance representing network socialization;
SLDL(G, V) represents a network social node value.
In a preferred embodiment of the present invention, the node set V is calculated and found in the module for calculating and discovering the values of the social network nodeskOf a neighboring matrix
Figure GDA0003144233580000043
The calculation method comprises the following steps:
Figure GDA0003144233580000044
wherein, VkRepresenting a set of nodes; k is 1,2,3, …, K;
v represents a node partition set; v ═ V1,V2,V3,…,VK};
vxRepresents node x; x is 1,2,3,. N;
vyrepresents node y; y is 1,2,3, …, N.
In a preferred embodiment of the invention, the node set V-V in the network social node value calculation discovery modulekOf a neighboring matrix
Figure GDA0003144233580000045
The calculation method comprises the following steps:
Figure GDA0003144233580000046
wherein, VkRepresenting a set of nodes; k is 1,2,3,. K;
v represents a node partition set; v ═ V1,V2,V3,…,VK};
vxRepresents node x; x is 1,2,3, …, N;
vyrepresents node y; y is 1,2,3, …, N.
In a preferred embodiment of the present invention, the method for optimizing the discovered social networking community in the community optimization module is as follows:
Figure GDA0003144233580000051
wherein, VkRepresents a set of nodes, K ═ 1,2, 3.., K;
v represents a node partition set; v ═ V1,V1,V1,...,VK};
viRepresents a node i;
indicates that in the case of … …, there is … …;
V[vi]indicating that node i belongs to a set of nodes Vi];
vjRepresents node j;
Aijrepresenting the ith row and jth column element values in the adjacency matrix A;
if yes, keeping node set V [ V ]i];
If not, the node set V is discardedi]。
In a preferred embodiment of the present invention, the system further includes a performance metric module, and the method for calculating the performance metric in the performance metric module includes:
Figure GDA0003144233580000052
wherein m represents the total number of connecting node edges; a. theijRepresenting the values of the elements in adjacency matrix a; fijRepresenting the proportion of any edge connecting the two nodes i and j;
Figure GDA0003144233580000053
wherein deg (v)i) Represents the degree of node i; deg (v)j) Represents the degree of node j; v. ofiRepresents a node i; v. ofjRepresents node j;
Figure GDA0003144233580000061
and/or the Jaccard coefficient module is also included, and the calculation method of the Jaccard coefficient in the Jaccard coefficient module is as follows:
Figure GDA0003144233580000062
wherein, VMThe community structure is optimal;
V0is a reference vector;
J(VM,V0) Represents the Jaccard coefficient;
when V isMAnd V0All are empty, J (V)M,V0)=1;
And/or the Error index module is also included, and the Error index calculation method in the Error index module is as follows:
Figure GDA0003144233580000063
wherein, VM' structural feature of V;
V0is' a V0Structural features of (a);
E(VM′,V0') indicates the Error index;
when V has the same value as V0The same community structure time E (V)M′,V0') is equal to 0;
and displaying one of the performance metric value, the Jaccard coefficient value and the Error index value or any combination on the display module.
In summary, due to the adoption of the technical scheme, firstly, the invention considers the problem of node self-transmission. Secondly, the method comprehensively considers the problem of the edge weight and can effectively show the characteristic structure of the whole social network. Thirdly, in terms of processing the multi-scale optimization problem, the optimization function of the invention can effectively find the optimal community structure. Finally, compared with other methods, the method has better performance.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic block diagram of the system of the present invention.
FIG. 2 is a schematic diagram of local distance community partitioning according to the present invention.
Fig. 3 is a schematic diagram comparing different algorithms of the present invention on an artificial network.
FIG. 4 is a schematic diagram illustrating an overview of the community discovery process of the present invention.
Fig. 5 is a schematic view of the visualization of the present invention on different networks.
FIG. 6 is a schematic diagram of community membership in a real data network according to the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
To date, some classical and effective local community discovery algorithms and MF algorithms have been proposed, and Liu et al propose a local community discovery framework based on node pair similarity, and a new local community discovery algorithm can be obtained by embedding a better node similarity measure.
Clauset et al propose an algorithm R for measuring local community structure, the calculation method is as follows:
Figure GDA0003144233580000071
wherein B is a local community, R represents an algorithm for measuring the structure of the local community, BinRepresenting the number of edges whose endpoints are all in local community B, and BoutIs the number of edges that have an endpoint in local community B. The algorithm requires a predefined size of the community. It will continue to add the neighbor node that increases R the most to the current community until the current community reaches a predefined size.
Luo et al propose another local community discovery algorithm M, the calculation method is as follows:
Figure GDA0003144233580000081
wherein M represents a local communityDiscovery algorithm, EinRepresents the number of internal edges of the community, and EoutRepresenting the number of edges between the community boundary and the external node. The algorithm provides three heuristic node searching methods to partially solve the problem of community discovery in a complex network. However, it must set different thresholds for different sizes of networks.
The two algorithms have several ideal advantages, can detect clusters of any shape, do not need to preset the number of clusters, and can display the selection process of the center through a decision diagram. However, DPC still has drawbacks. First, the truncation distance has a greater impact on the clustering results. Furthermore, manual intervention is required to select a suitable cluster center.
Lancihienti et al propose a fitness function FcTo measure the density of nodes within a community. The fitness function is defined as follows:
Figure GDA0003144233580000082
in the formula,
Figure GDA0003144233580000083
represents the sum of the internal degrees of the community c,
Figure GDA0003144233580000084
denotes the sum of the externalities of the community c, α denotes a resolution parameter for controlling the size of the detected community, FcRepresenting the density of nodes within the community. The quality function can effectively measure the closeness of nodes in the community, but cannot fully utilize local information between the nodes.
Xu et al studied how to apply computational intelligence genetic algorithms to directed, undirected community discovery and developed optimization algorithms through iterations. Wang et al propose a method for discovering overlapping communities using a bayesian MF model. The advantage of this approach is that the number of communities can be automatically determined and there is no resolution limit. However, its internal value estimation of the number of communities may mislead the decomposition and return a wrong solution.
Guo et al uses locality center nodes and Jaccard coefficients to detect the core members of the community as seeds in the network, thereby ensuring that the selected seeds are the center nodes of the community. The node with the greatest degree in the seed is pre-expanded each time by the fitness function. And expanding the first k nodes with the best performance in the pre-expansion process by utilizing the internal force among the nodes according to the fitness function so as to obtain a high-quality community in the network.
Chen et al propose a novel community discovery method that separates overlapping communities from the network using a non-Negative Matrix Factorization (NMF) model, and solves the problem of unknown community number through feature Matrix preprocessing and sorting optimization, thereby enabling the algorithm to divide the network structure of unknown community number. Hu et al propose an improved lagrangian alternating direction algorithm for symmetric non-negative matrix factorization.
Recently, Li et al proposed a method based on semi-supervised matrix factorization and random walk to perform community partitioning. And calculating the transition probability among the nodes through network topology, obtaining the final wandering probability by using a random wandering model, and constructing a characteristic matrix.
Wu et al propose a novel framework called hybrid Hypergraph Regularization non-negative Matrix Factorization (MHGNMF) that takes into account higher-order information between nodes to improve clustering performance. The hypergraph regularization term enforces that the nodes in the same superperiphere are projected to the same potential subspace, thereby realizing more discriminant representation. In the proposed framework, the topological connectivity information and the structural similarity information are exploited by blending together two neighbors of each centroid to generate a set of hyper-edges.
The local community discovery algorithms all use the topological property of the network, and all default networks have the same edge weight, but in a real data network, the connection strength between entities is different, and the node bias is not considered, so that the weight is easily estimated incorrectly. To the best of the present invention, there is no other community discovery work that combines local distance and laplacian matrix decomposition based methods.
To better describe the proposed model, the invention will use the following mathematical definition:
definition 1: the network G ═ (V, E) is composed of a set of node partitions V and a set of edges E, and the nodes contained in the set of node partitions V will be labeled V1、v2、v3、……、vN,vpRepresents node p, p ═ 1,2, 3.., N; n represents the total number of nodes in the node partition set; the edges contained in the edge set E indicate which nodes are connected. Thus, if an edge is paired with $ Ux≠y(vx,vy) In the edge set E, x is 1,2, 3.. N, y is 1,2,3, … N; v is thenxIs connected to vy. In the present invention, only undirected graphs are processed, thus edges are directed to Ux≠y(vx,vy) U is opposite to edgey≠x(vy,vx) Equal in Uy≠x(vy,vx) Denotes vyIs connected to vxI.e. represent vyAnd vxAre connected with each other. Wherein, U isζRepresents a condition ζ; i.e. Ux≠yIndicating that the condition x ≠ y, Uy≠xIndicating that condition y ≠ x.
Definition 2: each network G has an adjacency matrix a. If the network G has N nodes, the adjacency matrix A is an N matrix in the form of a combination of 0 and 1. A. thepqIf and only if the edge is to U ═ 1p≠q(vp,vq) E, p is 1,2,3, N, q is 1,2,3, N; i.e. vpAnd vqAnd (4) connecting. Known from definition 1 as $, $p≠q(vp,vq)=∪q≠p(vq,vp) Therefore, the adjacency matrix a here is a symmetric matrix.
Definition 3: the community discovery of the network G ═ V, E) is to divide the node partition set V into node sets V1,V2,V3,...,VKAs a result of (3), so that V1∪V2∪V3∪...∪VKIs equal to V, and V1,V2,V3,...,VKAll are notAnd (4) empty collection. I.e. set of nodes V1,V2,V3,...,VKIs the community structure. The present invention defines a partition as V ═ V1,V2,V3,...,VK}. The number of the subareas is K ═<V>,<V>And the number of the node sets in the node partition set V is represented.
Definition 4: given a network G ═ (V, E) and a set of node partitions V ═ V1,V2,V3,…,VKThe edges of the network G can be divided into an edge set EmnI.e. Emn∈E,
Figure GDA0003144233580000101
And is
Figure GDA0003144233580000102
If and only if
Figure GDA0003144233580000103
And is
Figure GDA0003144233580000104
There is an edge pair
Figure GDA0003144233580000105
Figure GDA0003144233580000106
Definition 5: the definition is particularly given in the following,
Figure GDA0003144233580000107
and
Figure GDA0003144233580000108
in other words, the inner set of edges
Figure GDA0003144233580000109
Containing a set of nodes VkInternal edge, internal edge set
Figure GDA00031442335800001010
In (1)Two nodes on one side pair belong to the same community; and the outer edge set
Figure GDA00031442335800001011
Comprising VkOuter edge, outer edge set
Figure GDA00031442335800001012
A node on any edge pair in the node set VkIn that another node does not belong to the set of nodes VkAnd belongs to a node set V-VkIn (1).
The invention provides a social network community discovery system through a local distance and node rank optimization function, which comprises a data acquisition module, a Laplace node matrix calculation module, a network social node value calculation discovery module, a community optimization module and a display module, wherein the data acquisition module is used for acquiring a data set;
the data output end of the data acquisition module is connected with the data input end of the Laplace node matrix calculation module, the data output end of the Laplace node matrix calculation module is connected with the data input end of the network social node value calculation discovery module, the data output end of the network social node value calculation discovery module is connected with the data input end of the community optimization module, and the data output end of the community optimization module is connected with the display data end of the display module;
the data acquisition module is used for acquiring a network social node data set;
the Laplace node matrix calculation module is used for carrying out Laplace normalization processing on the network social node data set acquired in the data acquisition module; obtaining a Laplace node matrix;
the network social node value calculation discovery module is used for calculating to obtain a network social node value according to the internal distance and the external distance of the network social:
if the network social node value is larger than or equal to the preset network social node value, discovering a network social community;
if the network social node value is smaller than the preset network social node value, rediscovering the network social community;
the community optimization module is used for optimizing the social network communities found in the social network node value calculation and discovery module;
and the display module is used for displaying the social networking communities obtained by the community optimization module or/and the social networking node value calculation and discovery module.
In a preferred embodiment of the present invention, a method for performing laplacian normalization processing calculation on the obtained social networking nodes in a laplacian node matrix calculation module is as follows:
Figure GDA0003144233580000111
wherein D represents a node degree matrix;
Figure GDA0003144233580000112
represents the un-normalized laplacian matrix;
a denotes an adjacency matrix.
In a preferred embodiment of the present invention, a method for calculating an element value in a laplacian node matrix calculation module is as follows:
Figure GDA0003144233580000113
wherein deg (v)i) Represents the degree of node i;
deg(vj) Represents the degree of node j;
virepresents a node i;
vjrepresents node j;
Figure GDA0003144233580000121
and the element values of the ith row and the jth column in the Laplace node matrix are represented.
In a preferred embodiment of the present invention, the method for calculating the social networking internal distance in the social networking node value calculation discovery module comprises:
Figure GDA0003144233580000122
wherein L issymRepresenting a laplacian node matrix;
Figure GDA0003144233580000126
representing a set of nodes VkThe adjacency matrix of (a);
g represents a social network;
Vkrepresenting a set of nodes; k is 1,2,3,. K;
dinternal(G,Vk) Representing the internal distance of network societies.
In a preferred embodiment of the present invention, the method for calculating the external distance of the social network in the social network node value calculation discovery module comprises:
Figure GDA0003144233580000123
wherein L issymRepresenting a laplacian node matrix;
Figure GDA0003144233580000124
represents V-VkThe adjacency matrix of (a);
Figure GDA0003144233580000125
representing a set of nodes VkThe adjacency matrix of (a);
v represents a node partition set; v ═ V1,V2,V3,...,VK};
G represents a social network;
Vkrepresenting a set of nodes; k is 1,2,3,. K;
dexternal(G,Vk) Representing the external distance of network socialization.
In a preferred embodiment of the present invention, the method for calculating the social networking node value in the social networking node value calculation discovery module includes:
Figure GDA0003144233580000131
wherein, VkRepresenting a set of nodes; k is 1,2,3,. K;
v represents a node partition set; v ═ V1,V2,V3,...,VK};
dinternal(G,Vk) An internal distance representing network socializing;
dexternal(G,Vk) An external distance representing network socialization;
SLDL(G, V) represents a network social node value.
In a preferred embodiment of the present invention, the node set V is calculated and found in the module for calculating and discovering the values of the social network nodeskOf a neighboring matrix
Figure GDA0003144233580000132
The calculation method comprises the following steps:
Figure GDA0003144233580000133
wherein, VkRepresenting a set of nodes; k is 1,2,3,. K;
v represents a node partition set; v ═ V1,V2,V3,...,VK};
vxRepresents node x; x is 1,2,3,. N;
vyrepresents node y; 1,2, 3.
In a preferred embodiment of the invention, the node set V-V in the network social node value calculation discovery modulekOf a neighboring matrix
Figure GDA0003144233580000134
The calculation method comprises the following steps:
Figure GDA0003144233580000135
wherein, VkRepresenting a set of nodes; k is 1,2,3,. K;
v represents a node partition set; v ═ V1,V2,V3,...,VK};
vxRepresents node x; x is 1,2,3,. N;
vyrepresents node y; 1,2, 3.
In a preferred embodiment of the present invention, the method for optimizing the discovered social networking community in the community optimization module is as follows:
Figure GDA0003144233580000136
wherein, VkRepresents a set of nodes, K ═ 1,2,3, …, K;
v represents a node partition set; v ═ V1,V1,V1,…,VK};
viRepresents a node i;
indicates that in the case of … …, there is … …;
V[vi]indicating that node i belongs to a set of nodes Vi];
vjRepresents node j;
Aijrepresenting the ith row and jth column element values in the adjacency matrix A;
if yes, keeping node set V [ V ]i];
If not, the node set V is discardedi]。
In a preferred embodiment of the present invention, the system further includes a performance metric module, and the method for calculating the performance metric in the performance metric module includes:
Figure GDA0003144233580000141
wherein m represents the total number of connecting node edges; a. theijRepresenting the values of the elements in adjacency matrix a; fijRepresenting the proportion of any edge connecting the two nodes i and j;
Figure GDA0003144233580000142
wherein deg (v)i) Represents the degree of node i; deg (v)j) Represents the degree of node j; v. ofiRepresents a node i; v. ofjRepresents node j;
Figure GDA0003144233580000143
and/or the Jaccard coefficient module is also included, and the calculation method of the Jaccard coefficient in the Jaccard coefficient module is as follows:
Figure GDA0003144233580000144
wherein, VMThe community structure is optimal;
V0is a reference vector;
J(VM,V0) Represents the Jaccard coefficient;
when V isMAnd V0All are empty, J (V)M,V0)=1;
And/or the Error index module is also included, and the Error index calculation method in the Error index module is as follows:
Figure GDA0003144233580000151
wherein, VM' structural feature of V;
V0is' a V0Structural features of (a);
E(VM′,V0') indicates the Error index;
when V has the same value as V0The same community structure time E (V)M′,V0') is equal to 0;
and displaying one of the performance metric value, the Jaccard coefficient value and the Error index value or any combination on the display module.
As shown in fig. 2: the entire network G is divided into 5 partitions, i.e. V ═ V1,V2,V3,V4,V5In which V is indicated briefly1Partitioned internal edge set
Figure GDA0003144233580000152
And external edge set
Figure GDA0003144233580000153
Community discovery is to find a node partition set V (V, E) of a network G (V, E)1,V2,V3,…,VKThe nodes contained in each cluster must be somehow related to each other, not to nodes outside the cluster, to form a community. Firstly, in order to solve the problem of node information self-transmission, the invention comprehensively considers the influence of the node on the node, introduces a self-degree matrix, and constructs the following model by utilizing the Laplace matrix decomposition principle:
Figure GDA0003144233580000154
wherein D represents a node degree matrix;
Figure GDA0003144233580000155
represents the un-normalized laplacian matrix; a represents an adjacency matrix; i isnRepresenting an n-order identity matrix; l issymIs a laplacian node matrix.
In view of the completeness of extracting the network features,i.e. the edge weight value problem is fully considered. The method is obtained by normalizing the adjacency matrix, multiplying two sides of the adjacency matrix by the degree evolution of the nodes and then inverting. For single node operation, normalization is to divide the degree of each node, so that the information transfer value of each adjacent edge is normalized, the influence of the node is not larger than that of the node because one node has 10 edges and the other node has 1 edge, the weight of the node is only 0.1 after normalization, the operation of rising from a single node to a two-dimensional matrix is to invert the matrix, and the normalization is completed by matrix division after multiplication by the nature of the inverse of the matrix. However, the left and right are multiplied by the evolution of the i, j degrees of the node respectively, which is the degree of the point at both sides of one edge. Specific to each node pair vi,vjThe elements in the matrix are given by the following equation:
Figure GDA0003144233580000161
wherein deg (v)i) Represents the degree of node i; deg (v)j) Represents the degree of node j; i.e. the value of the degree matrix at node i, j; v. ofiRepresents a node i; v. ofjRepresents node j;
Figure GDA0003144233580000162
and the element values of the ith row and the jth column in the Laplace node matrix are represented.
The inner and outer distances are given by:
Figure GDA0003144233580000163
Figure GDA0003144233580000164
wherein L issymRepresenting a laplacian node matrix;
Figure GDA0003144233580000165
represents VkThe adjacency matrix of (a);
Figure GDA0003144233580000166
represents V-VkThe adjacency matrix of (a); dinternal(G,Vk) An internal distance representing network socializing; dexternal(G,Vk) External distance representing network socializing, dexternal(G,Vk) Can be written as de(G,Vk) Or de;dinternal(G,Vk) Can be written as di(G,Vk) Or di
Figure GDA0003144233580000167
Figure GDA0003144233580000168
It should be understood in equation (8) that when node x and node y both belong to node set (node set is also called community) Vk,VkOf a neighboring matrix
Figure GDA0003144233580000169
The value of the element in the x row and the y column is 1; when node x belongs to node set VkNode y belongs to the set of nodes V-Vk,VkOf a neighboring matrix
Figure GDA00031442335800001610
The value of the element in the x row and the y column is 0; similarly, in formula (9), when both node x and node y belong to node set Vk,VkOf a neighboring matrix
Figure GDA00031442335800001611
The value of the element in the x row and the y column is 0; when node x belongs to node set VkNode y belongs to the set of nodes V-Vk,V-VkOf a neighboring matrix
Figure GDA00031442335800001612
The value of the element in the x-th row and y-th column is 1.
For all vxAll e.v have Axx1 (i.e., each node has a self-loop). All edges except the self-loop are counted twice. dinternal(G,Vk) Is taken to be [0,1 ]]When the network G is a union of communities which are not continuous with each other, dinternal(G,Vk) This case is a perfect community structure diagram. It dexternal(G,Vk) Also take on values of [0,1]And (for a perfect community structure graph, its value is 0).
The local distance Laplace network social node value function is as follows:
Figure GDA0003144233580000171
wherein, VkRepresenting a set of nodes; v represents a node partition set; dinternal(G,Vk) An internal distance representing network socializing; dexternal(G,Vk) An external distance representing network socialization; sLDL(G, V) represents a network social node value.
One point to emphasize for the LDL model is that the weight for each local partition (local inner distance plus local outer distance) is | VkI/2 | V |. This is done to avoid that smaller communities will have a disproportionate impact on the score of their total community.
3.3 Node Rank Optimization Function
Due to the community discovery algorithm proposed by the present invention, more than one possible community discovery result is generated. In this case, a community discovery optimization is required. The optimal community selection method provided by the invention is based on the idea of community discovery effectiveness, namely, more edges should be arranged inside the community but not outside the community. Weak criteria (WRC) and Strong criteria (SRC) were first proposed by Radicchi et al, but his WRC was too Weak and appeared indistinguishable at various nodes. ThatOr u and VkCompletely disconnected, any additional node u may also be added to VkAnd still satisfy the WRC. This can lead to failure of many discovered communities. Therefore, the present invention provides a node rank optimization function, which is as follows:
Figure GDA0003144233580000172
wherein,
Figure GDA0003144233580000173
that is to
Figure GDA0003144233580000174
Phi represents a node set to which the node j belongs; a. theijRepresenting the ith row and jth column element values in the adjacency matrix A; v [ i ]]Indicating that node i belongs to the set of nodes V [ i ]](ii) a That is, in the case of … …, there is … ….
Figure GDA0003144233580000181
Wherein v isiRepresenting nodes i, vyRepresenting node y.
Thus, the NRO function is expressed as follows:
Figure GDA0003144233580000182
wherein, VkRepresenting a node set, and V representing a node partition set; v. ofiRepresents a node i; indicates that in the case of … …, there is … …; v [ V ]i]Indicating that node i belongs to a set of nodes Vi];vjRepresents node j; a. theijRepresenting the ith row and jth column element values in the adjacency matrix a.
If yes, keeping node set V [ V ]i];
If not, the node set V is discardedi]。
In the optimization effect, because two coordination parameters V [ i ] and V-V [ i ] are set, the optimization effect is stronger than WRC and weaker than SRC, and thus a better optimization effect is achieved.
The main flow of the algorithm provided by the invention is as follows:
Figure GDA0003144233580000183
Figure GDA0003144233580000191
4 results and analysis of the experiments
To evaluate the algorithm proposed by the present invention, the present invention contemplates the use of eleven real data networks and artificial network datasets. Data sources are http:// www-personal. umich. edu/mejn/Netdata/http:// snap. stanford. edu/data/. The hardware environment of the experiment was as follows: inter (R) core (TM) i5-4160M CPU, 3.60GHz and 4GB memory, windows 10, MATLAB R2019 a.
4.1 evaluation index
In the present invention, Q is used as a performance metric in experiments in order to evaluate the performance of networks that do not have authenticity. The performance metric Q is:
Figure GDA0003144233580000192
wherein m represents the total number of connecting node edges; a. theijRepresenting the values of the elements in adjacency matrix a; fijRepresenting the proportion of any edge connecting the two nodes i and j;
it is composed of
Figure GDA0003144233580000193
deg(vi) Represents the degree of node i; deg (v)j) Represents the degree of node j; v. ofiRepresents a node i; v. ofjRepresenting node j.
δij(ci,cj) Is represented as follows:
Figure GDA0003144233580000201
wherein, ciIs the community to which vertex i is assigned, cjIs the community to which vertex j is assigned.
The Jaccard Coefficient (JSC) is used to compare Similarity and difference between a finite sample set.
Given two sets VM,V0The Jaccard coefficient is defined as VMAnd V0A larger value of the ratio of the size of the intersection to the size of the union indicates a higher degree of similarity.
Figure GDA0003144233580000202
Wherein, VMFor an optimal community structure, V0As a reference vector, when VMAnd V0All are empty, J (V)M,V0)=1。
The range of the RI is larger, which means that the community discovery result is more consistent with the real situation. A larger RI indicates a higher accuracy of clustering effect and a higher purity within each class.
Error index when V has the same value as V0The same community structure time E (V)M′,V0') is equal to 0, defined as follows:
Figure GDA0003144233580000203
wherein, VM' structural feature as V, V0Is' a V0The structural characteristics of (1).
4.2 Artificial network Performance comparison
The invention adopts an algorithm operated on an artificial data network (GN reference network). Internal edge set E for each nodeinternalConnected to other nodes in the same community, outsideSet of partial edges EexternalConnect with other communities. With outer edge set EexternalWith the increase in community structure becoming less clear, the community discovery task becomes more challenging.
TABLE 1 Artificial network parameters
Figure GDA0003144233580000204
Fig. 3 shows the performance comparison of 8 algorithms in an artificial data network, and the proposed LDL algorithm was experimentally analyzed on various data sets of the artificial network and the real network and compared with the conventional algorithm by experiments, which are LinkLPA, MFM, LFK, NMF, LRLFP, specluster 1 and specluster 2, respectively.
As shown in fig. 3 (a): the performance of the algorithm 8 on the Jaccard coefficient evaluation standard is described, it is easily understood that when the external edge number overview is larger, the Jaccard coefficient value is lower, and when the external edge probability is less than 0.4, the LDL algorithm provided by the invention is obviously advantageous, but after 0.4, the Jaccard coefficient value is slightly lower than that of other algorithms, but always higher than that of the LinkLPA algorithm.
Fig. 3(b) depicts the performance of the algorithm on the Rand index evaluation standard, the overall trend of each algorithm is similar to that of fig. 3(a), and the Rand index gradually decreases as the probability of the number of external edges increases. It is noted that the algorithms LDL and LinkLPA provided by the present invention have significant advantages over other algorithms, and when the probability of the number of external edges is less than 0.4, the LDL algorithm is better than the LinkLPA algorithm.
Figure 3(c) shows that the performance of the algorithm does not differ much in the performance of the modulority evaluation criteria, but the LDL algorithm remains dominant throughout.
The Error values of the algorithms in fig. 3(d) are significantly different, and it can be seen that the Error value of the LDL algorithm is the lowest when the probability of the number of outer edges is less than 0.8, and the LDL algorithm is only 3% worse than the MFM algorithm when the probability is greater than 0.8. In conclusion, the LDL algorithm proposed by the present invention is indeed better and more stable than the other 7 algorithms.
4.3 true network Performance comparison
To further evaluate the LDL algorithm proposed by the present invention, eleven representative social networks of different sizes were selected by the present invention. In table 2, Networks represents a real data network, nodes represent Node numbers, edges represent Edge numbers, a-co represents an average clustering coefficient of nodes, a-Lenth represents an average path length, and Description describes the practical significance of the network. As shown in table 2:
TABLE 2 true network
Figure GDA0003144233580000211
Figure GDA0003144233580000221
To better illustrate the overall social network community discovery process, fig. 4(a) -4 (i) show a brief overview of the overall community discovery process, taking a power grid network as an example. A total of 9 subgraphs, i.e. finally 9 communities are formed.
As shown in fig. 4 (a): community structure (green cut set) for the first one divided; secondly, a second community structure (purple cut set) is divided, as shown in FIG. 4 (b); then, a third community structure is divided, as shown in fig. 4 (c); by analogy, until the ninth community is divided, the convergence criterion has been reached, i.e. all nodes are contained within a certain community, as shown in fig. 4 (i).
The divided social networks already have clear community structures, and fig. 5(a) to 5(d) are respectively the visualization results of the community discovery of the LDL algorithm proposed by the present invention in 4 social networks, i.e., Dolphin, Lemis, celegansnert, and Netscience. It can be found that the LDL algorithm has high recognition quality in a large-scale data network (as shown in table 2), and the higher the degree and the average clustering coefficient of the node is, the stronger the display effect is, and the more easily the node becomes a community center to form a community structure.
Table 4 shows the results of the proposed LDL algorithm compared to the conventional algorithm on the Jaccard index in the real dataset. The bolded values in the table indicate algorithms that perform optimally, and the shaded gray values indicate algorithms that perform suboptimally.
Results of LDL and traditional algorithms presented in Table 4 on Jaccard index (real dataset)
LinkLPA MFM LDL NMF LRLFP LFK speClust1 speClust2
Karate 0.5 0.7375 0.6507 0.5882 0.325 0.6052 0.5593 0.2852
Dolphins 0.1035 0.1918 0.2131 0.1877 0.0541 0.2118 0.2161 0.2136
Lemis 0.4112 0.4793 0.6524 0.2844 0.2410 0.6276 0.4159 0.1972
Public book 0.3403 0.6671 0.6440 0.6512 0.0551 0.3951 0.6749 0.6951
Football 0.7147 0.6357 0.4052 0.8413 0.6920 0.0798 0.0798 0.07798
Celegansnertal 0.3445 0.2151 0.4804 0.343 0.0681 0.3551 0.2150 0.2151
Email 0.2599 0.0460 0.2085 0.1912 0.1251 0.0462 0.0467 0.0467
Public blogs 0.3112 0.5167 0.5690 0.5426 0.0162 0.4027 0.4120 0.4998
Netscience 0.2186 0.1332 0.2213 0.1780 0.0841 0.1464 0.0239 0.0100
Power 0.1603 0.0168 0.2240 0.2092 0.0048 0.0023 0.1371 0.0285
Hep_th 0.1524 0.1912 0.2203 0.1036 0.2015 0.1242 0.2003 0.0972
As shown in Table 4, the LDL algorithm provided by the invention has the optimal performance in the Lemis, Celegansnertal, Public blogs, Netsccience, Power and Hep _ th data networks, and is superior to the rest 7 algorithms; the LDL algorithm is suboptimal in Karate, Dolphins and Email data networks, is respectively second to MFM, speClust1 and LinkLPA algorithms, but has better performance than the other 6 algorithms; the LDL algorithm generally performs better in Public book, Football data networks than some other algorithms.
Results of the LDL algorithm presented in Table 5 with the conventional algorithm on the Rand index (real dataset)
LinkLPA MFM LDL NMF LRLFP LFK speClust1 speClust2
Karate 0.8503 0.9251 0.8574 0.9037 0.80214 0.8396 0.85 0.2852
Dolphins 0.7536 0.2523 0.7612 0.7536 0.7721 0.2523 0.2517 0.2136
Lemis 0.8835 0.8558 0.8914 0.73 0.8445 0.8168 0.4686 0.1972
Public book 0.7304 0.8419 0.7132 0.8377 0.639 0.3951 0.8432 0.395
Football 0.978 0.9682 0.8874 0.76 0.95 0.0781 0.08 0.0798
Celegansnertal 0.8281 0.2151 0.8468 0.7717 0.795 0.2151 0.2151 0.2151
Email 0.9107 0.0866 0.9233 0.9131 0.95 0.08 0.045 0.045
Public blogs 0.6779 0.7542 0.7799 0.7545 0.5085 0.5007 0.4998 0.4998
Netscience 0.9872 0.9879 0.9935 0.9881 0.99 0.9903 0.7792 0.01
Power 0.964 0.9599 0.976 0.9656 0.9668 0.8995 0.9394 0.9201
Hep_th 0.8674 0.9015 0.9395 0.9041 0.9192 0.803 0.9077 0.9234
As shown in Table 5, the LDL algorithm provided by the invention is optimal in the Lemis, Celegansnertal, Public blogs, Netsccience, Power and Hep _ th data networks, and is superior to the other 7 algorithms; the LDL algorithm is suboptimal in Dolphins and Email data networks, is inferior to the LRLFP algorithm, but has better performance than the other 6 algorithms; the LDL algorithm generally performs slightly better in karte, Public book, and Football data networks than some of the rest of the algorithms. For example, in the Karate data network, the performance is better than 5 algorithms, LFK, speClost 1, LinkLPA, LRLFP, and speClost 2.
Results of the LDL algorithm and the conventional algorithm on the Modularity index (real data set) presented in Table 6
LinkLPA MFM LDL NMF LRLFP LFK speClust1 speClust2
Karate 0.4427 0.4477 0.4347 0.4459 0.3663 0.4343 0.4116 0.1545
Dolphins 0.46 0.0108 0.4709 0.4486 0.4022 0.01080 0.0054 0.1299
Lemis 0.5882 0.5768 0.5772 0.4849 0.5298 0.5632 0.1088 0.2034
Public book 0.5531 0.5091 0.5196 0.5182 0.4117 0.4065 0.4595 0.4209
Football 0.6189 0.5423 0.6092 0.6236 0.6171 0.1075 0.5933 0.5753
Celegansnertal 0.433 0.4378 0.4521 0.3761 0.1722 0.2035 0.0092 0.3874
Email 0.6178 0.0381 0.5008 0.6547 0.6547 0.3292 0.1002 0.3048
Public blogs 0.3007 0.3431 0.3967 0.367 0.1864 0.1155 0.0087 0.2133
Netscience 0.8085 0.8118 0.872 0.8238 0.8238 0.8011 0.2062 0.73
Power 0.5826 0.531 0.6438 0.6241 0.5471 0.5289 0.6207 0.5227
Hep_th 0.6021 0.6754 0.7181 0.501 0.6503 0.685 0.6821 0.5431
As shown in Table 6, the LDL algorithm of the present invention performed best in the Dolphins, Celegansnal, Public blogs, Netsccience, Power and Hep _ th data networks, and was superior to the other 7 algorithms; the LDL algorithm is suboptimal in the Lemis and Public book data networks, is inferior to the LinkLPA algorithm, but has better performance than other 6 algorithms; the LDL algorithm generally performs slightly better in karte, Football, and Email data networks than some of the rest of the algorithms. Taking the Football data network as an example, the performance of the algorithm is better than that of 4 algorithms such as MFM, LFK, speClost 1 and speClost 2.
Results of LDL and conventional algorithms on Error index (true data set) presented in Table 7
Figure GDA0003144233580000241
Figure GDA0003144233580000251
As shown in Table 7, the LDL algorithm provided by the invention has more obvious advantages in Error indexes, is optimal in ten data networks of Karate, Dolphins, Lemis, Poblic book, Celegansnal, Email, Public blogs, Netscience, Power and Hep _ th, is slightly worse than the LRLFP algorithm in Football, and has stronger stability as shown by experimental data.
In summary, although the LDL algorithm proposed by the present invention does not perform optimally in every data network, the ratio of the dominance (optimal + suboptimal) is much higher than other algorithms. The LDL algorithm provided by the invention has better performance in a social network with higher average clustering coefficient and more complex data network, and is more suitable for the characteristics of large scale and complexity of the modern social network.
As shown in fig. 6(a) to 6 (e): respectively represents the community structure comparison expression of the LDL algorithm in 5 real data networks of Karate, Lemis, Celegansnartal, Public blogs and Power grid. The abscissa represents the number of nodes, the ordinate represents the community membership relationship of the community, namely the community to which the node belongs, blue is the reference community structure, and red is the community structure of the LDL algorithm. The more similar the community structure after the algorithm execution is to the reference community structure, the higher the score.
In Table 8, the LDL algorithm and the conventional algorithm mentioned in tables 4 to 7 were counted for each indexExpressing the situation and using the loss function Y ═ LOG constructed by the invention10((X1+X2+X3)/(X4+1))。X1Represents the coefficient of variation at the Jaccard index; x2Representing the coefficient of variation at Rand index; x3Expressing the coefficient of variation at the modulority index; x4Represents the coefficient of variation at the Error index; y represents the constructed loss function.
Results of LDL Algorithm and conventional Algorithm on Each index (real data set) presented in Table 8
Figure GDA0003144233580000252
Figure GDA0003144233580000261
As shown in table 8: and (3) performing result analysis on the performance of the LDL algorithm on each index through statistical mean, standard deviation, coefficient of variation and constructed loss functions. The first proposed LDL algorithm has the highest score (bold data value) on two indexes of Jaccard and Rand, and in the Jaccard index, the mean value of the LDL algorithm in each data network is the highest, but the standard deviation is higher than LinkLPA, which indicates that the performance difference of the LDL algorithm in the data networks is larger than that of the LinkLPA algorithm, but the score of the index variation coefficient (the mean value/standard deviation is higher as well as better) is finally the highest in comprehensive consideration; in the Rand index, the performance of the LDL algorithm in the mean value and the standard deviation is superior to that of other algorithms, and the score of the variation coefficient is obviously higher than that of other best NMF algorithms and is close to 77 percent; secondly, the score of the LDL algorithm on the modulatity index is second to that of the LinkLPA algorithm, because the performance difference of the LDL algorithm in each data network is larger than that of the LinkLPA algorithm; the performance of the LDL algorithm on the Error index is the best compared with other algorithms, the Error rate is only 0.0548 and is far better than other algorithms, and the experimental data show that the algorithm provided by the invention has stronger robustness; finally, the performance score at the loss function is also highest, which is approximately 7 percentage points higher than the conventional best method.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (7)

1. A social network community discovery system based on local distance and node rank optimization functions is characterized by comprising a data acquisition module, a Laplace node matrix calculation module, a network social node value calculation discovery module, a community optimization module and a display module;
the data output end of the data acquisition module is connected with the data input end of the Laplace node matrix calculation module, the data output end of the Laplace node matrix calculation module is connected with the data input end of the network social node value calculation discovery module, the data output end of the network social node value calculation discovery module is connected with the data input end of the community optimization module, and the data output end of the community optimization module is connected with the display data end of the display module;
the data acquisition module is used for acquiring a network social node data set;
the Laplace node matrix calculation module is used for carrying out Laplace normalization processing on the network social node data set acquired in the data acquisition module; obtaining a Laplace node matrix;
the network social node value calculation discovery module is used for calculating to obtain a network social node value according to the internal distance and the external distance of the network social:
the calculation method of the internal distance comprises the following steps:
Figure FDA0003144233570000011
wherein L issymRepresenting a laplacian node matrix;
Figure FDA0003144233570000012
representing a set of nodes VkThe adjacency matrix of (a);
g represents a social network;
Vkrepresenting a set of nodes; k is 1,2,3,. K;
dinternal(G,Vk) An internal distance representing network socializing;
the method for calculating the external distance comprises the following steps:
Figure FDA0003144233570000013
wherein L issymRepresenting a laplacian node matrix;
Figure FDA0003144233570000021
represents V-VkThe adjacency matrix of (a);
Figure FDA0003144233570000022
representing a set of nodes VkThe adjacency matrix of (a);
v represents a node partition set; v ═ V1,V2,V3,...,VK};
G represents a social network;
Vkrepresenting a set of nodes; k is 1,2,3, …, K;
dexternal(G,Vk) An external distance representing network socialization;
the method for calculating the social network node value comprises the following steps:
Figure FDA0003144233570000023
wherein, VkRepresenting a set of nodes; k is 1,2,3, …, K;
v represents a node partition set; v ═ V1,V2,V3,…,VK};
dinternal(G,Vk) An internal distance representing network socializing;
dexternal(G,Vk) An external distance representing network socialization;
SLDL(G, V) represents a network social node value;
if the network social node value is larger than or equal to the preset network social node value, discovering a network social community;
if the network social node value is smaller than the preset network social node value, rediscovering the network social community;
the community optimization module is used for optimizing the social network communities found in the social network node value calculation and discovery module;
and the display module is used for displaying the social networking communities obtained by the community optimization module or/and the social networking node value calculation and discovery module.
2. The social network community discovery system through a local distance and node rank optimization function according to claim 1, wherein the calculation method of performing laplacian normalization processing on the obtained network social nodes in the laplacian node matrix calculation module is as follows:
Figure FDA0003144233570000024
wherein D represents a node degree matrix;
Figure FDA0003144233570000031
represents the un-normalized laplacian matrix;
a represents an adjacency matrix;
Inrepresenting an n-order identity matrix.
3. The social network community discovery system through local distance and node rank optimization function according to claim 1, wherein the calculation method of the element values in the laplacian node matrix calculation module is as follows:
Figure FDA0003144233570000032
wherein deg (v)i) Represents the degree of node i;
deg(vj) Represents the degree of node j;
virepresents a node i;
vjrepresents node j;
Figure FDA0003144233570000033
and the element values of the ith row and the jth column in the Laplace node matrix are represented.
4. The social network community discovery system through local distance and node rank optimization function of claim 1, wherein a set of nodes V in a social network node value calculation discovery modulekOf a neighboring matrix
Figure FDA0003144233570000034
The calculation method comprises the following steps:
Figure FDA0003144233570000035
wherein, VkRepresenting a set of nodes; k is 1,2,3,. K;
v represents a node partition set; v ═ V1,V2,V3,...,VK};
vxRepresents node x; x is 1,2,3,. N;
vyrepresents node y; 1,2, 3.
5. The social network community discovery system through local distance and node rank optimization function of claim 1, wherein a set of nodes V-V in a network social node value calculation discovery modulekOf a neighboring matrix
Figure FDA0003144233570000041
The calculation method comprises the following steps:
Figure FDA0003144233570000042
wherein, VkRepresenting a set of nodes; k is 1,2,3,. K;
v represents a node partition set; v ═ V1,V2,V3,...,VK};
vxRepresents node x; x is 1,2,3,. N;
vyrepresents node y; 1,2, 3.
6. The system of claim 1, wherein the method for optimizing the discovered social networking communities in the community optimization module comprises:
Figure FDA0003144233570000043
wherein, VkRepresents a set of nodes, K ═ 1,2, 3.., K;
v represents a node partition set; v ═ V1,V1,V1,...,VK};
viRepresents a node i;
indicates that in the case of … …, there is … …;
V[vi]indicating that node i belongs to a set of nodes Vi];
vjRepresents node j;
Aijrepresenting the ith row and jth column element values in the adjacency matrix A;
if yes, keeping node set V [ V ]i];
If not, the node set V is discardedi]。
7. The social network community discovery system through local distance and node rank optimization function of claim 1, further comprising a performance metric module, wherein the performance metric in the performance metric module is calculated by:
Figure FDA0003144233570000044
wherein m represents the total number of connecting node edges; a. theijRepresenting the values of the elements in adjacency matrix a; fijRepresenting the proportion of any edge connecting the two nodes i and j;
Figure FDA0003144233570000051
wherein deg (v)i) Represents the degree of node i; deg (v)j) Represents the degree of node j; v. ofiRepresents a node i; v. ofjRepresents node j;
Figure FDA0003144233570000052
and/or the Jaccard coefficient module is also included, and the calculation method of the Jaccard coefficient in the Jaccard coefficient module is as follows:
Figure FDA0003144233570000053
wherein, VMThe community structure is optimal;
V0for reference purposesVector quantity;
J(VM,V0) Represents the Jaccard coefficient;
when V isMAnd V0All are empty, J (V)M,V0)=1;
And/or the Error index module is also included, and the Error index calculation method in the Error index module is as follows:
Figure FDA0003144233570000054
wherein, VM' structural feature of V;
V0is' a V0Structural features of (a);
E(VM′,V0') indicates the Error index;
when V has the same value as V0The same community structure time E (V)M′,V0') is equal to 0;
and displaying one of the performance metric value, the Jaccard coefficient value and the Error index value or any combination on the display module.
CN202010582081.6A 2020-06-23 2020-06-23 Social network community discovery system through local distance and node rank optimization function Active CN111738516B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010582081.6A CN111738516B (en) 2020-06-23 2020-06-23 Social network community discovery system through local distance and node rank optimization function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010582081.6A CN111738516B (en) 2020-06-23 2020-06-23 Social network community discovery system through local distance and node rank optimization function

Publications (2)

Publication Number Publication Date
CN111738516A CN111738516A (en) 2020-10-02
CN111738516B true CN111738516B (en) 2021-08-10

Family

ID=72650688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010582081.6A Active CN111738516B (en) 2020-06-23 2020-06-23 Social network community discovery system through local distance and node rank optimization function

Country Status (1)

Country Link
CN (1) CN111738516B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738515B (en) * 2020-06-23 2021-08-10 重庆理工大学 Social network community discovery method based on local distance and node rank optimization function
CN113065099B (en) * 2021-03-26 2024-03-05 浙江科技学院 Social network substructure counting method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920678A (en) * 2018-07-10 2018-11-30 福州大学 A kind of overlapping community discovery method based on spectral clustering with fuzzy set
CN109039716A (en) * 2018-07-19 2018-12-18 山西大学 A kind of estimation method of complex network community quantity
CN109859065A (en) * 2019-02-28 2019-06-07 桂林理工大学 Multiple target complex network community discovery method based on spectral clustering
CN109859054A (en) * 2018-12-13 2019-06-07 平安科技(深圳)有限公司 Network community method for digging, device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10671936B2 (en) * 2017-04-06 2020-06-02 Universite Paris Descartes Method for clustering nodes of a textual network taking into account textual content, computer-readable storage device and system implementing said method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920678A (en) * 2018-07-10 2018-11-30 福州大学 A kind of overlapping community discovery method based on spectral clustering with fuzzy set
CN109039716A (en) * 2018-07-19 2018-12-18 山西大学 A kind of estimation method of complex network community quantity
CN109859054A (en) * 2018-12-13 2019-06-07 平安科技(深圳)有限公司 Network community method for digging, device, computer equipment and storage medium
CN109859065A (en) * 2019-02-28 2019-06-07 桂林理工大学 Multiple target complex network community discovery method based on spectral clustering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Enhancement of Synchronizability in Networks with Community Structure through Adding Efficient Inter-Community Links";Mahdi Jalili;《IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING》;20160630;第106-116页 *
"基于局部信息的复杂网络社区发现算法研究";彭燕;《中国优秀硕士学位论文全文数据库 基础科学辑》;20130815;第A002-101页 *

Also Published As

Publication number Publication date
CN111738516A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
Luo et al. Learning to drop: Robust graph neural network via topological denoising
Joseph et al. Impact of regularization on spectral clustering
Bandaru et al. Data mining methods for knowledge discovery in multi-objective optimization: Part A-Survey
Hu et al. FCAN-MOPSO: an improved fuzzy-based graph clustering algorithm for complex networks with multiobjective particle swarm optimization
Huang et al. A graph neural network-based node classification model on class-imbalanced graph data
Drton et al. Binary models for marginal independence
CN111738516B (en) Social network community discovery system through local distance and node rank optimization function
CN111738514B (en) Social network community discovery method using local distance and node rank optimization function
Zhou et al. ECMdd: Evidential c-medoids clustering with multiple prototypes
Cai et al. A new method to build the adaptive k-nearest neighbors similarity graph matrix for spectral clustering
CN114494753A (en) Clustering method, clustering device, electronic equipment and computer-readable storage medium
Zhang et al. Hierarchical community detection based on partial matrix convergence using random walks
Rengasamy et al. K-means–Laplacian clustering revisited
Peng et al. JGSED: An end-to-end spectral clustering model for joint graph construction, spectral embedding and discretization
Tugnait Sparse graph learning under Laplacian-related constraints
Joy et al. Efficient relaxations for dense crfs with sparse higher-order potentials
Altameem et al. P-ROCK: a sustainable clustering algorithm for large categorical datasets
Chen et al. Differentiated graph regularized non-negative matrix factorization for semi-supervised community detection
Sun et al. LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering
Wang et al. Explicit pairwise factorized graph neural network for semi-supervised node classification
CN111738515B (en) Social network community discovery method based on local distance and node rank optimization function
Zhang et al. StructComp: Substituting propagation with Structural Compression in Training Graph Contrastive Learning
Li et al. Path-Graph fusion based community detection over heterogeneous information network
CN112990364B (en) Graph data node classification method and device
Zhou et al. A fast structured regression for large networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221114

Address after: 230000 Room 203, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province

Patentee after: Hefei Jiuzhou Longteng scientific and technological achievement transformation Co.,Ltd.

Address before: No.69 Hongguang Avenue, Banan District, Chongqing

Patentee before: Chongqing University of Technology

Effective date of registration: 20221114

Address after: 226000 West of 7th Floor, Building 11B, Zilang Science and Technology City, No. 60 Chongzhou Avenue, Nantong, Jiangsu

Patentee after: Nantong Baisong Data Technology Co.,Ltd.

Address before: 230000 Room 203, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province

Patentee before: Hefei Jiuzhou Longteng scientific and technological achievement transformation Co.,Ltd.