CN111860866A - Network representation learning method and device with community structure - Google Patents
Network representation learning method and device with community structure Download PDFInfo
- Publication number
- CN111860866A CN111860866A CN202010723330.9A CN202010723330A CN111860866A CN 111860866 A CN111860866 A CN 111860866A CN 202010723330 A CN202010723330 A CN 202010723330A CN 111860866 A CN111860866 A CN 111860866A
- Authority
- CN
- China
- Prior art keywords
- vertex
- network
- vertex sequence
- community
- representation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 239000013598 vector Substances 0.000 claims abstract description 36
- 230000006870 function Effects 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims abstract description 11
- 238000005295 random walk Methods 0.000 claims abstract description 10
- 238000012545 processing Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims abstract description 7
- 238000013480 data collection Methods 0.000 claims abstract description 4
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 239000000523 sample Substances 0.000 description 11
- 238000005192 partition Methods 0.000 description 10
- 239000011159 matrix material Substances 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 6
- 235000005156 Brassica carinata Nutrition 0.000 description 3
- 244000257790 Brassica carinata Species 0.000 description 3
- 241001481833 Coryphaena hippurus Species 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 238000003012 network analysis Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007430 reference method Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a network representation learning method with a community structure, which comprises the following steps: step 1: data collection and processing stage: using a density function, vertex sequence samples S ═ S are obtained by using a random walk strategy on the network G1,s2,...,sn}; step 2: data representation learning phase: optimizing the Skip-gram model, and training vertex sequence samples S-S by using the Skip-gram model1,s2,...,snObtaining the vector representation of each vertex sequence; and step 3: a data calculation stage: and carrying out similarity calculation on the vector representation of each vertex sequence to obtain community division similarity. The method can better capture the community structure in the network and can obtain higher accuracy in the vertex classification task.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a network representation learning method and device with a community structure.
Background
Many complex systems can be abstracted into a network structure, which is usually represented by a graph, i.e., composed of a set of nodes and a set of edges. For small scale networks, we can quickly perform many complex tasks on it, such as community mining and multi-label classification. However, for large-scale networks (e.g., networks having billions of vertices), it is a challenge to perform these complex tasks on it. To solve this problem, we must find another compact and efficient representation of the network. Network embedding is an effective strategy to solve this problem, i.e. learning low-dimensional vector representations of vertices in a network. For each vertex, we map their structural features in the network to low-dimensional spatial vectors, which are then applied to complex tasks in the network.
In the last few years, many network embedding methods have been proposed that characterize the local structure of the network. The Deepwalk method characterizes the neighborhood structure of the network vertices by using a truncated random walk strategy. The Node2vec method proves that DeepWalk cannot capture the diversity of connection modes in the network. The method provides a biased random walk strategy, which combines BFS and DFS ideas to explore vertex neighborhood information. The LINE method is mainly applied to large-scale network embedded learning. It preserves the high-order vertex neighborhood structure and can be easily extended to millions of vertices. Cao et al propose a deep-map representation model that uses a random surfing strategy to capture the structural information of the map. Von et al proposed a "degree penalty" principle that preserves the scale-free property by penalizing the proximity between height vertices. King et al propose a semi-supervised depth model that is able to capture highly non-linear network structures by optimizing multiple layers of non-linear functions. Yanardag et al propose a universal framework to capture mid-level similar structures. In addition, they have proposed some methods for preserving the global network structure. Wang et al propose a modular non-negative matrix factorization model that preserves the community structure in the network. Tu et al propose a heuristic community enhancement mechanism that maps community structure information into a vertex vector representation. Chen et al propose a multi-level network representation learning paradigm that progressively merges the initial network into smaller but structurally similar networks by recapturing the global structure of the initial network.
Disclosure of Invention
The technical problem to be solved by the invention is that three network embedding methods for describing a local network structure in the prior art cannot better capture a community structure in a network and cannot obtain higher accuracy in a vertex classification task, and the invention aims to provide a network representation learning method and device with the community structure to solve the problems.
The invention is realized by the following technical scheme:
a network representation learning method having a community structure, comprising the steps of:
step 1: data collection and processing stage: using a density function, vertex sequence samples S ═ S are obtained by using a random walk strategy on the network G1,s2,...,sn};
Step 2: data representation learning phase: optimizing the Skip-gram model, and training vertex sequence samples S-S by using the Skip-gram model1,s2,...,snObtaining the vector representation of each vertex sequence;
and step 3: a data calculation stage: and carrying out similarity calculation on the vector representation of each vertex sequence to obtain community division similarity.
Further, in a network representation learning method with a community structure, in step 1, a vertex sequence sample S ═ S1,s2,...,snThe vertex sequence in (v) is denoted as s ═ v1, v2., v | s | }.
Further, a network representation learning method with a community structure, the density function in step 1 is defined as:
whereinAndrespectively, the sum of the internal degree and the external degree of all vertexes in the vertex sequence s, and alpha is a resolution parameter used for controlling the size of the community;
the density function also has a density gain Δ fv sSaid density gain Δ fv sThe following formula should be satisfied:
△fs=fs+{v}-fs
where the notation s + { v } denotes the new vertex sequence resulting from moving vertex v to s.
Further, a network representation learning method with a community structure, the specific steps of obtaining vertex sequence samples in step 1 are as follows:
step 11: from the set N' (v)|s|) In the random selection of a vertex v|s|+1;
Step 12: according to the formula Δ fs=fs+{v}-fsCalculating Δ fs v|s|+1;
Step 13: if Δ fs v|s|+1<0, then from the set N' (v)|s|) Deletion of v|s|+1Then returning to step 11;
step 14: if Δ fs v|s|+1>0, then v will be|s|+1Add to set s and v|s|+1Marking as a current vertex;
wherein the vertex v|s|Is the last vertex added, let the last vertex added v|s|Is the current vertex; n' (v)|s|) Representing the current vertex v|s|All neighbor vertices not in s; the steps 11 to 14 are repeated until the density of the vertex sequence s cannot be increased.
Further, a network representation learning method with a community structure, wherein the Skip-gram model in the step 2 trains vertex sequence samples by minimizing the following objective function:
where t is the window size, vjIs viVertex representation in context network within window, probability p (v) in the above formulaj|vi) Is defined as
Where Φ(s) represents the embedding vector of s, Φ'(s) represents the context vector, and s represents the set of vertex sequences.
Further, the method for learning network representation with community structure, the calculating similarity of vector representation of each vertex sequence in step 3 specifically includes: for each vector representation of vertex sequences in the network, the similarity degree between the vector representation of the vertex sequence and the vector representations of other vertex sequences is calculated, and the similarity degree is calculated by using an NMI formula.
A network representation learning apparatus having a community structure, comprising:
a data collecting and processing module for reading the vertex sequence samples to obtain vertex sequence samples S ═ S1,s2,...,sn};
A data representation learning module for optimizing the Skip-gram model, using the Skip-gram model to train the vertex sequence samples S ═ { S }1,s2,...,snObtaining the vector representation of each vertex sequence;
and the similarity calculation module is used for calculating the similarity of the vector representation of each vertex sequence to obtain the community division similarity.
The method of the invention uses the following NMI formula to calculate the similarity:
normalized Mutual Information measure (NMI) is an Information theory-based index used to measure the similarity between two community partitions a and B. NMI is defined as follows:
where C is a similarity matrix, where the rows correspond to "real" communities and the columns correspond to "probe" communities, and N is the number of nodes. CijIs the common top point number in the real community i and the exploration community j. CAAnd CBRespectively representing the number of real communities and probe communities. CiAnd C.jRespectively representing the sum of the ith row and the jth column of matrix C. NMI ranges from 0 to 1. NMI equals 1 if the real community partition is identical to the probe community partition; otherwise it is equal to 0.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the method can better capture the community structure in the network and can obtain higher accuracy in the vertex classification task.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is an overall flow chart of the present invention.
Fig. 2 shows the lifting ratio of NMI and Q-Walker over the artificial network, with the parameter α being 1.5.
Fig. 3 shows the optimal interval of the parameter α over four real networks.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
Examples
As shown in fig. 1, a network representation learning method having a community structure includes the steps of:
step 1: data collection and processing stage: using a density function, vertex sequence samples S ═ S are obtained by using a random walk strategy on the network G1,s2,...,sn};
Step 2: data representation learning phase: optimizing the Skip-gram model, and training vertex sequence samples S-S by using the Skip-gram model1,s2,...,snObtaining the vector representation of each vertex sequence;
and step 3: a data calculation stage: and carrying out similarity calculation on the vector representation of each vertex sequence to obtain community division similarity.
Further, in a network representation learning method with a community structure, in step 1, a vertex sequence sample S ═ S1,s2,...,snThe vertex sequence in (v) is denoted as s ═ v1, v2., v | s | }.
Further, a network representation learning method with a community structure, the density function in step 1 is defined as:
whereinAndrespectively, the sum of the internal degree and the external degree of all vertexes in the vertex sequence s, and alpha is a resolution parameter used for controlling the size of the community;
the density function also has a density gain Δ fv sSaid density gain Δ fv sThe following formula should be satisfied:
△fs=fs+{v}-fs
where the notation s + { v } denotes the new vertex sequence resulting from moving vertex v to s.
Further, a network representation learning method with a community structure, the specific steps of obtaining vertex sequence samples in step 1 are as follows:
step 11: from the set N' (v)|s|) In the random selection of a vertex v|s|+1;
Step 12: according to the formula Δ fs=fs+{v}-fsCalculating Δ fs v|s|+1;
Step 13: if Δ fs v|s|+1<0, then from the set N' (v)|s|) Deletion of v|s|+1Then returning to step 11;
step 14: if Δ fs v|s|+1>0, then v will be|s|+1Add to set s and v|s|+1Marking as a current vertex;
wherein the vertex v|s|Is the last vertex added, let the last vertex added v|s|Is the current vertex; n' (v)|s|) Watch (A)Showing the current vertex v|s|All neighbor vertices not in s; the steps 11 to 14 are repeated until the density of the vertex sequence s cannot be increased.
Further, a network representation learning method with a community structure, wherein the Skip-gram model in the step 2 trains vertex sequence samples by minimizing the following objective function:
where t is the window size, vjIs viVertex representation in context network within window, probability p (v) in the above formulaj|vi) Is defined as
Where Φ(s) represents the embedding vector of s, Φ'(s) represents the context vector, and s represents the set of vertex sequences.
Further, the method for learning network representation with community structure, the calculating similarity of vector representation of each vertex sequence in step 3 specifically includes: for each vector representation of vertex sequences in the network, the similarity degree between the vector representation of the vertex sequence and the vector representations of other vertex sequences is calculated, and the similarity degree is calculated by using an NMI formula.
A network representation learning apparatus having a community structure, comprising:
a data collecting and processing module for reading the vertex sequence samples to obtain vertex sequence samples S ═ S1,s2,...,sn};
A data representation learning module for optimizing the Skip-gram model, using the Skip-gram model to train the vertex sequence samples S ═ { S }1,s2,...,snObtaining the vector representation of each vertex sequence;
and the similarity calculation module is used for calculating the similarity of the vector representation of each vertex sequence to obtain the community division similarity.
In this example, the similarity is calculated using the following NMI formula:
normalized Mutual Information measure (NMI) is an Information theory-based index used to measure the similarity between two community partitions a and B. NMI is defined as follows:
where C is a similarity matrix, where the rows correspond to "real" communities and the columns correspond to "probe" communities, and N is the number of nodes. CijIs the common top point number in the real community i and the exploration community j. CAAnd CBRespectively representing the number of real communities and probe communities. CiAnd C.jRespectively representing the sum of the ith row and the jth column of matrix C. NMI ranges from 0 to 1. NMI equals 1 if the real community partition is identical to the probe community partition; otherwise it is equal to 0.
Many classical embedding methods, such as deep walk, Node2vec, and DP-Walker, obtain a set of vertex sequence samples S ═ S by using a random walk strategy on the network G1,s2,...,snWhere each vertex sequence may be denoted as s ═ v1,...,v|s|}. By treating each vertex sequence as a sentence in a document, we can use the Skip-gram model to learn the vertex representation in the network:
for the Deepwalk method, it uses a uniform distribution of p (v) during random walksi+1|vi) I.e. viIs equal in probability of each neighbor being selected.
For the Node2vec method, it uses the biased probability p (v) in the random walk processi+1|vi) It is defined as:
wherein d isi-1,i+1Representing a vertex vi-1And vertex vi+1The shortest path distance between them. In etcIn the formula (3), parameters p and q respectively control the proportion of the breadth-first search strategy and the depth-first search strategy in the random walking process.
For DP-Walker, the probability p (v)i+1|vi) Is defined as
Wherein k isiIs the vertex viDegree of (C)i,i+1Is viAnd vi+1Is a model parameter.
The mean shift clustering method is a non-parameter clustering process. Compared with the classic k-means clustering method, the method does not need to assume the shape of distribution and the number of clusters. Given n data points xi∈Rd(i 1.. n), then the multivariate kernel density estimate based on the radially symmetric kernel k (x) is given by the following equation:
where h is the radius of the nucleus. For each data point xiA gradient-ascent optimization strategy is performed on its locally estimated density until convergence. All data points associated with the same center point belong to the same cluster.
The following experimental analyses:
(1) real network
In the community mining experiments, we used four real networks, Karate, Football, Dolphin and PolBooks networks respectively. Table 1 lists details of four networks, including the number of nodes (| V |), the number of edges (| E |), the degree of averaging (V |), (E |)<k>) Mean square degree of (<k2>) Average clustering coefficient (cc) and true community number (nc).
Table 1: four network statistical information with real communities
(2) Artificial network
In a community mining experiment, we further used an artificial network to evaluate the performance of our method. The Plantedpartition model is a classic artificial reference network generator. The model generates a network with n-g-z vertices, where z is the number of communities and g is the number of vertices in each community. In the same community, the probability that a connecting edge exists between any two vertexes is pinAnd the probability that a connecting edge exists between any two vertexes between different communities is Pout. Average degree of each vertex<k>=pin(g–1)+poutg (z-1). If p isin>=poutThe network has a community structure because the density of links within a community is greater than the density of links between communities. In the present invention we use the special case of the l-partition model proposed by Girvan and Newman. They set z 4, g 32,<k>16. Table 2 shows 7 artificial networks in which the internal average of the vertices<kin>The larger the community structure, the stronger the community structure.
Table 2: 7 artificial network statistical information with different community structures
(3) Reference method
We compare our method (Q-Walker) with the three network embedding methods (Deepwalk, Node2vec, and DP-Walker).
(4) Community detection
After learning the embedded vector representation for each node, we use a mean shift clustering algorithm to mine communities.
(5) Accuracy index
Normalized Mutual Information measure (NMI) is an Information theory-based index used to measure the similarity between two community partitions a and B. NMI is defined as follows:
where C is a similarity matrix, where the rows correspond to "real" communities and the columns correspond to "probe" communities, and N is the number of nodes. CijIs the common top point number in the real community i and the exploration community j. CAAnd CBRespectively representing the number of real communities and probe communities. CiAnd C.jRespectively representing the sum of the ith row and the jth column of matrix C. NMI ranges from 0 to 1. NMI equals 1 if the real community partition is identical to the probe community partition; otherwise it is equal to 0.
(6) Real network analysis
We first evaluated the performance of Q-Walker over four real networks with known communities, with the results shown in table 3. As can be seen from Table 3, Q-Walker and DP-Walker perform better in all networks than the other two algorithms, Deepwalk and Node2 vec. Therefore, we next only compare the two algorithms Q-Walker and DP-Walker. On the Karate network, the NMIs of Q-Walker and DP-Walker are 1 and 0.837, respectively. Therefore, Q-Walker can correctly detect all known communities, and the NMI of the community is improved by 19.47% compared with the DP-Walker. Similarly, on the Dolphin network, Q-Walker can also find all known communities whose NMI is 12.48% higher. On PolBooks networks, the NMIs for Q-Walker and DP-Walker are 0.679 and 0.581, respectively. Compared with the NMI of DP-Walker, the NMI of Q-Walker is improved by 16.86 percent. On the Football network, the NMI of Q-Walker is still slightly higher than the other three methods, although the effect of all methods is still good. Compared with DP-Walker, the NMI of Q-Walker is improved by 1.81 percent.
Table 3: NMI over four real networks with known communities
(7) Artificial network analysis
We also evaluated the performance of our method on artificial networks with different community structures, where table 2 shows details of 7 artificial networks. The results of the experiment are shown in FIG. 2. FromAs can be seen in FIG. 2, when<kin>Less than or equal to 10.5, and the performance of the Q-Walker method is superior to that of the other three methods. Also we note the ratio of Q-Walker boost compared to NMI for the other three methods<kin>In inverse proportion, i.e.<kin>The smaller the Q-Walker, the higher the ratio of boost. Taking Node2vec as an example, when<kin>The proportion of the Q-Walker boost is higher than 50% when 8.5, and when<kin>The proportion of the increase in Q-Walker was 0% at 11.5. The reason for this is explained below. When k is 8.5, the network has many weak community structures. Because there are many edges of connection between weak community structures, nodes can easily jump from one weak community to another during random walks. The vertex sequence s of the Node2vec samples does not describe the weak community structure well, since most of the vertices in s come from different weak community structures. However, for Q-Walker, most of the vertices in the sequence of vertices s are from the same community structure, so s has a relatively tight internal connection. Therefore, Q-Walker can well delineate the weak community structure. When k is 11.5, the network has many strong community structures. Because the internal connection density of the strong community structures is high and the connection edges among the strong community structures are few, the nodes randomly walk in the same community most of the time. Thus, the vertex sequence s of the Node2vec samples can well delineate a strong community structure, since most of the vertices in s are from the same community. Similarly, we can use the above analysis to explain the other two benchmark methods Deepwalk and DP-Walker. By combining the above analysis, Q-Walker not only performs well in networks with weak community structures, but also performs well in networks with strong community structures.
(8) Sensitivity of parameters
Finally, we change the formulaThe value of the parameter a to evaluate the performance of our method. In the experiment, the value range of the parameter alpha is more than or equal to 0.05 and less than or equal to 1.5, and the result is shown in FIG. 3. As can be seen from the view in figure 3,each network contains an optimal resolution interval within which the performance of the algorithm is stable and best. Using the Karate network as an example, at 0.55 ≦ α ≦ 0.7, our algorithm can find all known community structures. In addition, the optimal interval of each network α is different. Taking Dolphin and PolBooks networks as examples, the optimal intervals are respectively 0.5-0.8 alpha and 0.05-0.2 alpha. The difference in the optimal intervals is related to the community structure hierarchy in the network. Typically, the community hierarchy of different networks is not the same. Therefore, the α -optimum intervals for different networks should also be different.
(9) Multi-label classification
Furthermore, we further evaluated the performance of our approach in the multi-label classification task. To facilitate comparison of our method with the other three methods, we used the following experimental procedure. Specifically, we randomly select a part of the vertices as a training set, and the rest of the vertices as a testing set. The logistic multi-classification model implemented with LibLinear then returns the label with the highest probability. We repeated the above procedure 50 times and then averaged the Micro-F1 and Macro F1 scores. We performed the experiment on a BlogCatalog network and set the parameter α to 1. In order to accelerate the training speed of the multi-label classifier, a smaller training set is selected on a BlogCatalog network, and the proportion is from 1% to 9%. Tables 4 and 5 show Micro F1 and Macro F1 on blogCatalog, respectively. Bold numbers indicate the highest score in each column. As can be seen from tables 4 and 5, the Q-Walker algorithm outperforms the other three methods in terms of the scores of Micro-F1 and Macro F1.
Table 4: micro F1 score on BlogCatalog
Table 5: macro F1 score on BlogCatalog
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (7)
1. A network representation learning method having a community structure, comprising the steps of:
step 1: data collection and processing stage: using a density function, vertex sequence samples S ═ S are obtained by using a random walk strategy on the network G1,s2,...,sn};
Step 2: data representation learning phase: optimizing the Skip-gram model, and training vertex sequence samples S-S by using the Skip-gram model1,s2,...,snObtaining the vector representation of each vertex sequence;
and step 3: a data calculation stage: and carrying out similarity calculation on the vector representation of each vertex sequence to obtain community division similarity.
2. The method as claimed in claim 1, wherein in step 1, the vertex sequence sample S ═ S1,s2,...,snThe vertex sequence in (v) is denoted as s ═ v1, v2., v | s | }.
3. The method of claim 2, wherein the social network representation learning method comprises,
whereinAndrespectively, the sum of the internal degree and the external degree of all vertexes in the vertex sequence s, and alpha is a resolution parameter used for controlling the size of the community;
the density function also has a density gain Δ fv sSaid density gain Δ fv sThe following formula should be satisfied:
△fs=fs+{v}-fs
where the notation s + { v } denotes the new vertex sequence resulting from moving vertex v to s.
4. The method as claimed in claim 3, wherein the step 1 of obtaining vertex sequence samples comprises the following steps:
step 11: from the set N' (v)|s|) In the random selection of a vertex v|s|+1;
Step 12: according to the formula Δ fs=fs+{v}-fsCalculating Δ fs v|s|+1;
Step 13: if Δ fs v|s|+1<0, then from the set N' (v)|s|) Deletion of v|s|+1Then returning to step 11;
step 14: if Δ fs v|s|+1>0, then v will be|s|+1Add to set s and v|s|+1Marking as a current vertex;
wherein the vertex v|s|Is the last vertex added, let the last vertex added v|s|Is the current vertex; n' (v)|s|) Representing the current vertex v|s|All neighbor vertices not in s; the steps 11 to 14 are repeated until the density of the vertex sequence s cannot be increased.
5. The method as claimed in claim 2, wherein the Skip-gram model in step 2 trains vertex sequence samples by minimizing the following objective function:
where t is the window size, vjIs viVertex representation in context network within window, probability p (v) in the above formulaj|vi) Is defined as
Where Φ(s) represents the embedding vector of s, Φ'(s) represents the context vector, and s represents the set of vertex sequences.
6. The method as claimed in claim 1, wherein the calculating similarity of the vector representation of each vertex sequence in step 3 comprises: for each vector representation of vertex sequences in the network, the similarity degree between the vector representation of the vertex sequence and the vector representations of other vertex sequences is calculated, and the similarity degree is calculated by using an NMI formula.
7. A network representation learning apparatus having a community structure, comprising:
a data collecting and processing module for reading the vertex sequence samples to obtain vertex sequence samples S ═ S1,s2,...,sn};
A data representation learning module for optimizing the Skip-gram model, using the Skip-gram model to train the vertex sequence samples S ═ { S }1,s2,...,snObtaining the vector representation of each vertex sequence;
and the similarity calculation module is used for calculating the similarity of the vector representation of each vertex sequence to obtain the community division similarity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010723330.9A CN111860866A (en) | 2020-07-24 | 2020-07-24 | Network representation learning method and device with community structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010723330.9A CN111860866A (en) | 2020-07-24 | 2020-07-24 | Network representation learning method and device with community structure |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111860866A true CN111860866A (en) | 2020-10-30 |
Family
ID=72950884
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010723330.9A Pending CN111860866A (en) | 2020-07-24 | 2020-07-24 | Network representation learning method and device with community structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111860866A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116980559A (en) * | 2023-06-09 | 2023-10-31 | 负熵信息科技(武汉)有限公司 | Metropolitan area level video intelligent bayonet planning layout method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140317736A1 (en) * | 2013-04-23 | 2014-10-23 | Telefonica Digital Espana, S.L.U. | Method and system for detecting fake accounts in online social networks |
CN108399189A (en) * | 2018-01-23 | 2018-08-14 | 重庆邮电大学 | Friend recommendation system based on community discovery and its method |
CN109615550A (en) * | 2018-11-26 | 2019-04-12 | 兰州大学 | A kind of local corporations' detection method based on similitude |
CN110598128A (en) * | 2019-09-11 | 2019-12-20 | 西安电子科技大学 | Community detection method for large-scale network for resisting Sybil attack |
US20200142957A1 (en) * | 2018-11-02 | 2020-05-07 | Oracle International Corporation | Learning property graph representations edge-by-edge |
-
2020
- 2020-07-24 CN CN202010723330.9A patent/CN111860866A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140317736A1 (en) * | 2013-04-23 | 2014-10-23 | Telefonica Digital Espana, S.L.U. | Method and system for detecting fake accounts in online social networks |
CN108399189A (en) * | 2018-01-23 | 2018-08-14 | 重庆邮电大学 | Friend recommendation system based on community discovery and its method |
US20200142957A1 (en) * | 2018-11-02 | 2020-05-07 | Oracle International Corporation | Learning property graph representations edge-by-edge |
CN109615550A (en) * | 2018-11-26 | 2019-04-12 | 兰州大学 | A kind of local corporations' detection method based on similitude |
CN110598128A (en) * | 2019-09-11 | 2019-12-20 | 西安电子科技大学 | Community detection method for large-scale network for resisting Sybil attack |
Non-Patent Citations (2)
Title |
---|
何嘉林: "时序网络中的社团探测及演化分析方法", 计算机工程与设计, vol. 38, no. 8 * |
王慧雪;: "基于node2vec的社区检测方法", 计算机与数字工程, no. 02 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116980559A (en) * | 2023-06-09 | 2023-10-31 | 负熵信息科技(武汉)有限公司 | Metropolitan area level video intelligent bayonet planning layout method |
CN116980559B (en) * | 2023-06-09 | 2024-02-09 | 负熵信息科技(武汉)有限公司 | Metropolitan area level video intelligent bayonet planning layout method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
He et al. | AutoML: A survey of the state-of-the-art | |
Cao et al. | A neighbor-based learning particle swarm optimizer with short-term and long-term memory for dynamic optimization problems | |
Combe et al. | Combining relations and text in scientific network clustering | |
Shiu et al. | Transferring Case Knowledge To Adaptation Knowledge: An Approach for Case‐Base Maintenance | |
Qin et al. | Kernel neural gas algorithms with application to cluster analysis | |
CN110674326A (en) | Neural network structure retrieval method based on polynomial distribution learning | |
Bereta et al. | Immune K-means and negative selection algorithms for data analysis | |
Ghaffaripour et al. | A multi-objective genetic algorithm for community detection in weighted networks | |
Li et al. | WDAN: A weighted discriminative adversarial network with dual classifiers for fine-grained open-set domain adaptation | |
Chen et al. | Detecting community structures in social networks with particle swarm optimization | |
Tran et al. | Mining spatial co-location patterns based on overlap maximal clique partitioning | |
CN111860866A (en) | Network representation learning method and device with community structure | |
Choudhury et al. | Searches for the BSM scenarios at the LHC using decision tree based machine learning algorithms: A comparative study and review of Random Forest, Adaboost, XGboost and LightGBM frameworks | |
Ishii et al. | Modified reduct: nearest neighbor classification | |
Luo et al. | A reduced mixed representation based multi-objective evolutionary algorithm for large-scale overlapping community detection | |
Liu et al. | Swarm intelligence for classification of remote sensing data | |
CN113159976B (en) | Identification method for important users of microblog network | |
CN115086179A (en) | Detection method for community structure in social network | |
Singh et al. | Clustering using genetic algorithm: A collaborative performance analysis | |
Mahajan et al. | Various approaches of community detection in complex networks: a glance | |
Win Cho et al. | Data Clustering based on Modified Differential Evolution and Quasi-Oppositionbased Learning. | |
Zhang et al. | Color clustering using self-organizing maps | |
Leng et al. | Active semisupervised community detection based on asymmetric similarity measure | |
Zhang et al. | Community and local information preserved link prediction in complex networks | |
Waizumi et al. | High speed and high accuracy rough classification for handwritten characters using hierarchical learning vector quantization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |