CN111860866A - Network representation learning method and device with community structure - Google Patents

Network representation learning method and device with community structure Download PDF

Info

Publication number
CN111860866A
CN111860866A CN202010723330.9A CN202010723330A CN111860866A CN 111860866 A CN111860866 A CN 111860866A CN 202010723330 A CN202010723330 A CN 202010723330A CN 111860866 A CN111860866 A CN 111860866A
Authority
CN
China
Prior art keywords
vertex
network
vertex sequence
community
representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010723330.9A
Other languages
Chinese (zh)
Inventor
何嘉林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China West Normal University
Original Assignee
China West Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China West Normal University filed Critical China West Normal University
Priority to CN202010723330.9A priority Critical patent/CN111860866A/en
Publication of CN111860866A publication Critical patent/CN111860866A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a network representation learning method with a community structure, which comprises the following steps: step 1: data collection and processing stage: using a density function, vertex sequence samples S ═ S are obtained by using a random walk strategy on the network G1,s2,...,sn}; step 2: data representation learning phase: optimizing the Skip-gram model, and training vertex sequence samples S-S by using the Skip-gram model1,s2,...,snObtaining the vector representation of each vertex sequence; and step 3: a data calculation stage: and carrying out similarity calculation on the vector representation of each vertex sequence to obtain community division similarity. The method can better capture the community structure in the network and can obtain higher accuracy in the vertex classification task.

Description

Network representation learning method and device with community structure
Technical Field
The invention relates to the technical field of computers, in particular to a network representation learning method and device with a community structure.
Background
Many complex systems can be abstracted into a network structure, which is usually represented by a graph, i.e., composed of a set of nodes and a set of edges. For small scale networks, we can quickly perform many complex tasks on it, such as community mining and multi-label classification. However, for large-scale networks (e.g., networks having billions of vertices), it is a challenge to perform these complex tasks on it. To solve this problem, we must find another compact and efficient representation of the network. Network embedding is an effective strategy to solve this problem, i.e. learning low-dimensional vector representations of vertices in a network. For each vertex, we map their structural features in the network to low-dimensional spatial vectors, which are then applied to complex tasks in the network.
In the last few years, many network embedding methods have been proposed that characterize the local structure of the network. The Deepwalk method characterizes the neighborhood structure of the network vertices by using a truncated random walk strategy. The Node2vec method proves that DeepWalk cannot capture the diversity of connection modes in the network. The method provides a biased random walk strategy, which combines BFS and DFS ideas to explore vertex neighborhood information. The LINE method is mainly applied to large-scale network embedded learning. It preserves the high-order vertex neighborhood structure and can be easily extended to millions of vertices. Cao et al propose a deep-map representation model that uses a random surfing strategy to capture the structural information of the map. Von et al proposed a "degree penalty" principle that preserves the scale-free property by penalizing the proximity between height vertices. King et al propose a semi-supervised depth model that is able to capture highly non-linear network structures by optimizing multiple layers of non-linear functions. Yanardag et al propose a universal framework to capture mid-level similar structures. In addition, they have proposed some methods for preserving the global network structure. Wang et al propose a modular non-negative matrix factorization model that preserves the community structure in the network. Tu et al propose a heuristic community enhancement mechanism that maps community structure information into a vertex vector representation. Chen et al propose a multi-level network representation learning paradigm that progressively merges the initial network into smaller but structurally similar networks by recapturing the global structure of the initial network.
Disclosure of Invention
The technical problem to be solved by the invention is that three network embedding methods for describing a local network structure in the prior art cannot better capture a community structure in a network and cannot obtain higher accuracy in a vertex classification task, and the invention aims to provide a network representation learning method and device with the community structure to solve the problems.
The invention is realized by the following technical scheme:
a network representation learning method having a community structure, comprising the steps of:
step 1: data collection and processing stage: using a density function, vertex sequence samples S ═ S are obtained by using a random walk strategy on the network G1,s2,...,sn};
Step 2: data representation learning phase: optimizing the Skip-gram model, and training vertex sequence samples S-S by using the Skip-gram model1,s2,...,snObtaining the vector representation of each vertex sequence;
and step 3: a data calculation stage: and carrying out similarity calculation on the vector representation of each vertex sequence to obtain community division similarity.
Further, in a network representation learning method with a community structure, in step 1, a vertex sequence sample S ═ S1,s2,...,snThe vertex sequence in (v) is denoted as s ═ v1, v2., v | s | }.
Further, a network representation learning method with a community structure, the density function in step 1 is defined as:
Figure BDA0002600807170000021
wherein
Figure BDA0002600807170000022
And
Figure BDA0002600807170000023
respectively, the sum of the internal degree and the external degree of all vertexes in the vertex sequence s, and alpha is a resolution parameter used for controlling the size of the community;
the density function also has a density gain Δ fv sSaid density gain Δ fv sThe following formula should be satisfied:
△fs=fs+{v}-fs
where the notation s + { v } denotes the new vertex sequence resulting from moving vertex v to s.
Further, a network representation learning method with a community structure, the specific steps of obtaining vertex sequence samples in step 1 are as follows:
step 11: from the set N' (v)|s|) In the random selection of a vertex v|s|+1
Step 12: according to the formula Δ fs=fs+{v}-fsCalculating Δ fs v|s|+1
Step 13: if Δ fs v|s|+1<0, then from the set N' (v)|s|) Deletion of v|s|+1Then returning to step 11;
step 14: if Δ fs v|s|+1>0, then v will be|s|+1Add to set s and v|s|+1Marking as a current vertex;
wherein the vertex v|s|Is the last vertex added, let the last vertex added v|s|Is the current vertex; n' (v)|s|) Representing the current vertex v|s|All neighbor vertices not in s; the steps 11 to 14 are repeated until the density of the vertex sequence s cannot be increased.
Further, a network representation learning method with a community structure, wherein the Skip-gram model in the step 2 trains vertex sequence samples by minimizing the following objective function:
Figure BDA0002600807170000024
where t is the window size, vjIs viVertex representation in context network within window, probability p (v) in the above formulaj|vi) Is defined as
Figure BDA0002600807170000031
Where Φ(s) represents the embedding vector of s, Φ'(s) represents the context vector, and s represents the set of vertex sequences.
Further, the method for learning network representation with community structure, the calculating similarity of vector representation of each vertex sequence in step 3 specifically includes: for each vector representation of vertex sequences in the network, the similarity degree between the vector representation of the vertex sequence and the vector representations of other vertex sequences is calculated, and the similarity degree is calculated by using an NMI formula.
A network representation learning apparatus having a community structure, comprising:
a data collecting and processing module for reading the vertex sequence samples to obtain vertex sequence samples S ═ S1,s2,...,sn};
A data representation learning module for optimizing the Skip-gram model, using the Skip-gram model to train the vertex sequence samples S ═ { S }1,s2,...,snObtaining the vector representation of each vertex sequence;
and the similarity calculation module is used for calculating the similarity of the vector representation of each vertex sequence to obtain the community division similarity.
The method of the invention uses the following NMI formula to calculate the similarity:
normalized Mutual Information measure (NMI) is an Information theory-based index used to measure the similarity between two community partitions a and B. NMI is defined as follows:
Figure BDA0002600807170000032
where C is a similarity matrix, where the rows correspond to "real" communities and the columns correspond to "probe" communities, and N is the number of nodes. CijIs the common top point number in the real community i and the exploration community j. CAAnd CBRespectively representing the number of real communities and probe communities. CiAnd C.jRespectively representing the sum of the ith row and the jth column of matrix C. NMI ranges from 0 to 1. NMI equals 1 if the real community partition is identical to the probe community partition; otherwise it is equal to 0.
Compared with the prior art, the invention has the following advantages and beneficial effects:
the method can better capture the community structure in the network and can obtain higher accuracy in the vertex classification task.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is an overall flow chart of the present invention.
Fig. 2 shows the lifting ratio of NMI and Q-Walker over the artificial network, with the parameter α being 1.5.
Fig. 3 shows the optimal interval of the parameter α over four real networks.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
Examples
As shown in fig. 1, a network representation learning method having a community structure includes the steps of:
step 1: data collection and processing stage: using a density function, vertex sequence samples S ═ S are obtained by using a random walk strategy on the network G1,s2,...,sn};
Step 2: data representation learning phase: optimizing the Skip-gram model, and training vertex sequence samples S-S by using the Skip-gram model1,s2,...,snObtaining the vector representation of each vertex sequence;
and step 3: a data calculation stage: and carrying out similarity calculation on the vector representation of each vertex sequence to obtain community division similarity.
Further, in a network representation learning method with a community structure, in step 1, a vertex sequence sample S ═ S1,s2,...,snThe vertex sequence in (v) is denoted as s ═ v1, v2., v | s | }.
Further, a network representation learning method with a community structure, the density function in step 1 is defined as:
Figure BDA0002600807170000041
wherein
Figure BDA0002600807170000042
And
Figure BDA0002600807170000043
respectively, the sum of the internal degree and the external degree of all vertexes in the vertex sequence s, and alpha is a resolution parameter used for controlling the size of the community;
the density function also has a density gain Δ fv sSaid density gain Δ fv sThe following formula should be satisfied:
△fs=fs+{v}-fs
where the notation s + { v } denotes the new vertex sequence resulting from moving vertex v to s.
Further, a network representation learning method with a community structure, the specific steps of obtaining vertex sequence samples in step 1 are as follows:
step 11: from the set N' (v)|s|) In the random selection of a vertex v|s|+1
Step 12: according to the formula Δ fs=fs+{v}-fsCalculating Δ fs v|s|+1
Step 13: if Δ fs v|s|+1<0, then from the set N' (v)|s|) Deletion of v|s|+1Then returning to step 11;
step 14: if Δ fs v|s|+1>0, then v will be|s|+1Add to set s and v|s|+1Marking as a current vertex;
wherein the vertex v|s|Is the last vertex added, let the last vertex added v|s|Is the current vertex; n' (v)|s|) Watch (A)Showing the current vertex v|s|All neighbor vertices not in s; the steps 11 to 14 are repeated until the density of the vertex sequence s cannot be increased.
Further, a network representation learning method with a community structure, wherein the Skip-gram model in the step 2 trains vertex sequence samples by minimizing the following objective function:
Figure BDA0002600807170000051
where t is the window size, vjIs viVertex representation in context network within window, probability p (v) in the above formulaj|vi) Is defined as
Figure BDA0002600807170000052
Where Φ(s) represents the embedding vector of s, Φ'(s) represents the context vector, and s represents the set of vertex sequences.
Further, the method for learning network representation with community structure, the calculating similarity of vector representation of each vertex sequence in step 3 specifically includes: for each vector representation of vertex sequences in the network, the similarity degree between the vector representation of the vertex sequence and the vector representations of other vertex sequences is calculated, and the similarity degree is calculated by using an NMI formula.
A network representation learning apparatus having a community structure, comprising:
a data collecting and processing module for reading the vertex sequence samples to obtain vertex sequence samples S ═ S1,s2,...,sn};
A data representation learning module for optimizing the Skip-gram model, using the Skip-gram model to train the vertex sequence samples S ═ { S }1,s2,...,snObtaining the vector representation of each vertex sequence;
and the similarity calculation module is used for calculating the similarity of the vector representation of each vertex sequence to obtain the community division similarity.
In this example, the similarity is calculated using the following NMI formula:
normalized Mutual Information measure (NMI) is an Information theory-based index used to measure the similarity between two community partitions a and B. NMI is defined as follows:
Figure BDA0002600807170000053
where C is a similarity matrix, where the rows correspond to "real" communities and the columns correspond to "probe" communities, and N is the number of nodes. CijIs the common top point number in the real community i and the exploration community j. CAAnd CBRespectively representing the number of real communities and probe communities. CiAnd C.jRespectively representing the sum of the ith row and the jth column of matrix C. NMI ranges from 0 to 1. NMI equals 1 if the real community partition is identical to the probe community partition; otherwise it is equal to 0.
Many classical embedding methods, such as deep walk, Node2vec, and DP-Walker, obtain a set of vertex sequence samples S ═ S by using a random walk strategy on the network G1,s2,...,snWhere each vertex sequence may be denoted as s ═ v1,...,v|s|}. By treating each vertex sequence as a sentence in a document, we can use the Skip-gram model to learn the vertex representation in the network:
for the Deepwalk method, it uses a uniform distribution of p (v) during random walksi+1|vi) I.e. viIs equal in probability of each neighbor being selected.
For the Node2vec method, it uses the biased probability p (v) in the random walk processi+1|vi) It is defined as:
Figure BDA0002600807170000061
wherein d isi-1,i+1Representing a vertex vi-1And vertex vi+1The shortest path distance between them. In etcIn the formula (3), parameters p and q respectively control the proportion of the breadth-first search strategy and the depth-first search strategy in the random walking process.
For DP-Walker, the probability p (v)i+1|vi) Is defined as
Figure BDA0002600807170000062
Wherein k isiIs the vertex viDegree of (C)i,i+1Is viAnd vi+1Is a model parameter.
The mean shift clustering method is a non-parameter clustering process. Compared with the classic k-means clustering method, the method does not need to assume the shape of distribution and the number of clusters. Given n data points xi∈Rd(i 1.. n), then the multivariate kernel density estimate based on the radially symmetric kernel k (x) is given by the following equation:
Figure BDA0002600807170000063
where h is the radius of the nucleus. For each data point xiA gradient-ascent optimization strategy is performed on its locally estimated density until convergence. All data points associated with the same center point belong to the same cluster.
The following experimental analyses:
(1) real network
In the community mining experiments, we used four real networks, Karate, Football, Dolphin and PolBooks networks respectively. Table 1 lists details of four networks, including the number of nodes (| V |), the number of edges (| E |), the degree of averaging (V |), (E |)<k>) Mean square degree of (<k2>) Average clustering coefficient (cc) and true community number (nc).
Table 1: four network statistical information with real communities
Figure BDA0002600807170000071
(2) Artificial network
In a community mining experiment, we further used an artificial network to evaluate the performance of our method. The Plantedpartition model is a classic artificial reference network generator. The model generates a network with n-g-z vertices, where z is the number of communities and g is the number of vertices in each community. In the same community, the probability that a connecting edge exists between any two vertexes is pinAnd the probability that a connecting edge exists between any two vertexes between different communities is Pout. Average degree of each vertex<k>=pin(g–1)+poutg (z-1). If p isin>=poutThe network has a community structure because the density of links within a community is greater than the density of links between communities. In the present invention we use the special case of the l-partition model proposed by Girvan and Newman. They set z 4, g 32,<k>16. Table 2 shows 7 artificial networks in which the internal average of the vertices<kin>The larger the community structure, the stronger the community structure.
Table 2: 7 artificial network statistical information with different community structures
Figure BDA0002600807170000072
(3) Reference method
We compare our method (Q-Walker) with the three network embedding methods (Deepwalk, Node2vec, and DP-Walker).
(4) Community detection
After learning the embedded vector representation for each node, we use a mean shift clustering algorithm to mine communities.
(5) Accuracy index
Normalized Mutual Information measure (NMI) is an Information theory-based index used to measure the similarity between two community partitions a and B. NMI is defined as follows:
Figure BDA0002600807170000081
where C is a similarity matrix, where the rows correspond to "real" communities and the columns correspond to "probe" communities, and N is the number of nodes. CijIs the common top point number in the real community i and the exploration community j. CAAnd CBRespectively representing the number of real communities and probe communities. CiAnd C.jRespectively representing the sum of the ith row and the jth column of matrix C. NMI ranges from 0 to 1. NMI equals 1 if the real community partition is identical to the probe community partition; otherwise it is equal to 0.
(6) Real network analysis
We first evaluated the performance of Q-Walker over four real networks with known communities, with the results shown in table 3. As can be seen from Table 3, Q-Walker and DP-Walker perform better in all networks than the other two algorithms, Deepwalk and Node2 vec. Therefore, we next only compare the two algorithms Q-Walker and DP-Walker. On the Karate network, the NMIs of Q-Walker and DP-Walker are 1 and 0.837, respectively. Therefore, Q-Walker can correctly detect all known communities, and the NMI of the community is improved by 19.47% compared with the DP-Walker. Similarly, on the Dolphin network, Q-Walker can also find all known communities whose NMI is 12.48% higher. On PolBooks networks, the NMIs for Q-Walker and DP-Walker are 0.679 and 0.581, respectively. Compared with the NMI of DP-Walker, the NMI of Q-Walker is improved by 16.86 percent. On the Football network, the NMI of Q-Walker is still slightly higher than the other three methods, although the effect of all methods is still good. Compared with DP-Walker, the NMI of Q-Walker is improved by 1.81 percent.
Table 3: NMI over four real networks with known communities
Figure BDA0002600807170000082
(7) Artificial network analysis
We also evaluated the performance of our method on artificial networks with different community structures, where table 2 shows details of 7 artificial networks. The results of the experiment are shown in FIG. 2. FromAs can be seen in FIG. 2, when<kin>Less than or equal to 10.5, and the performance of the Q-Walker method is superior to that of the other three methods. Also we note the ratio of Q-Walker boost compared to NMI for the other three methods<kin>In inverse proportion, i.e.<kin>The smaller the Q-Walker, the higher the ratio of boost. Taking Node2vec as an example, when<kin>The proportion of the Q-Walker boost is higher than 50% when 8.5, and when<kin>The proportion of the increase in Q-Walker was 0% at 11.5. The reason for this is explained below. When k is 8.5, the network has many weak community structures. Because there are many edges of connection between weak community structures, nodes can easily jump from one weak community to another during random walks. The vertex sequence s of the Node2vec samples does not describe the weak community structure well, since most of the vertices in s come from different weak community structures. However, for Q-Walker, most of the vertices in the sequence of vertices s are from the same community structure, so s has a relatively tight internal connection. Therefore, Q-Walker can well delineate the weak community structure. When k is 11.5, the network has many strong community structures. Because the internal connection density of the strong community structures is high and the connection edges among the strong community structures are few, the nodes randomly walk in the same community most of the time. Thus, the vertex sequence s of the Node2vec samples can well delineate a strong community structure, since most of the vertices in s are from the same community. Similarly, we can use the above analysis to explain the other two benchmark methods Deepwalk and DP-Walker. By combining the above analysis, Q-Walker not only performs well in networks with weak community structures, but also performs well in networks with strong community structures.
(8) Sensitivity of parameters
Finally, we change the formula
Figure BDA0002600807170000091
The value of the parameter a to evaluate the performance of our method. In the experiment, the value range of the parameter alpha is more than or equal to 0.05 and less than or equal to 1.5, and the result is shown in FIG. 3. As can be seen from the view in figure 3,each network contains an optimal resolution interval within which the performance of the algorithm is stable and best. Using the Karate network as an example, at 0.55 ≦ α ≦ 0.7, our algorithm can find all known community structures. In addition, the optimal interval of each network α is different. Taking Dolphin and PolBooks networks as examples, the optimal intervals are respectively 0.5-0.8 alpha and 0.05-0.2 alpha. The difference in the optimal intervals is related to the community structure hierarchy in the network. Typically, the community hierarchy of different networks is not the same. Therefore, the α -optimum intervals for different networks should also be different.
(9) Multi-label classification
Furthermore, we further evaluated the performance of our approach in the multi-label classification task. To facilitate comparison of our method with the other three methods, we used the following experimental procedure. Specifically, we randomly select a part of the vertices as a training set, and the rest of the vertices as a testing set. The logistic multi-classification model implemented with LibLinear then returns the label with the highest probability. We repeated the above procedure 50 times and then averaged the Micro-F1 and Macro F1 scores. We performed the experiment on a BlogCatalog network and set the parameter α to 1. In order to accelerate the training speed of the multi-label classifier, a smaller training set is selected on a BlogCatalog network, and the proportion is from 1% to 9%. Tables 4 and 5 show Micro F1 and Macro F1 on blogCatalog, respectively. Bold numbers indicate the highest score in each column. As can be seen from tables 4 and 5, the Q-Walker algorithm outperforms the other three methods in terms of the scores of Micro-F1 and Macro F1.
Table 4: micro F1 score on BlogCatalog
Figure BDA0002600807170000101
Table 5: macro F1 score on BlogCatalog
Figure BDA0002600807170000102
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (7)

1. A network representation learning method having a community structure, comprising the steps of:
step 1: data collection and processing stage: using a density function, vertex sequence samples S ═ S are obtained by using a random walk strategy on the network G1,s2,...,sn};
Step 2: data representation learning phase: optimizing the Skip-gram model, and training vertex sequence samples S-S by using the Skip-gram model1,s2,...,snObtaining the vector representation of each vertex sequence;
and step 3: a data calculation stage: and carrying out similarity calculation on the vector representation of each vertex sequence to obtain community division similarity.
2. The method as claimed in claim 1, wherein in step 1, the vertex sequence sample S ═ S1,s2,...,snThe vertex sequence in (v) is denoted as s ═ v1, v2., v | s | }.
3. The method of claim 2, wherein the social network representation learning method comprises,
the density function in step 1 is defined as:
Figure FDA0002600807160000011
wherein
Figure FDA0002600807160000012
And
Figure FDA0002600807160000013
respectively, the sum of the internal degree and the external degree of all vertexes in the vertex sequence s, and alpha is a resolution parameter used for controlling the size of the community;
the density function also has a density gain Δ fv sSaid density gain Δ fv sThe following formula should be satisfied:
△fs=fs+{v}-fs
where the notation s + { v } denotes the new vertex sequence resulting from moving vertex v to s.
4. The method as claimed in claim 3, wherein the step 1 of obtaining vertex sequence samples comprises the following steps:
step 11: from the set N' (v)|s|) In the random selection of a vertex v|s|+1
Step 12: according to the formula Δ fs=fs+{v}-fsCalculating Δ fs v|s|+1
Step 13: if Δ fs v|s|+1<0, then from the set N' (v)|s|) Deletion of v|s|+1Then returning to step 11;
step 14: if Δ fs v|s|+1>0, then v will be|s|+1Add to set s and v|s|+1Marking as a current vertex;
wherein the vertex v|s|Is the last vertex added, let the last vertex added v|s|Is the current vertex; n' (v)|s|) Representing the current vertex v|s|All neighbor vertices not in s; the steps 11 to 14 are repeated until the density of the vertex sequence s cannot be increased.
5. The method as claimed in claim 2, wherein the Skip-gram model in step 2 trains vertex sequence samples by minimizing the following objective function:
Figure FDA0002600807160000014
where t is the window size, vjIs viVertex representation in context network within window, probability p (v) in the above formulaj|vi) Is defined as
Figure FDA0002600807160000021
Where Φ(s) represents the embedding vector of s, Φ'(s) represents the context vector, and s represents the set of vertex sequences.
6. The method as claimed in claim 1, wherein the calculating similarity of the vector representation of each vertex sequence in step 3 comprises: for each vector representation of vertex sequences in the network, the similarity degree between the vector representation of the vertex sequence and the vector representations of other vertex sequences is calculated, and the similarity degree is calculated by using an NMI formula.
7. A network representation learning apparatus having a community structure, comprising:
a data collecting and processing module for reading the vertex sequence samples to obtain vertex sequence samples S ═ S1,s2,...,sn};
A data representation learning module for optimizing the Skip-gram model, using the Skip-gram model to train the vertex sequence samples S ═ { S }1,s2,...,snObtaining the vector representation of each vertex sequence;
and the similarity calculation module is used for calculating the similarity of the vector representation of each vertex sequence to obtain the community division similarity.
CN202010723330.9A 2020-07-24 2020-07-24 Network representation learning method and device with community structure Pending CN111860866A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010723330.9A CN111860866A (en) 2020-07-24 2020-07-24 Network representation learning method and device with community structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010723330.9A CN111860866A (en) 2020-07-24 2020-07-24 Network representation learning method and device with community structure

Publications (1)

Publication Number Publication Date
CN111860866A true CN111860866A (en) 2020-10-30

Family

ID=72950884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010723330.9A Pending CN111860866A (en) 2020-07-24 2020-07-24 Network representation learning method and device with community structure

Country Status (1)

Country Link
CN (1) CN111860866A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116980559A (en) * 2023-06-09 2023-10-31 负熵信息科技(武汉)有限公司 Metropolitan area level video intelligent bayonet planning layout method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140317736A1 (en) * 2013-04-23 2014-10-23 Telefonica Digital Espana, S.L.U. Method and system for detecting fake accounts in online social networks
CN108399189A (en) * 2018-01-23 2018-08-14 重庆邮电大学 Friend recommendation system based on community discovery and its method
CN109615550A (en) * 2018-11-26 2019-04-12 兰州大学 A kind of local corporations' detection method based on similitude
CN110598128A (en) * 2019-09-11 2019-12-20 西安电子科技大学 Community detection method for large-scale network for resisting Sybil attack
US20200142957A1 (en) * 2018-11-02 2020-05-07 Oracle International Corporation Learning property graph representations edge-by-edge

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140317736A1 (en) * 2013-04-23 2014-10-23 Telefonica Digital Espana, S.L.U. Method and system for detecting fake accounts in online social networks
CN108399189A (en) * 2018-01-23 2018-08-14 重庆邮电大学 Friend recommendation system based on community discovery and its method
US20200142957A1 (en) * 2018-11-02 2020-05-07 Oracle International Corporation Learning property graph representations edge-by-edge
CN109615550A (en) * 2018-11-26 2019-04-12 兰州大学 A kind of local corporations' detection method based on similitude
CN110598128A (en) * 2019-09-11 2019-12-20 西安电子科技大学 Community detection method for large-scale network for resisting Sybil attack

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何嘉林: "时序网络中的社团探测及演化分析方法", 计算机工程与设计, vol. 38, no. 8 *
王慧雪;: "基于node2vec的社区检测方法", 计算机与数字工程, no. 02 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116980559A (en) * 2023-06-09 2023-10-31 负熵信息科技(武汉)有限公司 Metropolitan area level video intelligent bayonet planning layout method
CN116980559B (en) * 2023-06-09 2024-02-09 负熵信息科技(武汉)有限公司 Metropolitan area level video intelligent bayonet planning layout method

Similar Documents

Publication Publication Date Title
He et al. AutoML: A survey of the state-of-the-art
Cao et al. A neighbor-based learning particle swarm optimizer with short-term and long-term memory for dynamic optimization problems
Combe et al. Combining relations and text in scientific network clustering
Shiu et al. Transferring Case Knowledge To Adaptation Knowledge: An Approach for Case‐Base Maintenance
Qin et al. Kernel neural gas algorithms with application to cluster analysis
CN110674326A (en) Neural network structure retrieval method based on polynomial distribution learning
Bereta et al. Immune K-means and negative selection algorithms for data analysis
Ghaffaripour et al. A multi-objective genetic algorithm for community detection in weighted networks
Li et al. WDAN: A weighted discriminative adversarial network with dual classifiers for fine-grained open-set domain adaptation
Chen et al. Detecting community structures in social networks with particle swarm optimization
Tran et al. Mining spatial co-location patterns based on overlap maximal clique partitioning
CN111860866A (en) Network representation learning method and device with community structure
Choudhury et al. Searches for the BSM scenarios at the LHC using decision tree based machine learning algorithms: A comparative study and review of Random Forest, Adaboost, XGboost and LightGBM frameworks
Ishii et al. Modified reduct: nearest neighbor classification
Luo et al. A reduced mixed representation based multi-objective evolutionary algorithm for large-scale overlapping community detection
Liu et al. Swarm intelligence for classification of remote sensing data
CN113159976B (en) Identification method for important users of microblog network
CN115086179A (en) Detection method for community structure in social network
Singh et al. Clustering using genetic algorithm: A collaborative performance analysis
Mahajan et al. Various approaches of community detection in complex networks: a glance
Win Cho et al. Data Clustering based on Modified Differential Evolution and Quasi-Oppositionbased Learning.
Zhang et al. Color clustering using self-organizing maps
Leng et al. Active semisupervised community detection based on asymmetric similarity measure
Zhang et al. Community and local information preserved link prediction in complex networks
Waizumi et al. High speed and high accuracy rough classification for handwritten characters using hierarchical learning vector quantization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination