CN111860866A - A network representation learning method and device with community structure - Google Patents

A network representation learning method and device with community structure Download PDF

Info

Publication number
CN111860866A
CN111860866A CN202010723330.9A CN202010723330A CN111860866A CN 111860866 A CN111860866 A CN 111860866A CN 202010723330 A CN202010723330 A CN 202010723330A CN 111860866 A CN111860866 A CN 111860866A
Authority
CN
China
Prior art keywords
vertex
network
vertex sequence
community
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010723330.9A
Other languages
Chinese (zh)
Inventor
何嘉林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China West Normal University
Original Assignee
China West Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China West Normal University filed Critical China West Normal University
Priority to CN202010723330.9A priority Critical patent/CN111860866A/en
Publication of CN111860866A publication Critical patent/CN111860866A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a network representation learning method with a community structure, which comprises the following steps: step 1: data collection and processing stage: using a density function, vertex sequence samples S ═ S are obtained by using a random walk strategy on the network G1,s2,...,sn}; step 2: data representation learning phase: optimizing the Skip-gram model, and training vertex sequence samples S-S by using the Skip-gram model1,s2,...,snObtaining the vector representation of each vertex sequence; and step 3: a data calculation stage: and carrying out similarity calculation on the vector representation of each vertex sequence to obtain community division similarity. The method can better capture the community structure in the network and can obtain higher accuracy in the vertex classification task.

Description

一种具有社团结构的网络表示学习方法及装置A network representation learning method and device with community structure

技术领域technical field

本发明涉及计算机技术领域,具体涉及一种具有社团结构的网络表示学习方法及装置。The invention relates to the field of computer technology, in particular to a network representation learning method and device with a community structure.

背景技术Background technique

许多复杂系统可以抽象成网络结构,而网络结构通常由图来表示,即由一组节点和一组连边组成。对于小规模网络,我们可以在它上面快速执行许多复杂任务,例如社团挖掘和多标签分类。然而,对于大规模网络(例如具有数十亿个顶点的网络),在它上面执行这些复杂任务是一个挑战。为了解决这个问题,我们必须找到另外一种简洁而有效的网络表示形式。网络嵌入就是解决该问题的一种有效策略,即学习网络中顶点的低维向量表示。对于每个顶点,我们将它们在网络中的结构特征映射到低维空间向量,然后再将这些向量应用于网络中的复杂任务。Many complex systems can be abstracted into network structures, and network structures are usually represented by graphs, which consist of a set of nodes and a set of connected edges. For small-scale networks, we can quickly perform many complex tasks on it, such as community mining and multi-label classification. However, performing these complex tasks on large-scale networks (such as those with billions of vertices) is a challenge. To solve this problem, we must find another concise and efficient network representation. An effective strategy for solving this problem is network embedding, which is to learn low-dimensional vector representations of vertices in the network. For each vertex, we map their structural features in the network to low-dimensional spatial vectors, which are then applied to complex tasks in the network.

在过去的几年中,有人提出了许多刻画网络局部结构的网络嵌入方法。DeepWalk方法通过使用截断的随机游走策略来刻画网络顶点的邻域结构。Node2vec方法证明DeepWalk并不能捕获网络中连接模式的多样性。它提出了一种偏向随机行走策略,该策略结合了BFS和DFS思想来探索顶点邻域信息。LINE方法主要应用于大规模网络嵌入学习。它保留了高阶的顶点邻域结构,并可以很容易地扩展到数百万个顶点。曹等人提出了一种深图表示模型,该模型采用随机冲浪策略来捕获图的结构信息。冯等人提出了“度惩罚”原则,该原则通过惩罚高度顶点之间的邻近性来保留无标度属性。王等人提出了一种半监督的深度模型,该模型能够通过优化多层非线性函数来捕获高度非线性的网络结构。Yanardag等人提出了捕获中层相似结构的通用框架。此外,他们还提出了一些用来保留全局网络结构的方法。王等人提出了一个模块化的非负矩阵分解模型,该模型保留了网络中的社团结构。涂等人提出了一种启发式的社团增强机制,该机制将社团结构信息映射到顶点向量表示中。陈等人提出了一种多层次的网络表示学习范式,它通过重新捕获初始网络的全局结构,逐步将初始网络合并为较小但结构相似的网络。In the past few years, many network embedding methods have been proposed to characterize the local structure of the network. The DeepWalk method characterizes the neighborhood structure of network vertices by using a truncated random walk strategy. The Node2vec approach proves that DeepWalk does not capture the diversity of connection patterns in the network. It proposes a biased random walk strategy that combines BFS and DFS ideas to explore vertex neighborhood information. The LINE method is mainly applied to large-scale network embedding learning. It preserves the high-order vertex neighborhood structure and scales easily to millions of vertices. Cao et al. proposed a deep graph representation model that employs a random surfing strategy to capture the structural information of the graph. Feng et al. proposed the "degree penalty" principle, which preserves the scale-free property by penalizing the proximity between vertices of high degree. Wang et al. proposed a semi-supervised deep model capable of capturing highly nonlinear network structures by optimizing multiple layers of nonlinear functions. Yanardag et al. proposed a general framework to capture similar structures in mid-level. In addition, they also proposed some methods to preserve the global network structure. Wang et al. proposed a modular non-negative matrix factorization model that preserves the community structure in the network. Tu et al. proposed a heuristic community enhancement mechanism that maps community structure information into vertex vector representations. Chen et al. proposed a multi-level network representation learning paradigm that gradually merges the initial network into smaller but structurally similar networks by recapturing the global structure of the initial network.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题是现有技术中刻画网络局部结构的三种网络嵌入方法不能更好地捕获网络中的社团结构,并且在顶点分类任务中不能获得更高的准确率,目的在于提供一种具有社团结构的网络表示学习方法及装置,解决上述问题。The technical problem to be solved by the present invention is that the three network embedding methods for describing the local structure of the network in the prior art cannot better capture the community structure in the network, and cannot obtain higher accuracy in the vertex classification task. The purpose is to provide A network representation learning method and device with community structure solve the above problems.

本发明通过下述技术方案实现:The present invention is achieved through the following technical solutions:

一种具有社团结构的网络表示学习方法,包括以下步骤:A network representation learning method with community structure, including the following steps:

步骤1:数据收集与处理阶段:使用一种密度函数,通过在网络G上使用随机游走策略,获得顶点序列样本S={s1,s2,...,sn};Step 1: Data collection and processing phase: use a density function to obtain vertex sequence samples S={s 1 ,s 2 ,...,s n } by using a random walk strategy on the network G;

步骤2:数据表示学习阶段:优化Skip-gram模型,使用Skip-gram模型来训练顶点序列样本S={s1,s2,...,sn},得到每个顶点序列的向量表示;Step 2: Data representation learning stage: optimize the Skip-gram model, use the Skip-gram model to train vertex sequence samples S={s 1 , s 2 ,...,s n }, and obtain the vector representation of each vertex sequence;

步骤3:数据计算阶段:对每个顶点序列的向量表示进行相似度计算,获得社团划分相似度。Step 3: Data calculation stage: Calculate the similarity of the vector representation of each vertex sequence to obtain the similarity of community division.

进一步地,一种具有社团结构的网络表示学习方法,所述步骤1中,顶点序列样本S={s1,s2,...,sn}中顶点序列表示为s={v1,v2...,v|s|}。Further, a network representation learning method with community structure, in the step 1, the vertex sequence in the vertex sequence sample S={s 1 ,s 2 ,...,s n } is represented as s={v1,v2 ...,v|s|}.

进一步地,一种具有社团结构的网络表示学习方法,所述步骤1中的密度函数定义为:

Figure BDA0002600807170000021
Further, a network representation learning method with community structure, the density function in the step 1 is defined as:
Figure BDA0002600807170000021

其中

Figure BDA0002600807170000022
Figure BDA0002600807170000023
分别是顶点序列s中所有顶点的内部度之和与外部度之和,而α是分辨率参数,用于控制社团的大小;in
Figure BDA0002600807170000022
and
Figure BDA0002600807170000023
are the sum of the internal and external degrees of all vertices in the vertex sequence s, respectively, and α is the resolution parameter used to control the size of the community;

所述密度函数还具有密度增益Δfv s,所述密度增益Δfv s应满足如下公式:The density function also has a density gain Δf v s , which should satisfy the following formula:

△fs=fs+{v}-fs Δf s =f s+{v} -f s

其中符号s+{v}表示将顶点v移到s后得到的新顶点序列。where the symbol s+{v} represents the new vertex sequence obtained by moving vertex v to s.

进一步地,一种具有社团结构的网络表示学习方法,所述步骤1中获得顶点序列样本的具体步骤为:Further, in a network representation learning method with a community structure, the specific steps of obtaining vertex sequence samples in the step 1 are:

步骤11:从集合N′(v|s|)中随机选择一个顶点v|s|+1Step 11: Randomly select a vertex v |s|+1 from the set N'(v |s| );

步骤12:根据公式△fs=fs+{v}-fs计算Δfs v|s|+1Step 12: Calculate Δf s v|s|+1 according to the formula Δf s =f s+{v} -f s ;

步骤13:如果Δfs v|s|+1<0,则从集合N′(v|s|)中删除v|s|+1,然后返回步骤11;Step 13: If Δf s v|s|+1 <0, delete v |s|+1 from the set N'(v |s| ), and then return to step 11;

步骤14:如果Δfs v|s|+1>0,则将v|s|+1添加到集合s上,并将v|s|+1标记为当前顶点;Step 14: If Δf s v|s|+1 >0, add v |s|+1 to the set s, and mark v |s|+1 as the current vertex;

其中顶点v|s|是最后添加的顶点,令最后添加的顶点v|s|为当前顶点;N′(v|s|)表示当前顶点v|s|的所有不在s中的邻居顶点集合;重复步骤11~步骤14直到不能增加顶点序列s的密度为止。where vertex v |s| is the last added vertex, let the last added vertex v |s| be the current vertex; N′(v |s| ) represents the set of all the neighbor vertices of the current vertex v |s| that are not in s; Repeat steps 11 to 14 until the density of the vertex sequence s cannot be increased.

进一步地,一种具有社团结构的网络表示学习方法,所述步骤2中Skip-gram模型通过最小化以下目标函数来训练顶点序列样本:Further, a network representation learning method with community structure, in the step 2, the Skip-gram model trains vertex sequence samples by minimizing the following objective function:

Figure BDA0002600807170000024
Figure BDA0002600807170000024

其中t是窗口大小,vj是vi在窗口内的上下文网络中的顶点表示,以上公式中的概率p(vj|vi)定义为where t is the window size, v j is the vertex representation of vi in the context network within the window, and the probability p(v j | vi ) in the above formula is defined as

Figure BDA0002600807170000031
Figure BDA0002600807170000031

其中Φ(s)表示s的嵌入向量,Φ′(s)表示上下文向量,s表示顶点序列集合。where Φ(s) represents the embedding vector of s, Φ′(s) represents the context vector, and s represents the vertex sequence set.

进一步地,一种具有社团结构的网络表示学习方法,所述步骤3中对每个顶点序列的向量表示进行相似度计算具体包括:对于网络中每个顶点序列的向量表示计算它与其他的顶点序列的向量表示的相似程度,具体使用NMI公式计算相似度。Further, a network representation learning method with a community structure, in the step 3, the similarity calculation for the vector representation of each vertex sequence specifically includes: calculating the vector representation of each vertex sequence in the network and other vertices. The similarity degree of the vector representation of the sequence, specifically using the NMI formula to calculate the similarity.

一种具有社团结构的网络表示学习装置,包括:A network representation learning device with community structure, comprising:

数据收集与处理模块,用于读取顶点序列样本,获得顶点序列样本S={s1,s2,...,sn};The data collection and processing module is used to read the vertex sequence samples and obtain the vertex sequence samples S={s 1 , s 2 ,...,s n };

数据表示学习模块,用于优化Skip-gram模型,使用Skip-gram模型来训练顶点序列样本S={s1,s2,...,sn},得到每个顶点序列的向量表示;The data representation learning module is used to optimize the Skip-gram model. The Skip-gram model is used to train vertex sequence samples S={s 1 , s 2 ,...,s n } to obtain the vector representation of each vertex sequence;

相似度计算模块,用于对每个顶点序列的向量表示进行相似度计算,获得社团划分相似度。The similarity calculation module is used to calculate the similarity of the vector representation of each vertex sequence to obtain the similarity of community division.

本发明方法使用如下NMI公式计算相似度:The inventive method uses the following NMI formula to calculate similarity:

Normalized Mutual Information measure(NMI)是一种基于信息论的指标,用来衡量两个社团划分A和B的相似度。NMI定义如下:The Normalized Mutual Information measure (NMI) is an information-theoretic metric used to measure the similarity of the divisions A and B of two communities. NMI is defined as follows:

Figure BDA0002600807170000032
Figure BDA0002600807170000032

其中C是相似矩阵,其中的行对应“真实”社团,而列对应“探测”社团,N是节点数。Cij是真实社团i与探测社团j中共同的顶点数。CA和CB分别代表真实社团和探测社团的个数。Ci.和C.j分别表示矩阵C的第i行和第j列的和。NMI的取值范围为0到1。如果真实社团划分与探测社团划分完全相同,则NMI等于1;反之则等于0。where C is the similarity matrix, where the rows correspond to the "true" communities and the columns correspond to the "probe" communities, and N is the number of nodes. C ij is the number of common vertices in real community i and probe community j. C A and C B represent the number of real communities and probe communities, respectively. C i . and C .j denote the sum of the i-th row and j-th column of matrix C, respectively. The value range of NMI is 0 to 1. If the real community partition is exactly the same as the detection community partition, the NMI is equal to 1; otherwise, it is equal to 0.

本发明与现有技术相比,具有如下的优点和有益效果:Compared with the prior art, the present invention has the following advantages and beneficial effects:

本发明方法不仅可以更好地捕获网络中的社团结构,并且可以在顶点分类任务中获得更高的准确率。The method of the present invention can not only better capture the community structure in the network, but also obtain higher accuracy in the vertex classification task.

附图说明Description of drawings

此处所说明的附图用来提供对本发明实施例的进一步理解,构成本申请的一部分,并不构成对本发明实施例的限定。在附图中:The accompanying drawings described herein are used to provide further understanding of the embodiments of the present invention, and constitute a part of the present application, and do not constitute limitations to the embodiments of the present invention. In the attached image:

图1是本发明的整体流程图。FIG. 1 is an overall flow chart of the present invention.

图2为人工网络上的NMI和Q-Walker的提升比例,参数α=1.5。Figure 2 shows the boost ratio of NMI and Q-Walker on artificial network, parameter α=1.5.

图3为参数α在四个真实网络上的最优区间。Figure 3 shows the optimal interval of parameter α on four real networks.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白,下面结合实施例和附图,对本发明作进一步的详细说明,本发明的示意性实施方式及其说明仅用于解释本发明,并不作为对本发明的限定。In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the embodiments and the accompanying drawings. as a limitation of the present invention.

实施例Example

如图1所示,一种具有社团结构的网络表示学习方法,包括以下步骤:As shown in Figure 1, a network representation learning method with community structure includes the following steps:

步骤1:数据收集与处理阶段:使用一种密度函数,通过在网络G上使用随机游走策略,获得顶点序列样本S={s1,s2,...,sn};Step 1: Data collection and processing phase: use a density function to obtain vertex sequence samples S={s 1 ,s 2 ,...,s n } by using a random walk strategy on the network G;

步骤2:数据表示学习阶段:优化Skip-gram模型,使用Skip-gram模型来训练顶点序列样本S={s1,s2,...,sn},得到每个顶点序列的向量表示;Step 2: Data representation learning stage: optimize the Skip-gram model, use the Skip-gram model to train vertex sequence samples S={s 1 , s 2 ,...,s n }, and obtain the vector representation of each vertex sequence;

步骤3:数据计算阶段:对每个顶点序列的向量表示进行相似度计算,获得社团划分相似度。Step 3: Data calculation stage: Calculate the similarity of the vector representation of each vertex sequence to obtain the similarity of community division.

进一步地,一种具有社团结构的网络表示学习方法,所述步骤1中,顶点序列样本S={s1,s2,...,sn}中顶点序列表示为s={v1,v2...,v|s|}。Further, a network representation learning method with community structure, in the step 1, the vertex sequence in the vertex sequence sample S={s 1 ,s 2 ,...,s n } is represented as s={v1,v2 ...,v|s|}.

进一步地,一种具有社团结构的网络表示学习方法,所述步骤1中的密度函数定义为:

Figure BDA0002600807170000041
Further, a network representation learning method with community structure, the density function in the step 1 is defined as:
Figure BDA0002600807170000041

其中

Figure BDA0002600807170000042
Figure BDA0002600807170000043
分别是顶点序列s中所有顶点的内部度之和与外部度之和,而α是分辨率参数,用于控制社团的大小;in
Figure BDA0002600807170000042
and
Figure BDA0002600807170000043
are the sum of the internal and external degrees of all vertices in the vertex sequence s, respectively, and α is the resolution parameter used to control the size of the community;

所述密度函数还具有密度增益Δfv s,所述密度增益Δfv s应满足如下公式:The density function also has a density gain Δf v s , which should satisfy the following formula:

△fs=fs+{v}-fs Δf s =f s+{v} -f s

其中符号s+{v}表示将顶点v移到s后得到的新顶点序列。where the symbol s+{v} represents the new vertex sequence obtained by moving vertex v to s.

进一步地,一种具有社团结构的网络表示学习方法,所述步骤1中获得顶点序列样本的具体步骤为:Further, in a network representation learning method with a community structure, the specific steps of obtaining vertex sequence samples in the step 1 are:

步骤11:从集合N′(v|s|)中随机选择一个顶点v|s|+1Step 11: Randomly select a vertex v |s|+1 from the set N'(v |s| );

步骤12:根据公式△fs=fs+{v}-fs计算Δfs v|s|+1Step 12: Calculate Δf s v|s|+1 according to the formula Δf s =f s+{v} -f s ;

步骤13:如果Δfs v|s|+1<0,则从集合N′(v|s|)中删除v|s|+1,然后返回步骤11;Step 13: If Δf s v|s|+1 <0, delete v |s|+1 from the set N'(v |s| ), and then return to step 11;

步骤14:如果Δfs v|s|+1>0,则将v|s|+1添加到集合s上,并将v|s|+1标记为当前顶点;Step 14: If Δf s v|s|+1 >0, add v |s|+1 to the set s, and mark v |s|+1 as the current vertex;

其中顶点v|s|是最后添加的顶点,令最后添加的顶点v|s|为当前顶点;N′(v|s|)表示当前顶点v|s|的所有不在s中的邻居顶点集合;重复步骤11~步骤14直到不能增加顶点序列s的密度为止。where vertex v |s| is the last added vertex, let the last added vertex v |s| be the current vertex; N′(v |s| ) represents the set of all the neighbor vertices of the current vertex v |s| that are not in s; Repeat steps 11 to 14 until the density of the vertex sequence s cannot be increased.

进一步地,一种具有社团结构的网络表示学习方法,所述步骤2中Skip-gram模型通过最小化以下目标函数来训练顶点序列样本:Further, a network representation learning method with community structure, in the step 2, the Skip-gram model trains vertex sequence samples by minimizing the following objective function:

Figure BDA0002600807170000051
Figure BDA0002600807170000051

其中t是窗口大小,vj是vi在窗口内的上下文网络中的顶点表示,以上公式中的概率p(vj|vi)定义为where t is the window size, v j is the vertex representation of vi in the context network within the window, and the probability p(v j | vi ) in the above formula is defined as

Figure BDA0002600807170000052
Figure BDA0002600807170000052

其中Φ(s)表示s的嵌入向量,Φ′(s)表示上下文向量,s表示顶点序列集合。where Φ(s) represents the embedding vector of s, Φ′(s) represents the context vector, and s represents the vertex sequence set.

进一步地,一种具有社团结构的网络表示学习方法,所述步骤3中对每个顶点序列的向量表示进行相似度计算具体包括:对于网络中每个顶点序列的向量表示计算它与其他的顶点序列的向量表示的相似程度,具体使用NMI公式计算相似度。Further, a network representation learning method with a community structure, in the step 3, the similarity calculation for the vector representation of each vertex sequence specifically includes: calculating the vector representation of each vertex sequence in the network and other vertices. The similarity degree of the vector representation of the sequence, specifically using the NMI formula to calculate the similarity.

一种具有社团结构的网络表示学习装置,包括:A network representation learning device with community structure, comprising:

数据收集与处理模块,用于读取顶点序列样本,获得顶点序列样本S={s1,s2,...,sn};The data collection and processing module is used to read the vertex sequence samples and obtain the vertex sequence samples S={s 1 , s 2 ,...,s n };

数据表示学习模块,用于优化Skip-gram模型,使用Skip-gram模型来训练顶点序列样本S={s1,s2,...,sn},得到每个顶点序列的向量表示;The data representation learning module is used to optimize the Skip-gram model. The Skip-gram model is used to train vertex sequence samples S={s 1 , s 2 ,...,s n } to obtain the vector representation of each vertex sequence;

相似度计算模块,用于对每个顶点序列的向量表示进行相似度计算,获得社团划分相似度。The similarity calculation module is used to calculate the similarity of the vector representation of each vertex sequence to obtain the similarity of community division.

本实施例中使用如下NMI公式计算相似度:In this embodiment, the following NMI formula is used to calculate the similarity:

Normalized Mutual Information measure(NMI)是一种基于信息论的指标,用来衡量两个社团划分A和B的相似度。NMI定义如下:The Normalized Mutual Information measure (NMI) is an information-theoretic metric used to measure the similarity of the divisions A and B of two communities. NMI is defined as follows:

Figure BDA0002600807170000053
Figure BDA0002600807170000053

其中C是相似矩阵,其中的行对应“真实”社团,而列对应“探测”社团,N是节点数。Cij是真实社团i与探测社团j中共同的顶点数。CA和CB分别代表真实社团和探测社团的个数。Ci.和C.j分别表示矩阵C的第i行和第j列的和。NMI的取值范围为0到1。如果真实社团划分与探测社团划分完全相同,则NMI等于1;反之则等于0。where C is the similarity matrix, where the rows correspond to the "true" communities and the columns correspond to the "probe" communities, and N is the number of nodes. C ij is the number of common vertices in real community i and probe community j. C A and C B represent the number of real communities and probe communities, respectively. C i . and C .j denote the sum of the i-th row and j-th column of matrix C, respectively. The value range of NMI is 0 to 1. If the real community partition is exactly the same as the detection community partition, the NMI is equal to 1; otherwise, it is equal to 0.

许多经典嵌入方法,如DeepWalk,Node2vec和DP-Walker,通过在网络G上使用随机行走策略,获得一组顶点序列样本S={s1,s2,...,sn},其中每个顶点序列可以表示为s={v1,...,v|s|}。通过将每个顶点序列看作一个文档中的句子,我们就可以使用Skip-gram模型来学习网络中的顶点表示:Many classical embedding methods, such as DeepWalk, Node2vec and DP-Walker, obtain a set of vertex sequence samples S={s 1 ,s 2 ,...,s n } by using a random walk strategy on the network G, where each The vertex sequence can be represented as s={v 1 ,...,v |s| }. By treating each vertex sequence as a sentence in a document, we can use the Skip-gram model to learn vertex representations in the network:

对于DeepWalk方法来说,它在随机行走过程中采用均匀分布p(vi+1|vi),即vi的每个邻居被选择的概率相等。For the DeepWalk method, it adopts a uniform distribution p(v i+1 |v i ) in the random walk process, that is, each neighbor of v i has an equal probability of being selected.

对于Node2vec方法来说,它在随机行走过程中使用有偏概率p(vi+1|vi),其定义为:For the Node2vec method, it uses a biased probability p(v i+1 |v i ) in the random walk process, which is defined as:

Figure BDA0002600807170000061
Figure BDA0002600807170000061

其中di-1,i+1表示顶点vi-1与顶点vi+1之间的最短路径距离。在等式(3)中,参数p和q分别控制广度优先搜索策略和深度优先搜索策略在随机行走过程中的比重。where d i-1, i+1 represents the shortest path distance between vertex v i-1 and vertex v i+1 . In equation (3), the parameters p and q control the proportions of the breadth-first search strategy and the depth-first search strategy in the random walk process, respectively.

对于DP-Walker来说,概率p(vi+1|vi)定义为For DP-Walker, the probability p(v i+1 |v i ) is defined as

Figure BDA0002600807170000062
Figure BDA0002600807170000062

其中ki是顶点vi的度,Ci,i+1是vi和vi+1的公共邻居数,β是模型参数。where k i is the degree of vertex v i , C i,i+1 is the number of common neighbors of v i and v i+1 , and β is the model parameter.

均值飘移聚类方法是一种非参数聚类的过程。与经典的k均值聚类方法相比,它不需要假设分布的形状,也不需要假设聚类的个数。给定n个数据点xi∈Rd(i=1,...,n),则基于径向对称核K(x)的多元核密度估计由以下公式给出:The mean-shift clustering method is a nonparametric clustering process. Compared with the classic k-means clustering method, it does not need to assume the shape of the distribution, nor the number of clusters. Given n data points x i ∈ R d (i=1,...,n), the multivariate kernel density estimate based on the radially symmetric kernel K(x) is given by:

Figure BDA0002600807170000063
Figure BDA0002600807170000063

其中h是核的半径。对于每个数据点xi,都会对其局部估计密度执行梯度上升优化策略,直到收敛为止。凡是与同一个中心点关联的数据点都属于同一个聚类。where h is the radius of the nucleus. For each data point xi , a gradient ascent optimization strategy is performed on its local estimated density until convergence. All data points associated with the same center point belong to the same cluster.

以下为实验分析:The following is the experimental analysis:

(1)真实网络(1) Real network

在社团挖掘实验中,我们使用了四个真实网络,分别是Karate,Football,Dolphin和PolBooks网络。表1列出了四个网络的详细信息,包括节点数(|V|),边数(|E|),平均度(<k>),均方度(<k2>),平均聚类系数(cc)和真实社团数(nc)。In the community mining experiments, we use four real networks, namely Karate, Football, Dolphin and PolBooks networks. Table 1 lists the details of the four networks, including the number of nodes (|V|), the number of edges (|E|), the mean degree (<k>), the mean square degree (<k 2 >), the mean clustering Coefficients (cc) and the number of true communities (nc).

表1:四个具有真实社团的网络统计信息Table 1: Four network statistics with real communities

Figure BDA0002600807170000071
Figure BDA0002600807170000071

(2)人工网络(2) Artificial network

在社团挖掘实验中,我们进一步使用人工网络来评估我们方法的性能。Plantedpartition模型是一个经典的人工基准网络生成器。该模型生成一个具有n=g·z个顶点的网络,其中z是社团数,g是每个社团中的顶点数。在同一社团中,任意两个顶点之间存在一条连边的概率为pin,而在不同社团之间,任意两个顶点之间存在一条连边的概率为Pout。每个顶点的平均度<k>=pin(g–1)+pout g(z-1)。如果pin>=pout,则网络具有社团结构,因为社团内的连边密度大于社团之间的连边密度。在本发明中,我们采用Girvan和Newman提出的l-partition模型的特例。他们设置z=4,g=32,<k>=16。表2展示了7个人工网络,其中顶点的内部平均度<kin>越大,则社团结构越强。In community mining experiments, we further use artificial networks to evaluate the performance of our method. The Plantedpartition model is a classic artificial benchmark network generator. The model generates a network with n=g z vertices, where z is the number of communities and g is the number of vertices in each community. In the same community, the probability of a connecting edge between any two vertices is p in , and between different communities, the probability of an edge existing between any two vertices is P out . Average degree of each vertex <k>= pin(g-1)+poutg ( z-1). If p in >= p out , the network has a community structure because the density of connections within communities is greater than the density of connections between communities. In the present invention, we adopt a special case of the l-partition model proposed by Girvan and Newman. They set z=4, g=32, <k>=16. Table 2 shows 7 artificial networks, where the larger the internal average degree <k in > of the vertices, the stronger the community structure.

表2:7个具有不同社团结构的人工网络统计信息Table 2: Statistics of 7 artificial networks with different community structures

Figure BDA0002600807170000072
Figure BDA0002600807170000072

(3)基准方法(3) Benchmark method

我们将我们的方法(Q-Walker)与三个网络嵌入方法(DeepWalk、Node2vec和DP-Walker)进行比较。We compare our method (Q-Walker) with three network embedding methods (DeepWalk, Node2vec and DP-Walker).

(4)社团探测(4) Community detection

在学习到每个节点的嵌入向量表示后,我们使用均值飘移聚类算法来挖掘社团。After learning the embedding vector representation of each node, we use the mean-shift clustering algorithm to mine communities.

(5)精度指标(5) Accuracy index

Normalized Mutual Information measure(NMI)是一种基于信息论的指标,用来衡量两个社团划分A和B的相似度。NMI定义如下:The Normalized Mutual Information measure (NMI) is an information-theoretic metric used to measure the similarity of the divisions A and B of two communities. NMI is defined as follows:

Figure BDA0002600807170000081
Figure BDA0002600807170000081

其中C是相似矩阵,其中的行对应“真实”社团,而列对应“探测”社团,N是节点数。Cij是真实社团i与探测社团j中共同的顶点数。CA和CB分别代表真实社团和探测社团的个数。Ci.和C.j分别表示矩阵C的第i行和第j列的和。NMI的取值范围为0到1。如果真实社团划分与探测社团划分完全相同,则NMI等于1;反之则等于0。where C is the similarity matrix, where the rows correspond to the "true" communities and the columns correspond to the "probe" communities, and N is the number of nodes. C ij is the number of common vertices in real community i and probe community j. C A and C B represent the number of real communities and probe communities, respectively. C i . and C .j denote the sum of the i-th row and j-th column of matrix C, respectively. The value range of NMI is 0 to 1. If the real community partition is exactly the same as the detection community partition, the NMI is equal to 1; otherwise, it is equal to 0.

(6)真实网络分析(6) Real network analysis

我们首先在具有已知社团的四个真实网络上评估Q-Walker的性能,结果如表3所示。从表3中可以看出,Q-Walker和DP-Walker在所有网络中的性能都优于另外两个算法,即DeepWalk和Node2vec。因此,接下来我们只比较Q-Walker和DP-Walker两个算法。在Karate网络上,Q-Walker和DP-Walker的NMI分别为1和0.837。因此,Q-Walker可以正确检测到所有的已知社团,与DP-Walker相比,其NMI提升了19.47%。同样,在Dolphin网络上,Q-Walker也可以找到所有的已知社团,其NMI提升了12.48%。在PolBooks网络上,Q-Walker和DP-Walker的NMI分别为0.679和0.581。与DP-Walker的NMI相比,Q-Walker的NMI提升了16.86%。在Football网络上,尽管所有方法的效果都还不错,但Q-Walker的NMI仍然略高于其他三种方法。与DP-Walker相比,Q-Walker的NMI提升了1.81%。We first evaluate the performance of Q-Walker on four real networks with known communities, and the results are shown in Table 3. As can be seen from Table 3, Q-Walker and DP-Walker outperform the other two algorithms, DeepWalk and Node2vec, in all networks. Therefore, next we only compare the Q-Walker and DP-Walker algorithms. On the Karate network, the NMI of Q-Walker and DP-Walker are 1 and 0.837, respectively. Therefore, Q-Walker can correctly detect all known communities, and its NMI is improved by 19.47% compared with DP-Walker. Similarly, on the Dolphin network, Q-Walker can also find all known communities, and its NMI is improved by 12.48%. On the PolBooks network, the NMI of Q-Walker and DP-Walker are 0.679 and 0.581, respectively. Compared with the NMI of DP-Walker, the NMI of Q-Walker is improved by 16.86%. On the Football network, the NMI of the Q-Walker is still slightly higher than that of the other three methods, although all methods perform fairly well. Compared with DP-Walker, the NMI of Q-Walker is improved by 1.81%.

表3:四个具有已知社团的真实网络上的NMITable 3: NMI on four real networks with known communities

Figure BDA0002600807170000082
Figure BDA0002600807170000082

(7)人工网络分析(7) Artificial network analysis

我们还在具有不同社团结构的人工网络上评估了我们方法的性能,其中表2展示了7个人工网络的详细信息。实验结果如图2所示。从图2中可以看出,当<kin>≤10.5,Q-Walker方法的性能优于其他三种方法。同时我们也注意到,与其它三种方法的NMI相比,Q-Walker提升的比例与<kin>成反比,即<kin>越小,Q-Walker提升的比例越高。以Node2vec为例,当<kin>=8.5时,Q-Walker提升的比例高于50%,而当<kin>=11.5时,Q-Walker的提升的比例为0%。其中的原因解释如下。当k=8.5时,网络具有许多弱社团结构。由于弱社团结构之间存在许多连边,因此在随机行走过程中,节点可以很容易地从一个弱社团跳跃到另一个弱社团。所以Node2vec采样的顶点序列s并不能很好地刻画弱社团结构,原因是s中的大多数顶点来自不同的弱社团结构。然而,对于Q-Walker来说,顶点序列s中的大多数顶点都来自于同一个社团结构,所以s具有相对紧密的内部连接。因此,Q-Walker可以很好地刻画弱社团结构。当k=11.5时,网络具有许多强社团结构。由于强社团结构的内部连接密度很高且强社团结构之间的连边较少,所以节点在大部分时间内都在同一个社团内部随机行走。因此,Node2vec采样的顶点序列s可以很好刻画强社团结构,原因是s中的大多数顶点都来自同一个社团。同样的,我们可以用上面的分析来解释另外两种基准方法DeepWalk和DP-Walker。综合以上分析,Q-Walker不仅在具有弱社团结构的网络中表现良好,而且在具有强社团结构的网络上也表现不错。We also evaluate the performance of our method on artificial networks with different community structures, where Table 2 shows the details of 7 artificial networks. The experimental results are shown in Figure 2. As can be seen from Figure 2, when <k in >≤10.5, the Q-Walker method outperforms the other three methods. At the same time, we also noticed that compared with the NMI of the other three methods, the ratio of Q-Walker improvement is inversely proportional to <k in >, that is, the smaller <k in >, the higher the ratio of Q-Walker improvement. Taking Node2vec as an example, when <k in >=8.5, the improvement ratio of Q-Walker is higher than 50%, and when <k in >=11.5, the improvement ratio of Q-Walker is 0%. The reasons for this are explained below. When k=8.5, the network has many weak community structures. Since there are many edges between weak community structures, nodes can easily jump from one weak community to another during random walk. Therefore, the vertex sequence s sampled by Node2vec cannot well describe the weak community structure, because most of the vertices in s come from different weak community structures. However, for Q-Walker, most of the vertices in the vertex sequence s are from the same community structure, so s has relatively tight internal connections. Therefore, Q-Walker can well characterize weak community structure. When k=11.5, the network has many strong community structures. Due to the high density of internal connections and few edges between strong community structures, nodes walk randomly within the same community most of the time. Therefore, the vertex sequence s sampled by Node2vec can well characterize the strong community structure, because most of the vertices in s are from the same community. Similarly, we can use the above analysis to explain two other benchmark methods, DeepWalk and DP-Walker. Based on the above analysis, Q-Walker not only performs well in networks with weak community structure, but also performs well in networks with strong community structure.

(8)参数敏感性(8) Parameter sensitivity

最后,我们通过改变公式

Figure BDA0002600807170000091
中参数α的值来评估我们方法的性能。实验中,参数α的取值范围为0.05≤α≤1.5的,结果如图3所示。从图3可以看出,每个网络都包含一个最优分辨率区间,在这个区间内,算法的性能稳定且最好。以Karate网络为例,在0.55≤α≤0.7,我们的算法能够发现所有已知的社团结构。另外,每个网络α的最优区间是不同的。以Dolphin和PolBooks网络为例,它们的最优区间分别为0.5≤α≤0.8和0.05≤α≤0.2。最优区间的不同与网络中的社团结构层次相关。通常地,不同网络的社团层次结构是不相同的。因此,不同网络的α最优区间也应该是不相同的。Finally, we change the formula
Figure BDA0002600807170000091
to evaluate the performance of our method. In the experiment, the value range of parameter α is 0.05≤α≤1.5, and the result is shown in Figure 3. As can be seen from Figure 3, each network contains an optimal resolution interval, and within this interval, the performance of the algorithm is stable and the best. Taking the Karate network as an example, at 0.55≤α≤0.7, our algorithm is able to discover all known community structures. In addition, the optimal interval for each network α is different. Taking the Dolphin and PolBooks networks as examples, their optimal intervals are 0.5≤α≤0.8 and 0.05≤α≤0.2, respectively. The difference in the optimal interval is related to the level of community structure in the network. Generally, the community hierarchy of different networks is not the same. Therefore, the optimal interval of α for different networks should also be different.

(9)多标签分类(9) Multi-label classification

此外,我们进一步评估了我们的方法在多标签分类任务中的性能。为了方便我们的方法和其他三种方法进行比较,我们使用了以下实验步骤。具体来说,我们随机选择一部分顶点作为训练集,其余顶点则作为测试集。然后用LibLinear实现的逻辑斯蒂多分类模型返回概率最高的标签。我们将上述过程重复执行50次,然后求Micro-F1和Macro F1的平均分数。我们在BlogCatalog网络上进行了该实验,并设置参数α=1。为了加快多标签分类器的训练速度,我们在BlogCatalog网络上选择了较小的训练集,比例从1%到9%。表4和表5分别显示了在BlogCatalog上的Micro F1和Macro F1。粗体数字表示每列中的最高分数。从表4和表5可以看出,根据Micro-F1和Macro F1的分数,Q-Walker算法优于其他三种方法。Furthermore, we further evaluate the performance of our method on multi-label classification tasks. To facilitate the comparison of our method with the other three methods, we used the following experimental steps. Specifically, we randomly select a portion of the vertices as the training set and the rest as the test set. Then a logistic classification model implemented with LibLinear returns the label with the highest probability. We repeated the above process 50 times and then averaged the Micro-F1 and Macro F1 scores. We conducted this experiment on the BlogCatalog network and set the parameter α=1. To speed up the training of multi-label classifiers, we choose a smaller training set on the BlogCatalog network, ranging from 1% to 9%. Table 4 and Table 5 show the Micro F1 and Macro F1 on BlogCatalog, respectively. Bold numbers indicate the highest score in each column. As can be seen from Table 4 and Table 5, according to the scores of Micro-F1 and Macro F1, the Q-Walker algorithm outperforms the other three methods.

表4:BlogCatalog上的Micro F1得分Table 4: Micro F1 Score on BlogCatalog

Figure BDA0002600807170000101
Figure BDA0002600807170000101

表5:BlogCatalog上的Macro F1得分Table 5: Macro F1 Scores on BlogCatalog

Figure BDA0002600807170000102
Figure BDA0002600807170000102

以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The specific embodiments described above further describe the objectives, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above descriptions are only specific embodiments of the present invention and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims (7)

1.一种具有社团结构的网络表示学习方法,其特征在于,包括以下步骤:1. A network representation learning method with community structure, characterized in that, comprising the following steps: 步骤1:数据收集与处理阶段:使用一种密度函数,通过在网络G上使用随机游走策略,获得顶点序列样本S={s1,s2,...,sn};Step 1: Data collection and processing phase: use a density function to obtain vertex sequence samples S={s 1 ,s 2 ,...,s n } by using a random walk strategy on the network G; 步骤2:数据表示学习阶段:优化Skip-gram模型,使用Skip-gram模型来训练顶点序列样本S={s1,s2,...,sn},得到每个顶点序列的向量表示;Step 2: Data representation learning stage: optimize the Skip-gram model, use the Skip-gram model to train vertex sequence samples S={s 1 , s 2 ,...,s n }, and obtain the vector representation of each vertex sequence; 步骤3:数据计算阶段:对每个顶点序列的向量表示进行相似度计算,获得社团划分相似度。Step 3: Data calculation stage: Calculate the similarity of the vector representation of each vertex sequence to obtain the similarity of community division. 2.根据权利要求1所述的一种具有社团结构的网络表示学习方法,其特征在于,所述步骤1中,顶点序列样本S={s1,s2,...,sn}中顶点序列表示为s={v1,v2...,v|s|}。2. A network representation learning method with community structure according to claim 1, characterized in that, in the step 1, in the vertex sequence sample S={s 1 , s 2 ,...,s n } The vertex sequence is denoted as s={v1,v2...,v|s|}. 3.根据权利要求2所述的一种具有社团结构的网络表示学习方法,其特征在于,3. a kind of network representation learning method with community structure according to claim 2 is characterized in that, 所述步骤1中的密度函数定义为:
Figure FDA0002600807160000011
The density function in step 1 is defined as:
Figure FDA0002600807160000011
其中
Figure FDA0002600807160000012
Figure FDA0002600807160000013
分别是顶点序列s中所有顶点的内部度之和与外部度之和,而α是分辨率参数,用于控制社团的大小;
in
Figure FDA0002600807160000012
and
Figure FDA0002600807160000013
are the sum of the internal and external degrees of all vertices in the vertex sequence s, respectively, and α is the resolution parameter used to control the size of the community;
所述密度函数还具有密度增益Δfv s,所述密度增益Δfv s应满足如下公式:The density function also has a density gain Δf v s , which should satisfy the following formula: △fs=fs+{v}-fs Δf s =f s+{v} -f s 其中符号s+{v}表示将顶点v移到s后得到的新顶点序列。where the symbol s+{v} represents the new vertex sequence obtained by moving vertex v to s.
4.根据权利要求3所述的一种具有社团结构的网络表示学习方法,其特征在于,所述步骤1中获得顶点序列样本的具体步骤为:4. a kind of network representation learning method with community structure according to claim 3, is characterized in that, the concrete step that obtains vertex sequence sample in described step 1 is: 步骤11:从集合N′(v|s|)中随机选择一个顶点v|s|+1Step 11: Randomly select a vertex v |s|+1 from the set N'(v |s| ); 步骤12:根据公式△fs=fs+{v}-fs计算Δfs v|s|+1Step 12: Calculate Δf s v|s|+1 according to the formula Δf s =f s+{v} -f s ; 步骤13:如果Δfs v|s|+1<0,则从集合N′(v|s|)中删除v|s|+1,然后返回步骤11;Step 13: If Δf s v|s|+1 <0, delete v |s|+1 from the set N'(v |s| ), and then return to step 11; 步骤14:如果Δfs v|s|+1>0,则将v|s|+1添加到集合s上,并将v|s|+1标记为当前顶点;Step 14: If Δf s v|s|+1 >0, add v |s|+1 to the set s, and mark v |s|+1 as the current vertex; 其中顶点v|s|是最后添加的顶点,令最后添加的顶点v|s|为当前顶点;N′(v|s|)表示当前顶点v|s|的所有不在s中的邻居顶点集合;重复步骤11~步骤14直到不能增加顶点序列s的密度为止。where vertex v |s| is the last added vertex, let the last added vertex v |s| be the current vertex; N′(v |s| ) represents the set of all the neighbor vertices of the current vertex v |s| that are not in s; Repeat steps 11 to 14 until the density of the vertex sequence s cannot be increased. 5.根据权利要求2所述的一种具有社团结构的网络表示学习方法,其特征在于,所述步骤2中Skip-gram模型通过最小化以下目标函数来训练顶点序列样本:5. a kind of network representation learning method with community structure according to claim 2, is characterized in that, in described step 2, Skip-gram model trains vertex sequence samples by minimizing following objective function:
Figure FDA0002600807160000014
Figure FDA0002600807160000014
其中t是窗口大小,vj是vi在窗口内的上下文网络中的顶点表示,以上公式中的概率p(vj|vi)定义为where t is the window size, v j is the vertex representation of vi in the context network within the window, and the probability p(v j | vi ) in the above formula is defined as
Figure FDA0002600807160000021
Figure FDA0002600807160000021
其中Φ(s)表示s的嵌入向量,Φ′(s)表示上下文向量,s表示顶点序列集合。where Φ(s) represents the embedding vector of s, Φ′(s) represents the context vector, and s represents the vertex sequence set.
6.根据权利要求1所述的一种具有社团结构的网络表示学习方法,其特征在于,所述步骤3中对每个顶点序列的向量表示进行相似度计算具体包括:对于网络中每个顶点序列的向量表示计算它与其他的顶点序列的向量表示的相似程度,具体使用NMI公式计算相似度。6. a kind of network representation learning method with community structure according to claim 1, is characterized in that, in described step 3, carrying out similarity calculation to the vector representation of each vertex sequence specifically comprises: for each vertex in the network The vector representation of the sequence calculates the degree of similarity between it and the vector representation of other vertex sequences. Specifically, the NMI formula is used to calculate the similarity. 7.一种具有社团结构的网络表示学习装置,其特征在于,包括:7. A network representation learning device with community structure, characterized in that it comprises: 数据收集与处理模块,用于读取顶点序列样本,获得顶点序列样本S={s1,s2,...,sn};The data collection and processing module is used to read the vertex sequence samples and obtain the vertex sequence samples S={s 1 , s 2 ,...,s n }; 数据表示学习模块,用于优化Skip-gram模型,使用Skip-gram模型来训练顶点序列样本S={s1,s2,...,sn},得到每个顶点序列的向量表示;The data representation learning module is used to optimize the Skip-gram model. The Skip-gram model is used to train vertex sequence samples S={s 1 , s 2 ,...,s n } to obtain the vector representation of each vertex sequence; 相似度计算模块,用于对每个顶点序列的向量表示进行相似度计算,获得社团划分相似度。The similarity calculation module is used to calculate the similarity of the vector representation of each vertex sequence to obtain the similarity of community division.
CN202010723330.9A 2020-07-24 2020-07-24 A network representation learning method and device with community structure Pending CN111860866A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010723330.9A CN111860866A (en) 2020-07-24 2020-07-24 A network representation learning method and device with community structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010723330.9A CN111860866A (en) 2020-07-24 2020-07-24 A network representation learning method and device with community structure

Publications (1)

Publication Number Publication Date
CN111860866A true CN111860866A (en) 2020-10-30

Family

ID=72950884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010723330.9A Pending CN111860866A (en) 2020-07-24 2020-07-24 A network representation learning method and device with community structure

Country Status (1)

Country Link
CN (1) CN111860866A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115422445A (en) * 2022-08-22 2022-12-02 深圳大学 Network representation learning method based on hierarchical negative sampling
CN116980559A (en) * 2023-06-09 2023-10-31 负熵信息科技(武汉)有限公司 Metropolitan area level video intelligent bayonet planning layout method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140317736A1 (en) * 2013-04-23 2014-10-23 Telefonica Digital Espana, S.L.U. Method and system for detecting fake accounts in online social networks
CN108399189A (en) * 2018-01-23 2018-08-14 重庆邮电大学 Friend recommendation system based on community discovery and its method
CN109615550A (en) * 2018-11-26 2019-04-12 兰州大学 A Similarity-Based Local Community Detection Method
CN110598128A (en) * 2019-09-11 2019-12-20 西安电子科技大学 A Community Detection Method for Large-Scale Networks Against Sybil Attacks
US20200142957A1 (en) * 2018-11-02 2020-05-07 Oracle International Corporation Learning property graph representations edge-by-edge

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140317736A1 (en) * 2013-04-23 2014-10-23 Telefonica Digital Espana, S.L.U. Method and system for detecting fake accounts in online social networks
CN108399189A (en) * 2018-01-23 2018-08-14 重庆邮电大学 Friend recommendation system based on community discovery and its method
US20200142957A1 (en) * 2018-11-02 2020-05-07 Oracle International Corporation Learning property graph representations edge-by-edge
CN109615550A (en) * 2018-11-26 2019-04-12 兰州大学 A Similarity-Based Local Community Detection Method
CN110598128A (en) * 2019-09-11 2019-12-20 西安电子科技大学 A Community Detection Method for Large-Scale Networks Against Sybil Attacks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何嘉林: "时序网络中的社团探测及演化分析方法", 计算机工程与设计, vol. 38, no. 8 *
王慧雪;: "基于node2vec的社区检测方法", 计算机与数字工程, no. 02 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115422445A (en) * 2022-08-22 2022-12-02 深圳大学 Network representation learning method based on hierarchical negative sampling
CN115422445B (en) * 2022-08-22 2024-11-26 深圳大学 A network representation learning method based on layered negative sampling
CN116980559A (en) * 2023-06-09 2023-10-31 负熵信息科技(武汉)有限公司 Metropolitan area level video intelligent bayonet planning layout method
CN116980559B (en) * 2023-06-09 2024-02-09 负熵信息科技(武汉)有限公司 Metropolitan area level video intelligent bayonet planning layout method

Similar Documents

Publication Publication Date Title
CN108228716B (en) SMOTE _ Bagging integrated sewage treatment fault diagnosis method based on weighted extreme learning machine
CN110378366B (en) A Cross-Domain Image Classification Method Based on Coupling Knowledge Transfer
Wu et al. Attribute weighting via differential evolution algorithm for attribute weighted naive bayes (wnb)
Wu et al. AutoCTS+: Joint neural architecture and hyperparameter search for correlated time series forecasting
Liang et al. Review–a survey of learning from noisy labels
CN109086412A (en) A kind of unbalanced data classification method based on adaptive weighted Bagging-GBDT
CN104731882B (en) A kind of adaptive querying method that weighting sequence is encoded based on Hash
De Sousa et al. Evaluating and comparing the igraph community detection algorithms
CN108734223A (en) The social networks friend recommendation method divided based on community
CN103455612B (en) Based on two-stage policy non-overlapped with overlapping network community detection method
CN111539448A (en) A meta-learning-based method for few-shot image classification
Zhang et al. Normalized modularity optimization method for community identification with degree adjustment
CN111860866A (en) A network representation learning method and device with community structure
CN114329031A (en) Fine-grained bird image retrieval method based on graph neural network and deep hash
Jabbour et al. Triangle-driven community detection in large graphs using propositional satisfiability
Ye et al. SAME: Uncovering GNN black box with structure-aware shapley-based multipiece explanations
CN107451617A (en) One kind figure transduction semisupervised classification method
CN109543114A (en) Heterogeneous Information network linking prediction technique, readable storage medium storing program for executing and terminal
CN115086179B (en) Detection method for community structure in social network
CN118038167A (en) A small sample image classification method based on metric meta-learning
Wang et al. Community focusing: yet another query-dependent community detection
Jokar et al. Discovering community structure in social networks based on the synergy of label propagation and simulated annealing
Qiao et al. Genetic feature fusion for object skeleton detection
CN109522954A (en) Heterogeneous Information network linking prediction meanss
Liu et al. Deep pyramidal residual networks for plankton image classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201030