WO2016090877A1 - 一种广义最大度随机游走图抽样算法 - Google Patents

一种广义最大度随机游走图抽样算法 Download PDF

Info

Publication number
WO2016090877A1
WO2016090877A1 PCT/CN2015/081147 CN2015081147W WO2016090877A1 WO 2016090877 A1 WO2016090877 A1 WO 2016090877A1 CN 2015081147 W CN2015081147 W CN 2015081147W WO 2016090877 A1 WO2016090877 A1 WO 2016090877A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
algorithm
sample
random walk
degree
Prior art date
Application number
PCT/CN2015/081147
Other languages
English (en)
French (fr)
Inventor
李荣华
邱宇轩
毛睿
秦璐
金檀
蔡涛涛
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Publication of WO2016090877A1 publication Critical patent/WO2016090877A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • the invention belongs to the field of large-scale data mining technology, and in particular relates to a generalized maximum random walk pattern sampling algorithm.
  • the graph-based traversal method mainly uses a breadth-first search (BFS) or a depth-first search (DFS) acquisition node.
  • BFS breadth-first search
  • DFS depth-first search
  • the main disadvantage of this type of method is that in the process of collecting nodes, the algorithm will be biased towards higher-degree nodes, which obviously does not match the target of a uniform node sample.
  • this kind of algorithm is The relatively high node bias cannot be theoretically portrayed, so it is difficult to correct this bias, and thus it is impossible to obtain a uniform node sample.
  • the algorithm based on random walk solves the defects of graph traversal-based algorithms. They can directly generate unbiased node samples or generate node samples with biased but known bias. Therefore, such algorithms are used in graph sampling.
  • RW re-weighted random walk
  • MD maximum random walk
  • represents the number of nodes, and m
  • N(u) be the set of all neighboring nodes of node u ⁇ V
  • d u
  • f:V ⁇ R be a real-valued function defined on the node set V, representing the value of a certain property of the node u, such as the degree of the node, or a certain attribute value of the node.
  • the goal is to estimate the average of the f(u) values of all nodes in the entire network, recorded as
  • both RW and MD algorithms can produce a pair. Unbiased estimate.
  • the RW algorithm uses a re-weighting strategy. Specifically, the RW algorithm uses estimation (S represents the set of sample nodes collected, w rw (u) ⁇ 1 / d u represents the weight of node u, where ⁇ represents a proportional relationship) to estimate This estimate can be explained by the framework of importance sampling (IS).
  • the IS framework uses a relatively easy to implement test distribution to replace the target distribution to collect sample nodes, and then uses importance weighting to construct an unbiased estimate.
  • the target distribution is a uniform distribution ⁇ u and the experimental distribution is ⁇ rw .
  • the estimation accuracy of the sampling algorithm based on the IS framework depends on the chi-square distance between the experimental distribution and the target distribution. The larger the chi-square distance between the two, the worse the estimation accuracy of the sampling algorithm.
  • the chi-square distance is defined as follows: Let p, q be the test distribution and the target distribution, respectively, then the chi-square distance between p and q is var p (q(X)/p(X)), where var represents the variance.
  • the MD algorithm is an unbiased graph sampling algorithm that randomly moves the acquisition nodes from a dynamically constructed rule graph, which can directly obtain uniform node samples.
  • the principle is that by adding a self-loop to the nodes of the original graph, the degree of each node is equal to the maximum degree of the graph, and a rule graph (a graph with equal node degrees is called a rule graph) is generated.
  • the random walk algorithm proceeds to node u, it randomly selects a node from the adjacent node set N(u) of the u node with probability 1/d max , where d max represents the maximum degree of the graph (the degree of the node with the largest degree) ).
  • the algorithm will stay on the original node u with a probability of (d max -d u )/d max .
  • the experimental distribution ⁇ rw of the RW algorithm is proportional to the degree of the node, and the target distribution is a uniform distribution ⁇ u .
  • the node degree of the network is often not uniform, but a long tail phenomenon. Therefore, in many applications, the experimental distribution ⁇ rw of the RW algorithm and the target distribution ⁇ u are greatly deviated.
  • the effectiveness of the RW algorithm depends on the closeness of ⁇ rw and ⁇ u . Therefore, in a real network, the RW algorithm tends to have a large deviation.
  • the MD algorithm is capable of producing uniform samples, so it can avoid the "big deviation problem” of the RW algorithm. But it produces a self-loop, which produces a lot of duplicate samples, and this situation is especially severe on nodes with smaller degrees. Too many repeated samples usually lead to a large estimated variance, which reduces the estimation accuracy of the algorithm. This defect of the MD algorithm is called a "repeated samples problem”.
  • the maximum degree of a node is usually unknown. To solve this problem, the usual practice is to set the maximum to a very large constant to ensure that the constant is greater than the true maximum. Obviously, this method will lead to more self-loops, which will aggravate the "repetitive sample problem.”
  • the invention provides a generalized maximum random walk pattern sampling algorithm, which can effectively balance the "large deviation problem” of the RW algorithm and the "repetitive sample problem” of the MD algorithm, thereby improving the overall efficiency of collecting sample points from the network.
  • a generalized maximum random walk map sampling algorithm comprising the following steps:
  • S i represents the i-th node collected by the algorithm
  • ⁇ i refers to the number of repetitions used to represent the sample S i .
  • d u represents the degree of node u and C is a non-negative integer.
  • the above generalized maximum random walk algorithm referred to as GMD algorithm
  • GMD algorithm can effectively solve the problem of extracting uniform samples from a "hidden” online social network, which balances the "large deviation problem” of the RW algorithm.
  • the “repetitive sample problem” of the MD algorithm can replace the existing widely used RW and MD algorithms to solve the sampling problem of online social networks.
  • FIG. 1 is a schematic diagram of sample collection of a random walk algorithm to be performed.
  • the invention provides a new generalized maximum degree random walk algorithm, hereinafter referred to as GMD algorithm.
  • the GMD algorithm introduces a parameter C (C is a non-negative integer) above the MD algorithm to control the number of self-loops. Its probability transfer equation is as follows:
  • C is a non-negative integer.
  • the GMD algorithm includes two steps: first, randomly collecting the samples on the map by the above transition probability; secondly, constructing an unbiased estimate based on the collected samples.
  • the detailed process of the first step is as follows:
  • the node u is taken as S i and added to the set S;
  • the geometric random variable ⁇ i in the random walk algorithm represents the number of repetitions of the sample S i .
  • the GMD algorithm constructs an unbiased estimate by the following formula:
  • S i represents the i-th node collected by the algorithm
  • ⁇ i refers to the number of repetitions used to represent the sample S i .
  • the GMD algorithm adds fewer self-loops to each graph node than the MD algorithm. Therefore, the GMD algorithm can solve the "repetitive sample problem" of the MD algorithm to some extent. Moreover, the GMD algorithm can also solve the problem of the maximum degree unknown in the MD algorithm. In addition, it can be proved that the chi-square distance between the experimental distribution of the GMD algorithm and the target distribution (uniform distribution) is smaller than the chi-square distance between the experimental distribution of the RW algorithm and the target distribution. Therefore, the GMD algorithm can also solve the "large deviation problem" of the RW algorithm to some extent.
  • ⁇ gmd (v) / ⁇ gmd (u) max ⁇ d v , C ⁇ / max ⁇ d u , C ⁇ .
  • GMD algorithm generalized maximum random walk algorithm
  • Output Collect a collection S containing 2 sample points
  • a node v is randomly selected from the adjacent nodes of v 4 with a medium probability. Use it as the initial node for the next step.
  • the generalized maximum random walk algorithm that is, the GMD algorithm can effectively solve the problem of extracting uniform samples from a "hidden” online social network, which well balances the "large deviation problem” of the RW algorithm. And the "repetitive sample problem” of the MD algorithm. Based on this, the GMD algorithm can replace the existing widely used RW and MD algorithms to solve the sampling problem of online social networks.

Abstract

一种广义最大度随机游走图抽样算法,其包括:在图上随机游走采集样本;根据采集得到的样本构造无偏估计。该算法能够有效地平衡RW算法的"大偏差问题"以及MD算法的"重复样本问题",从而提升了从网络中采集样本点的整体效率。

Description

一种广义最大度随机游走图抽样算法 技术领域
本发明属于大图数据挖掘技术领域,尤其涉及一种广义最大度随机游走图抽样算法。
背景技术
近年来,在线社交网络分析在学术界和工业界都引起了广泛关注。在所有在线社交网络分析的相关研究中,一个最为基本的研究问题是估计社交网络中的节点性质以及整个社交网络的拓扑特性。然而,由于很多在线的社交网络公司,例如腾讯、新浪微博、Facebook以及Twitter等,都没有向第三方发布其社交网络的图谱数据,并且整个社交图谱数据的大小对于第三方来说往往都是未知的。因此,广大从事社交网络分析的研究者和开发者都面临一个非常困难的数据采集问题。这里的主要难点在于,如何设计和开发出一种简便的方法来从一个“对于研究者不可见”的社交网络中提取出均匀的图节点样本。
为了解决这一问题,目前在学术界有很多基于爬虫技术的网络抽样方法被提出并广泛使用。可以把这些方法分为两大类:一类是基于图遍历的方法,另一类则是基于随机游走的方法。基于图遍历的方法主要是应用广度优先搜索(BFS,breadth-first search)或者深度优先搜索(DFS,depth-first search)采集节点。然而,这一类方法的主要缺点是在采集节点的过程中,算法会偏向于度比较高的节点,这显然与需要均匀的节点样本的目标不相符。并且,这一类算法对度 比较高的节点偏向多少无法从理论上刻画,因此很难纠正这一偏向,进而无法得到均匀的节点样本。目前,这一类算法逐渐被学术界和工业界弃用。基于随机游走的算法很好地解决了基于图遍历的算法的缺陷,它们可以直接生成无偏的节点样本,或者生成有偏但是偏向性已知的节点样本,故而这类算法在图采样中广受欢迎。目前有两种非常流行的基于随机游走的图抽样算法。第一种算法是重新加权的随机游走算法,称之为RW(re-weighted random walk)算法;第二种算法是最大度随机游走算法,称之为MD(maximum-degree random walk)算法。下面简要介绍这两种算法。
将网络抽象成一个图G=(V,E),其中n=|V|代表节点的个数,m=|E|代表边的条数。令N(u)为节点u∈V的所有邻接节点的集合,du=|N(u)|表示节点u的度。令f:V→R是一个定义在节点集V上的实值函数,表示节点u的某种特性的值,例如节点的度,或者节点的某个属性值。在估计网络特性的问题中,目标是估计整个网络中所有节点的f(u)值的平均值,记为
Figure PCTCN2015081147-appb-000001
这里的πu=[1/n,...,1/n]表示均匀分布。例如,如果定义f(u)=du,那么
Figure PCTCN2015081147-appb-000002
代表的是图G中节点度的平均值。如果定义
Figure PCTCN2015081147-appb-000003
Figure PCTCN2015081147-appb-000004
表示的是图G中节点的度分布,这里
Figure PCTCN2015081147-appb-000005
是一个指示函数,如果du=d,则
Figure PCTCN2015081147-appb-000006
否则
Figure PCTCN2015081147-appb-000007
在现有的文献中,RW和MD算法都能产生一个对
Figure PCTCN2015081147-appb-000008
的无偏估计。RW算法是在图中执行一次随机游走来采集节点样本。众所周知,在一个非周期性的无向连通图中采用随机游走所采集到的节点样本并不 是一个均匀分布。根据随机游走的稳定分布理论,节点被选取的概率和节点的度成正比,也即对于u∈V,有πrw(u)=du/2m,这里的πrw表示随机游走的稳定分布。因此,根据随机游走的采集样本策略,图中每个节点被采集到的概率是不一样的,度大的节点被采集到的概率比度小的节点被采集到的概率要大,也就是说随机游走的算法更偏向于度比较高的节点。为了纠正这种偏向性,RW算法采用了一种重新加权的策略。具体地,RW算法采用估计
Figure PCTCN2015081147-appb-000009
(S表示采集到的样本节点的集合,wrw(u)∝1/du代表节点u的权值,其中∝表示正比关系)来估计
Figure PCTCN2015081147-appb-000010
这一估计可以用重要性抽样(IS,important sampl ing)的框架来加以解释。具体地,IS框架采用的是相对比较容易实现的试验分布来代替目标分布采集样本节点,然后采用重要性加权来构造无偏估计。在RW算法中,目标分布是一个均匀分布πu,试验分布是πrw。根据IS框架,节点u的重要性权值为wrw(u)□πu(u)/πrw(u)=2m/ndu∝1/du。因此,根据IS框架,可以得到估计
Figure PCTCN2015081147-appb-000011
并且在理论上可以证明
Figure PCTCN2015081147-appb-000012
是渐进无偏的。也即,当n→∞时
Figure PCTCN2015081147-appb-000013
Figure PCTCN2015081147-appb-000014
的方差取决于f(u)wrw(u)的方差。当f(u)与wrw(u)=πu(u)/πrw(u)无关时,
Figure PCTCN2015081147-appb-000015
的方差仅取决于πu(u)和πrw(u)的相近程度。根据“刘氏法则”,基于IS框架的抽样算法的估计精度依赖于试验分布与目标分布的卡方距离。二者的卡方距离越大,抽样算法的估计精度就越差。这里卡方距离的定义如下:令p,q分别为试验分布和目标分布,则p与q的卡方距离为varp(q(X)/p(X)),其中var表示方差。MD算法是一个无偏的图抽样算法,它是从一个动态构造的规则图上随 机游走采集节点,该算法能够直接得到均匀的节点样本。其原理是,通过在原始图的节点上加上自环,使得每个节点的度都等于图的最大度,生成一个规则图(节点度都相等的图称之为规则图)。当随机游走算法进行到节点u时,它以概率1/dmax从u节点的邻接节点集合N(u)中随机选取一个节点,这里dmax表示图的最大度(度最大的节点的度)。根据这一过程,对于节点u,该算法将以(dmax-du)/dmax的概率停留在原来的节点u上。使用重要性抽样(IS,important sampl ing)的框架,可知MD算法的试验分布πmd和目标分布πu=[1/n,...,1/n]一致。因此,MD算法可以直接采用样本的均值来估计
Figure PCTCN2015081147-appb-000016
并且该估计也是渐进无偏的。
以上所述的算法中,根据IS框架,RW算法的试验分布πrw与节点的度成正比,而目标分布是一个均匀分布πu。在很多现实的社交网络中,网络的节点度往往并不均匀,而是呈现长尾现象。因此,在很多应用中,RW算法的试验分布πrw和目标分布πu有很大的偏离。根据“刘氏法则”,RW算法的有效性取决于πrw和πu的相近程度。所以在现实的网络中,RW算法往往会产生有很大的偏差,这个问题被称为“大偏差问题”(large deviation problem)。MD算法能够产生均匀的样本,因此它能够避免RW算法的“大偏差问题”。但它会产生自环(self-loop),因而会产生很多重复的样本,并且这种情况在度比较小的节点上显得尤为严重。而过多的重复样本,通常会导致较大的估计方差,从而降低算法的估计精度,MD算法的这一缺陷被称为“重复样本问题”(repeated samples problem)。另外,在现实的很多 网络中,节点的最大度通常来说是未知的。为了解决这个问题,通常的做法是将最大度设为一个非常大的常数,从而保证该常数要大于真实的最大度。显然,这一方法会导致更多的自环,从而加重“重复样本问题”。
发明内容
本发明提供一种广义最大度随机游走图抽样算法,能够有效地平衡RW算法的“大偏差问题”以及MD算法的“重复样本问题”,从而提升了从网络中采集样本点的整体效率。
本发明通过以下技术手段实现:
一种广义最大度随机游走图抽样算法,包括以下步骤:
S1,在图上随机游走采集样本;采集到样本点集S;在图中随机选择节点u设为初始节点,并且将计数器i置为1;使用du/max{du,C}作为参数生成一个几何随机变量ξi并加入集合ξ;将节点u作为Si,并加入样本点集S;从节点u的邻接节点中等概率随机选取一个节点v;将节点v作为下一步的节点u,计数器i加1,返回采集到的样本点集S和相应的几何随机变量集ξ;循环执行直至不满足条件;
S2,根据采集得到的样本构造无偏估计;构造无偏估计的公式为:
Figure PCTCN2015081147-appb-000017
其中,Si表示算法收集到的第i个节点,ξi指用来表示样本Si的重复次数。
其中,在图上随机游走采集样本的概率转移方程如下:
Figure PCTCN2015081147-appb-000018
其中du表示节点u的度,C是一个非负整数。
以上的广义最大度随机游走算法,简称为GMD算法,能够有效地解决从一个“隐藏”的在线社交网络中提取均匀样本的问题,它很好地平衡了RW算法的“大偏差问题”,以及MD算法的“重复样本问题”。基于此,GMD算法可以取代现有的广泛使用的RW以及MD算法来解决在线社交网络的抽样问题。
附图说明
图1为待进行随机游走算法样本采集的示意图。
具体实施方式
以下将结合具体的附图对本发明具体的实施方式进行详细说明。
本发明提供了一种新的广义最大度随机游走算法,以下简称GMD算法。
GMD算法在MD算法之上引入一个参数C(C为一个非负整数)来控制自环的数目,它的概率转移方程如下:
Figure PCTCN2015081147-appb-000019
其中C是一个非负整数。
具体地,GMD算法包括两个步骤:首先,通过上述的转移概率在图上随机游走采集样本;其次,根据采集得到的样本构造无偏估计。 其中,第一步的详细过程如下所示:
输入:图G=(V,E)
输出:采集到的样本点集S
1 在图中随机选择节点u设为初始节点,并且将计数器i置为1
2 循环执行直至不满足条件
2.1 使用du/max{du,C}作为参数生成一个几何随机变量ξi并加入ξ;
2.2 将节点u作为Si,并加入集合S;
2.3 从u的邻接节点中等概率随机选取一个节点v;
2.4 将节点v作为下一步的节点u
2.5 计数器i加1
3 返回采集到的样本点集S和相应的几何随机变量集ξ
在这一步骤中,由于采用了一个几何随机变量来模拟随机游走算法在自环上停留的次数,使得随机游走算法不必真实地去游走自环,从而提升了算法的效率。换句话说,该随机游走算法中的几何随机变量ξi代表了样本Si的重复次数。
采集完节点样本后,GMD算法通过以下公式构造无偏估计:
Figure PCTCN2015081147-appb-000020
其中,Si表示算法收集到的第i个节点,ξi指用来表示样本Si的重复次数。
显然,相对于MD算法,GMD算法在每个图节点上添加的自环数要少于MD算法。因此,GMD算法能够在一定程度上解决MD算法的“重复样本问题”。而且,GMD算法还可以解决MD算法中的最大度未知的问题。此外,还可以证明GMD算法的试验分布与目标分布(均匀分布)的卡方距离较RW算法的试验分布与目标分布的卡方距离要小。因此,GMD算法也能够在一定程度上解决RW算法的“大偏差问题”。
以下详细证明这个结论。
定理:
Figure PCTCN2015081147-appb-000021
其中的π(u)为均匀分布,即π(u)=1/n。
证明:首先容易得到
Figure PCTCN2015081147-appb-000022
同样地,有
Figure PCTCN2015081147-appb-000023
因此,要证明定理成立,只需要证明
Figure PCTCN2015081147-appb-000024
成立即可。
具体地,有
Figure PCTCN2015081147-appb-000025
根据定义,有πrw(v)/πrw(u)=dv/du
πgmd(v)/πgmd(u)=max{dv,C}/max{du,C}。
令g(u,v)=π2(u)[πgmd(v)/πgmd(u)-πrw(v)/πrw(u)]。
对任意u,v∈V,令h(u,v)=g(u,v)+g(v,u)。
为了证明Eπgmd[(π(u)/πgmd(u))2]≤Eπrw[(π(u)/πrw(u))2],只需证明h(u,v)≤0即可。显然,当u=v时,有h(u,v)=0。当u≠v时,有:
Figure PCTCN2015081147-appb-000026
不失一般性,令du≥dv。考虑以下三种情况:
(1)如果du≥dv≥C,有h(u,v)=0;
(2)如果du≥C≥dv,有
Figure PCTCN2015081147-appb-000027
(3)如果C≥du≥dv,有
Figure PCTCN2015081147-appb-000028
综上所述,有h(u,v)≤0。
证明完毕。
以下进一步举例说明本发明。即通过介绍当C=0.5*dmax=4时,广义最大度随机游走算法(GMD算法)从图1中抽取2个节点的具体实施过程,以及通过抽取的样本节点估计图1中节点度的平均值的计算过程来说明GMD算法的算法流程。抽取更多的节点样本,以及其它C值的情况与本例类似。
(1)通过状态转移概率矩阵
Figure PCTCN2015081147-appb-000029
对图进行一次随机游走,采集节点样本集合。
输入:图1
输出:采集得到包含2个样本点的集合S
1 在图中随机选择节点u设为初始节点。假设选择v1作为初始节点,并且将计数器i置为1
2 使用du/max{du,C}=dv1/max{dv1,C}=2/max{2,4}=0.5生成一个几何随机变量ξ1并加入ξ;不妨假设这里生成的几何随机变量ξ1=2。
3 将节点v1作为S1加入集合S;
4 从v1的邻接节点中等概率随机选取一个节点v。假设选择的邻居节点为v4
5 将v4作为下一步操作的初始节点
6 使用du/max{du,C}=dv4/max{dv4,C}=8/max{8,4}=1生成一个几何随机变量ξ2并加入ξ;不妨假设这里生成的几何随机变量ξ2=1。
7 将节点v4作为S2加入集合S;
8 从v4的邻接节点中等概率随机选取一个节点v。将其作为下一步操作的初始节点。
9 样本点采集完毕,采集过程结束。此时S={v1,v4},ξ={2,1}
(2)对采集到的样本点集通过
Figure PCTCN2015081147-appb-000030
来估计图1中节点度的平均值。这里
Figure PCTCN2015081147-appb-000031
说明由这个样本集估计图1中节点度的平均值为3.2。
由上可知,广义最大度随机游走算法,即GMD算法能够有效地解决从一个“隐藏”的在线社交网络中提取均匀样本的问题,它很好地平衡了RW算法的“大偏差问题”,以及MD算法的“重复样本问题”。基于此,GMD算法可以取代现有的广泛使用的RW以及MD算法来解决在线社交网络的抽样问题。

Claims (2)

  1. 一种广义最大度随机游走图抽样算法,包括以下步骤:
    S1,在图上随机游走采集样本;采集到样本点集S;在图中随机选择节点u设为初始节点,并且将计数器i置为1;使用du/max{du,C}作为参数生成一个几何随机变量ξi并加入集合ξ;将节点u作为Si,并加入样本点集S;从节点u的邻接节点中等概率随机选取一个节点v;将节点v作为下一步的节点u,计数器i加1,返回采集到的样本点集S和相应的几何随机变量集ξ;循环执行直至不满足条件;
    S2,根据采集得到的样本构造无偏估计;构造无偏估计的公式为:
    Figure PCTCN2015081147-appb-100001
    其中,Si表示算法收集到的第i个节点,ξi指用来表示样本Si的重复次数。
  2. 根据权利要求1所述的广义最大度随机游走图抽样算法,其特征在于:在图上随机游走采集样本的概率转移方程如下:
    Figure PCTCN2015081147-appb-100002
    其中du表示节点u的度,C是一个非负整数。
PCT/CN2015/081147 2014-12-09 2015-06-10 一种广义最大度随机游走图抽样算法 WO2016090877A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410749244.XA CN104462374B (zh) 2014-12-09 2014-12-09 一种广义最大度随机游走图抽样方法
CN201410749244.X 2014-12-09

Publications (1)

Publication Number Publication Date
WO2016090877A1 true WO2016090877A1 (zh) 2016-06-16

Family

ID=52908409

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/081147 WO2016090877A1 (zh) 2014-12-09 2015-06-10 一种广义最大度随机游走图抽样算法

Country Status (2)

Country Link
CN (1) CN104462374B (zh)
WO (1) WO2016090877A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110196995A (zh) * 2019-04-30 2019-09-03 西安电子科技大学 一种基于带偏置随机游走的复杂网络特征提取方法
CN111147311A (zh) * 2019-12-31 2020-05-12 杭州师范大学 一种基于图嵌入的网络结构性差异量化方法
CN112132326A (zh) * 2020-08-31 2020-12-25 浙江工业大学 一种基于随机游走度惩罚机制的社交网络好友预测方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462374B (zh) * 2014-12-09 2018-06-05 深圳大学 一种广义最大度随机游走图抽样方法
CN106713035B (zh) * 2016-12-23 2019-12-27 西安电子科技大学 一种基于分组测试的拥塞链路定位方法
CN107358534A (zh) * 2017-06-29 2017-11-17 浙江理工大学 社交网络的无偏数据采集系统及采集方法
CN110019975B (zh) 2017-10-10 2020-10-16 创新先进技术有限公司 随机游走、基于集群的随机游走方法、装置以及设备
CN109658094B (zh) * 2017-10-10 2020-09-18 阿里巴巴集团控股有限公司 随机游走、基于集群的随机游走方法、装置以及设备
CN109547265A (zh) * 2018-12-29 2019-03-29 中国人民解放军国防科技大学 基于随机游走抽样的复杂网络局部免疫方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8396855B2 (en) * 2010-05-28 2013-03-12 International Business Machines Corporation Identifying communities in an information network
US20140095616A1 (en) * 2012-09-28 2014-04-03 7517700 Canada Inc. O/A Girih Method and system for sampling online social networks
CN104462374A (zh) * 2014-12-09 2015-03-25 深圳大学 一种广义最大度随机游走图抽样算法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8719211B2 (en) * 2011-02-01 2014-05-06 Microsoft Corporation Estimating relatedness in social network
US8583659B1 (en) * 2012-07-09 2013-11-12 Facebook, Inc. Labeling samples in a similarity graph
CN103617609B (zh) * 2013-10-24 2016-04-13 上海交通大学 基于图论的k-means非线性流形聚类与代表点选取方法
CN103942308B (zh) * 2014-04-18 2017-04-05 中国科学院信息工程研究所 大规模社交网络社区的检测方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8396855B2 (en) * 2010-05-28 2013-03-12 International Business Machines Corporation Identifying communities in an information network
US20140095616A1 (en) * 2012-09-28 2014-04-03 7517700 Canada Inc. O/A Girih Method and system for sampling online social networks
CN104462374A (zh) * 2014-12-09 2015-03-25 深圳大学 一种广义最大度随机游走图抽样算法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LI, R.H: "On Random Walk Based Graph Sampling", ICDE CONFERENCE, 2015, pages 931 - 933 *
ZIV, B.Y. ET AL.: "Approximating Aggregate Queries about Web Pages via Random Walks.", PROCEEDINGS OF THE 26 TH VLDB CONFERENCE, 2000, Cairo, Egypt *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110196995A (zh) * 2019-04-30 2019-09-03 西安电子科技大学 一种基于带偏置随机游走的复杂网络特征提取方法
CN110196995B (zh) * 2019-04-30 2022-12-06 西安电子科技大学 一种基于带偏置随机游走的复杂网络特征提取方法
CN111147311A (zh) * 2019-12-31 2020-05-12 杭州师范大学 一种基于图嵌入的网络结构性差异量化方法
CN111147311B (zh) * 2019-12-31 2022-06-21 杭州师范大学 一种基于图嵌入的网络结构性差异量化方法
CN112132326A (zh) * 2020-08-31 2020-12-25 浙江工业大学 一种基于随机游走度惩罚机制的社交网络好友预测方法
CN112132326B (zh) * 2020-08-31 2023-12-01 浙江工业大学 一种基于随机游走度惩罚机制的社交网络好友预测方法

Also Published As

Publication number Publication date
CN104462374B (zh) 2018-06-05
CN104462374A (zh) 2015-03-25

Similar Documents

Publication Publication Date Title
WO2016090877A1 (zh) 一种广义最大度随机游走图抽样算法
Cui et al. Detecting overlapping communities in networks using the maximal sub-graph and the clustering coefficient
CN102456062B (zh) 社区相似度计算方法与社会网络合作模式发现方法
CN107276793B (zh) 基于概率跳转随机游走的节点重要性度量方法
CN110705045B (zh) 一种利用网络拓扑特性构建加权网络的链路预测方法
CN103838803A (zh) 一种基于节点Jaccard相似度的社交网络社团发现方法
Hou et al. Prediction methods and applications in the science of science: A survey
Meng et al. A novel potential edge weight method for identifying influential nodes in complex networks based on neighborhood and position
Zhang et al. Identifying node importance by combining betweenness centrality and katz centrality
WO2016086634A1 (zh) 一种拒绝率可控的Metropolis-Hastings图抽样算法
CN107784327A (zh) 一种基于gn的个性化社区发现方法
CN105162654A (zh) 一种基于局部社团信息的链路预测方法
Ma et al. The local triangle structure centrality method to rank nodes in networks
Han et al. Generating uncertain networks based on historical network snapshots
Chen et al. Fast community detection based on distance dynamics
Dong Application of Big Data Mining Technology in Blockchain Computing
Jin et al. Heterogeneous graph neural networks using self-supervised reciprocally contrastive learning
CN109492677A (zh) 基于贝叶斯理论的时变网络链路预测方法
Jiang et al. Efficiency improvements in social network communication via MapReduce
Jia et al. Effect of weak ties on degree and H-index in link prediction of complex network
Xiu et al. An extended self-representation model of complex networks for link prediction
Jiang et al. Community Detection using Closeness Similarity based on Common Neighbor Node Clustering Entropy.
Jiang et al. Robust size estimation of online social networks via subgraph sampling
Junjie et al. Local optimization overlapping community discovery algorithm combining attribute features
Mehdiabadi et al. Sampling from diffusion networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15868061

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15868061

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.06.2018)

122 Ep: pct application non-entry in european phase

Ref document number: 15868061

Country of ref document: EP

Kind code of ref document: A1