CN109657160A - In-degree information estimation method and system based on random walk access frequency - Google Patents
In-degree information estimation method and system based on random walk access frequency Download PDFInfo
- Publication number
- CN109657160A CN109657160A CN201811632238.0A CN201811632238A CN109657160A CN 109657160 A CN109657160 A CN 109657160A CN 201811632238 A CN201811632238 A CN 201811632238A CN 109657160 A CN109657160 A CN 109657160A
- Authority
- CN
- China
- Prior art keywords
- node
- network
- random walk
- degree
- visited
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 238000005295 random walk Methods 0.000 title claims abstract description 56
- 238000009826 distribution Methods 0.000 description 29
- 238000004088 simulation Methods 0.000 description 6
- 230000006399 behavior Effects 0.000 description 5
- 238000005070 sampling Methods 0.000 description 4
- 230000001186 cumulative effect Effects 0.000 description 2
- 238000005315 distribution function Methods 0.000 description 2
- 238000001276 Kolmogorov–Smirnov test Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003012 network analysis Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000010415 tropism Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域technical field
本发明属于社会网络拓扑信息估计领域,尤其涉及一种基于随机游走访问频数的入度信息估计方法及系统。The invention belongs to the field of social network topology information estimation, and in particular relates to an in-degree information estimation method and system based on random walk access frequency.
背景技术Background technique
当前在线社交网络规模巨大,为研究者们提供了研究复杂网络、真实群体特征、行为的平台。而又由于其规模巨大,研究者们无法进行全网络信息收集或获取用于分析。一般地,只能通过随机游走的方式,获取网络的部分信息。利用获取的网络部分信息去恢复网络的拓扑结构是后续进行复杂网络分析、群体特征分析等的基础。但是怎样通过获取的网络部分信息去恢复网络拓扑结构中重要的一个环节是对网络入度分布的估计,因为在随机游走过程中,入度信息是潜在的、隐藏了。有了入度信息的估计,即网络入度分布的估计,才能进行网络拓扑结构的恢复,从而进一步得出全网络的特征。At present, the scale of online social networks is huge, providing researchers with a platform to study complex networks, real group characteristics, and behaviors. And because of its large scale, researchers cannot conduct network-wide information collection or access for analysis. Generally, only part of the information of the network can be obtained by means of random walks. Using the acquired network information to restore the topology of the network is the basis for subsequent complex network analysis and group characteristic analysis. However, an important part of how to recover the network topology through the acquired network information is the estimation of the network in-degree distribution, because in the random walk process, the in-degree information is latent and hidden. With the estimation of the in-degree information, that is, the estimation of the in-degree distribution of the network, the network topology can be restored, and the characteristics of the entire network can be further obtained.
传统的入度信息估计方法,利用随机游走过程中能够收集到的出度信息,假设当网络中节点的入度边和出度边高度对称时,即网络无向性程度较高时(无向性即无向边的比例),可以得到基于出度信息的估计方法EST_out:The traditional in-degree information estimation method uses the out-degree information that can be collected in the random walk process. It is assumed that when the in-degree edges and out-degree edges of nodes in the network are highly symmetrical, that is, when the degree of network anisotropy is high (no The tropism is the proportion of undirected edges), and the estimation method EST_out based on the out-degree information can be obtained:
其中,表示网络的入度分布估计,表示网络的出度分布的估计,qd(kout)是随机游走抽样获取样本的出度分布。in, represents the estimated in-degree distribution of the network, Represents an estimate of the out-degree distribution of the network, and q d (k out ) is the out-degree distribution of samples obtained by random walk sampling.
然而对于在线社交网络来说,用户之间的关系或行为是有方向的,例如,“关注行为”可以是“关注”或“被关注”两种关系;“选举行为”可以是“选举”或“被选举”关系等等。由此,网络的边可以分为“入度边”和“出度边”,用于分别描述“指向”该节点的关系(边)和该节点指向其他节点的关系(边)。并且在大多数情况下,有向网络中的无向性不强。由此,利用式(1)得到的入度信息估计会引起很大的偏差,因此需要去解决有向网络中的入度信息估计问题。However, for online social networks, the relationship or behavior between users is directional. For example, "following behavior" can be two relationships of "following" or "being followed"; "election behavior" can be "election" or The "elected" relationship and so on. Therefore, the edges of the network can be divided into "in-degree edges" and "out-degree edges", which are used to describe the relationship (edge) "pointing" to the node and the relationship (edge) that the node points to other nodes, respectively. And in most cases, the undirection in directed networks is not strong. Therefore, the estimation of the in-degree information obtained by using the formula (1) will cause a large deviation, so it is necessary to solve the problem of in-degree information estimation in the directed network.
发明内容SUMMARY OF THE INVENTION
本发明要解决的技术问题是:由于现有技术中对于网络入度信息的估计方法是在网络无向性程度较高时通过出度信息进行的估计方法,该方法在应用到有向性较高的社交网络中时,所估计出来的入度信息误差较大,从而不能很好地通过入度信息来了解网络中用户行为、恢复网络拓扑结构,提出了一种对于有向网络来说入度信息估计误差较小的基于随机游走访问频数的入度信息估计方法。The technical problem to be solved by the present invention is: since the estimation method for network in-degree information in the prior art is an estimation method based on the out-degree information when the degree of network anisotropy is relatively high, the method can be used when the degree of network in-degree is relatively high. In a high social network, the estimated in-degree information has a large error, so it is impossible to understand the user behavior in the network and restore the network topology through the in-degree information. In-degree information estimation method based on random walk access frequency with small degree information estimation error.
为解决该问题,本发明采用的技术方案是:In order to solve this problem, the technical scheme adopted in the present invention is:
一种基于随机游走访问频数的入度信息估计方法,包括以下步骤:A method for estimating in-degree information based on random walk access frequency, comprising the following steps:
步骤1:从待估计入度信息的有向网络中随机选择随机游走的种子节点,所述种子节点为网络的任意节点,然后实施随机游走,随机游走的后续节点由当前节点的邻居节点随机选出;Step 1: Randomly select the seed node of the random walk from the directed network of the in-degree information to be estimated, the seed node is any node of the network, and then implement the random walk, and the subsequent nodes of the random walk are determined by the neighbors of the current node. Nodes are randomly selected;
步骤2:在随机游走过程中,记录各个节点i被重复访问的次数xi;Step 2: In the process of random walk, record the number of times xi that each node i is repeatedly visited;
步骤3:当实施行走的步数n与网络的节点数N相等时,统计每个节点i被访问的次数xi;Step 3: When the number of walking steps n is equal to the number of nodes N of the network, count the times x i that each node i is visited;
步骤4:根据所统计的每个节点i被访问的次数估计入度信息并输出;Step 4: Estimate the in-degree information according to the counted number of visits to each node i and output;
其中mi是随机游走过程中被访问了xi次的节点的数量。where m i is the number of nodes visited x i times during the random walk.
本发明还提供了一种基于随机游走访问频数的网络入度信息估计系统,其特征在于:包括处理器,以及与所述处理器连接的存储器,所述存储器存储有基于随机游走访问频数的入度信息估计方法的程序,所述基于随机游走访问频数的入度信息估计方法的程序被所述处理器执行时实现上述所述方法的步骤。The present invention also provides a system for estimating network in-degree information based on random walk access frequency, which is characterized by comprising a processor and a memory connected to the processor, and the memory stores a random walk based access frequency. The program of the in-degree information estimation method based on the random walk access frequency is executed by the processor to implement the steps of the above-mentioned method.
与现有技术相比,本发明所取得的有益效果为:Compared with the prior art, the beneficial effects achieved by the present invention are:
本发明基于随机游走访问频数的入度信息估计方法,通过研究发现,在随机游走的步数与网络节点数相同时,随机游走过程中节点被访问的次数(频数)近似地与它的入度成正比,因此针对有向网络中的无向性不强的问题时,通过统计随机游走过程中每个节点被访问的次数进行入度信息的估计,其估计出的入度信息的误差较小,且估计的效率更高。The present invention is based on the in-degree information estimation method of random walk access frequency. Through research, it is found that when the number of steps of random walk is the same as the number of network nodes, the number of times (frequency) of nodes visited in the process of random walk is approximately the same as the number of nodes in the random walk. The in-degree information is proportional to the in-degree of , so for the problem of low undirection in the directed network, the in-degree information is estimated by counting the number of times each node is visited in the random walk process, and the estimated in-degree information The error is smaller and the estimation efficiency is higher.
附图说明Description of drawings
图1为本发明的流程图;Fig. 1 is the flow chart of the present invention;
图2为在不同真实网络得到的估计结果与真实分布的对比情况示意图,其中(a)维基选举网络(WEL),(b)爱丁堡词汇联想网络(EAT),(c)斯坦福超链接网络(SFH),和(d)亚马逊推荐网络(AMR);Figure 2 is a schematic diagram of the comparison between the estimated results obtained in different real networks and the real distribution, in which (a) Wiki Electoral Network (WEL), (b) Edinburgh Lexical Association Network (EAT), (c) Stanford Hyperlink Network (SFH) ), and (d) the Amazon Referral Network (AMR);
图3为在不同真实网络上入度信息与出度信息估计方法得到的DKS值的比较。其中(a)维基选举网络(WEL),(b)爱丁堡词汇联想网络(EAT),(c)斯坦福超链接网络(SFH),和(d)亚马逊推荐网络(AMR)。在每个网络上分别进行了100次仿真。Figure 3 shows the comparison of D KS values obtained by in-degree information and out-degree information estimation methods on different real networks. where (a) Wiki Electoral Network (WEL), (b) Edinburgh Lexical Association Network (EAT), (c) Stanford Hyperlink Network (SFH), and (d) Amazon Recommendation Network (AMR). 100 simulations were performed on each network separately.
具体实施方式Detailed ways
图1至图3示出了本发明基于随机游走访问频数的入度信息估计方法的一种实施例。FIG. 1 to FIG. 3 show an embodiment of the method for estimating in-degree information based on random walk access frequency according to the present invention.
首先说明在随机游走的步数与网络节点数相同时,随机游走过程中节点被访问的次数(频数)近似地与它的入度成正比,进而得到入度信息估计的方法。Firstly, when the number of steps of the random walk is the same as the number of network nodes, the number of times (frequency) that a node is visited in the process of random walk is approximately proportional to its in-degree, and then the method of estimating the in-degree information is obtained.
假设在有向网络中实施了n步随机游走,该随机游走的种子节点为1,种子选择策略为随机选择,后续节点由当前节点的邻居节点随机选出。那么对于任意的一个入度为的节点i,可以近似地将该节点i被访问的过程用n次伯努利实验(n Bernoulli trials)进行建模:Assuming that n-step random walk is implemented in the directed network, the seed node of the random walk is 1, the seed selection strategy is random selection, and the subsequent nodes are randomly selected by the neighbor nodes of the current node. Then for any in-degree of The node i of the node i can be approximately modeled by n Bernoulli trials:
其中Xi代表节点i在随机游走中被访问次数的随机变量,pi是节点i在随机游走中可能被访问的概率(即入样概率)。所以,Xi的期望可以表示为:Among them, X i represents the random variable of the number of visits of node i in the random walk, and p i is the probability that node i may be visited in the random walk (ie, the probability of sampling). So, the expectation of Xi can be expressed as:
E[Xi]=npi. (3)E[X i ]=np i . (3)
文献:Lu X,Malmros J,Liljeros F,et al.Respondent-driven sampling ondirected networks[J].Electronic Journal of Statistics,2013,7(1):292-322.给出了在有向网络中,任意节点i的入样概率pi近似地与它的入度成正比,即:Literature: Lu X, Malmros J, Liljeros F, et al. Respondent-driven sampling on directed networks[J]. Electronic Journal of Statistics, 2013, 7(1): 292-322. Given that in directed networks, any The in-sampling probability pi of node i is approximately the same as its in-degree proportional, that is:
其中<kin>表示网络的平均入度,N为网络的节点数。将(4)带入到(3)中可将Xi的期望表示为:where <kin> represents the average in -degree of the network, and N is the number of nodes in the network. Bringing (4) into (3) can express the expectation of Xi as:
若随机游走的步数n被设置为N,则If the number of steps n of the random walk is set to N, then
也就是说,在这时随机游走过程中节点被访问的次数(频数)近似地与它的入度成正比。所以可以近似地得到一个尺度(scaling)有所缩放的入度信息估计:That is to say, the number of times (frequency) that the node is visited during the random walk at this time is approximately the same as its in-degree proportional. Therefore, a scaled in-degree information estimate can be obtained approximately:
其中mi是随机游走过程中被访问了xi次的节点的数量。本实施例中称这种入度信息的估计方法为EST_rw。where m i is the number of nodes visited x i times during the random walk. In this embodiment, the estimation method of the in-degree information is called EST_rw.
具体的入度信息的估计方法为:The specific in-degree information estimation method is as follows:
一种基于随机游走访问频数的入度信息估计方法,包括以下步骤:A method for estimating in-degree information based on random walk access frequency, comprising the following steps:
步骤1:从待估计入度信息的有向网络中随机选择随机游走的种子节点,所述种子节点为网络的任意节点,然后实施随机游走,随机游走的后续节点由当前节点的邻居节点随机选出;Step 1: Randomly select the seed node of the random walk from the directed network of the in-degree information to be estimated, the seed node is any node of the network, and then implement the random walk, and the subsequent nodes of the random walk are determined by the neighbors of the current node. Nodes are randomly selected;
步骤2:在随机游走过程中,记录各个节点i被重复访问的次数xi;Step 2: In the process of random walk, record the number of times xi that each node i is repeatedly visited;
步骤3:当实施行走的步数n与网络的节点数N相等时,统计每个节点i被访问的次数xi;Step 3: When the number of walking steps n is equal to the number of nodes N of the network, count the times x i that each node i is visited;
步骤4:根据所统计的每个节点i被访问的次数估计入度信息并输出;Step 4: Estimate the in-degree information according to the counted number of visits to each node i and output;
其中mi是随机游走过程中被访问了xi次的节点的数量。where m i is the number of nodes visited x i times during the random walk.
本发明还提供了一种基于随机游走访问频数的网络入度信息估计系统,其特征在于:包括处理器,以及与所述处理器连接的存储器,所述存储器存储有基于随机游走访问频数的入度信息估计方法的程序,所述基于随机游走访问频数的入度信息估计方法的程序被所述处理器执行时实现上述所述方法的步骤。The present invention also provides a system for estimating network in-degree information based on random walk access frequency, which is characterized by comprising a processor and a memory connected to the processor, and the memory stores a random walk based access frequency. The program of the in-degree information estimation method based on the random walk access frequency is executed by the processor to implement the steps of the above-mentioned method.
下面通过使用4个真实的有向网络来验证提出的有向网络入度信息估计方法。它们分别是(1)维基选举网络(the Wikipedia election network,WEL),网络中的节点代表维基百科中的用户;网络中由节点i到节点j的有向边表示用户i对用户j进行了投票。(2)爱丁堡词汇联想网络(the Edinburgh Associative Thesaurus network,EAT),其中网络节点代表了英文单词,而由节点i到节点j的有向边表示,在使用者实验中若对其用单词i刺激时,其会有单词j的响应。(3)斯坦福超链接网络(Stanford hyperlink network,SFH):该网络中的节点代表了斯坦福大学主页上的不同网页;由节点i指向节点j的有向边表示网页i有超链接指向网页j。(4)亚马逊推荐网络(Amazon recommendation network,AMR):该网络中的节点代表了不同的商品,由节点i指向节点j的有向边表示当商品i被购买的同时商品j也被购买。为了使网络中的节点都是可达的(reachable),本实施例抽取了这4个网络的最大连通片(Giant Connected Component,GCC)得到了最终的实验网络。这几个实验网络的主要网络统计量如表1所示。In the following, the proposed directed network in-degree information estimation method is verified by using 4 real directed networks. They are (1) the Wikipedia election network (WEL), the nodes in the network represent users in Wikipedia; the directed edge from node i to node j in the network indicates that user i voted for user j . (2) The Edinburgh Associative Thesaurus network (EAT), in which the network nodes represent English words, and the directed edges from node i to node j are represented. In user experiments, if they are stimulated with word i , it will respond with word j. (3) Stanford hyperlink network (SFH): The nodes in this network represent different web pages on the Stanford University homepage; the directed edge from node i to node j indicates that web page i has hyperlinks to web page j. (4) Amazon recommendation network (AMR): The nodes in this network represent different commodities, and the directed edge from node i to node j indicates that when commodity i is purchased, commodity j is also purchased. In order to make the nodes in the network all reachable (reachable), this embodiment extracts the largest connected component (Giant Connected Component, GCC) of the four networks to obtain the final experimental network. The main network statistics of these experimental networks are shown in Table 1.
表1实验网络的基本网络统计量Table 1 Basic network statistics of the experimental network
对于入度信息估计的仿真实验,本实施例将在四个真实网络(WEL,EAT,SFH和AMR)中进行。在每次仿真中,随机选择网络中的一个节点作为随机游走的初始种子节点,然后每一步都随机地从当前节点的邻居中选择下一个游走节点。设置随机游走的步数为网络的节点数N。在游走的过程中,记录节点被重复访问的情况以及节点的出度信息。本实施例对于每个网络都进行了100次仿真。For the simulation experiments of in-degree information estimation, this embodiment will be carried out in four real networks (WEL, EAT, SFH and AMR). In each simulation, a node in the network is randomly selected as the initial seed node for the random walk, and then at each step the next node for the walk is randomly selected from the neighbors of the current node. Set the number of steps of the random walk as the number of nodes in the network N. In the process of walking, record the repeated visits of the node and the out-degree information of the node. In this example, 100 simulations were performed for each network.
在每一次仿真中,我们将使用两种方法对网络的入度信息进行估计,即In each simulation, we will use two methods to estimate the in-degree information of the network, namely
(I)入度信息的估计方法,EST_rw;(1) Estimation method of in-degree information, EST_rw;
(II)基于出度信息的传统方法,EST_out。(II) Traditional method based on out-degree information, EST_out.
为了对比两种方法的性能,使用K-S统计量去衡量它们得出的估计和真实入度信息的相似性。具体过程如下。To compare the performance of the two methods, the K-S statistic is used to measure the similarity of the estimates they produce with the true in-degree information. The specific process is as follows.
首先,将数据进行标准化(normalization),把真实的入度信息和估计方法收集的数据(EST_rw:节点的访问频次;EST_out:样本节点出度)投影到相同尺度上:First, normalize the data, and project the real in-degree information and the data collected by the estimation method (EST_rw: node access frequency; EST_out: sample node out-degree) to the same scale:
为了计算便利,我们可以将zmin设置为1,zmax设置为100,则上式中y的值将被投影到[1,100]之间,对真实入度信息进行投影,上式中的y值为节点的真实入度;对于EST_rw方法,上式中的y值为节点在N步随机游走中被访问的频次;对于EST_out方法,上式中的y值为样本节点的出度,ymax和ymin分别对应上面表述的观测值中的最大值和最小值。ymax和ymin对应不同的方法是不同的量,具体为:如果是用于对比的传统方法EST_out,ymax和ymin就表示出度的最大值和最小值,如果是对提出的EST_rw方法来说,ymax和ymin表示访问频数的最大值和最小值。如果是对用于对比的真实值来说,ymax和ymin就是真实入度的最大值和最小值。For the convenience of calculation, we can set z min to 1 and z max to 100, then the value of y in the above formula will be projected between [1, 100], and the real in-degree information is projected, the value of y in the above formula is the true in-degree of the node; for the EST_rw method, the y value in the above formula is the frequency of the node being visited in N-step random walks; for the EST_out method, the y value in the above formula is the out-degree of the sample node, y max and y min correspond to the maximum and minimum values of the observations expressed above, respectively. y max and y min correspond to different methods and are different quantities, specifically: if it is the traditional method for comparison EST_out, y max and y min indicate the maximum and minimum values of the degree, if it is the proposed EST_rw method For example, y max and y min represent the maximum and minimum visit frequency. For the true values used for comparison, y max and y min are the maximum and minimum values of the true in-degree.
然后,使用投影后的数据,分别去计算投影后的真实度分布和两种方法对它的估计(对于提出的EST_rw,使用式(7);对于EST_out,使用式(1))。最后去计算投影后的真实分布与两种方法得到的估计分布之间的K-S统计量。K-S统计量是Komogorov-Smirnov检验(KS检验)中用来拒绝零假设的依据,即用于判断两个分布是否具有一致性(agreement);本实施例中简单地用它来衡量真实分布与估计分布之间的相似性(similarity):Then, use the projected data to calculate the projected reality distribution and its estimation by the two methods (for the proposed EST_rw, use equation (7); for EST_out, use equation (1)). Finally, calculate the K-S statistic between the projected true distribution and the estimated distribution obtained by the two methods. K-S statistic is the basis for rejecting the null hypothesis in the Komogorov-Smirnov test (KS test), that is, it is used to judge whether the two distributions have agreement; in this embodiment, it is simply used to measure the true distribution and estimation Similarity between distributions:
DKS=maxz{|F'(z)-F(z)|}. (9)D KS =max z {|F'(z)-F(z)|}. (9)
其中,z表示投影后(标准化后)入度信息的取值范围,F(z)和F'(z)分别表示真实入度分布和估计分布(注意,是标准化后)的累计分布函数(cumulative distributionfuncion)。式中的DKS描述了这两个累计度分布函数的最大距离,一般来说,如果DKS的值很小,那么估计分布在形状和位置上就跟真实分布具有较高的相似性,否则,它们之间的相似性就较低。所以,可以认为如果一个估计方法的性能比另一个要好,那么它得到的DKS值要相对较小。Among them, z represents the value range of the in-degree information after projection (after normalization), and F(z) and F'(z) represent the cumulative distribution function of the true in-degree distribution and the estimated distribution (note that it is normalized), respectively. distributionfuncion). D KS in the formula describes the maximum distance between the two cumulative degree distribution functions. Generally speaking, if the value of D KS is small, then the estimated distribution has a high similarity with the real distribution in shape and position, otherwise , the similarity between them is low. Therefore, it can be considered that if one estimation method performs better than the other, it will result in a relatively small DKS value.
四个真实网络上的仿真实验结果如下。首先,直观地比较估计分布与真实分布的差异,实验结果如图2所示。可以看到:相比之下,提出的新方法比传统基于出度信息的方法得到的估计结果更好,更接近真实入度信息。在WEL网络、EAT网络以及SFH网络中,我们可以直观地观察到EST_rw得到的分布在形状上与真实分布更加相似,同时两者之间的距离也比EST_out估计的分布与真实分布的距离要小很多。对于AMR网络,尽管EST_rw估计得到的分布在前半段的形状与真实分布有一定的偏离,但从总体上看,与EST_out得到的结果相比,它的整个形状要与真实分布接近得多。The simulation results on the four real networks are as follows. First, the difference between the estimated distribution and the true distribution is visually compared, and the experimental results are shown in Figure 2. It can be seen that: in contrast, the proposed new method achieves better estimation results than traditional out-degree information-based methods, and is closer to the real in-degree information. In the WEL network, EAT network and SFH network, we can intuitively observe that the distribution obtained by EST_rw is more similar in shape to the real distribution, and the distance between the two is also smaller than the distance between the distribution estimated by EST_out and the real distribution a lot of. For the AMR network, although the shape of the distribution estimated by EST_rw deviates from the real distribution in the first half, on the whole, its overall shape is much closer to the real distribution than the results obtained by EST_out.
然后,定量地分析EST_rw和EST_out的估计结果,即用指标DKS去衡量它们与真实分布之间的差异,具体结果如图3所示。可以看到,在这四个真实网络中,EST_rw得到的DKS的值都比EST_out得到的结果更小一些。具体地,EST_rw得到的DKS的平均值比EST_out的相应结果在WEL网络中小0.29,在EAT网络中小0.60,在SFH网络中小0.78,在AMR网络中小0.85。也就是说,EST_rw得到的估计结果与真实分布之间的相似性要高于EST_out的相应结果。Then, quantitatively analyze the estimation results of EST_rw and EST_out, that is, use the indicator D KS to measure the difference between them and the real distribution. The specific results are shown in Figure 3. It can be seen that in these four real networks, the value of D KS obtained by EST_rw is smaller than that obtained by EST_out. Specifically, the average value of D KS obtained by EST_rw is 0.29 smaller in WEL network, 0.60 smaller in EAT network, 0.78 smaller in SFH network, and 0.85 smaller in AMR network than the corresponding result of EST_out. That is, the similarity between the estimated results obtained by EST_rw and the true distribution is higher than the corresponding results of EST_out.
上述结果都非常好地说明了本发明提出的入度估计方法EST_rw的误差较出度估计方法误差更小。同时也可以得到,无论是在模型网络中还是在真实网络中,本发明估计得到的分布与真实分布都具有较高的相似性。The above results all illustrate very well that the error of the in-degree estimation method EST_rw proposed by the present invention is smaller than that of the out-degree estimation method. At the same time, it can also be obtained that, whether in the model network or in the real network, the distribution estimated by the present invention has a high similarity with the real distribution.
与游走整个网络(简称为“全局获取方法”)相比,本发明提出的方法虽然无法像全局获取方法那样通过对网络中的全部或大部分的出边获取后得出准确的入度信息,但本发明的效率却要远远高于全局获取方法。因为,若要得到网络的全部出边,则每一条边至少要被遍历到一次,也就是说,至少要执行的随机游走的步数为整个网络的总边数;而本发明设置的随机游走的步数与网络的节点数相同,即只需要访问与网络节点数数量相同的边;因此,本发明的效率是全局获取方法的<k>倍(<k>为网络的平均度)。Compared with roaming the whole network (referred to as "global acquisition method"), although the method proposed in the present invention cannot obtain accurate in-degree information by acquiring all or most of the outgoing edges in the network like the global acquisition method. , but the efficiency of the present invention is much higher than the global acquisition method. Because, to get all the outgoing edges of the network, each edge must be traversed at least once, that is to say, at least the number of steps of the random walk to be performed is the total number of edges in the entire network; The number of walking steps is the same as the number of nodes in the network, that is, only the same number of edges as the number of network nodes need to be accessed; therefore, the efficiency of the present invention is <k> times that of the global acquisition method (<k> is the average degree of the network) .
除此之外,与出度方法相比,提出的新方法在随机游走过程中并不需要收集如出度这样的额外数据,只需记录被访问的次数;从数据收集量的角度来看,新方法的效率也比出度方法要高,其需要存储的过程数据更少。Besides, compared with the out-degree method, the proposed new method does not need to collect additional data such as out-degree in the random walk process, but only needs to record the number of visits; from the perspective of data collection volume , the new method is also more efficient than the out-degree method, which requires less process data to be stored.
以上仅是本发明的优选实施方式,本发明的保护范围并不仅局限于上述实施例,凡属于本发明思路下的技术方案均属于本发明的保护范围。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理前提下的若干改进和润饰,应视为本发明的保护范围。The above are only preferred embodiments of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions that belong to the idea of the present invention belong to the protection scope of the present invention. It should be pointed out that for those skilled in the art, some improvements and modifications without departing from the principle of the present invention should be regarded as the protection scope of the present invention.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811632238.0A CN109657160B (en) | 2018-12-29 | 2018-12-29 | In-degree information estimation method and system based on random walk access frequency |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811632238.0A CN109657160B (en) | 2018-12-29 | 2018-12-29 | In-degree information estimation method and system based on random walk access frequency |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109657160A true CN109657160A (en) | 2019-04-19 |
CN109657160B CN109657160B (en) | 2023-01-06 |
Family
ID=66117905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811632238.0A Active CN109657160B (en) | 2018-12-29 | 2018-12-29 | In-degree information estimation method and system based on random walk access frequency |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109657160B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111371619A (en) * | 2020-03-10 | 2020-07-03 | 广州大学 | A method and system for estimating the number of instant messaging network users |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100121792A1 (en) * | 2007-01-05 | 2010-05-13 | Qiong Yang | Directed Graph Embedding |
US20100125572A1 (en) * | 2008-11-20 | 2010-05-20 | Yahoo! Inc. | Method And System For Generating A Hyperlink-Click Graph |
CN106530097A (en) * | 2016-10-11 | 2017-03-22 | 中国人民武装警察部队工程大学 | Oriented social network key propagation node discovering method based on random walking mechanism |
CN106886524A (en) * | 2015-12-15 | 2017-06-23 | 天津科技大学 | A kind of community network community division method based on random walk |
CN107276793A (en) * | 2017-05-31 | 2017-10-20 | 西北工业大学 | The node importance measure of random walk is redirected based on probability |
CN108600013A (en) * | 2018-04-26 | 2018-09-28 | 北京邮电大学 | The overlapping community discovery method and device of dynamic network |
-
2018
- 2018-12-29 CN CN201811632238.0A patent/CN109657160B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100121792A1 (en) * | 2007-01-05 | 2010-05-13 | Qiong Yang | Directed Graph Embedding |
US20100125572A1 (en) * | 2008-11-20 | 2010-05-20 | Yahoo! Inc. | Method And System For Generating A Hyperlink-Click Graph |
CN106886524A (en) * | 2015-12-15 | 2017-06-23 | 天津科技大学 | A kind of community network community division method based on random walk |
CN106530097A (en) * | 2016-10-11 | 2017-03-22 | 中国人民武装警察部队工程大学 | Oriented social network key propagation node discovering method based on random walking mechanism |
CN107276793A (en) * | 2017-05-31 | 2017-10-20 | 西北工业大学 | The node importance measure of random walk is redirected based on probability |
CN108600013A (en) * | 2018-04-26 | 2018-09-28 | 北京邮电大学 | The overlapping community discovery method and device of dynamic network |
Non-Patent Citations (5)
Title |
---|
LAI,DR;LU,HT;NARDINI,C: "Extracting weights from edge directions to find communities in directed networks", 《IEEE》 * |
RIBEIRO,B;WANG,PH;MURAI,F;TOWSLEY,D: "Sampling Directed Graphs with Random Walks", 《JOURNAL OF STATISTICAL MECHANICS:THEORY AND EXPERIMENT》 * |
SANTO FORTUNATO,MARIAN BOGUNA,ALESSANDRO FLAMMINI,FILIPPO MENCZE: "Approximating PageRank from In-Degree", 《SPRINGER LINK》 * |
SHEN LUYI, WANG XIAOFAN: "Bi-graph Random Walk Sampling of Directed Online Social", 《IEEE》 * |
翟柄宇: "大型在线社交网络(OSN)用户采样、测量、评价关键问题研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111371619A (en) * | 2020-03-10 | 2020-07-03 | 广州大学 | A method and system for estimating the number of instant messaging network users |
CN111371619B (en) * | 2020-03-10 | 2022-06-10 | 广州大学 | A method and system for estimating the number of instant messaging network users |
Also Published As
Publication number | Publication date |
---|---|
CN109657160B (en) | 2023-01-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10902025B2 (en) | Techniques for measuring a property of interest in a dataset of location samples | |
Gordy et al. | Nested simulation in portfolio risk measurement | |
CN106250905B (en) | A real-time energy consumption anomaly detection method combined with the structural characteristics of college buildings | |
Kim et al. | The network completion problem: Inferring missing nodes and edges in networks | |
CN111090780B (en) | Method and device for determining suspicious transaction information, storage medium and electronic equipment | |
CN111274495A (en) | Data processing method and device for user relationship strength, computer equipment and storage medium | |
CN108595655A (en) | A kind of abnormal user detection method of dialogue-based characteristic similarity fuzzy clustering | |
CN109829721B (en) | Online transaction multi-subject behavior modeling method based on heterogeneous network characterization learning | |
CN112052404A (en) | Group discovery method, system, device and medium for multi-source heterogeneous relation network | |
CN106533778A (en) | Method for identifying key node of command and control network based on hierarchical flow betweenness | |
Désir et al. | Assortment optimization under the mallows model | |
Liu et al. | House price modeling over heterogeneous regions with hierarchical spatial functional analysis | |
Sariyuce et al. | Incremental algorithms for network management and analysis based on closeness centrality | |
Bai et al. | Efficient calibration of multi-agent simulation models from output series with bayesian optimization | |
Dong et al. | Ego-aware graph neural network | |
CN110059795A (en) | A kind of mobile subscriber's node networking method merging geographical location and temporal characteristics | |
CN109657160B (en) | In-degree information estimation method and system based on random walk access frequency | |
Priya et al. | Community Detection in Networks: A Comparative study | |
CN109255433B (en) | A Similarity-Based Approach to Community Detection | |
Bóta et al. | The inverse infection problem | |
CN110147519A (en) | A kind of data processing method and device | |
Ma et al. | Fuzzy nodes recognition based on spectral clustering in complex networks | |
Wang et al. | Distributed particle filter based speaker tracking in distributed microphone networks under non-Gaussian noise environments | |
CN109492924B (en) | A Second-Order Influence Evaluation Method Based on Weibo Users' Self and Behavioral Value | |
CN112597699B (en) | Social network rumor source identification method integrated with objective weighting method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |