CN103457799B

CN103457799B - Microblog zombie user detection method based on graph of a relation

Info

Publication number: CN103457799B
Application number: CN201310396404.2A
Authority: CN
Inventors: 邹福泰; 姚雨石; 吴嘉玮
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2013-09-03
Filing date: 2013-09-03
Publication date: 2016-08-17
Anticipated expiration: 2033-09-03
Also published as: CN103457799A

Abstract

The microblog zombie user detection method based on relational graph analysis includes a data collection module and a relational graph analysis module; the data collection module is used to collect data of a known zombie user and select sample users from the data. The relationship graph analysis module is used to judge whether the sample user is a zombie user: first, establish a relationship graph between known zombie users and sample users; initialize the user’s malicious score; calculate the user’s relevance value and calculate and update the sample according to the relationship graph and propagation rules Malicious score of the user; judging whether the malicious score of the sample user is spread and converged; judging whether the malicious score of the sample user whose spread is converged is greater than a threshold, and if it is greater than the threshold, the sample user is a zombie user. The invention utilizes the social relationship and the semantic relationship of the zombie user to find and identify other zombie users, thereby improving the discrimination efficiency of the zombie user. The invention is applied to the social network, provides more safe and effective detection service for it, and improves the security of the social network.

Description

Microblog Zombie User Detection Method Based on Relationship Graph

技术领域technical field

本发明涉及一种微博僵尸用户检测方法，尤其涉及一种基于关系图的微博僵尸用户检测方法。The invention relates to a method for detecting a microblog zombie user, in particular to a method for detecting a microblog zombie user based on a relationship graph.

背景技术Background technique

当今科技时代，随着智能通信终端的流行，移动互联网也日益走进人们的日常生活中。当前，社交网络是移动互联网中相当热门的应用，如国外的“Facebook”和“Twitter”等，在中国，在线社交网络也已渐渐成为一个主要的平台，比较熟知和常用的是微博，人们在网络上通过微博搜集信息，结交志同道合的人。微博，即微博客（MicroBlog）的简称，是一种可以即时发布消息的类似博客的系统，是一个基于用户关系的信息分享、传播以及获取平台。微博被称为中国的“Twitter”，因为它类似于Twitter。微博不仅可以满足用户相互推荐彼此感兴趣的网络咨询，还可以关注自己欣赏的名人或朋友，查看别人的最新动态或发表自己的最新言论，从而达到与别人分享快乐的目的，因此微博得到了越来越多人的喜爱。In today's technological era, with the popularity of smart communication terminals, the mobile Internet has increasingly entered people's daily lives. At present, social networking is a very popular application in the mobile Internet, such as "Facebook" and "Twitter" in foreign countries. In China, online social networking has gradually become a major platform. The more familiar and commonly used one is Weibo. People Collect information on the Internet through Weibo and make friends with like-minded people. Microblog, short for MicroBlog, is a blog-like system that can publish messages in real time, and is a platform for information sharing, dissemination, and acquisition based on user relationships. Weibo has been dubbed China's "Twitter" because of its resemblance to Twitter. Weibo can not only meet the needs of users to recommend each other's interested online consultation, but also pay attention to celebrities or friends they admire, check other people's latest developments or express their own latest remarks, so as to achieve the purpose of sharing happiness with others, so Weibo has gained a lot of attention. More and more people love it.

然而，从用户的使用习惯和不同的文化角度来看，微博和“Twitter”是不同的。根据惠普实验室的研究，人们在微博上更喜欢转发信息而不是发表原创的微博，只要分享的有价值，人们在微博上会很热心帮助转发。此外，两者在用户体验方面也不同。在“Twitter”，人们只能分享文本信息，但在微博中，人们还能够分享图片、视频和音频。此外，微博也允许用户回复状态并在同一时间转发，这在“Twitter”是不可行的。However, Weibo and "Twitter" are different from the perspective of users' usage habits and different cultures. According to the research of Hewlett-Packard Labs, people prefer to forward information on Weibo rather than publish original Weibo. As long as the sharing is valuable, people will be very enthusiastic to help forward it on Weibo. Also, the two differ in terms of user experience. In "Twitter", people can only share text information, but in Weibo, people can also share pictures, video and audio. In addition, Weibo also allows users to reply to the status and forward it at the same time, which is not feasible in "Twitter".

随着微博的广泛发展，出现了许多虚假用户，即僵尸用户。这些僵尸用户的出现存在很多方面的原因。一方面，为了满足微博用户的虚荣心和提高个人微博的关注度，有些人选择花钱买一些虚假的用户来提高自己的用户丝数目，这种行为大大降低了用户的个人诚信；另一方面就是有人利用微博的这种检测漏洞大肆贩卖“僵尸用户”，那些在背后控制“僵尸用户”的人通过交易获取不小的利益，致使僵尸用户产业链的产生，为微博带来不小的负面影响。这也是微博与“Twitter”的一个明显区别。With the widespread development of Weibo, many false users, namely zombie users, have appeared. There are many reasons for the emergence of these zombie users. On the one hand, in order to satisfy the vanity of Weibo users and increase the attention of personal Weibo, some people choose to spend money to buy some fake users to increase the number of their own user threads. This behavior greatly reduces the personal integrity of users; On the one hand, some people take advantage of the detection loopholes of Weibo to sell "zombie users" wantonly. Those who control the "zombie users" behind the scenes obtain a lot of profits through transactions, resulting in the emergence of the industrial chain of zombie users, which brings more benefits to Weibo. No small negative impact. This is also an obvious difference between Weibo and "Twitter".

现在，有许多人研究西方社交网站，然而，对中国社交网络的研究却是一片空白。由于微博和“Twitter”的巨大差异，所以，本领域的技术人员致力于开发一种检测微博僵尸用户的方法。Now, many people study western social networking sites, however, research on Chinese social networks is blank. Due to the huge difference between Weibo and "Twitter", those skilled in the art are devoting themselves to developing a method for detecting Weibo zombie users.

发明内容Contents of the invention

有鉴于现有技术的上述缺陷，本发明所要解决的技术问题是提供一种基于可视化关系图分析的微博僵尸用户检测方法。In view of the above-mentioned defects in the prior art, the technical problem to be solved by the present invention is to provide a method for detecting microblog zombie users based on visual relationship graph analysis.

为实现上述目的，本发明提供了一种基于可视关系网络的微博僵尸用户检测方法，其特征在于，包括数据收集模块和关系图分析模块；To achieve the above object, the present invention provides a method for detecting microblog zombie users based on a visual relationship network, which is characterized in that it includes a data collection module and a relationship graph analysis module;

所述数据收集模块用于根据一个已知僵尸用户，收集所述已知僵尸用户的数据，并挑选出样本用户；The data collection module is used to collect the data of a known zombie user according to a known zombie user, and select a sample user;

所述关系图分析模块用于判断所述样本用户是否是僵尸用户，具体包括以下步骤：The relationship graph analysis module is used to judge whether the sample user is a zombie user, specifically including the following steps:

步骤201，将所述已知僵尸用户和所述样本用户的关系属性可视化，作出关系图：所述已知僵尸用户和所述样本用户均作为所述关系图的节点；Step 201, visualize the relationship attributes between the known zombie users and the sample users, and make a relationship diagram: the known zombie users and the sample users are both nodes of the relationship diagram;

步骤202，初始化所述已知僵尸用户和所述样本用户的恶意评分；Step 202, initializing the malicious scores of the known zombie users and the sample users;

步骤203，分析所述关系图的共性，计算所述关系图中各个所述节点的关联性数值，并按照传播规则和所述关系图计算和更新所述样本用户的恶意评分；Step 203, analyzing the commonality of the relationship graph, calculating the correlation value of each node in the relationship graph, and calculating and updating the malicious score of the sample user according to the propagation rules and the relationship graph;

步骤204，判断所述样本用户的所述恶意评分是否传播收敛，如果传播收敛，跳入步骤205；如果不传播收敛，则跳转入步骤203；Step 204, judging whether the malicious score of the sample user is propagated and converged, if the propagation is converged, jump to step 205; if not propagated and converged, then jump to step 203;

步骤205，判断所述样本用户的所述恶意评分是否大于阈值，如果大于所述阈值，则跳转入步骤206；如果小于所述阈值，则所述样本用户被判定为正常用户；Step 205, judging whether the malicious score of the sample user is greater than a threshold, if greater than the threshold, jump to step 206; if less than the threshold, the sample user is judged as a normal user;

步骤206：所述样本用户被判定为僵尸用户；Step 206: The sample user is determined to be a zombie user;

步骤207：处理完毕。Step 207: the processing is completed.

进一步地，所述数据收集模块是通过微博API收集的所述已知僵尸用户的数据。Further, the data collection module is the data of the known zombie users collected through Weibo API.

进一步地，所述已知僵尸用户的数据包括用户粉丝和关注者的名字以及数量。Further, the data of the known zombie users include the names and numbers of the user's fans and followers.

进一步地，所述数据收集模块对所述样本用户的选择是随机的。Further, the selection of the sample users by the data collection module is random.

进一步地，所述数据收集模块选择的所述样本用户的用户粉丝和关注者的数量小于1000。Further, the number of fans and followers of the sample user selected by the data collection module is less than 1000.

进一步地，所述步骤201的所述关系图的相邻节点之间有关注与被关注的关系。Further, the adjacent nodes of the relationship graph in step 201 have a relationship of following and being followed.

进一步地，在所述步骤202中，所述已知僵尸用户的恶意评分初始化为1，所述样本用户的恶意评分初始化为0。Further, in the step 202, the malicious score of the known zombie user is initialized to 1, and the malicious score of the sample user is initialized to 0.

进一步地，在所述步骤203中，所述节点的所述关联性数值为所述节点所对应的用户的粉丝数量的倒数。Further, in the step 203, the relevance value of the node is the reciprocal of the number of fans of the user corresponding to the node.

进一步地，所述步骤203中的所述传播规则包括：Further, the propagation rules in step 203 include:

a)，当计算一个用户的粉丝的恶意评分时，粉丝的恶意评分为用户的恶意评分乘以用户的关联性数值；a) When calculating the malicious score of a user's fans, the fan's malicious score is the user's malicious score multiplied by the user's relevance value;

b），当一个用户关注多个用户时，一个用户的恶意评分是它所关注的多个用户的恶意评分之和。b) When a user follows multiple users, a user's malicious score is the sum of the malicious scores of the multiple users it follows.

进一步地，所述步骤204的所述传播收敛是指所述样本用户的所述恶意评分达到稳定不再发生变化。Further, the convergence of the propagation in step 204 means that the malicious score of the sample user becomes stable and no longer changes.

由于微博缺乏对僵尸用户的检测机制，通过对僵尸用户关系网络的全面分析，所训练出来的检测方法准确度和回归度都较高，综合性能较好。适用于中国社交网络的僵尸用户判别。由于僵尸用户大多数是由系统自动生成，所以他们的ID很大程度上是相似的，此外为了避免被检测到，僵尸用户往往会互相关注使得看上去与正常用户没有区别，因此在他们的社交网络图中会有许多ID相似的用户聚拢在一块，所以如果能事先找到一个僵尸用户那么就很有可能找到与其相关的僵尸用户，大大提升了推理算法的效率。Because Weibo lacks a detection mechanism for zombie users, through a comprehensive analysis of the network of zombie users, the detection method trained has higher accuracy and regression, and better overall performance. Bot user identification for Chinese social networks. Since most zombie users are automatically generated by the system, their IDs are largely similar. In addition, in order to avoid being detected, zombie users often follow each other so that they look no different from normal users. Therefore, in their social There will be many users with similar IDs gathered together in the network graph, so if a zombie user can be found in advance, it is very likely to find related zombie users, which greatly improves the efficiency of the reasoning algorithm.

以下将结合附图对本发明的构思、具体结构及产生的技术效果作进一步说明，以充分地了解本发明的目的、特征和效果。The idea, specific structure and technical effects of the present invention will be further described below in conjunction with the accompanying drawings, so as to fully understand the purpose, features and effects of the present invention.

附图说明Description of drawings

图1是本发明的数据收集模块的处理过程；Fig. 1 is the processing procedure of the data collection module of the present invention;

图2是本发明的关系图分析模块的处理流程图。Fig. 2 is a processing flowchart of the relationship graph analysis module of the present invention.

具体实施方式detailed description

下面结合附图对本发明的实施例作详细说明：本实施例在以本发明技术方案前提下进行实施，给出了详细的实施方式和具体的操作过程，但本发明的保护范围不限于下述的实施例。Below in conjunction with accompanying drawing, the embodiment of the present invention is described in detail: present embodiment implements under the premise of the technical scheme of the present invention, has provided detailed implementation and specific operation process, but protection scope of the present invention is not limited to the following the embodiment.

本发明的一种基于关系图的微博僵尸用户检测方法，分为两大模块：数据收集模块和关系图分析模块。A method for detecting microblog zombie users based on a relationship diagram of the present invention is divided into two modules: a data collection module and a relationship diagram analysis module.

数据收集模块的处理流程具体如图1所示。首先，通过微博的API（ApplicationProgramming Interface，应用程序编程接口）101从已知僵尸用户的账户开始，收集已知僵尸用户的数据102，即该僵尸用户的用户粉丝和关注者的用户名和数量；然后，收集已知僵尸用户的用户粉丝和关注者的数据；最后，从已知僵尸用户的用户粉丝和关注者中选择样本用户，并将样本用户和已知僵尸用户数据存入关系数据库中，其中样本用户是随机选择的。在本发明的较佳实施例中，为了保证选择的随机性，在已知僵尸用户的用户粉丝和关注者中，选择其用户粉丝和关注者不超过1000的用户作为样本用户。The processing flow of the data collection module is shown in Figure 1. First, collect the data 102 of known zombie users from the accounts of known zombie users through Weibo API (Application Programming Interface, application programming interface) 101, that is, the user names and numbers of fans and followers of the zombie user; Then, collect the data of user fans and followers of known zombie users; finally, select sample users from the user fans and followers of known zombie users, and store the data of sample users and known zombie users in the relational database, The sample users are randomly selected. In a preferred embodiment of the present invention, in order to ensure the randomness of selection, among the fans and followers of known zombie users, users whose fans and followers do not exceed 1000 are selected as sample users.

数据收集模块的处理是通过人工登录微博来获取的用户信息：每一个微博用户都有自己的用户名，并且根据用户名，每个用户都有一个个人页面的链接：http://weibo.com/userid，登录到该页面中，用户的数据（用户粉丝和关注者）都能够一目了然的查到。The processing of the data collection module is to obtain user information through manual login to Weibo: each Weibo user has its own user name, and according to the user name, each user has a link to a personal page: http://weibo .com/userid, log in to this page, the user's data (user fans and followers) can be checked at a glance.

关系图分析模块的处理流程如图2所示，具体包括以下步骤：The processing flow of the relationship graph analysis module is shown in Figure 2, which specifically includes the following steps:

步骤201，将已知僵尸用户和样本用户的关系属性可视化，作出关系图：Step 201, visualize the relationship attributes between known zombie users and sample users, and make a relationship diagram:

将每个用户（包括已知僵尸包括已知僵尸用户和样本用户）视作一个节点，对于任意两个用户，如果他们之间有关注与被关注的关系，则这两个节点之间由一条有向线段相连，方向由粉丝指向被关注的用户。Each user (including known zombie users and sample users) is regarded as a node. For any two users, if there is a relationship between them and being followed, there is a link between the two nodes Directed line segments are connected, and the direction is from fans to followed users.

步骤202，初始化每个节点用户（包括已知僵尸用户和样本用户）的恶意评分：设定已知僵尸用户的恶意评分为1，设定每个样本用户的恶意评分为0；Step 202, initialize the malicious score of each node user (including known zombie users and sample users): set the malicious score of known zombie users to 1, and set the malicious score of each sample user to 0;

步骤203，分析关系图的共性并得出僵尸用户的社会关系，计算和更新样本用户的恶意评分：Step 203, analyze the commonality of the relationship graph and obtain the social relationship of the zombie users, calculate and update the malicious score of the sample users:

1）计算用户的关联性数值：统计每一个用户的粉丝数量，并将这个数量取倒数即为该用户与其粉丝之间的关联性数值；1) Calculate the user's correlation value: count the number of fans of each user, and take the inverse of this number to get the correlation value between the user and its fans;

2）将该关联性数值作为该用户与其相邻用户的关系图的边的权值；2) The relevance value is used as the edge weight of the relationship graph between the user and its adjacent users;

3）根据关联性数值和已知僵尸用户的恶意评分按传播规则计算样本用户的恶意评分，其传播规则为：a）当计算一个用户的粉丝的恶意评分时，粉丝的恶意评分是用户的恶意评分乘以用户的关联性数值；b）当一个用户关注多个用户时，该用户的评分将是该用户所关注的所有用户的恶意评分之和；3) Calculate the malicious score of the sample user according to the correlation value and the malicious score of known zombie users according to the propagation rules. The score is multiplied by the user's relevance value; b) When a user follows multiple users, the user's score will be the sum of the malicious scores of all users followed by the user;

4）按照关系图迭代计算，更新每个样本用户的恶意评分。4) Iteratively calculate according to the relationship graph, and update the malicious score of each sample user.

步骤204，判断样本用户的恶意评分是否达到稳定值不再变化，即为传播收敛：如果传播收敛，则跳转至步骤205；如果没有达到传播收敛，则跳转至步骤203。Step 204, judging whether the malicious score of the sample user reaches a stable value and does not change, that is, the propagation is converged: if the propagation is converged, go to step 205; if not, go to step 203.

步骤205，对每个样本用户此时的恶意评分值进行判断：如果恶意评分大于阈值（阈值确定可由启发性实验确定），则跳转如步骤206；如果恶意评分小于阈值，则该用户被判定为正常用户。Step 205, judge the malicious score value of each sample user at this time: if the malicious score is greater than the threshold (the threshold can be determined by a heuristic experiment), then jump to step 206; if the malicious score is smaller than the threshold, the user is judged for normal users.

步骤206：该样本用户被判定为僵尸用户。Step 206: The sample user is determined to be a zombie user.

步骤207：僵尸用户判定结束。Step 207: The determination of the zombie user ends.

以上详细描述了本发明的较佳具体实施例。应当理解，本领域的普通技术无需创造性劳动就可以根据本发明的构思作出诸多修改和变化。因此，凡本技术领域中技术人员依本发明的构思在现有技术的基础上通过逻辑分析、推理或者有限的实验可以得到的技术方案，皆应在由权利要求书所确定的保护范围内。The preferred specific embodiments of the present invention have been described in detail above. It should be understood that those skilled in the art can make many modifications and changes according to the concept of the present invention without creative efforts. Therefore, all technical solutions that can be obtained by those skilled in the art based on the concept of the present invention through logical analysis, reasoning or limited experiments on the basis of the prior art shall be within the scope of protection defined by the claims.

Claims

1. A microblog zombie user detection method based on a visual relational network, characterized in that it comprises a data collection module and a relational graph analysis module;

The data collection module is used to collect the data of a known zombie user according to a known zombie user, and select a sample user;

The relationship graph analysis module is used to judge whether the sample user is a zombie user, specifically including the following steps:

Step (201), visualize the relationship attributes between the known zombie users and the sample users, and make a relationship graph: the known zombie users and the sample users are both nodes of the relationship graph;

Step (202), initializing the malicious scores of the known zombie users and the sample users;

Step (203), analyzing the commonality of the relationship graph, calculating the correlation value of each node in the relationship graph, and calculating and updating the malicious score of the sample user according to the propagation rules and the relationship graph;

Step (204), judging whether the malicious score of the sample user is propagated and converged, if propagated and converged, jump to step (205); if not propagated and converged, then jump to step (203);

Step (205), judging whether the malicious score of the sample user is greater than a threshold, if greater than the threshold, jump to step (206); if less than the threshold, then the sample user is determined to be a normal user ;

Step (206), the sample user is determined to be a zombie user;

Step (207), the processing is completed.

2. The microblog zombie user detection method according to claim 1, wherein the data collection module collects the data of the known zombie users through a microblog API.

3. The method for detecting microblog zombie users as claimed in claim 1, wherein the data of said known zombie users includes names and numbers of fans and followers of the user.

4. The microblog zombie user detection method according to claim 1, wherein the selection of the sample users by the data collection module is random.

5. The microblog zombie user detection method as claimed in claim 1, wherein the number of user fans and followers of the sample user selected by the data collection module is less than 1000.

6. The method for detecting microblog zombie users according to claim 1, wherein there is a relationship of following and being followed between adjacent nodes of the relationship graph in the step (201).

7. The microblog zombie user detection method as claimed in claim 1, wherein, in the step (202), the malicious score of the known zombie user is initialized to 1, and the malicious score of the sample user is initialized to 0 .

8. The microblog zombie user detection method according to claim 1, wherein, in the step (203), the relevance value of the node is the reciprocal of the number of fans of the user corresponding to the node.

9. microblog zombie user detection method as claimed in claim 1, wherein, described propagation rule in described step (203) comprises:

a) When calculating the malicious score of a user's fans, the fan's malicious score is the user's malicious score multiplied by the user's relevance value;

b) When a user follows multiple users, the malicious score of a user is the sum of the malicious scores of the multiple users it follows.

10. The microblog zombie user detection method according to claim 1, wherein the propagation convergence described in the step (204) means that the malicious score of the sample user reaches a stability and no longer changes.