CN103294833A - Junk user discovering method based on user following relationships - Google Patents

Junk user discovering method based on user following relationships Download PDF

Info

Publication number
CN103294833A
CN103294833A CN2013102689495A CN201310268949A CN103294833A CN 103294833 A CN103294833 A CN 103294833A CN 2013102689495 A CN2013102689495 A CN 2013102689495A CN 201310268949 A CN201310268949 A CN 201310268949A CN 103294833 A CN103294833 A CN 103294833A
Authority
CN
China
Prior art keywords
user
rubbish
close attention
local triangle
ratio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013102689495A
Other languages
Chinese (zh)
Other versions
CN103294833B (en
Inventor
丁兆云
贾焰
杨树强
周斌
韩伟红
李爱平
韩毅
李莎莎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201310268949.5A priority Critical patent/CN103294833B/en
Publication of CN103294833A publication Critical patent/CN103294833A/en
Application granted granted Critical
Publication of CN103294833B publication Critical patent/CN103294833B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

A junk user discovering method based on user following relationships comprises: obtaining following relationships among users; for any first user, calculating the number of local triangles of the first user on the basis of the following relationships, wherein any one of the local triangles is formed by the first user and two other users, the first user follows one of the two other users, and the two other users are in following relationship; calculating the proportion of the local triangles of the first user according to the number of the local triangles of the first user; judging whether the first user is a junk user at least partially on the basis of the proportion of the local triangles of the first user.

Description

Rubbish user discover method based on user's concern relation
Technical field
The present invention relates to the web excavation applications, relate in particular to rubbish user or rubbish account discover method based on user's concern relation.
Background technology
The microblogging service of class Twitter is developed rapidly as a new telecommunication media recently, add up according to the 29th China Internet report: by in by the end of December, 2011, China's microblogging actual user number reaches 2.5 hundred million, and increased by 296.0% a last end of the year, and netizen's utilization rate is 48.7%.Be different from the social networking service of other classes Facebook, it is unidirectional that the community network of microblogging service closes, and the user does not need other users to give authority to it just can " pay close attention to " them.For example, community network is formed by the relation of paying close attention among the Twitter, and the person that the user pays close attention to is this user's good friend or pays close attention to the good friend; Pay close attention to certain user's person and be this user's bean vermicelli, all blog articles of user's issue will appear on line common time, will show all message of this user on the timeline of these all beans vermicelli of user.
Along with popularizing of microblogging service, existing in a large number to spy privacy information, business marketing, to raise user's popularity etc. is the artificial rubbish user of purpose.These a large amount of rubbish users make microblogging service provider's account resource be subjected to impact, have strengthened the difficulty of account executive, have improved account development of resources and handling cost.For example, a large amount of rubbish user makes the microblogging service provider have to spend more hardware resource or human cost is carried out account management.Simultaneously, a large amount of existence of these rubbish user have also brought interference to the use of normal users.Therefore, all the time, people's expectation can find that the rubbish user in the microblogging is in order to carry out suitable processing to it.
Rubbish user discover method is mainly judged based on user's explicit statistical nature in traditional microblogging, such as mentioning (@userScreenName in good friend's quantity of the rule of posting, concern and its bean vermicelli quantitative proportion, the blog article) other user's ratios etc.These methods are for example:
At list of references 1 " Chu Z; Gianvecchio S; Wang H; et al.Who is tweeting on Twitter:human; bot; or cyborg [C] .Proc of the26th Annual Computer Security Applications Conference.ACM, 2010:21-30. " in rely on user among the Twitter to issue blog article explicit statistical property distinguish the rubbish robot; humanoid robot and normal users, utilize the rule of posting; good friend's quantity of concern and its bean vermicelli quantitative proportion; mention (@userScreenName in the blog article) other user's ratios etc. identify the rubbish user.
List of references 2 " McCord M; Chuah M.Spam Detection on Twitter Using Traditional Classifiers[C] .Proc of the8th International Conference on Autonomic and Trusted Computing.NJ:IEEE; 2011:175-186. " in, utilize user characteristics and blog article characteristic Design sorter to distinguish normal users and rubbish user, sorter adopts bayes classification method.
List of references 3 " Stringhini G; Kruegel C; Vigna G.Detecting spammers on social networks[C] .Proc of the26th Annual Computer Security Applications Conference.ACM; 2010:1-9. " in analyzed rubbish user's the behavior of posting, rely on explicit statistical property identification rubbish user and the activity of large scale rubbish whole user.
In list of references 4 " Thomas; K; Grier; C; Paxson; V, et al.Suspended Accounts in Retrospect:An Analysis of Twitter Spam[C] .Proc of the2011ACM SIGCOMM conference on Internet measurement conference.New York:ACM, 2011:243-258. ", utilize the number of the account of suspending among the Twitter to analyze the rubbish user personality.
The rubbish user that this paper will use above-mentioned classic method to find based on user's explicit statistical nature is called explicit rubbish user.Above-mentioned classic method can be found the rubbish user really to a certain extent, but because its algorithm is comparatively coarse (for example, only consider some explicit statistical natures), therefore high reliability on the probability can not be provided, for example, it may omit a large amount of rubbish users, and perhaps, it may be the rubbish user with a large amount of normal users erroneous judgements.Particularly, use along with above-mentioned these traditional rubbish user discover methods, the people that some malice are made the rubbish user has also correspondingly taked countermeasure, make the rubbish user be similar to normal users aspect the explicit statistical nature more, for example, make the rubbish user have a large amount of good friends and bean vermicelli equally, this has caused the more complicated of rubbish user characteristics, also is difficult to distinguish exactly rubbish user and normal users more.This type of can be called implicit expression rubbish user the rubbish user that comparing class aspect the explicit statistical nature is similar to normal users in this article.
Therefore, in order to remedy the deficiency of traditional rubbish user discover method, a kind of method that can find the rubbish user in the microblogging (particularly implicit expression rubbish user) more accurately need be provided, in order to make the microblogging service provider to handle accordingly these rubbish user, thereby save hardware resource or human cost that the microblogging service provider is used for account management, simultaneously, also avoid these rubbish user to the interference of normal users.
Summary of the invention
One aspect of the present invention relates to a kind of rubbish user discover method that concerns based on user's concern, and it comprises: obtain the concern relation between user and the user; For arbitrary first user, the quantity that concerns to add up described first user's local triangle based on described concern, wherein, in the described local triangle any one is made of described first user and two other user, and wherein, described first user pays close attention to each among described two other user, and also has the relation of concern between described two other user; Calculate local triangle's ratio of described first user according to the quantity of described first user's local triangle; And judge based on local triangle's ratio of described first user whether described first user is the rubbish user at least in part.
Preferably, the quantity of described local triangle according to described first user local triangle's ratio of calculating described first user comprises: local triangle's ratio of calculating described first user between described first user and its other users that pay close attention to according to the maximum quantity of the quantity of described first user's local triangle and described first user's that can form local triangle; Perhaps, other number of users of paying close attention to according to quantity and described first user of described first user's local triangle are calculated local triangle's ratio of described first user.
Preferably, describedly judge based on local triangle's ratio of described first user whether described first user is that the rubbish user comprises at least in part: if local triangle's ratio of described first user is lower than predetermined threshold, judge that then described first user is the rubbish user.
Preferably, judge that whether described first user is that the rubbish user is further based on the trust forward-propagating process between the user and/or trust reverse communication process.
Preferably, described trust forward-propagating process comprises: determine normal users kind child node; Determine described normal users kind child node all nodes of directly paying close attention to or pay close attention to indirectly, wherein, described normal users kind child node institute directly pay close attention to or indirectly the node of concern to have higher probability be normal users; The reverse communication process of described trust comprises: determine that the rubbish user plants child node; Determine directly to pay close attention to or to pay close attention to all nodes that described rubbish user plants child node indirectly, wherein, directly paying close attention to or pay close attention to indirectly node that described rubbish user plants child node, to have higher probability be the rubbish user.
Another aspect of the present invention relates to a kind of rubbish user discovering device that concerns based on user's concern, and it comprises: the device that is used for obtaining the concern relation between user and the user; Be used for for arbitrary first user, the device of quantity that concerns to add up described first user's local triangle based on described concern, wherein, in the described local triangle any one is made of described first user and two other user, and wherein, described first user pays close attention to each among described two other user, and also has the relation of concern between described two other user; Be used for calculating according to the quantity of described first user's local triangle the device of local triangle's ratio of described first user; And be used for judging based on local triangle's ratio of described first user whether described first user is rubbish user's device at least in part.
Preferably, describedly comprise for the device that calculates local triangle's ratio of described first user according to the quantity of described first user's local triangle: the device that is used for calculating according to the maximum quantity of the quantity of described first user's local triangle and described first user's that between described first user and its other users that pay close attention to, can form local triangle local triangle's ratio of described first user; Perhaps, be used for calculating according to the quantity of described first user's local triangle and other number of users that described first user pays close attention to the device of local triangle's ratio of described first user.
Preferably, described for judging that based on local triangle's ratio of described first user whether described first user is that rubbish user's device comprises at least in part: as to be lower than predetermined threshold if be used for local triangle's ratio of described first user, to judge that then described first user is rubbish user's device.
Preferably, judge that whether described first user is that the rubbish user is further based on the trust forward-propagating process between the user and/or trust reverse communication process.
Preferably, described trust forward-propagating process comprises: determine normal users kind child node; Determine described normal users kind child node all nodes of directly paying close attention to or pay close attention to indirectly, wherein, described normal users kind child node institute directly pay close attention to or indirectly the node of concern to have higher probability be normal users; The reverse communication process of described trust comprises: determine that the rubbish user plants child node; Determine directly to pay close attention to or to pay close attention to all nodes that described rubbish user plants child node indirectly, wherein, directly paying close attention to or pay close attention to indirectly node that described rubbish user plants child node, to have higher probability be the rubbish user.
Description of drawings
Be described in detail with reference to the attached drawings the present invention, should be appreciated that accompanying drawing and corresponding description are appreciated that it is illustrative and nonrestrictive, wherein:
Fig. 1 shows two kinds of different expression forms of link structure of microblogging user's concern network, wherein, Fig. 1 (a) shows the link structure of the concern network of a normal users A, and Fig. 1 (b) shows the link structure of the concern network of a typical rubbish user B;
Fig. 2 shows the example distribution of local triangle's ratio of normal users and rubbish user;
Fig. 3 shows according to an exemplary rubbish user discover method of the present invention;
Fig. 4 shows 3 kinds of relations of two nodes among the microblogging user directed networks figure;
Fig. 5 shows 3 kinds of modes that the microblogging user forms local triangle;
It is that 0 user's time interval Δ t distributes that Fig. 6 shows in experiment once local triangle's ratio.
Embodiment
Use to come preferred implementation of the present invention is elaborated as example with microblogging below in conjunction with accompanying drawing, but be appreciated that the present invention is equally applicable to exist other network applications of the relation of concern between the user.
Understand as people, the microblogging service not only shows as the social networks function, is more prone to show as news media's function, and the user not only can make friends by the microblogging service, and each user can become the role of news media and releases news.Infer that thus normal users can show two kinds of behaviors usually when using the microblogging service: 1) release news; 2) make friends by the microblogging service, to seek more valuable information.
Statistics shows that the most of normal users in the microblogging (about 90%) can be sought new good friend according to its present good friend, thereby causes user and its good friend to form local triangle.For example, the link structure of the concern network of a normal users A has been shown in Fig. 1 (a), wherein, between user A, c, the d, all forming local triangle between user A, e, the f and between user A, g, h.Be example with the local triangle that between user A, e, f, forms, its forming process may be as follows: user A has at first paid close attention to user e(namely, e is the good friend of A, A is the bean vermicelli of e), user A finds that user e has paid close attention to user f, makes user f also become its good friend thereby user A may correspondingly pay close attention to user f then; Perhaps, user A has at first paid close attention to user f, comprises user e in the bean vermicelli of user A discovery user f then, makes user e also become its good friend thereby user A may correspondingly pay close attention to user e.From as can be seen last, normal users can rely on 2 (2-hops) relations of jumping to seek more valuable good friend, finally is formed on three local triangles between the user.
But the rubbish user in the microblogging does not have more good friend or bean vermicelli usually, though perhaps have some good friends or bean vermicelli, just do not have purpose yet, optionally pay close attention to other good friends, concern the more valuable good friend of searching and seldom can jump (2-hops) according to 2 on one's own initiative.The link structure of the concern network of a typical rubbish user B has been shown in Fig. 1 (b), and it may just optionally pay close attention to some good friends when creating, but and then according to the good friend does not seek new good friend.
In order further above-mentioned inference to be verified, can add up local triangle's ratio distribution respectively at (the perhaps confirmed rubbish user) sample of the explicit rubbish user in the microblogging and normal users sample.Fig. 2 shows the once result of statistics who carries out; its local triangle's ratio that shows the rubbish user more than 70% is distributed in [0; 0.1] between; illustrate that the rubbish user makes friends according to 2 (2-hops) relations of jumping with very low probability usually; and local triangle's ratio of about 80% normal users is distributed in [0.1; 0.5] between, illustrate that normal users concerns friend-making through regular meeting according to 2 jumpings (2-hops).Therefore, have evident difference between local triangle's ratio of rubbish user and normal users, the rubbish user makes friends by 2 (2-hops) relations of jumping with lower probability with respect to normal users.For local triangle mentioned above ratio, it is proportional to the quantity of user's local triangle usually.Can provide exemplary definition or the computing formula of local triangle's ratio among the embodiment hereinafter, but the definition of this local triangle's ratio or computing formula can be chosen or adjust according to actual conditions, are not limited to the situation shown in this paper embodiment.
Statistics above and experimental result show based on or local triangle's ratio of being based in part on the user can be distinguished normal users and rubbish user more exactly.Hereinafter with reference Fig. 3 has described an exemplary rubbish user discover method of the present invention.
In step 301, obtain the concern relation between user and the user.
In order to obtain local triangle's ratio of user, need at first obtain local triangle's quantity of user.Can be by extracting local triangle's quantity that concern between microblogging user and the user concern to calculate the user, preferably, can carry out above-mentioned calculating for directed networks figure with the relation of the concern between user and the user is abstract.For example, the network for the microblogging user constitutes can be configured with to network chart G d=(V d, E d), V wherein dRepresent this directed networks figure G dIn node set (also being user or account aggregation), E dRepresent this directed networks figure G dIn each node between directed edge set, the concern relation between its representative of consumer.For two example endpoint u among this directed networks figure and v, there are 3 kinds of relations (not considering not exist between u and the v relation on limit) shown in Figure 4, particularly, Fig. 4 (a) expression v is the concern good friend of u; Fig. 4 (b) expression v is the bean vermicelli of u; Fig. 4 (c) expression u and v are mutual powder friend.
In step 302, for arbitrary first user, the quantity that concerns to add up this first user's local triangle based on described concern, wherein, in the described local triangle any one is made of this first user and two other user, and wherein, this first user pays close attention to each among described two other user, and also has the relation of concern between described two other user.
Particularly, local triangle's quantity of user being added up mainly is the situation of judging that this user makes friends by 2 relations of jumping.Make user v be the concern good friend of user u, then user u can make friends according to the concern good friend of user v, also can make friends according to the bean vermicelli of user v, also can make friends according to the mutual powder friend of user v simultaneously.Referring to shown in Figure 5, in (a) of Fig. 5, user u pays close attention to the good friend with the concern good friend w of user v as it, in (b) of Fig. 5, user u pays close attention to the good friend with the bean vermicelli w of user v as it, in (c) of Fig. 5, user u pays close attention to the good friend with the mutual powder friend w of user v as it.Shown in Fig. 5 three types are may type about the various of the local triangle of user u.Therefore, as can be seen, the local triangle of user u by user u with and two other user v and the w(that are paid close attention to also be, two good friend v of this user u and w) constitute, and also have the relation of concern between described two user v and w, for example, user v pays close attention to user w, perhaps user w pays close attention to user v, and perhaps user v and w pay close attention to mutually.
For the directed networks figure that is constituted by user u, v and w, by considering three types local triangle shown in Figure 5, can be defined as the local triangle's quantity about user u as follows: T (u)=| { e Vw∈ E d, e Uv∈ E d, e Uw∈ E d|+| { e Wv∈ E d, e Uv∈ E d, e Uw∈ E d|-| { e Vwwv∈ E d, e Uv∈ E d, e Uw∈ E d| wherein, e UvExpression user u pays close attention to user v, e UwExpression user u pays close attention to user w, e VwExpression user v pays close attention to user w, e WvExpression user w pays close attention to user v, e VwwvThereby pay close attention to mutually between expression user v and the w and form mutual powder relation and (be equal to " e VwAnd e Wv"), E dRepresent the directed edge set between each node among this directed networks figure.As can be seen, | { e Vw∈ E d, e Uv∈ E d, e Uw∈ E d| consideration be the local triangle of Fig. 5 (a) or Fig. 5 (c) shown type, | { e Wv∈ E d, e Uv∈ E d, e Uw∈ E d| consideration be the local triangle of Fig. 5 (b) or Fig. 5 (c) shown type, and | { e Vwwv∈ E d, e Uv∈ E d, e Uw∈ E d| consideration be the local triangle of Fig. 5 (c) shown type because it has been repeated to calculate in front, therefore need be deducted.
Above show local triangle's quantity of user u wherein of method how to calculate to(for) three users.Below this method is expanded to the directed networks figure G that is constituted by all microblogging users dSituation, its with local triangle quantity statistics conversion of directed networks figure for set operation.Particularly, for directed networks figure G d=(V d, E d) in a certain user u, can use Fr (u) to represent the concern good friend's that this user u is all set, i.e. Fr (u)={ w ∈ V d: e Uw∈ E d.For arbitrary concern good friend v of user u, correspondingly, can use Fr (v) represent the concern good friend's that this user v is all set, namely Fr (v)={ w ∈ V d: e Vw∈ E d.In addition, can further use Fo (v)={ w ∈ V d: e Wv∈ E dRepresent the set of all beans vermicelli of user v, use F (v)={ w ∈ V d: e Wv∈ E dAnde Vw∈ E dRepresent all mutual powder good friends' of user v set.
Like this, can be with directed networks figure G d=(V d, E d) in all local leg-of-mutton quantity of node u be calculated as:
N Tri = 1 2 ( Σ v ∈ Fr ( u ) | Fr ( u ) ∩ Fr ( v ) | + Σ v ∈ Fr ( u ) | Fr ( u ) ∩ Fo ( v ) | - Σ v ∈ Fr ( u ) | Fr ( u ) ∩ F ( v ) | )
In the above-mentioned formula | Fr (u) ∩ Fr (v) | consideration be the local triangle of Fig. 5 (a) or Fig. 5 (c) shown type, | Fr (u) ∩ Fo (v) | consideration be the local triangle of Fig. 5 (b) or Fig. 5 (c) shown type, | Fr (u) ∩ F (v) | consideration be the local triangle of Fig. 5 (c) shown type.
By above-mentioned formula, the local triangle's quantity statistics of the user among the directed networks figure is changed for 3 intersection of sets computings.An exemplary specific algorithm realizing above-mentioned formula is as follows, wherein, two intersection of sets set operations need iteration m time, while is at two group nodes of every pair of relation, calculate good friend's set of one of them node and good friend, bean vermicelli, the mutual powder friend intersection of sets collection quantity of another one node, be that every pair of relation all needs intersection operation 3 times, so the whole algorithm time overhead is O (m|E|).
Figure BDA00003426908200081
After the quantity of the local triangle that has obtained arbitrary first user by step 302, can be in step 303, calculate local triangle's ratio of this first user according to the quantity of this first user's local triangle.
For example, for given user u, be N if calculated local triangle's quantity of this user u Tri, and know that all concern good friends' of this user u quantity is N Fr, then the ratio TRatio of local triangle of this user u can be defined as follows:
TRatio = N Tri C N fr 2 N fr ≥ 2 0 N fr = 1
Wherein
Figure BDA00003426908200093
The maximum quantity of the local triangle of the user u that all good friends of expression user u and its concern can form, that is:
C N fr 2 = N fr × ( N fr - 1 ) 2 , N fr ≥ 2
Then, in step 304, can judge based on local triangle's ratio of this first user whether it is the rubbish user at least in part.For example, by using the ratio TRatio of local triangle of the user u that obtains, can whether be that the rubbish user makes judgement to user u.
Need be appreciated that above the definition to the ratio TRatio of local triangle only is an example, can use any other rational definition.For example, can calculate the ratio TRatio of this local triangle based on the ratio of good friend's quantity of local triangle's quantity of a certain user and this user.In addition, in certain embodiments, can come whether the user is belonged to the rubbish user further combined with other factors (for example the blog article quantity delivered of user, good friend's quantity of user, bean vermicelli quantity of user etc.) and comprehensively judge.
Above described by local triangle's ratio of calculating the user to come whether the user is belonged to the method that the rubbish user judges.In order further to improve accuracy and the reliability of judging, among the embodiment hereinafter, further considered trust forward-propagating model and/or trust reverse propagation model and judge whether the user belongs to the rubbish user.Trust the forward-propagating model and trust reverse propagation model respectively based on following two statistical laws among the microblogging user: 1) normal users is paid close attention to normal users usually; 2) concern rubbish user's user is generally the rubbish user.
Trust forward-propagating
At first as kind of child node (for example confirm the part normal users according to user's statistical nature, it is normal users that the microblogging user that some influence power is bigger can be confirmed as very for certain, thereby can be used as kind of a child node), determine other nodes that these kinds child node is paid close attention to then, these other nodes are node set that the direct propagation of kind of child node can reach, then determine the other node that these other nodes are paid close attention to, the rest may be inferred, up to not obtaining the node that new propagation can reach again.Finally, just can obtain all and can propagate the node that can reach by kind of child node, also, plant the directly or indirectly node of concern of child node institute.These by kind of child node the node directly or indirectly paid close attention to be considered to have higher probability be normal users.
Can obtain to propagate the node that can reach by kind of child node by various models or algorithm, in preferred embodiment more, the forward that can also further calculate each node gets score value.For example, in one embodiment, after confirming kind of child node, can get score value according to the forward that the random walk model calculates other each nodes.In order to guarantee the randomness of transition matrix M, beta pruning is fallen all can not propagate other nodes that can reach by kind of child node, guarantee that namely each node can both can be reached by kind of a child node propagation, thereby the beta pruning algorithm is converted to kind of a child node broadcast algorithm.Particularly, definition initial seed node set is Set 0, it is Set that the kind child node is propagated the node set that can reach Done, the seed node set of handling in the iterative process is Set Seed, the seed node set that increases newly is Set Temp, a kind of exemplary kind child node broadcast algorithm Seed_Diffusion is as follows.
Figure BDA00003426908200101
Above-mentioned kind of child node broadcast algorithm Seed_Diffusion is an iterative algorithm, and in the time can not obtaining the node that new propagation can reach again, iteration stops, and last algorithm obtains all and propagated other node set Set that can reach by the initial seed node Done, directed networks figure G then d=(V d, E d) in comprise node set Set DoneSubgraph
Figure BDA00003426908200111
In the forward score of each node can be calculated as:
r → = α · T s · r → + ( 1 - α ) · d →
Wherein
Figure BDA00003426908200113
Ts is subgraph Transition matrix, α is the redirect factor at random,
Figure BDA000034269082001113
Be the seed knot vector, above-mentioned formula is converted to comprehensive matrix form and is:
r k → = r k - 1 → · M s
M wherein sBe subgraph
Figure BDA00003426908200116
The middle comprehensive matrix of considering the redirect factor, that is:
M s = α · T s + ( 1 - α ) · d → · e
Wherein, e is vector of unit length.Because T sIn except kind of child node, other nodes all can be propagated and can be reached by kind of child node, namely exist other nodes to point to this node at random certainly; Simultaneously
Figure BDA00003426908200118
The expression algorithm jumps to kind of a child node with probability (1-α), namely exists other nodes to jump to kind of a child node at random certainly, can not propagate the comprehensive matrix M that other nodes that can reach obtain by kind of child node so beta pruning is fallen all sBe the randomness matrix.
In computation process, the ballot contribution with user of different statistical natures can be set at difference, for example, according to user's statistical nature, the user can be fallen into 5 types: 1) approach normal users; 2) doubtful normal users; 3) uncertain user; 4) doubtful rubbish user; 5) approach the rubbish user, above-mentioned 5 class users vote the contribution descending.Make z (i) be the ballot contribution margin of each user's statistical nature, and make node set Set DoneIn all nodes ballot contribution margin z (i) be diagonal matrix Z=diag (z), the forward score A ttriGoodRank of defined node is as follows:
r → = α · Z · T s · r → + ( 1 - α ) · d →
The AttriGoodRank score is converted to comprehensive matrix form:
r k → = r k - 1 → · H
Wherein H is for considering the comprehensive matrix of user's statistical nature ballot contribution margin, that is:
H = α · Z · T s + ( 1 - α ) · d → · e
Because of diagonal matrix Z and subgraph Transition matrix T sProduct only change the value of ballot at random with certain proportion, can not reduce the linking relationship in the network, so comprehensive matrix H still be the randomness matrix, so the AttriGoodRank score is calculated as the Markov process that each attitude travels through.Therefore, a given initial vector
Figure BDA000034269082001112
By n iteration, result of calculation will restrain gradually.
Trust reverse propagation
It is similar on principle with the trust forward-propagating to trust reverse propagation, just it confirms that according to user's statistical nature part rubbish user is as kind of a child node, in addition, because its based on statistical law be " user who pays close attention to the rubbish user is generally the rubbish user ", therefore it is propagated to its bean vermicelli direction from node, and planting child node by described rubbish user, to propagate the node that can reach be directly to pay close attention to or pay close attention to indirectly the node that these rubbish user plants child node.
Equally, in a more preferred embodiment, can calculate the reverse score value that gets of each node according to the random walk model.In order to guarantee the randomness of transition matrix, same beta pruning is fallen all and can not be planted child node by the rubbish user and propagate other nodes that can reach, thereby guarantees that each node can both plant child node by the rubbish user and propagate and can reach.The beta pruning algorithm is converted to kind of a child node broadcast algorithm Seed_Diffusion, and the difference of the kind child node broadcast algorithm that uses in itself and the trust forward-propagating mentioned above is: to the seed node set Set that is handling in the iterative process SeedIn each element i, need obtain all bean vermicelli node follower of element i i
Algorithm obtains propagating other node set Set that can reach by kind of child node at last Bad, community network figure G then d=(V d, E d) in comprise node set Set BadSubgraph
Figure BDA00003426908200121
In the reverse rank of each node get score value and can be calculated as:
b k → = b k - 1 → · M b
Wherein
Figure BDA00003426908200123
M bBe subgraph
Figure BDA00003426908200124
The middle comprehensive matrix of considering the redirect factor, that is:
M b = α · T b + ( 1 - α ) · d → · e
T wherein bBe subgraph
Figure BDA00003426908200126
Transition matrix,
Figure BDA00003426908200127
Be the seed knot vector, e is vector of unit length, same M bBe the randomness matrix.
In computation process, the reverse ballot contribution with user of different statistical natures can be set at difference, for example, the rubbish user votes and contributes maximum, is uncertain user secondly, normal users ballot contribution minimum.Make x (i) be the reverse propagation ballot contribution margin of each user's statistical nature, and make node set Set BadIn the ballot contribution margin x (i) of the reverse propagation of all nodes be diagonal matrix X=diag (x), the reverse score A ttriBadRank of this paper defined node is as follows:
b → = α · X · T b · b → + ( 1 - α ) · d →
AttriBadRank is converted to comprehensive matrix form:
b k → = b k - 1 → · B
Wherein B is for considering the comprehensive matrix of the reverse ballot contribution margin of user's statistical nature, that is:
B = α · X · T b + ( 1 - α ) · d → · e
Can prove that equally comprehensive matrix B is the randomness matrix, so the AttriBadRank score is calculated as the Markov process of each attitude traversal.Therefore, a given initial vector
Figure BDA00003426908200131
By n iteration, result of calculation will restrain gradually.
Experimental result
In the once experiment based on rubbish user discover method of the present invention, at first from 200,000 users, extract local triangle's ratio and be 3901 users of 0, analyzed the relation of these users' number of the account creation-time (createTime) with issue the last item blog article time (lastPostTime) then.Particularly, make the time interval (time interval) be: Δ t=lastPostTime-createTime, then 3901 users' time interval Δ t distributes as shown in Figure 6.The result of Fig. 6 shows these user's major parts are issued a small amount of blog article when number of the account is created after and just never issues blog article, about 59.4% user creates in 1 hour in number of the account and never issues blog article behind the issue blog article, and about 83.7% user creates in 24 hours in number of the account and never issues blog article behind the issue blog article simultaneously.That is to say, just never issue blog article the major part among these users is issued a small amount of blog article when number of the account is created after, 2 (2-hops) relations of jumping of also having no backing are simultaneously sought more valuable good friend, are in " non-fully active " state.Above-mentioned experimental result shows that the application's the rubbish user discover method of paying close attention to relation based on the user is effective.
Above preferred implementation of the present invention is had been described in detail, it should be noted that, use at microblogging in a preferred embodiment and be illustrated, but it will be understood by those skilled in the art that method as herein described can be applied in other network applications outside the microblogging.In addition, the specific algorithm that the application's embodiment is partly mentioned, formula, parameter setting etc. only are used for the example explanation, are not limited to the present invention.Those skilled in the art can carry out suitable distortion and replacement to above-mentioned algorithm, formula, parameter etc. under the situation of knowing design concept of the present invention and connotation, it still belongs to the application's protection domain.

Claims (10)

1. rubbish user discover method based on user's concern relation, it comprises:
Obtain the concern relation between user and the user;
For arbitrary first user, the quantity that concerns to add up described first user's local triangle based on described concern, wherein, in the described local triangle any one is made of described first user and two other user, and wherein, described first user pays close attention to each among described two other user, and also has the relation of concern between described two other user;
Calculate local triangle's ratio of described first user according to the quantity of described first user's local triangle; And
Judge based on local triangle's ratio of described first user whether described first user is the rubbish user at least in part.
2. method according to claim 1, wherein, local triangle's ratio that the quantity of described local triangle according to described first user is calculated described first user comprises:
Calculate local triangle's ratio of described first user according to the maximum quantity of the quantity of described first user's local triangle and described first user's that between described first user and its other users that pay close attention to, can form local triangle; Perhaps
Other number of users of paying close attention to according to quantity and described first user of described first user's local triangle are calculated local triangle's ratio of described first user.
3. method according to claim 1, wherein, describedly judge based on local triangle's ratio of described first user whether described first user is that the rubbish user comprises at least in part:
If local triangle's ratio of described first user is lower than predetermined threshold, judge that then described first user is the rubbish user.
4. method according to claim 1, wherein,
Judge that whether described first user is that the rubbish user is further based on the trust forward-propagating process between the user and/or trust reverse communication process.
5. method according to claim 4, wherein,
Described trust forward-propagating process comprises:
Determine normal users kind child node;
Determine described normal users kind child node all nodes of directly paying close attention to or pay close attention to indirectly, wherein, described normal users kind child node institute directly pay close attention to or indirectly the node of concern to have higher probability be normal users;
The reverse communication process of described trust comprises:
Determine that the rubbish user plants child node;
Determine directly to pay close attention to or to pay close attention to all nodes that described rubbish user plants child node indirectly, wherein, directly paying close attention to or pay close attention to indirectly node that described rubbish user plants child node, to have higher probability be the rubbish user.
6. rubbish user discovering device based on user's concern relation, it comprises:
Be used for obtaining the device of the concern relation between user and the user;
Be used for for arbitrary first user, the device of quantity that concerns to add up described first user's local triangle based on described concern, wherein, in the described local triangle any one is made of described first user and two other user, and wherein, described first user pays close attention to each among described two other user, and also has the relation of concern between described two other user;
Be used for calculating according to the quantity of described first user's local triangle the device of local triangle's ratio of described first user; And
Be used for judging based on local triangle's ratio of described first user whether described first user is rubbish user's device at least in part.
7. equipment according to claim 6 wherein, describedly comprises for the device that calculates local triangle's ratio of described first user according to the quantity of described first user's local triangle:
Be used for calculating according to the maximum quantity of the quantity of described first user's local triangle and described first user's that between described first user and its other users that pay close attention to, can form local triangle the device of local triangle's ratio of described first user; Perhaps
Be used for calculating according to the quantity of described first user's local triangle and other number of users that described first user pays close attention to the device of local triangle's ratio of described first user.
8. equipment according to claim 6, wherein, described for judging that based on local triangle's ratio of described first user whether described first user is that rubbish user's device comprises at least in part:
Be lower than predetermined threshold if be used for local triangle's ratio of described first user, judge that then described first user is rubbish user's device.
9. equipment according to claim 6, wherein,
Judge that whether described first user is that the rubbish user is further based on the trust forward-propagating process between the user and/or trust reverse communication process.
10. equipment according to claim 9, wherein,
Described trust forward-propagating process comprises:
Determine normal users kind child node;
Determine described normal users kind child node all nodes of directly paying close attention to or pay close attention to indirectly, wherein, described normal users kind child node institute directly pay close attention to or indirectly the node of concern to have higher probability be normal users;
The reverse communication process of described trust comprises:
Determine that the rubbish user plants child node;
Determine directly to pay close attention to or to pay close attention to all nodes that described rubbish user plants child node indirectly, wherein, directly paying close attention to or pay close attention to indirectly node that described rubbish user plants child node, to have higher probability be the rubbish user.
CN201310268949.5A 2012-11-02 2013-06-28 The junk user of concern relation based on user finds method Active CN103294833B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310268949.5A CN103294833B (en) 2012-11-02 2013-06-28 The junk user of concern relation based on user finds method

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN2012104334411 2012-11-02
CN201210433441.1 2012-11-02
CN201210433441 2012-11-02
CN201310268949.5A CN103294833B (en) 2012-11-02 2013-06-28 The junk user of concern relation based on user finds method

Publications (2)

Publication Number Publication Date
CN103294833A true CN103294833A (en) 2013-09-11
CN103294833B CN103294833B (en) 2016-12-28

Family

ID=49095695

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310268949.5A Active CN103294833B (en) 2012-11-02 2013-06-28 The junk user of concern relation based on user finds method

Country Status (1)

Country Link
CN (1) CN103294833B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090961A (en) * 2014-07-14 2014-10-08 福州大学 Social network garbage user filtering method based on machine study
CN104199981A (en) * 2014-09-24 2014-12-10 苏州大学 Method and system for classifying persons and mechanisms based on microblog texts
CN105357189A (en) * 2015-10-13 2016-02-24 精硕世纪科技(北京)有限公司 Zombie account detection method and device
CN106557983A (en) * 2016-11-18 2017-04-05 重庆邮电大学 A kind of microblogging junk user detection method based on fuzzy multiclass SVM
CN107229871A (en) * 2017-07-17 2017-10-03 梧州井儿铺贸易有限公司 A kind of safe information acquisition device
CN107315838A (en) * 2017-07-17 2017-11-03 深圳源广安智能科技有限公司 A kind of efficient network hotspot digging system
CN109214944A (en) * 2018-08-28 2019-01-15 北京费马科技有限公司 Junk user recognition methods and application based on social graph
TWI690191B (en) * 2018-03-14 2020-04-01 香港商阿里巴巴集團服務有限公司 Graph structure model training, garbage account identification method, device and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050246420A1 (en) * 2004-04-28 2005-11-03 Microsoft Corporation Social network email filtering
CN102571485A (en) * 2011-12-14 2012-07-11 上海交通大学 Method for identifying robot user on micro-blog platform

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050246420A1 (en) * 2004-04-28 2005-11-03 Microsoft Corporation Social network email filtering
CN102571485A (en) * 2011-12-14 2012-07-11 上海交通大学 Method for identifying robot user on micro-blog platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
许明: ""IMS网络垃圾通信识别过滤系统的研究与实现"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, no. 3, 15 March 2011 (2011-03-15) *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090961A (en) * 2014-07-14 2014-10-08 福州大学 Social network garbage user filtering method based on machine study
CN104090961B (en) * 2014-07-14 2017-07-04 福州大学 A kind of social networks junk user filter method based on machine learning
CN104199981A (en) * 2014-09-24 2014-12-10 苏州大学 Method and system for classifying persons and mechanisms based on microblog texts
CN105357189A (en) * 2015-10-13 2016-02-24 精硕世纪科技(北京)有限公司 Zombie account detection method and device
CN105357189B (en) * 2015-10-13 2018-05-01 精硕科技(北京)股份有限公司 Corpse account detection method and device
CN106557983A (en) * 2016-11-18 2017-04-05 重庆邮电大学 A kind of microblogging junk user detection method based on fuzzy multiclass SVM
CN106557983B (en) * 2016-11-18 2020-11-17 重庆邮电大学 Microblog junk user detection method based on fuzzy multi-class SVM
CN107229871A (en) * 2017-07-17 2017-10-03 梧州井儿铺贸易有限公司 A kind of safe information acquisition device
CN107315838A (en) * 2017-07-17 2017-11-03 深圳源广安智能科技有限公司 A kind of efficient network hotspot digging system
TWI690191B (en) * 2018-03-14 2020-04-01 香港商阿里巴巴集團服務有限公司 Graph structure model training, garbage account identification method, device and equipment
CN109214944A (en) * 2018-08-28 2019-01-15 北京费马科技有限公司 Junk user recognition methods and application based on social graph
CN109214944B (en) * 2018-08-28 2022-03-11 北京蚂蚁云金融信息服务有限公司 Social graph-based junk user identification method and device

Also Published As

Publication number Publication date
CN103294833B (en) 2016-12-28

Similar Documents

Publication Publication Date Title
CN103294833A (en) Junk user discovering method based on user following relationships
Cheng et al. An epidemic model of rumor diffusion in online social networks
CN103150374B (en) Method and system for identifying abnormal microblog users
CN103179198B (en) Based on the topic influence individual method for digging of many relational networks
CN103617279A (en) Method for achieving microblog information spreading influence assessment model on basis of Pagerank method
CN104239385A (en) Method for estimating relationships between topics, and system
Pfeiffer et al. Fast generation of large scale social networks while incorporating transitive closures
CN104537096A (en) Microblog message influence measuring method based on microblog message propagation tree
CN105279187A (en) Edge clustering coefficient-based social network group division method
CN107276793B (en) Node importance measurement method based on probability jump random walk
CN105095419A (en) Method for maximizing influence of information to specific type of weibo users
CN103150678B (en) Method and device for discovering inter-user potential focus relationships on microblogs
CN103136331A (en) Micro blog network opinion leader identification method
CN105550275A (en) Microblog forwarding quantity prediction method
CN105760449A (en) Multi-source heterogeneous data cloud pushing method
Shakkottai et al. Evolution of the internet as-level ecosystem
CN109726319A (en) A kind of user force analysis method based on interactive relation
Sydney et al. Elasticity: topological characterization of robustness in complex networks
Song et al. Forward or ignore: User behavior analysis and prediction on microblogging
Zhao et al. A short-term trend prediction model of topic over Sina Weibo dataset
Xiao et al. A multi-agent simulation approach to rumor spread in virtual commnunity based on social network
Gao et al. Influence maximization based on activity degree in mobile social networks
Ahmad et al. Modeling spread of ideas in online social networks
Ghosh et al. Structure and evolution of online social networks
Shang et al. Limitation of degree information for analyzing the interaction evolution in online social networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant