CN110177094B - User group identification method and device, electronic equipment and storage medium - Google Patents

User group identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110177094B
CN110177094B CN201910431373.7A CN201910431373A CN110177094B CN 110177094 B CN110177094 B CN 110177094B CN 201910431373 A CN201910431373 A CN 201910431373A CN 110177094 B CN110177094 B CN 110177094B
Authority
CN
China
Prior art keywords
user
users
behavior
time period
vertex
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910431373.7A
Other languages
Chinese (zh)
Other versions
CN110177094A (en
Inventor
王璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Douyu Network Technology Co Ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201910431373.7A priority Critical patent/CN110177094B/en
Publication of CN110177094A publication Critical patent/CN110177094A/en
Application granted granted Critical
Publication of CN110177094B publication Critical patent/CN110177094B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention discloses a user group identification method, a user group identification device, electronic equipment and a storage medium, wherein the method comprises the following steps: constructing a user relation graph according to the specific online behaviors of the user; calculating the behavior similarity between every two users in the user relationship graph based on the times of the specific online behaviors of the users in a set time period; cutting the user relation graph according to the behavior similarity so as to delete the association relation between the users with the behavior similarity lower than a set threshold; and identifying a target user group based on the cut user relation graph according to the number of terminal equipment used by the user for performing online behaviors and the number of Internet Protocol (IP) addresses. By adopting the technical scheme, the identification precision of the target user group is improved.

Description

User group identification method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the field of computers, in particular to a user group identification method and device, electronic equipment and a storage medium.
Background
In order to obtain benefits, popular cheating behaviors such as barrage brushing and attention brushing exist on live websites. Cheating behaviors based on a platform (such as a live broadcast website) mostly have a group property, and the cheating behaviors can cause problems of network blockage, overlarge pressure of a live broadcast platform server and the like, so that the live broadcast ecological environment of the platform is greatly influenced. Therefore, in order to reduce the negative influence caused by the cheating behaviors, a reasonable method is adopted to find out a user group with cheating suspicion, and a proper prevention measure is taken for the user group, so that the significance is great.
A cheating group identification method that is commonly used at present is as follows: the method completely depends on the topological relation among users, so that false recognition is easily caused once noise data exists, for example, 10 normal users log in a live broadcast website through the same computer of an internet bar in one day, and the 10 normal users can be recognized as the cheating group through the method. Another cheating group identification method is as follows: the cheating groups are identified based on behavior consistency, and the method cannot exclude the influence of accidental consistency behaviors among normal users, so that the identification accuracy is not high.
Disclosure of Invention
The invention provides a user group identification method, a user group identification device, electronic equipment and a storage medium, which are used for improving the identification accuracy of a target user group.
In a first aspect, an embodiment of the present invention provides a user community identification method, where the method includes:
constructing a user relation graph according to the specific online behaviors of the user;
calculating the behavior similarity between every two users in the user relationship graph based on the times of the specific online behaviors of the users in a set time period;
cutting the user relation graph according to the behavior similarity so as to delete the association relation between the users with the behavior similarity lower than a set threshold;
and identifying a target user group based on the cut user relation graph according to the number of terminal equipment and the number of IP (Internet Protocol) addresses used by the user when the user performs online behaviors.
In a second aspect, an embodiment of the present invention provides an apparatus for identifying a user community, where the apparatus includes:
the building module is used for building a user relation graph according to the specific online behaviors of the user;
the calculation module is used for calculating the behavior similarity between every two users in the user relationship graph based on the times of the specific online behaviors of the users in a set time period;
the cutting module is used for cutting the user relation graph according to the behavior similarity so as to delete the association relation between the users with the behavior similarity lower than a set threshold value;
and the identification module is used for identifying the target user group based on the cut user relation graph according to the number of the terminal devices used by the user for online behavior and the number of the Internet protocol IP addresses.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a first memory, a first processor, and a computer program stored in the memory and executable on the first processor, where the first processor executes the computer program to implement the user community identification method according to the first aspect.
In a fourth aspect, embodiments of the present invention provide a storage medium containing computer-executable instructions which, when executed by a computer processor, implement the user community identification method according to the first aspect described above.
The user group identification method provided by the embodiment of the invention realizes the accurate identification of the target user group with explosive behavior characteristics by constructing the user relationship diagram according to the specific online behaviors of the user, calculating the behavior similarity between every two users in the user relationship diagram based on the times of the specific online behaviors of the user in a set time period, cutting the user relationship diagram according to the behavior similarity, and finally identifying the target user group based on the cut user relationship diagram according to the number of terminal equipment used by the user for online behaviors and the number of Internet protocol IP addresses, and effectively avoids the influence of noise data and the misidentification of the user group with accidental behavior consistency characteristics by integrating the conditions of the terminal equipment used by the user for online behaviors, the conditions of the IP addresses and the compactness of the user group to be identified, the identification precision of the target user group is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the contents of the embodiments of the present invention and the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a user group identification method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a user relationship according to an embodiment of the present invention;
fig. 3 is a schematic flow chart of a user group identification method according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of a user group identification apparatus according to a third embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order to make the technical problems solved, technical solutions adopted and technical effects achieved by the present invention clearer, the technical solutions of the embodiments of the present invention will be described in further detail below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Fig. 1 is a flowchart illustrating a user group identification method according to an embodiment of the present invention. The user group identification method disclosed in this embodiment may be applicable to identifying a user group engaged in online cheating behaviors, for example, identifying a user group engaged in online cheating behaviors such as flick screen brushing, attention brushing, and the like in a live broadcast room, and may be executed by a user group identification device, where the device may be implemented by software and/or hardware and is generally integrated in a terminal, such as a smart phone or a computer. Referring specifically to fig. 1, the method comprises the steps of:
and step 110, constructing a user relationship graph according to the specific online behaviors of the user.
The specific online behavior can be positive advocated behavior, such as online donation, and can also be negative behavior requiring resistance, such as a barrage brushing behavior for the same anchor through a live platform or a focus brushing behavior for the same anchor through the live platform. The negative behavior that needs to be resisted often has some negative effects, for example, the above-mentioned act of swiping a bullet screen through the live platform for the same anchor or swiping attention through the live platform for the same anchor often causes problems of network congestion, overstressing of the live platform server, and the like. Therefore, in order to reduce the negative impact caused by the pop-up screen brushing behavior or the attention brushing behavior or to actively advocate the pursuit of the beneficial behavior, the user group identification method disclosed in this embodiment is used for identifying the cheating group who pursues the pop-up screen brushing behavior or the attention brushing behavior, so as to give an alarm or take other measures to stop the cheating group, or identifying the group who pursues the public welfare behavior such as donation, so as to raise the table, create good social atmosphere, and the like. The present embodiment will be described by taking an example of recognizing a cheating group on the internet, such as a bullet screen swiping behavior or a focus swiping behavior.
The user relationship diagram is a diagram for reflecting the association relationship between users. For example, each user is regarded as an independent vertex, if two users have a friend relationship with each other, the vertices corresponding to the two users are connected by a line, if the number of the lines between the current user and the other users is more, the number of the users having the friend relationship with the current user is more, and the like, and of course, the association relationship between the users may also be established from other dimensions.
Illustratively, the building the user relationship graph according to the specific online behaviors of the user includes:
determining all users performing specific online behaviors in a set time period;
taking each user in all the users as a vertex;
and connecting vertexes corresponding to the users performing the specific online behaviors based on the same terminal equipment and/or the same IP address in a set time period through an edge line to generate a undirected user relation graph.
Wherein, the set time period may be a specific day, a specific week or a specific month. The specific online behavior may be, for example, logging in or registering an account of the live platform, where the accounts of all users that appear in a set time period specifically include an account of logging in the live platform and an account of registering the live platform. Specifically, vertexes corresponding to users who log in or register to the same live broadcast platform based on the same terminal device within a set time period are connected through a sideline to generate a undirected user relationship graph; or connecting vertexes corresponding to users who log in or register the same live broadcast platform based on the same IP address in a set time period through a sideline to generate a undirected user relation graph; or connecting vertexes corresponding to users who log in or register the same live broadcast platform by using the same IP address based on the same terminal equipment in a set time period through a sideline to generate a undirected user relation graph.
Taking an attention brushing behavior based on the same device in a set time period as an example, referring to a user relationship diagram shown in fig. 2, it is assumed that a user 1 and a user 2 perform attention brushing behavior based on a terminal device a in the set time period, and the user 1 and a user 8 perform attention brushing behavior based on a terminal device B in the set time period, so that a vertex 1 corresponding to the user 1 is connected with a vertex 2 corresponding to the user 2 and a vertex 8 corresponding to the user 8 through edge lines, respectively; assuming that the user 2 and the user 3 perform the attention brushing behavior based on the terminal device C within the set time period, the vertex 2 corresponding to the user 2 and the vertex 3 corresponding to the user 3 are connected by a side line, and the user 2 and the user 5 perform the attention brushing behavior based on the terminal device D within the set time period, so the vertex 2 corresponding to the user 2 and the vertex 5 corresponding to the user 5 are connected by a side line; by analogy, an undirected graph shown in fig. 2 is obtained. In the undirected graph shown in fig. 2, since the user 1 and the user 2 use the same terminal device a for the same online behavior (attention brushing) within a set period of time, the users having the above-described relationship are referred to as neighbor users.
For example, the determining all users performing a specific online behavior within a set period of time includes:
collecting a user behavior log based on behavior dotting to determine a user performing a specific online behavior within a set time period;
acquiring network environment information used by a user aiming at the user performing the specific online behavior so as to determine the IP address of the user; and/or
And acquiring the terminal equipment information used by the user aiming at the user performing the specific online behavior so as to determine the equipment identification number used by the user.
The behavior dotting is to count that a user behavior inserts a dot-burying code at a place (such as a click event and page jump) where a dot needs to be buried in a project, then the online behavior of the user is recorded in a user behavior log, and the user who performs a specific online behavior can be determined by collecting the user behavior log and inquiring the user behavior, wherein the specific online behavior is, for example, specific user who sends bullet screen information for the anchor A. And simultaneously, the network environment information used by the user for performing online behaviors and the terminal equipment information used by the user are recorded in the user behavior log. The user behavior log can be obtained at a mobile terminal (such as a smart phone) directly through a data acquisition interface.
The preliminary mining of the user community is realized by combining the equipment condition and the IP address condition used by the user for carrying out the specific online behavior. Users who belong to the same community are generally grouped together to perform a specific online behavior, so users who perform the specific online behavior through the same terminal device and/or the same IP address may belong to the same community.
And step 120, calculating the behavior similarity between every two users in the user relationship graph based on the times of the specific online behaviors of the users in a set time period.
Specifically, if the times of the two users performing the specific online behaviors in the set time period are both high and relatively close, which indicates that the behaviors of the two users have high consistency, the possibility of group cheating of the two users is higher. For example, if a user a sends 100 bullet screens between 2 o 'clock yesterday early morning and 3 o' clock yesterday early morning, and a user b sends 110 bullet screens between 2 o 'clock yesterday early morning and 3 o' clock yesterday early morning, the probability that the user a and the user b belong to a group to brush the bullet screens is high.
Further, in order to fully mine the synchronicity behavior among the users, the set time period can be further divided into a plurality of smaller time periods which are equidistant, so as to fully embody the explosive synchronicity behavior of the users in a certain time period.
And step 130, cutting the user relation graph according to the behavior similarity so as to delete the association relation between the users with the behavior similarity lower than a set threshold value.
By cutting the user relationship graph according to the behavior similarity, the influence of noise data on the identification of the cheating group can be effectively avoided, and the mistaken identification of the cheating group caused by the accidental consistency behavior among the users can be avoided. For example, the similarity between users who accidentally have consistent behaviors can be made low by setting a specific similarity algorithm, so that the association relationship between users with low behavior similarity can be deleted from the user relationship graph.
Illustratively, the cutting the user relationship graph according to the behavior similarity to delete the association relationship between users whose behavior similarity is lower than a set threshold includes:
deleting the edge line between the vertexes corresponding to the two users with the behavior similarity lower than the set threshold; the set threshold value can be obtained by carrying out reverse deduction based on the identification method provided by the embodiment according to the online behavior of the known cheating group.
And step 140, identifying a target user group based on the cut user relation graph according to the number of the terminal devices used by the user for online behavior and the number of the internet protocol IP addresses.
Specifically, all connected subgraphs in the graph are found out based on the cut user relationship graph, a user corresponding to a vertex included in each connected subgraph is determined as a user group to be identified, the confidence coefficient that each user group to be identified is a cheating user group is calculated by combining the number of terminal devices used when all users in each user group to be identified conduct online behaviors and the number of IP addresses, and whether the user group to be identified is the cheating user group is further determined according to the confidence coefficient.
According to the user group identification method provided by the embodiment, the behavior similarity between every two users in the user relationship graph is calculated based on the times of the specific online behaviors of the users in the set time period, so that the embodiment of the users with explosive consistent behaviors is realized, the user relationship graph is cut according to the behavior similarity, the influence of the accidentally existing consistent behaviors among the users on cheating group identification is avoided, the use condition of terminal equipment, the use condition of an IP address and the group scale of the user group to be identified are comprehensively calculated, the identification angle is relatively comprehensive, and the identification precision of the cheating group is improved; the stopping measures with different degrees can be adopted according to different confidence degrees, so that the group with higher cheating risk can not be caused to adopt lighter stopping measures to result in 'killing missing'; and the user group with lower cheating risk adopts heavier deterrent measures, thereby causing the result of 'false killing'.
Example two
Fig. 3 is a schematic flow chart of a user group identification method according to a second embodiment of the present invention, where on the basis of the above-mentioned embodiment, this embodiment provides a specific implementation manner for "calculating behavior similarity between every two users in the user relationship diagram based on the number of times that a user has a specific online behavior within a set time period" in step 120, and "calculating confidence degrees that each user group to be identified is a target user group according to the number of terminal devices used by the user for performing online behaviors and the number of internet protocol IP addresses" in step "as shown in fig. 3, where the method includes:
and 310, constructing a user relationship graph according to the specific online behaviors of the user.
And step 320, calculating the behavior similarity between every two users in the user relationship graph based on the times of the specific online behaviors of the users in a set time period.
Specifically, the behavior similarity between each two users is calculated according to the following formula:
Figure BDA0002069104110000091
wherein sim (u, v)Representing the behavioral similarity between user u and user v, uiIndicating that user u is in time period TiNumber of times a particular online activity occurs in, viIndicating that user i is in time period TiThe number of times of specific online behaviors occurring in the network, n represents a time period T included in the set time period TiFor example, the time period T is set to one day, i.e., 24 hours, the time period TiThe time of day is divided into 0 point-2 point, 2 point-4 point, 4 point-6 point, 6 point-8 point, 8 point-10 point, 10 point-12 point, 12 point-14 point, 14 point-16 point, 16 point-18 point, 18 point-20 point, 20 point-22 point and 22 point-24 point for 12 time periods Ti
And step 330, cutting the user relation graph according to the behavior similarity so as to delete the association relation between the users with the behavior similarity lower than a set threshold value.
And 340, obtaining each user group to be identified based on the cut user relationship graph in a connected graph clustering mode.
Specifically, assuming that the user relationship diagram shown in fig. 2 is a clipped user relationship diagram, taking fig. 2 as an example, the process of "obtaining each user group to be identified based on the clipped user relationship diagram in a connected graph clustering manner" is described: in order to mine the association relationship between non-neighbor users based on the relationship between neighbor users, the embodiment mines by using a depth-first search DFS algorithm to obtain a plurality of connected subgraphs, where a user corresponding to all vertices in each connected subgraph is a user group to be identified, and whether each user group to be identified is a target user group or not, and it is also necessary to perform confidence calculation on each user group to be identified.
Specifically, the working principle of the DFS algorithm is as follows: starting from a vertex of the undirected graph, accessing an adjacency point adjacent to the vertex, then accessing an adjacency point … … of the adjacency point, and going so far until reaching a vertex, finding that the adjacency points around the vertex have been visited, going back to the previous vertex, accessing an unvisited adjacency point … … of the vertex until all vertices have been visited, wherein all nodes visited in the search constitute a connected subgraph. The search process of the DFS algorithm is described with reference to fig. 2:
(1) putting the vertex 1 into a stack, and marking the vertex 1 as traversed;
(2) accessing vertex 1 of the stack top from the stack, finding out a vertex adjacent to the vertex 1, wherein the vertex comprises a vertex 2 and a vertex 8, and selecting any one of the two vertexes, wherein a selection rule can be set, and if the selection is performed according to the sequence of numbers corresponding to the vertexes from small to large, the vertex 2 is selected at this time, the vertex 2 is marked as traversed, and the vertex 2 is placed in the stack;
(3) accessing vertex 2 of the stack top from the stack, finding out a vertex adjacent to vertex 2, wherein the vertex 1, the vertex 3 and the vertex 5 are three vertexes, and the vertex 1 is traversed, so that the vertex 1 is excluded, selecting the vertex 3 according to the selection rule, marking the vertex 3 as traversed, and putting the vertex 3 into the stack;
(4) accessing a vertex 3 of the stack top from the stack, finding out a vertex adjacent to the vertex 3, wherein the vertex 2 has traversed and the vertex 4 has traversed, so that the vertex 2 is excluded, the vertex 4 is selected, the vertex 4 is marked as traversed, and the vertex 4 is put into the stack;
(5) accessing a vertex 4 of the stack top from the stack, finding out a vertex adjacent to the vertex 4, wherein the vertex 3, the vertex 5 and the vertex 6 are provided, and the vertex 3 is traversed, so that the vertex 3 is excluded, selecting the vertex 5 according to the selection rule, marking the vertex 5 as traversed, and putting the vertex 5 into the stack;
(6) accessing vertex 5 at the stack top from the stack, finding a vertex adjacent to vertex 5, wherein the vertex 2 and the vertex 4 have both traversed, so that vertex 5 has no adjacent point which has not been traversed, giving up the access to vertex 5, and removing vertex 5 from the stack, which means that vertex 5 has not been accessed;
(7) accessing a vertex 4 of the stack top from the stack, finding out a vertex adjacent to the vertex 4, wherein the vertex 3, the vertex 5 and the vertex 6 are provided, and the vertex 3 and the vertex 5 are traversed, so that the vertex 6 is selected, the vertex 6 is marked as traversed, and the vertex 6 is placed in the stack;
(8) accessing a vertex 6 of the stack top from the stack, finding out a vertex adjacent to the vertex 6, wherein the vertex 4, the vertex 7 and the vertex 8 are provided, the vertex 4 is traversed, selecting the vertex 7 according to the selection rule, marking the vertex 7 as traversed, and placing the vertex 7 in the stack;
(9) accessing vertex 7 at the top of the stack from the stack, finding a vertex adjacent to vertex 7, wherein only vertex 6 is traversed, and therefore vertex 7 has no adjacent point which is not traversed, giving up the access to vertex 7, and removing vertex 7 from the stack, which means that vertex 7 is not visited;
(10) accessing a vertex 6 of the stack top from the stack, finding out a vertex adjacent to the vertex 6, wherein the vertex 4, the vertex 7 and the vertex 8 are included, the vertex 4 and the vertex 7 are traversed, selecting the vertex 8, marking the vertex 8 as traversed, and placing the vertex 8 in the stack;
(11) accessing a vertex 8 of the stack top from the stack, finding out a vertex adjacent to the vertex 8, wherein the vertex 1, the vertex 6 and the vertex 9 are provided, the vertex 1 and the vertex 6 are traversed, selecting the vertex 9, marking the vertex 9 as traversed, and placing the vertex 9 into the stack;
(12) accessing vertex 9 at the top of the stack from the stack, finding out a vertex adjacent to vertex 9, wherein only vertex 8 is traversed and vertex 8 is traversed, so that vertex 9 has no adjacent point which is not traversed, giving up access to vertex 9, removing vertex 9 from the stack, and at this time, indicating that vertex 9 is not visited, so far, all vertices are traversed, vertices in the stack have no adjacent point which is not traversed, and all vertices are removed from the stack, and finally the stack is empty, indicating that all vertices are traversed, and vertex 1, vertex 2, vertex 3, vertex 4, vertex 6 and vertex 8 visited in the traversal search process form a connected subgraph; and forming a user group to be identified by the user 1, the user 2, the user 3, the user 4, the user 6 and the user 8 corresponding to all the vertexes in the connected subgraph.
And then randomly selecting one vertex from the vertices which are not searched to become the known connected subgraph in the undirected graph as a starting point of the next search, and performing traversal search for multiple times (each traversal search can obtain one connected subgraph) until all the vertices which are not searched to become the known connected subgraph in the undirected graph are isolated vertices which are not connected with other vertices.
And 350, respectively calculating the confidence coefficient of each user group to be identified as the target user group according to the number of the terminal devices used by the user for online behavior and the number of the Internet protocol IP addresses.
Specifically, the confidence coefficient that the user group to be identified is the target user group is calculated according to the following formula:
Figure BDA0002069104110000121
wherein, f (G) represents the confidence that the user group G to be identified is the target user group, | G | represents the number of user members included in the user group G to be identified, IP (G) represents the total number of IP addresses used when all user members in the user group G to be identified have a specific online behavior within a set time period, device (G) represents the total number of terminal devices used when all user members in the user group G to be identified have a specific online behavior within a set time period, edge (G) represents the number of edges formed in the clipped user relationship graph by the user group G to be identified, w1、w2、w3Is a weight coefficient, and w1+w2+w 31, according to business experience, it is considered that the situation of the terminal device used by the user is more indicative of whether the current user group is a cheating group than the situation of the IP address used by the user and the group size, and therefore, generally, w2>w3>w1(ii) a The specific online behavior comprises login, check-in, bullet screen or attention.
And step 360, determining the user group to be identified with the confidence coefficient reaching the threshold value as a target user group.
The threshold value can be obtained by performing a reverse deduction based on the identification method provided by the embodiment according to the online behavior of the known cheating group.
For example, suppose that the current user group to be identified includes 10 user members, 20 edges are formed in the user relationship graph among the 10 user members, 5 different IP addresses and 2 different terminal devices are used together when a specific online behavior occurs within a set time period, and the weight coefficient w is1、w2、w3Respectively 0.2, 0.5, 0.3, and the threshold is 0.5, then the confidence that the current user group to be identified is a cheating group is:
Figure BDA0002069104110000131
since 0.51 is larger than 0.5, the user community to be currently identified is a cheating community.
After the confidence coefficient that the user group to be identified is the cheating user group is obtained, different deterrent measures can be taken for the corresponding user group according to different confidence coefficients, for example, the user group with higher confidence coefficient is added into a blacklist to limit subsequent cheating behaviors; and taking the user group with lower confidence as a suspected cheating group, continuously paying attention to the suspected cheating group, and calculating the confidence by using the identification method provided by the embodiment of the invention again based on the subsequent online behaviors so as to finally determine whether the suspected cheating group is a true cheating group.
According to the user group identification method provided by the embodiment, the confidence coefficient that the user group to be identified is the target user group is calculated by integrating the use condition of the terminal equipment of the user group to be identified, the use condition of the IP address and the group size, so that the identification angle is relatively comprehensive, and the identification precision of the cheating group is improved; the stopping measures with different degrees can be adopted according to different confidence degrees, so that the group with higher cheating risk can not be caused to adopt lighter stopping measures to result in 'killing missing'; and the user group with lower cheating risk adopts heavier deterrent measures, thereby causing the result of 'false killing'.
EXAMPLE III
Fig. 4 is a schematic structural diagram of a user group identification apparatus according to a third embodiment of the present invention. Referring to fig. 4, the apparatus comprises: a construction module 410, a calculation module 420, a cutting module 430 and a recognition module 440;
the construction module 410 is configured to construct a user relationship graph according to a specific online behavior of a user; a calculating module 420, configured to calculate a behavior similarity between every two users in the user relationship graph based on the number of times that a user performs a specific online behavior within a set time period; the cutting module 430 is configured to cut the user relationship graph according to the behavior similarity so as to delete an association relationship between users whose behavior similarity is lower than a set threshold; and the identifying module 440 is configured to identify the target user group based on the cut user relationship graph according to the number of the terminal devices and the number of the internet protocol IP addresses used by the user when performing the online behavior.
Further, the building module 410 includes:
the system comprises a determining unit, a judging unit and a judging unit, wherein the determining unit is used for determining all users performing specific online behaviors in a set time period and taking each user in all the users as a vertex;
and the connection unit is used for connecting vertexes corresponding to the users performing the specific online behaviors based on the same terminal equipment and/or the same IP address in a set time period through an edge line to generate a undirected user relationship diagram.
Further, the clipping module 430 is specifically configured to:
and deleting the edge line between the vertexes corresponding to the two users with the behavior similarity lower than the set threshold.
Further, the calculation module 420 is specifically configured to:
and calculating the behavior similarity between every two users according to the following formula:
Figure BDA0002069104110000151
where sim (u, v) represents the behavior similarity between user u and user v, uiIndicating that user u is in time period TiNumber of times a particular online activity occurs in, viIndicating that user i is in time period TiThe number of times of specific online behaviors occurring in the network, n represents a time period T included in the set time period TiThe number of (2).
Further, the identification module 440 includes:
the searching unit is used for obtaining each user group to be identified based on the cut user relation graph in a connected graph clustering mode;
the computing unit is used for respectively computing the confidence coefficient of each user group to be identified as a target user group according to the number of terminal equipment used by the user for online behavior and the number of Internet Protocol (IP) addresses;
and the determining unit is used for determining the user community to be identified with the confidence coefficient reaching the threshold value as a target user community.
Further, the computing unit is specifically configured to:
calculating the confidence degree that the user group to be identified is the target user group according to the following formula:
Figure BDA0002069104110000152
wherein, f (G) represents the confidence that the user group G to be identified is the target user group, | G | represents the number of user members included in the user group G to be identified, IP (G) represents the total number of IP addresses used when all user members in the user group G to be identified have a specific online behavior within a set time period, device (G) represents the total number of terminal devices used when all user members in the user group G to be identified have a specific online behavior within a set time period, edge (G) represents the number of edges formed in the clipped user relationship graph by the user group G to be identified, w1、w2、w3Is a weight coefficient, and w1+w2+w3=1。
Further, the specific online behavior comprises login, check-in, bullet screen or attention.
The user group identification device provided by this embodiment calculates the behavior similarity between every two users in the user relationship graph based on the number of times that the user has a specific online behavior within a set time period, so as to embody the users with explosive consistency behavior, cuts the user relationship graph according to the behavior similarity, avoids the influence of the consistency behavior accidentally existing between the users on the identification of a cheating group, and calculates the confidence that the user group to be identified is a target user group by integrating the use condition of the terminal device, the use condition of the IP address and the group size of the user group to be identified, so that the identification angle is relatively comprehensive, and the identification precision of the cheating group is improved; the stopping measures with different degrees can be adopted according to different confidence degrees, so that the group with higher cheating risk can not be caused to adopt lighter stopping measures to result in 'killing missing'; and the user group with lower cheating risk adopts heavier deterrent measures, thereby causing the result of 'false killing'.
Example four
Fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention. Fig. 5 illustrates a block diagram of an exemplary device 12 suitable for use in implementing embodiments of the present invention. The device 12 shown in fig. 5 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present invention.
As shown in FIG. 5, device 12 is in the form of a general purpose computing device. The components of device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set of program modules (e.g., build module 410, compute module 420, crop module 430, and identify module 440 in a user community identification apparatus) configured to perform the functions of embodiments of the present invention.
A program/utility 40 having a set of program modules 42 (e.g., a build module 410, a compute module 420, a crop module 430, and an identify module 440 in a user community identifying apparatus) may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may include an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with device 12, and/or with any devices (e.g., network card, modem, etc.) that enable device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with the other modules of the device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing by executing programs stored in the system memory 28, for example, implementing the user community identification method provided by the embodiment of the present invention.
EXAMPLE five
An embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a user community identification method, the method including:
constructing a user relation graph according to the specific online behaviors of the user;
calculating the behavior similarity between every two users in the user relationship graph based on the times of the specific online behaviors of the users in a set time period;
cutting the user relation graph according to the behavior similarity so as to delete the association relation between the users with the behavior similarity lower than a set threshold;
and identifying a target user group based on the cut user relation graph according to the number of terminal equipment used by the user for performing online behaviors and the number of Internet Protocol (IP) addresses.
Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform the user community identification related operations provided by any embodiments of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a storage medium, or a network device) to execute the embodiments of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (9)

1. A user community identification method, comprising:
constructing a user relation graph according to the specific online behaviors of the user;
calculating the behavior similarity between every two users in the user relationship graph based on the times of the specific online behaviors of the users in a set time period;
cutting the user relation graph according to the behavior similarity so as to delete the association relation between the users with the behavior similarity lower than a set threshold;
identifying a target user group based on the cut user relation graph according to the number of terminal equipment used by the user for performing online behaviors and the number of Internet Protocol (IP) addresses;
the calculating the behavior similarity between every two users in the user relationship graph based on the times of the specific online behaviors of the users in a set time period comprises the following steps:
and calculating the behavior similarity between every two users according to the following formula:
Figure FDA0003231242600000011
where sim (u, v) represents the behavior similarity between user u and user v, uiIndicating that user u is in time period TiNumber of times a particular online activity occurs in, viIndicating that user i is in time period TiThe number of times of specific online behaviors occurring in the network, n represents a time period T included in the set time period TiThe number of (2).
2. The method of claim 1, wherein constructing the user relationship graph according to the user's specific online behavior comprises:
determining all users performing specific online behaviors in a set time period;
taking each user in all the users as a vertex;
and connecting vertexes corresponding to the users performing the specific online behaviors based on the same terminal equipment and/or the same IP address in a set time period through an edge line to generate a undirected user relation graph.
3. The method according to claim 2, wherein the step of clipping the user relationship graph according to the behavior similarity to delete the association relationship between users whose behavior similarity is lower than a set threshold comprises:
and deleting the edge line between the vertexes corresponding to the two users with the behavior similarity lower than the set threshold.
4. The method according to any one of claims 1 to 3, wherein identifying the target user group based on the cut user relationship graph according to the number of terminal devices and the number of internet protocol IP addresses used by the user for performing online behavior comprises:
obtaining each user group to be identified based on the cut user relationship graph in a connected graph clustering mode;
respectively calculating the confidence coefficient of each user group to be identified as a target user group according to the number of terminal equipment used by the user for online behavior and the number of Internet Protocol (IP) addresses;
and determining the user community to be identified with the confidence coefficient reaching the threshold value as a target user community.
5. The method of claim 4, wherein the step of calculating the confidence level of each user group to be identified as the target user group according to the number of terminal devices used by the user for performing online activities and the number of internet protocol IP addresses comprises:
calculating the confidence degree that the user group to be identified is the target user group according to the following formula:
Figure FDA0003231242600000021
wherein, f (G) represents the confidence that the user group G to be identified is the target user group, | G | represents the number of user members included in the user group G to be identified, IP (G) represents the total number of IP addresses used when all user members in the user group G to be identified have a specific online behavior within a set time period, device (G) represents the total number of terminal devices used when all user members in the user group G to be identified have a specific online behavior within a set time period, edge (G) represents the number of edges formed in the clipped user relationship graph by the user group G to be identified, w1、w2、w3Is a weight coefficient, and w1+w2+w3=1。
6. The method of any of claims 1-3, wherein the specific online behavior comprises a login, a check-in, a pop-up, or an attention.
7. An apparatus for identifying a user community, the apparatus comprising:
the building module is used for building a user relation graph according to the specific online behaviors of the user;
the calculation module is used for calculating the behavior similarity between every two users in the user relationship graph based on the times of the specific online behaviors of the users in a set time period;
the cutting module is used for cutting the user relation graph according to the behavior similarity so as to delete the association relation between the users with the behavior similarity lower than a set threshold value;
the identification module is used for identifying a target user group based on the cut user relation graph according to the number of terminal equipment used by the user for online behavior and the number of Internet Protocol (IP) addresses;
a computing module specifically configured to:
and calculating the behavior similarity between every two users according to the following formula:
Figure FDA0003231242600000031
where sim (u, v) represents the behavior similarity between user u and user v, uiIndicating that user u is in time period TiNumber of times a particular online activity occurs in, viIndicating that user i is in time period TiThe number of times of specific online behaviors occurring in the network, n represents a time period T included in the set time period TiThe number of (2).
8. An electronic device comprising a first memory, a first processor and a computer program stored on the memory and executable on the first processor, characterized in that the first processor implements the user community identification method according to any of claims 1-6 when executing the computer program.
9. A storage medium containing computer executable instructions which, when executed by a computer processor, implement the user community identification method of any one of claims 1-6.
CN201910431373.7A 2019-05-22 2019-05-22 User group identification method and device, electronic equipment and storage medium Active CN110177094B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910431373.7A CN110177094B (en) 2019-05-22 2019-05-22 User group identification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910431373.7A CN110177094B (en) 2019-05-22 2019-05-22 User group identification method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110177094A CN110177094A (en) 2019-08-27
CN110177094B true CN110177094B (en) 2021-11-09

Family

ID=67691871

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910431373.7A Active CN110177094B (en) 2019-05-22 2019-05-22 User group identification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110177094B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852761B (en) * 2019-10-11 2023-07-04 支付宝(杭州)信息技术有限公司 Method and device for formulating anti-cheating strategy and electronic equipment
CN112651764B (en) * 2019-10-12 2023-03-31 武汉斗鱼网络科技有限公司 Target user identification method, device, equipment and storage medium
CN112788351B (en) * 2019-11-01 2022-08-05 武汉斗鱼鱼乐网络科技有限公司 Target live broadcast room identification method, device, equipment and storage medium
CN110929105B (en) * 2019-11-28 2022-11-29 广东云徙智能科技有限公司 User ID (identity) association method based on big data technology
CN111489190A (en) * 2020-03-16 2020-08-04 上海趣蕴网络科技有限公司 Anti-cheating method and system based on user relationship
CN111401959B (en) * 2020-03-18 2023-09-29 多点(深圳)数字科技有限公司 Risk group prediction method, apparatus, computer device and storage medium
CN113810341B (en) * 2020-06-12 2023-08-22 武汉斗鱼鱼乐网络科技有限公司 Method and system for identifying target network group, storage medium and equipment
CN114071196A (en) * 2020-08-03 2022-02-18 武汉斗鱼鱼乐网络科技有限公司 Method, system, medium and equipment for identifying target live broadcast room
CN112839027B (en) * 2020-12-16 2023-08-01 贝壳技术有限公司 User group identification method, device, electronic equipment and storage medium
CN113807862A (en) * 2021-01-29 2021-12-17 北京沃东天骏信息技术有限公司 Access security control method, device, equipment and storage medium
CN112995283B (en) * 2021-02-03 2023-03-14 杭州海康威视系统技术有限公司 Object association method and device and electronic equipment
CN112616074B (en) * 2021-03-08 2021-05-28 武汉斗鱼鱼乐网络科技有限公司 Target group identification method and electronic equipment
CN112800286B (en) * 2021-04-08 2021-07-23 北京轻松筹信息技术有限公司 User relationship chain construction method and device and electronic equipment
CN113205129B (en) * 2021-04-28 2023-04-07 五八有限公司 Cheating group identification method and device, electronic equipment and storage medium
CN113117338B (en) * 2021-05-08 2022-11-15 上海益世界信息技术集团有限公司广州分公司 Game cheating user identification method and device
CN113365113B (en) * 2021-05-31 2022-09-09 武汉斗鱼鱼乐网络科技有限公司 Target node identification method and device
CN113283908B (en) * 2021-06-09 2023-07-18 武汉斗鱼鱼乐网络科技有限公司 Target group identification method and device
CN113553498A (en) * 2021-06-18 2021-10-26 北京旷视科技有限公司 Community updating method, apparatus, electronic device and computer readable medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571485A (en) * 2011-12-14 2012-07-11 上海交通大学 Method for identifying robot user on micro-blog platform
CN108898505A (en) * 2018-05-28 2018-11-27 武汉斗鱼网络科技有限公司 Recognition methods, corresponding medium and the electronic equipment of cheating clique
CN109255632A (en) * 2018-09-03 2019-01-22 武汉斗鱼网络科技有限公司 A kind of user community recognition methods, device, equipment and medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571485A (en) * 2011-12-14 2012-07-11 上海交通大学 Method for identifying robot user on micro-blog platform
CN108898505A (en) * 2018-05-28 2018-11-27 武汉斗鱼网络科技有限公司 Recognition methods, corresponding medium and the electronic equipment of cheating clique
CN109255632A (en) * 2018-09-03 2019-01-22 武汉斗鱼网络科技有限公司 A kind of user community recognition methods, device, equipment and medium

Also Published As

Publication number Publication date
CN110177094A (en) 2019-08-27

Similar Documents

Publication Publication Date Title
CN110177094B (en) User group identification method and device, electronic equipment and storage medium
CN110213164B (en) Method and device for identifying network key propagator based on topology information fusion
US20140121559A1 (en) Detecting cognitive impairment indicators
CN104915351A (en) Picture sorting method and terminal
CN110060087B (en) Abnormal data detection method, device and server
WO2016145993A1 (en) Method and system for user device identification
US10664481B2 (en) Computer system programmed to identify common subsequences in logs
CN104679769A (en) Method and device for classifying usage scenario of product
CN110502697B (en) Target user identification method and device and electronic equipment
CN110727740B (en) Correlation analysis method and device, computer equipment and readable medium
CN106301979B (en) Method and system for detecting abnormal channel
CN112819611A (en) Fraud identification method, device, electronic equipment and computer-readable storage medium
CN106358220B (en) The detection method of abnormal contact information, apparatus and system
RU2612608C2 (en) Social circle formation system and method and computer data carrier
CN114143035A (en) Attack resisting method, system, equipment and medium for knowledge graph recommendation system
CN108076032B (en) Abnormal behavior user identification method and device
CN113360895A (en) Station group detection method and device and electronic equipment
CN110009056B (en) Method and device for classifying social account numbers
CN110688995A (en) Map query processing method, computer-readable storage medium and mobile terminal
CN110647595A (en) Method, device, equipment and medium for determining newly-added interest points
US20170060998A1 (en) Method and apparatus for mining maximal repeated sequence
CN113537806A (en) Abnormal user identification method and device, electronic equipment and readable storage medium
CN111178531B (en) Method, device and storage medium for acquiring relationship reasoning and relationship reasoning model
CN112261484B (en) Target user identification method and device, electronic equipment and storage medium
CN112003819A (en) Method, device, equipment and computer storage medium for identifying crawler

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant