CN106294508B - Brushing amount tool detection method and device - Google Patents

Brushing amount tool detection method and device Download PDF

Info

Publication number
CN106294508B
CN106294508B CN201510317134.0A CN201510317134A CN106294508B CN 106294508 B CN106294508 B CN 106294508B CN 201510317134 A CN201510317134 A CN 201510317134A CN 106294508 B CN106294508 B CN 106294508B
Authority
CN
China
Prior art keywords
user
users
behavior
classification
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510317134.0A
Other languages
Chinese (zh)
Other versions
CN106294508A (en
Inventor
贺海军
孔蓓蓓
熊健
熊焰
杨剑鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tencent Computer Systems Co Ltd
Original Assignee
Shenzhen Tencent Computer Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tencent Computer Systems Co Ltd filed Critical Shenzhen Tencent Computer Systems Co Ltd
Priority to CN201510317134.0A priority Critical patent/CN106294508B/en
Publication of CN106294508A publication Critical patent/CN106294508A/en
Application granted granted Critical
Publication of CN106294508B publication Critical patent/CN106294508B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Abstract

The application discloses a brushing amount tool detection method and device, and the method comprises the following steps: the method comprises the steps of obtaining behavior data of at least one user corresponding to a channel to be tested, generating behavior characteristics of each user according to the behavior data of each user, calculating behavior characteristic values of each user by utilizing the behavior characteristics of each user, classifying the users by taking the behavior characteristic values as classification conditions to obtain a plurality of classification groups, and determining that the channel to be tested uses a brushing amount tool when the number of the users in the classification groups meets a preset judgment strategy. Because the behavior characteristic values of the users in each classification group are similar or identical, if the number of the users in one or more classification groups is very large, most of the users in the classification group are false users, that is, it can be determined that the channel to be tested uses the brushing amount tool. The method solves the problem of limitation existing in the existing method for detecting the brushing amount tool based on the distribution of the attribute parameters of the hardware access environment.

Description

Brushing amount tool detection method and device
Technical Field
The present application relates to the field of network technologies, and in particular, to a method and an apparatus for detecting a brushing amount tool.
Background
The popularization of smart terminals such as smart phones provides carriers for various types of Applications (APPs). In order to promote the application of the application provider, the application provider shows the application to the user through various platforms and provides a downloading function, and each platform can be regarded as a channel for application promotion. As a feedback, the application provider needs to pay a certain promotion fee to a certain channel every time a new user registers an application account through the channel.
In the real-world situation, some channels (platforms) generate false new users by using a brushing tool in order to cheat the promotion fee of an application provider, and the process is roughly as follows: the channel generates various parameters such as a terminal Equipment Number, i.e., an International Mobile Equipment Identity (IMEI), an International Mobile Subscriber Identity (IMSI), a Media Access Control (MAC) address, a screen resolution, a model, a Subscriber Identity Module (SIM) Number, a Mobile phone Number, an operator Number or name, and a Mobile phone Operating System (OS) version by using a traffic tool randomly or based on an existing user data file, and then registers the generated parameters as new user parameters. For the application provider, it is generally determined whether the user is a new user only by identifying the IMEI, that is, as long as the IMEI of the currently registered user is not registered before, the currently registered user is considered to be a new user, and the popularization fee is paid to the corresponding channel.
In order to avoid the above problems, in the prior art, it is generally determined whether a brushing amount tool is used in a current channel by detecting whether distribution of hardware access environment attribute parameters, such as IMSI, MAC address, screen resolution, model, and the like, of a newly registered user in the current channel is normal. Taking the screen resolution as an example, if the screen resolution distribution of the user under the current channel is very different from the screen resolution distribution under all channels, it means that the current channel may use the brushing amount tool, for example, the screen resolution of the newly registered user in the next day of channel 1 is 80% of screen resolution 1, the remaining 20% is screen resolution 2 and screen resolution 3, and the overall screen resolution distribution of the registered users under all channels is 50% of screen resolution 2, 30% of screen resolution 3 and 20% of screen resolution 1. Obviously, the screen resolution distribution of channel 1 is very different from the overall screen resolution distribution, representing that channel 1 may use the brushing volume tool.
However, the existing method for detecting the brushing amount tool based on the distribution of the hardware access environment attribute parameters has certain limitations, and some high-quality brushing amount tools are generated based on the distribution of each attribute under the real condition when generating the hardware access environment attribute parameters, that is, the distribution condition of each parameter is consistent with the distribution of each parameter under a normal channel. For channels using such a brushware, the prior art would not be able to detect.
Disclosure of Invention
In view of this, the present application provides a method and an apparatus for detecting a brushing amount tool, which are used to solve the problem that the existing method for detecting the brushing amount tool based on the distribution of the hardware access environment attribute parameters has limitations.
In order to achieve the above object, the following solutions are proposed:
a method of brushware detection, comprising:
acquiring behavior data of at least one user corresponding to a channel to be detected;
generating behavior characteristics of each user according to the behavior data of each user;
calculating the behavior characteristic value of each user by using the behavior characteristic of each user;
classifying each user by taking the behavior characteristic value as a classification condition to obtain a plurality of classification groups;
and when the number of the users in the classification group is determined to meet a preset judgment strategy, determining that the channel to be detected uses a brushing amount tool.
A brushware detecting device comprising:
the behavior data acquisition unit is used for acquiring behavior data of at least one user corresponding to the channel to be detected;
the behavior feature generation unit is used for generating behavior features of the users according to the behavior data of the users;
the behavior characteristic value calculation unit is used for calculating the behavior characteristic value of each user by using the behavior characteristic of each user;
the classification processing unit is used for classifying each user by taking the behavior characteristic value as a classification condition to obtain a plurality of classification groups;
and the brushing amount tool determining unit is used for determining that the channel to be tested uses a brushing amount tool when the number of the users in the classification group is determined to meet a preset judgment strategy.
It can be seen from the foregoing technical solutions that, in the detection method for a brushing amount tool provided in the embodiments of the present application, behavior data of at least one user corresponding to a channel to be tested is obtained, where the behavior data represents various operations performed after a user logs in an application, and further behavior characteristics of each user can be generated according to the behavior data of each user, then behavior characteristic values of each user are calculated by using the behavior characteristics of each user, and the users are classified by using the behavior characteristic values as classification conditions to obtain a plurality of classification groups. According to the method and the device, the principle that a plurality of false users generated by the brushing amount tool have certain similarity in behavior is utilized, the behavior characteristic values of the users are calculated, the users are clustered accordingly, and whether the brushing amount tool is used by a channel to be detected or not can be judged according to a preset strategy, so that the limitation problem existing in the existing method for detecting the brushing amount tool based on the distribution of hardware access environment attribute parameters is avoided.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method for detecting a brushing tool according to an embodiment of the present disclosure;
FIG. 2 is a flowchart of a method for determining a brushing amount tool used by a channel to be measured according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of another method for determining a channel to be measured using a brushing volume tool according to an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of a brushing amount tool detection device disclosed in the embodiments of the present application;
FIG. 5 is a schematic structural diagram of a brushing amount tool determining unit disclosed in an embodiment of the present application;
FIG. 6 is a schematic structural diagram of another brushing amount tool determining unit disclosed in the embodiments of the present application;
fig. 7 is a schematic structural diagram of a behavior feature calculation unit disclosed in an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a classification processing unit according to an embodiment of the present application;
fig. 9 is a schematic diagram of a hardware structure of a server according to an embodiment of the present application.
Detailed Description
Before introducing the concepts of the present application, a first interpretation of the title or phrase used herein is:
SimHash is a dimension reduction technique that maps high-dimensional vectors to one-dimensional fingerprints, which was first proposed by Google for web page deduplication. The input of the SimHash algorithm is a vector, and the output is a fingerprint with f bits. For convenience of presentation, assuming that the input is a set of features of the user's behavior, each feature having a corresponding weight, the SimHash algorithm is as follows:
1. initializing a vector V of dimension f (where f is 64) to 0, and initializing a binary number S of bits f to 0;
2. for each feature: using a conventional hash algorithm to generate a f-bit fingerprint b for the feature, where i is 1 to f, and if the ith bit of b is 1, the ith element of V is added with the weight of the feature (here, the weight is 1); otherwise, the ith element of V is subtracted by the weight of the feature (here the weight is 1);
3. if the ith element of V is greater than 0, the ith bit of S is 1, otherwise 0;
4. the fingerprint S is output.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of a method for detecting a brushing amount tool disclosed in an embodiment of the present application.
As shown in fig. 1, the method includes:
s100, acquiring behavior data of at least one user corresponding to a channel to be detected;
the user corresponding to the channel to be tested refers to a user registered through the channel to be tested. For such users, behavior data generated after the users log in the application is obtained.
Specifically, the application server records behavior data of each login user, and the behavior data carries a channel identifier indicating through which channel the current user is registered.
The behavior data may include a plurality of types, and the specific type needs to be determined according to the type of the application, for example, for an application providing a downloading function, such as an app treasure, the behavior data may include the number of actions downloaded by clicking in a popular software scene, the number of behavior records in an updated push scene, the number of exposure actions in a competitive scene, and the like.
Of course, for most applications, the following behavior data are typically involved: the number of user access records, the user access duration, the start type, the login type, the number of different scenes of user participation, the number of user click actions, etc.
It should be noted that, it is not necessary to acquire behavior data of all users corresponding to the channel to be tested, and only behavior data of new users registered through the channel to be tested on the same day or within a period of time may be acquired.
Step S110, generating behavior characteristics of each user according to the behavior data of each user;
specifically, after behavior data of the users is acquired, behavior characteristics of the users are generated according to a preset method. The behavior characteristics are the embodiment after the behavior data are quantized. For example, the number of user access records is divided into a plurality of intervals in advance, and which interval the obtained number of user access records falls into is a behavior feature.
Step S120, calculating the behavior characteristic value of each user by using the behavior characteristic of each user;
the behavior feature of each user may include a plurality of behavior features, so that a certain algorithm may be adopted to calculate the behavior feature value of the user based on the behavior feature set of the user.
Preferably, a SimHash algorithm may be employed.
Step S130, classifying each user by taking the behavior characteristic value as a classification condition to obtain a plurality of classification groups;
specifically, the characteristic values of the users in each classification group have certain similarity, that is, the behavior representing the users in the same classification group is similar.
And step S140, when the number of the users in the classification group is determined to meet a preset judgment strategy, determining that the channel to be detected uses a brushing amount tool.
Specifically, the preset decision strategy may be various, and is described in detail in the related section below. Because a plurality of false users generated by the measurement brushing tool have certain similarity in behavior, which is reflected in behavior characteristic values, namely similar or identical, whether the channel to be tested uses the measurement brushing tool can be determined by analyzing the number of users in each classification group.
The method for detecting the brushing amount tool, provided by the embodiment of the application, includes the steps of obtaining behavior data of at least one user corresponding to a channel to be detected, wherein the behavior data represents various operations executed after the user logs in the application, further generating behavior characteristics of each user according to the behavior data of each user, then calculating behavior characteristic values of each user by using the behavior characteristics of each user, classifying the users by using the behavior characteristic values as classification conditions to obtain a plurality of classification groups, and if the number of the users contained in one or more classification groups is very large, the fact that most of the users in the classification groups are false users is meant, and the fact that the channel to be detected uses the brushing amount tool can be determined. According to the method and the device, the principle that a plurality of false users generated by the brushing amount tool have certain similarity in behavior is utilized, the behavior characteristic values of the users are calculated, the users are clustered accordingly, and whether the brushing amount tool is used by a channel to be detected or not can be judged according to a preset strategy, so that the limitation problem existing in the existing method for detecting the brushing amount tool based on the distribution of hardware access environment attribute parameters is avoided.
Optionally, when classifying each user by using the behavior feature value as a classification condition, users with the same behavior feature value may be classified into one class. Certainly, the classification may also be performed according to other rules, for example, when the behavior feature value is in a binary group number form, the users corresponding to the behavior feature values with different bit values less than the preset number may be classified into one class, and if the two behavior feature values are 1011 and 1111 respectively and the preset number is 2, it is obvious that only the 2 nd bit values from the left of the two behavior feature values are different, so the two users of the behavior feature values 1011 and 1111 may be classified into one class.
In another embodiment of the present application, several different decision strategies are disclosed to determine whether the channel under test uses a brushing volume tool.
Referring to fig. 2, fig. 2 is a flowchart of a method for determining a brushing amount tool used by a channel to be measured according to an embodiment of the present application.
As shown in fig. 2, the method includes:
s200, screening classification groups with the number of users exceeding a first threshold value from all classification groups;
step S210, counting the total number of users contained in the screened classification group;
and for each classified group obtained by screening, adding the number of the users in each classified group to obtain the total number of the users.
Step S220, judging whether the ratio of the total number of the users to the total number of the users in all the classified groups exceeds a second threshold value, if so, executing step S230;
and step S230, determining that the channel to be tested uses a brushing amount tool.
In this embodiment, users with higher similar behaviors are screened out from all users, whether the proportion of the users occupying the total number of the users exceeds a certain value or not is judged, and if yes, the current channel to be tested is considered to use the brushing amount tool.
For example, 200 users corresponding to the channel a to be measured are counted to determine that the first threshold is 20 and the second threshold is 50%. 6 classification groups are obtained through classification, wherein the classification groups are respectively classified into 1-6, and the number of users in each classification group is 100, 80, 10, 5, 3 and 2 in sequence. Obviously, the number of people in the 1 st and 2 nd classification groups exceeds the first threshold, the sum of the number of people in the two classification groups is 100+ 80-180, and the judgment 180/200 is 90% > 50%, so that the channel a to be measured is determined to use the brushing amount tool.
Referring to fig. 3, fig. 3 is a flowchart of another method for determining a brushing amount tool used by a channel to be measured according to an embodiment of the present application.
As shown in fig. 3, the method includes:
s300, sequencing the classification groups according to the sequence of at least the number of the users;
step S310, counting the total number of users contained in the first n classification groups in the sequence;
the value of n is a preset constant, and may be a constant set by any user, such as 1, 2, etc., and of course n needs to be less than the total number of the classification groups.
Step S320, judging whether the ratio of the total number of the users to the total number of the users in all the classified groups exceeds a third threshold value, if so, executing step S330;
and step S330, determining that the channel to be tested uses a brushing amount tool.
In this embodiment, compared with the previous embodiment, in this embodiment, the first n classification groups with a larger number of users are selected, whether the specific gravity between the total number of the n classification groups and the total number of all classification groups exceeds a third threshold value is judged, and if yes, it is determined that the channel to be tested uses the brushing tool.
For example, a total of 200 users corresponding to the channel B to be measured are determined to be 3 and the third threshold is 60% by counting a large amount of data. 6 classification groups are obtained through classification, and the number of users in the 6 classification groups is shown in the following table 1:
Figure BDA0000735077970000071
TABLE 1
Sorting the classification groups according to the sequence from large to small including the number of users: 1-3-4-6-5-2. And selecting the first 3 classification groups, namely a classification group 1, a classification group 3 and a classification group 4, and solving the total number of people in the three classification groups as 120+50+ 15-185. And judging that 185/200 is 92.5% > 60%, so that the channel B to be tested is determined to use the brushing amount tool.
Next, the entire scheme is described by taking the application software as an example in this embodiment.
The behavior data of each user corresponding to the channel to be tested can be obtained through the application software server, and the behavior data comprises the attributes shown in the following table 2:
Figure BDA0000735077970000081
Figure BDA0000735077970000091
TABLE 2
According to the behavior data of each user, behavior characteristics of each user are generated, and the attributes of the behavior characteristics are described in the following table 3:
Figure BDA0000735077970000092
Figure BDA0000735077970000101
TABLE 3
And calculating the behavior characteristic value of each user based on the behavior characteristic set of each user by using a SimHash algorithm, and gathering the users with the same behavior characteristic value into one class to obtain a plurality of classification groups. And selecting a classification group containing the largest number of users, dividing the number of the users in the classification group by the total number of the users in all the classification groups, judging whether the result is greater than a preset value, and if so, determining that the current channel to be detected uses a brushing amount tool.
The following describes the brushing amount tool detection device provided in the embodiments of the present application, and the brushing amount tool detection device described below and the brushing amount tool detection method described above may be referred to in correspondence with each other.
Referring to fig. 4, fig. 4 is a schematic structural view of a brushing amount tool detection device disclosed in the embodiment of the present application.
As shown in fig. 4, the apparatus includes:
a behavior data obtaining unit 41, configured to obtain behavior data of at least one user corresponding to a channel to be tested;
a behavior feature generation unit 42, configured to generate a behavior feature of each user according to the behavior data of each user;
a behavior feature value calculation unit 43 for calculating a behavior feature value of each user using the behavior feature of each user;
a classification processing unit 44, configured to classify each user by using the behavior feature value as a classification condition, so as to obtain a plurality of classification groups;
and a brushing amount tool determining unit 45, configured to determine that the channel to be tested uses a brushing amount tool when it is determined that the number of users in the classification group meets a preset determination policy.
Alternatively, fig. 5 illustrates an alternative structure of the brushing amount tool determining unit 45, and as shown in fig. 5, the brushing amount tool determining unit 45 may include:
a classification group screening unit 451 for screening out classification groups, of which the number of users exceeds a first threshold value, among all classification groups;
a first user number counting unit 452, configured to count the total number of users included in the screened classification group;
and a first ratio judgment unit 453, configured to judge whether the ratio of the total number of the users to the total number of the users in all the classification groups exceeds a second threshold, and if yes, determine that the channel to be detected uses a volume brushing tool.
Alternatively, fig. 6 illustrates another optional structure of the brushing amount tool determining unit 45, and as shown in fig. 6, the brushing amount tool determining unit 45 may include:
a sort group sorting unit 454 for sorting the sort groups in an order of at least the number of users;
a second user number counting unit 455, configured to count the total number of users included in the first n classification groups in the sequence;
a second ratio determining unit 456, configured to determine whether a ratio of the total number of the users to the total number of the users in all the classification groups exceeds a third threshold, and if so, determine that the channel to be detected uses a volume brushing tool.
Alternatively, fig. 7 illustrates an alternative structure of the behavior feature value calculation unit 43, and as shown in fig. 7, the behavior feature value calculation unit 43 may include:
and the first behavior feature value operator unit 431 is configured to calculate the behavior feature value of each user based on the behavior feature set of each user by using a SimHash algorithm.
Optionally, fig. 8 illustrates an optional structure of the classification processing unit 44, and as shown in fig. 8, the classification processing unit 44 may include:
and the first classification processing subunit 441 is configured to classify users with the same behavior feature value into one class.
Certainly, the classification may also be performed according to other rules, for example, when the behavior feature value is in a binary group number form, the users corresponding to the behavior feature values with different bit values less than the preset number may be classified into one class, and if the two behavior feature values are 1011 and 1111 respectively and the preset number is 2, it is obvious that only the 2 nd bit values from the left of the two behavior feature values are different, so the two users of the behavior feature values 1011 and 1111 may be classified into one class.
The detection device for the brushing amount tool, provided by the embodiment of the application, acquires the behavior data of at least one user corresponding to the channel to be detected, the behavior data represents various operations executed after the user logs in the application, so that the behavior characteristics of each user can be generated according to the behavior data of each user, then the behavior characteristics of each user are used for calculating the behavior characteristics of each user, the users are classified by taking the behavior characteristics as classification conditions, and a plurality of classification groups are obtained. According to the method and the device, the principle that a plurality of false users generated by the brushing amount tool have certain similarity in behavior is utilized, the behavior characteristic values of the users are calculated, the users are clustered accordingly, and whether the brushing amount tool is used by a channel to be detected or not can be judged according to a preset strategy, so that the limitation problem existing in the existing method for detecting the brushing amount tool based on the distribution of hardware access environment attribute parameters is avoided.
The embodiment of the present application further provides a server, where the server may include the brushing amount tool detection device described above, and the description of the brushing amount tool detection device may refer to the description of the corresponding part above, which is not described herein again.
The hardware structure of the server provided in the embodiment of the present application is described below, and the parts related to account management in the following description may refer to the corresponding parts above. Fig. 9 is a schematic diagram of a hardware structure of a server according to an embodiment of the present application, and referring to fig. 9, the server may include:
a processor 1, a communication interface 2, a memory 3, a communication bus 4, and a display screen 5;
the processor 1, the communication interface 2, the memory 3 and the display screen 5 are communicated with each other through a communication bus 4;
optionally, the communication interface 2 may be an interface of a communication module, such as an interface of a GSM module;
a processor 1 for executing a program;
a memory 3 for storing a program;
the program may include program code including operating instructions of the processor.
The processor 1 may be a central processing unit CPU or an application specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present application.
The memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory), such as at least one disk memory.
Among them, the procedure can be specifically used for:
acquiring behavior data of at least one user corresponding to a channel to be detected;
generating behavior characteristics of each user according to the behavior data of each user;
calculating the behavior characteristic value of each user by using the behavior characteristic of each user;
classifying each user by taking the behavior characteristic value as a classification condition to obtain a plurality of classification groups;
and when the number of the users in the classification group is determined to meet a preset judgment strategy, determining that the channel to be detected uses a brushing amount tool.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (6)

1. A method of detecting a brushware, comprising:
acquiring behavior data of at least one user corresponding to a channel to be tested, wherein the behavior data of the user comprises any one or more of the number of user access records, the access duration of the user, the starting type, the login type, the number of different scenes participated by the user and the number of clicking actions of the user;
generating behavior characteristics of each user according to the behavior data of each user;
calculating the behavior characteristic value of each user by using the behavior characteristic of each user;
clustering users by taking the behavior characteristic values as classification conditions to obtain a plurality of classification groups;
when determining that the number of users in the classification group meets a preset judgment strategy, determining that the channel to be tested uses a brushing amount tool, wherein the method comprises the following steps:
screening classification groups with the user number exceeding a first threshold value from all classification groups, counting the total number of users contained in the screened classification groups, and determining that the channel to be tested uses a brushing amount tool if the ratio of the total number of users contained in the screened classification groups to the total number of users in all classification groups exceeds a second threshold value;
alternatively, the first and second electrodes may be,
sorting the classification groups according to a sequence of at least more users, counting the total number of the users in the first n classification groups in the sequence, and determining that the channel to be tested uses a brushing amount tool if the ratio of the total number of the users in the first n classification groups to the total number of the users in all the classification groups exceeds a third threshold; n is a preset constant less than the total number of the classification groups.
2. The method according to claim 1, wherein the behavior feature of each user is plural, and the calculating the behavior feature value of each user by using the behavior feature of each user comprises:
and calculating the behavior characteristic value of each user based on the behavior characteristic set of each user by adopting a SimHash algorithm.
3. The method according to claim 1, wherein the classifying the respective users by using the behavior feature value as a classification condition comprises:
and classifying users with the same behavior characteristic value into one class.
4. A brushware detecting device, comprising:
the behavior data acquisition unit is used for acquiring behavior data of at least one user corresponding to the channel to be detected, wherein the behavior data of the user comprises any one or more of the number of user access records, the access duration time of the user, the starting type, the login type, the number of different scenes participated by the user and the number of click actions of the user;
the behavior feature generation unit is used for generating behavior features of the users according to the behavior data of the users;
the behavior characteristic value calculation unit is used for calculating the behavior characteristic value of each user by using the behavior characteristic of each user;
the classification processing unit is used for clustering the users by taking the behavior characteristic values as classification conditions to obtain a plurality of classification groups;
the brushing amount tool determining unit is used for determining that the channel to be tested uses a brushing amount tool when the number of users in the classification group is determined to meet a preset judgment strategy;
the brushing amount tool determining unit includes:
the classification group screening unit is used for screening out classification groups of which the number of users exceeds a first threshold value from all classification groups;
the first user number counting unit is used for counting the total number of users contained in the screened classification group;
the first ratio judgment unit is used for determining that the channel to be tested uses a brushing amount tool if the ratio of the total number of the users in the screened classification group to the total number of the users in all the classification groups exceeds a second threshold;
alternatively, the brushing amount tool determining unit includes:
the classification group sorting unit is used for sorting the classification groups according to an order of at least the number of users;
the second user number counting unit is used for counting the total number of users contained in the first n classification groups in the sequence;
the second ratio judging unit is used for determining that the channel to be tested uses a brushing amount tool if the ratio of the total number of the users contained in the first n classification groups to the total number of the users in all the classification groups exceeds a third threshold; n is a preset constant less than the total number of the classification groups.
5. The apparatus according to claim 4, wherein a plurality of behavior characteristics are provided for each user, and the behavior characteristic value calculation unit includes:
and the first behavior feature value operator unit is used for calculating the behavior feature value of each user based on the behavior feature set of each user by adopting a SimHash algorithm.
6. A readable storage medium, having stored thereon a computer program, wherein the computer program, when being executed by a processor, carries out the steps of the brushtool detection method of any one of claims 1-3.
CN201510317134.0A 2015-06-10 2015-06-10 Brushing amount tool detection method and device Active CN106294508B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510317134.0A CN106294508B (en) 2015-06-10 2015-06-10 Brushing amount tool detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510317134.0A CN106294508B (en) 2015-06-10 2015-06-10 Brushing amount tool detection method and device

Publications (2)

Publication Number Publication Date
CN106294508A CN106294508A (en) 2017-01-04
CN106294508B true CN106294508B (en) 2020-02-11

Family

ID=57659601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510317134.0A Active CN106294508B (en) 2015-06-10 2015-06-10 Brushing amount tool detection method and device

Country Status (1)

Country Link
CN (1) CN106294508B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107124391B (en) * 2016-09-22 2021-11-16 北京星选科技有限公司 Abnormal behavior identification method and device
CN106844655A (en) * 2017-01-23 2017-06-13 广东欧珀移动通信有限公司 Information processing method, device and computer equipment
CN107193972A (en) * 2017-05-25 2017-09-22 山东浪潮云服务信息科技有限公司 A kind of sorted users method and device based on big data
CN110754076B (en) * 2017-08-30 2022-04-29 深圳市欢太科技有限公司 Method and device for determining brushing amount terminal
WO2019041200A1 (en) * 2017-08-30 2019-03-07 深圳市云中飞网络科技有限公司 Method and apparatus for determining resources for increasing downloading quantities
CN107707642B (en) * 2017-09-22 2019-08-13 Oppo广东移动通信有限公司 Brush amount terminal determines method and device
CN109284380B (en) * 2018-09-25 2023-04-25 平安科技(深圳)有限公司 Illegal user identification method and device based on big data analysis and electronic equipment
CN111368169B (en) * 2018-12-25 2023-08-01 卓望数码技术(深圳)有限公司 Method, device, equipment and storage medium for detecting brushing amount behavior
CN110189165B (en) * 2019-05-14 2021-07-23 微梦创科网络科技(中国)有限公司 Channel abnormal user and abnormal channel identification method and device
CN113383362B (en) * 2019-06-24 2022-05-13 深圳市欢太科技有限公司 User identification method and related product
CN110378112A (en) * 2019-07-08 2019-10-25 北京达佳互联信息技术有限公司 A kind of user identification method and device
CN111639681A (en) * 2020-05-09 2020-09-08 同济大学 Early warning method, system, medium and device based on education drive type fraud

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521248A (en) * 2011-11-14 2012-06-27 北京亿赞普网络技术有限公司 Network user classification method and device
CN102769851A (en) * 2011-05-06 2012-11-07 中国移动通信集团广东有限公司 Method and system for monitoring service provider services
CN103793484A (en) * 2014-01-17 2014-05-14 五八同城信息技术有限公司 Fraudulent conduct identification system based on machine learning in classified information website
CN103853841A (en) * 2014-03-19 2014-06-11 北京邮电大学 Method for analyzing abnormal behavior of user in social networking site
CN103886068A (en) * 2014-03-20 2014-06-25 北京国双科技有限公司 Data processing method and device for Internet user behavior analysis
CN104281882A (en) * 2014-09-16 2015-01-14 中国科学院信息工程研究所 Method and system for predicting social network information popularity on basis of user characteristics
CN104298719A (en) * 2014-09-23 2015-01-21 新浪网技术(中国)有限公司 Method and system for conducting user category classification and advertisement putting based on social behavior
CN104424433A (en) * 2013-08-22 2015-03-18 腾讯科技(深圳)有限公司 Anti-cheating method and anti-cheating system of application program

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102769851A (en) * 2011-05-06 2012-11-07 中国移动通信集团广东有限公司 Method and system for monitoring service provider services
CN102521248A (en) * 2011-11-14 2012-06-27 北京亿赞普网络技术有限公司 Network user classification method and device
CN104424433A (en) * 2013-08-22 2015-03-18 腾讯科技(深圳)有限公司 Anti-cheating method and anti-cheating system of application program
CN103793484A (en) * 2014-01-17 2014-05-14 五八同城信息技术有限公司 Fraudulent conduct identification system based on machine learning in classified information website
CN103853841A (en) * 2014-03-19 2014-06-11 北京邮电大学 Method for analyzing abnormal behavior of user in social networking site
CN103886068A (en) * 2014-03-20 2014-06-25 北京国双科技有限公司 Data processing method and device for Internet user behavior analysis
CN104281882A (en) * 2014-09-16 2015-01-14 中国科学院信息工程研究所 Method and system for predicting social network information popularity on basis of user characteristics
CN104298719A (en) * 2014-09-23 2015-01-21 新浪网技术(中国)有限公司 Method and system for conducting user category classification and advertisement putting based on social behavior

Also Published As

Publication number Publication date
CN106294508A (en) 2017-01-04

Similar Documents

Publication Publication Date Title
CN106294508B (en) Brushing amount tool detection method and device
CN106294105B (en) Brushing amount tool detection method and device
CN106934627B (en) Method and device for detecting cheating behaviors of e-commerce industry
CN110830986B (en) Method, device, equipment and storage medium for detecting abnormal behavior of Internet of things card
CN107305611B (en) Method and device for establishing model corresponding to malicious account and method and device for identifying malicious account
CN107222511B (en) Malicious software detection method and device, computer device and readable storage medium
CN108768695B (en) KQI problem positioning method and device
CN107346310B (en) Account complaint processing method and server
WO2020087758A1 (en) Abnormal traffic data identification method, apparatus, computer device, and storage medium
CN111148018B (en) Method and device for identifying and positioning regional value based on communication data
CN106571933B (en) Service processing method and device
CN106998336B (en) Method and device for detecting user in channel
CN106301979B (en) Method and system for detecting abnormal channel
CN113412607A (en) Content pushing method and device, mobile terminal and storage medium
CN114780606B (en) Big data mining method and system
CN111064719A (en) Method and device for detecting abnormal downloading behavior of file
CN113609389A (en) Community platform information pushing method and system
CN113609409A (en) Method and system for recommending browsing information, electronic device and storage medium
CN109582829B (en) Processing method, device, equipment and readable storage medium
CN108090089B (en) Method, device and system for detecting hot point data in website
CN109246083B (en) DGA domain name detection method and device
CN110677269A (en) Method and device for determining communication user relationship and computer readable storage medium
CN114650239A (en) Data brushing amount identification method, storage medium and electronic equipment
CN112601228B (en) Method and device for detecting card number and computer readable storage medium
CN110458707B (en) Behavior evaluation method and device based on classification model and terminal equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant