WO2017202336A1 - 广告反作弊方法,装置及存储介质 - Google Patents

广告反作弊方法,装置及存储介质 Download PDF

Info

Publication number
WO2017202336A1
WO2017202336A1 PCT/CN2017/085687 CN2017085687W WO2017202336A1 WO 2017202336 A1 WO2017202336 A1 WO 2017202336A1 CN 2017085687 W CN2017085687 W CN 2017085687W WO 2017202336 A1 WO2017202336 A1 WO 2017202336A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
sample
advertisement
cheat
application
Prior art date
Application number
PCT/CN2017/085687
Other languages
English (en)
French (fr)
Inventor
程权
李益群
王春辉
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2018543423A priority Critical patent/JP6878450B2/ja
Publication of WO2017202336A1 publication Critical patent/WO2017202336A1/zh
Priority to US15/971,614 priority patent/US10929879B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0248Avoiding fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement

Definitions

  • the present application relates to Internet advertising technologies in the field of communications, and in particular, to an advertising anti-cheat method, device and storage medium.
  • the traffic party provides users with various forms of Internet-based services (such as providing news, media playback, online games, etc.), and the advertising system is used by users in the process of using the service.
  • the service such as the application used by the user, or the webpage visited by the user
  • the click volume also called the advertisement traffic
  • the visible traffic party is based on the advertisement resources owned by the user (such as the application).
  • the ads in the ads, the ad slots in the pages, etc. consume the clicks on the ads.
  • the traffic party will use cheating to click on the ads served on the inventory, thus forming false clicks on the ads (also Become a false ad traffic), and there is no effective solution for accurately identifying cheating users to filter out false clicks from the traffic's clicks.
  • the embodiment of the present application provides an advertisement anti-cheat method and device, which can accurately identify a cheating user who is cheating on an advertisement in the Internet.
  • an advertisement anti-cheat method where the method includes:
  • the advertisement anti-cheating device acquires a sample set, wherein at least one sample in the sample set includes a cheating user, and a click log of the cheating user clicking the advertisement;
  • the advertisement anti-cheating device extracts, from the sample of the sample set, features of at least one dimension corresponding to the level of the cheat user, wherein each cheat user corresponds to one level, and the features corresponding to the cheat users of different levels are different;
  • the advertisement anti-cheat device performs a positive sample on the feature of the at least one dimension based on the cheating user's click log of the cheating user clicking the advertisement, and based on the cheating user identification model corresponding to the level of the cheating user based on the positive sample at least training;
  • the advertisement anti-cheat device determines that the sample to be identified corresponds to the feature of the at least one dimension
  • the advertisement anti-cheating device inputs the feature corresponding to the at least one dimension of the sample to be identified into the trained cheating user recognition model, and identifies a cheating user in the sample to be identified based on the output result.
  • an advertisement anti-cheat device where the device includes:
  • a sample module configured to acquire a sample set, wherein at least one sample in the sample set includes a cheating user, and a click log of the cheating user clicking an advertisement;
  • An extracting module configured to extract, from the samples of the sample set, features of at least one dimension corresponding to a level of the cheat user, wherein each cheat user corresponds to one level, and the features of the cheat users of different levels are different;
  • a model training module configured to form a positive sample in the at least one dimension based on the click log of the cheating user and the cheating user clicking the advertisement, at least based on the positive sample pair corresponding to the level of the cheat user to be identified Cheating user identification model for training;
  • a model application module configured to determine that the sample to be identified corresponds to the feature of the at least one dimension; and input the feature corresponding to the at least one dimension of the sample to be identified into the trained user identification model after training, and identify the result based on the output result The cheating user in the sample to be identified.
  • the embodiment of the present application provides a computer storage medium for storing computer software instructions used by the advertisement anti-cheating device, which includes steps for executing the above-described advertisement anti-cheat method.
  • the corresponding feature is extracted from the sample to train the corresponding level cheat user recognition model, so that the trained model can be used to target the cheating users at different levels.
  • 1-1 is an optional schematic structural diagram of an advertisement anti-cheat device in the embodiment of the present application.
  • 1-2 is an optional schematic structural diagram of an advertisement anti-cheat device in the embodiment of the present application.
  • FIG. 2 is an optional implementation diagram of an advertisement anti-cheat device for identifying a low-level cheat user in the embodiment of the present application
  • Figure 3-1 is an optional flowchart of identifying a low-level cheat user in the embodiment of the present application.
  • FIG. 4 is a schematic diagram of an optional implementation of a middle-level cheating user identification model in a training embodiment and a middle-level cheat user identification model for identifying a middle-level cheat user in the embodiment of the present application;
  • FIG. 5 is an optional schematic flowchart of identifying a high-level cheat user in the embodiment of the present application.
  • FIG. 6 is a schematic diagram of an optional implementation of training a high-level cheat user identification model and identifying a high-level cheat user using a high-level cheat user identification model in the embodiment of the present application;
  • FIG. 7 is an optional schematic diagram showing the cheating user identification of the anti-cheat system in the embodiment of the present application.
  • FIG. 8 is a schematic diagram of an optional functional architecture of an advertisement anti-cheat system in an embodiment of the present application.
  • FIG. 9 is an optional schematic diagram showing the cheating user identification of the anti-cheat system in the embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an advertisement anti-cheat system according to an embodiment of the present application.
  • Ad exposure The ad is displayed on the user side of the ad slot (such as the ad slot in the page accessed by the user, the ad slot in the app used by the user), and the ad is displayed once on the user side as an ad exposure.
  • Advertising click The user visits the advertiser's page by clicking on the advertisement in the terminal (such as a smart phone or a tablet), and the user clicks on the advertisement to access the advertiser's page, which is called an advertisement click.
  • the advertisement in the terminal such as a smart phone or a tablet
  • Ad performance After the ad is exposed, the user clicks on the ad to place an order or download the app on the advertiser's webpage, which is called the ad effect.
  • Clickthrough rate The ratio of ad clicks to ad impressions.
  • Advertising cheats in the areas of advertising exposure, clicks, effects, etc., users for some malicious purpose, There are behaviors that can increase the number of ad exposures, ad clicks, and ad performance. The malicious behavior of such cheating users is called advertising cheat.
  • Advertising anti-cheat check the exposure, click and effect of the advertisement to determine whether the advertisement exposure, the advertisement click, the advertisement effect, etc. are triggered by the normal access of the user side, or because the cheating user is cheated by the advertisement.
  • Advertising anti-cheat system A system for anti-cheating checks on advertising exposure, ad clicks, and advertising effectiveness.
  • the anti-cheat system is a series of rules used to combat cheating. Each rule is called a strategy.
  • Advertising task platform A platform that only provides paid tasks such as advertisement browsing, advertisement click or application download.
  • the platform user obtains points to exchange money or prizes by completing paid tasks, and the platform user's advertisement click behavior is similar to the cheating user's advertisement click behavior.
  • High (first) level cheat users professional cheating user groups, thorough understanding of anti-cheat system, a group of high-level cheating users click on a batch of applications (APP), the application used by high-level cheat users is a fake app with shell It is designed for high-level cheating users to cheat on advertisements, to ensure that the behavior of a single cheating user is no different from that of a normal user, and is mostly a user group forged by cheating software.
  • APP applications
  • the middle (second) level of cheating users professional cheating users, have an understanding of the anti-cheat system, long-term scattered, intermittent clicks on advertising, mostly users of the advertising task platform or professional water army.
  • Low (third) level cheat users unorganized cheating users, less understanding of the anti-cheat system, click on a large number of advertisements in a short period of time, mostly inside or around the traffic party.
  • the anti-cheat system (in the embodiment of the present application, the anti-cheating device is implemented as an anti-cheat system as an example) needs to identify a cheating user and filter the cheating user's clicks on the advertisement.
  • the anti-cheat system provided by related art can identify the obvious cheating behavior of cheating, but with the change and deepening of the cheating methods of cheating users, some hidden cheating users are difficult to identify.
  • the embodiment of the present application provides an advertisement anti-cheat method and an application advertisement reverse
  • the anti-cheating device for the cheating method the anti-cheating device can be implemented in various ways, and the following describes the implementation manner of the anti-cheating device.
  • the advertisement anti-cheat device is implemented as an advertisement anti-cheat system (the actual application can be implemented in the form of a server or a server cluster, optional
  • the advertisement anti-cheating service is provided in the form of a cloud service.
  • the advertisement anti-cheat system is connected with the advertisement system. The advertisement system is explained below.
  • the advertising system delivers an advertisement to an advertisement slot of a corresponding user's terminal according to the targeting condition of the advertisement set by the advertiser (such as the age, region, group, consumption ability, etc. of the advertisement audience), and according to the user's click on the advertisement, Corresponding to the click log that forms each statistical period (such as one week), the click log is used to record various information such as the click amount and the click time of the user's click on the advertisement.
  • the advertising system also statistically forms an exposure log.
  • the exposure log includes objects such as applications, merchandise, and the like exposed by the advertisement clicked by the user.
  • the advertisement system also statistically forms an effect log corresponding to each application.
  • the effect log includes an effect achieved by the exposed object for the advertisement reached by the user after clicking the advertisement.
  • the advertising system corresponds to the information of the device used by the user to click on the advertisement, such as the hardware information and software information of the device.
  • the advertisement anti-cheat system obtains a click log, an effect log, an exposure log, and a user's device information of the user clicking the advertisement from the advertisement system, and processes the at least one type of information to form a model for identifying cheat users at different levels, and then utilizes Different models identify cheat users at different levels, and can also filter the clicks of the cheating users to click on the ads to ensure the accuracy of the user-side ad clicks.
  • the advertisement anti-cheat device is coupled into the advertisement system as a function module of the advertisement system shown in FIG. 1-1, and the advertisement anti-cheating device obtains the click log of the user clicking the advertisement from the advertisement system.
  • the effect log, the exposure log, and the user's device information are processed based on the at least one type of information to form a model for identifying cheat users at different levels, and then use different models to identify the cheat user at the corresponding level, and Cheating users clicked on the ad’s clicks. Filter processing to ensure the accuracy of the number of ad clicks on the user side.
  • FIG. 1-1 and FIG. 1-2 are merely illustrative, and the actual application may be based on the advertisements shown in FIG. 1-1 and FIG.
  • the anti-cheat processing device is easily transformed and implemented in different ways.
  • the advertisement anti-cheat system is described below with reference to FIG. 1-1 for the identification of low-level cheating users, middle-level cheating users, and high-level cheating users.
  • the identification of low-level cheat users, middle-level cheat users, and high-level cheat users can be implemented by referring to the following description.
  • an optional implementation diagram of the low-level cheat user is identified by the advertisement anti-cheat device shown in FIG. 2, and the low-level cheat user is identified by online real-time penalty and offline delay re-judgment.
  • the processing method for online execution of the penalty includes a blacklist strategy and a statistical type strategy
  • the processing method of the offline delay re-judgment includes a statistical type strategy, which are separately described below.
  • the ad anti-cheat system pre-maintains a blacklist of users with low-level cheats, including the identity of low-level cheat users.
  • the advertisement anti-cheat system extracts the identifier of the user who is currently clicking the advertisement from the click log obtained by the advertisement system implementation, and matches the identifier of the low-level cheating user in the blacklist. Once the matching is successful, it is determined that the user who clicks the advertisement currently cheats for the lower level. user.
  • the identifier of the low-level cheat user adopts information that uniquely distinguishes the user, such as the user's mobile phone number, social platform account (such as WeChat account, QQ account), etc.
  • the type of the low-level cheat user's logo is not limited to this, Any type of identification such as an Internet Protocol (IP) address, a medium access (MAC) address, or the like can be used.
  • IP Internet Protocol
  • MAC medium access
  • two or more of the above-mentioned identifiers may be used in combination to calibrate low-level cheat users.
  • the advertisement anti-cheat system statistics counts the number of clicks of the advertisement in the statistical period (such as 5 minutes, 1 hour, and the actual application according to the situation). When the number of clicks on the ad exceeds the traffic threshold, the user is identified as a low-level cheat user.
  • the advertising anti-cheat system utilizes the filtering (criminating) of low-level cheating users' clicks and feeds back to the advertising system to prevent the advertising system from using the inaccuracies caused by the clicks of low-level cheating users.
  • the amount of clicks after exceeding the click threshold is filtered according to a predetermined ratio, and the more clicks exceeding the click threshold, the larger the filter ratio.
  • the click amount (ab) exceeding the click threshold is selected according to the correspondence between the (ab) value space and the filtering ratio.
  • the ratio is filtered, and an example of the correspondence between the (ab) value space and the filtering ratio is shown in Table 1.
  • the anti-cheat system in order to further reduce the amount of clicks generated by low-level cheat users among the clicks of advertisements, also employs a method of delaying re-judgement.
  • the advertisement anti-cheat system counts the click logs obtained from the advertisement system to count the number of times the user clicks on the advertisement in the statistical period (the set interval time is 5 minutes, 1 hour, and the actual application is set according to the situation).
  • the set interval time is 5 minutes, 1 hour, and the actual application is set according to the situation.
  • the user is identified as a low-level cheat user.
  • clicks that do not exceed the click threshold are pre- Filtering proportionally, or all filtering, is to clear the clicks of low-level cheat users.
  • the predetermined ratio used in the offline delay re-judge mode may be a fixed ratio, or dynamically determined according to the user's click amount in the statistical period (eg, proportional), using the amount of clicks according to the user during the statistical period.
  • a predetermined proportional positive correlation (eg, proportional) relationship dynamically determines a predetermined ratio for each low-level cheat user, that is, the greater the user’s clicks during the statistical period, the more the user’s clicks are filtered beyond the click threshold threshold. The larger the predetermined ratio.
  • the proportion of clicks filtered for the first 20 times without exceeding the click threshold (20) is determined based on the user's clicks during this 1 hour. Assuming that user A clicks 21 times within 1 hour, the ratio of the first 20 filters is lower than the filter rate of user B's first 20 clicks when user B clicks 100 times within 1 hour.
  • the click amount that the click amount a does not exceed the click amount threshold is that the click amount b is filtered according to a predetermined ratio (for example, 70%). If the user's click volume is b*(1-70%), or the click volume b is all filtered, the click amount b of the user's click volume that does not exceed the click amount threshold is cleared.
  • the number of clicks exceeding the click threshold in the clicks of the low-level cheating users is imposed (split proportionally), for the low-level cheating users.
  • the part of the click volume that does not exceed the click threshold is subjected to an offline delay re-judgment (filtered according to a fixed predetermined ratio or a predetermined percentage of dynamic adjustment) to minimize the clicks of low-level cheating users in the click volume of the advertisement.
  • an optional flowchart of identifying a low-level cheat user in the embodiment of the present application includes steps 101 to 106. The following steps are described.
  • the advertisement anti-cheat system uses the middle-level cheating user identification model to identify from the user.
  • the middle-level cheating users need to form a usable sample to train the middle-level cheat user identification model so that the recognition accuracy of the middle-level cheat user recognition model reaches the available preset precision.
  • the advertising anti-cheat system obtains a sample set from the advertising task platform (step 101) to form a sample that trains the middle level cheat user identification model.
  • the sample set includes samples corresponding to the middle-level cheat user.
  • An optional data structure in the sample is shown in Table 2:
  • the samples in the sample set include at least one middle-level cheat user and the middle-level cheat user's click log in the statistical period (such as one week).
  • the click log includes the operation data of the middle-level cheating user clicking the advertisement, such as The ID of each click on the ad, the time of the click, and so on.
  • the platform user who completes the advertisement task in the advertisement task platform can be regarded as a middle-level cheat user, and accordingly, the advertisement is obtained from the advertisement task platform.
  • the click log corresponding to the task platform user completing the advertisement task forms a sample set.
  • the aforementioned sample corresponding to the middle-level cheating user is used for the positive anti-cheating system to form a positive sample for training the middle-level cheating user identification model, in order to further improve the middle level.
  • the cheat user identification model identifies the accuracy of the middle-level cheat user.
  • the sample collection obtained by the advertisement anti-cheat system further includes a sample corresponding to the non-cheating user for forming an anti-cheating system for training.
  • a negative sample of the hierarchically cheated user identification model is used for the positive anti-cheating system to form a positive sample for training the middle-level cheating user identification model, in order to further improve the middle level.
  • the cheat user identification model identifies the accuracy of the middle-level cheat user.
  • the sample collection obtained by the advertisement anti-cheat system further includes a sample corresponding to the non-cheating user for forming an anti-cheating system for training.
  • a negative sample of the hierarchically cheated user identification model is a sample corresponding to the non-cheating user
  • the sample corresponding to the non-cheating user includes: a user who is a normal application (that is, an application that is known to have no cheating user), that is, a non-cheating user, and a user who is using a normal application.
  • a user who is a normal application that is, an application that is known to have no cheating user
  • click on the click log corresponding to the advertisement in the advertisement slot of the application and an optional data structure of the sample corresponding to the non-cheating user is as shown in Table 3:
  • Table 3 shows an optional data structure of the sample corresponding to the non-cheating user.
  • the application 3 is a normal application, and both the user 3 and the user 4 install the application 1 in the respective terminal, and All of the advertisements in the application 1 are clicked on the advertisement.
  • the advertisement anti-cheat system forms a sample corresponding to each non-cheating user (user 3 and user 4) in the application 1 based on the click log obtained from the advertisement system.
  • the advertisement anti-cheat system obtains the sample set
  • the click log in the parsing sample set corresponds to the operation data of the user clicking the advertisement
  • the feature associated with the operation of the user clicking the advertisement is extracted from the operation data ( Step 102).
  • the advertisement anti-cheat system parses the click log in the corresponding sample of the middle-level cheat user to determine the middle-level cheating user. Click on the feature associated with the action of the ad.
  • the advertisement anti-cheat system also parses the click log in the sample corresponding to the non-cheating user to determine the operation of clicking the advertisement with the non-cheating user. Associated features.
  • features associated with an operation of a user (middle level cheat user or non-cheating user) clicking an advertisement include features of at least one of the following dimensions:
  • the user clicks on the click volume of the advertisement during the statistical period, which is the total number of times the user clicks on the advertisement in any advertisement position in the statistical period, such as the advertisement of the page and the advertisement position in the application.
  • the corresponding number of clicks is 1, 2, and 3 times, and the user's click amount in the statistical period is 6 (1+2). +3).
  • the total number of times of the advertisement, the user clicked on the advertisement 1, advertisement 2 and advertisement 3 in the first time period of the statistical period, and the corresponding number of clicks is 1 time, 2 times and 3 times, and the user is in the statistical period.
  • 3 ads of advertisement 1, advertisement 2 and advertisement 3 are clicked, and the corresponding number of clicks is 1 time, 2 times and 3 times, then the user corresponds to advertisement 1, advertisement 2 and advertisement 3 in the statistical period.
  • the click volume is 2 (1+1), 4 (2+2), and 6 (3+3).
  • the number of time periods in which a user clicks on an advertisement is the number of time periods in which the user clicked on the advertisement.
  • the corresponding average value is (T2-T1)/2+(T3-T2)/2.
  • the historical ratio of the number of middle-level cheat users identified to the users who clicked on the advertisement may also be the current statistical period.
  • the average of the proportions of multiple statistical periods may also be the current statistical period.
  • the statistical period is 1 day and the time period is hour. Assume that the user clicks on the advertisement in the 1/2/4/5 hour of the 1st day, the number of time periods during which the user clicked the advertisement during the statistical period is 4. The number of clicks in the statistical period is 12 (1+2+4+5), and the average number of clicks on the ads in 4 time periods is 3 (12/4).
  • a positive sample for training the middle-level cheat user identification model may be formed.
  • the advertisement anti-cheat system will cheat the user, and the cheating user clicks on the advertisement click log in at least one dimension.
  • the feature is marked as a positive sample (step 103).
  • the advertisement anti-cheat system may form a negative sample for training the middle-level cheat user identification model, exemplarily Referring to FIG. 3-2, an optional process diagram for identifying a low-level cheating user in the embodiment of the present application, the advertisement anti-cheat system will feature the non-cheating user, the non-cheating user clicks on the advertisement click log in at least one dimension. Marked as a negative sample (step 107).
  • the advertisement anti-cheat system forms a positive sample for training the middle-level cheat user identification model
  • the positive sample is input into the middle-level cheat user recognition model to train the model parameters of the middle-level cheat user recognition model.
  • the advertisement anti-cheat system also forms a negative sample for training the middle-level cheat user identification model
  • the negative sample is input together with the positive sample into the middle-level cheat user identification model to be trained to improve the middle level. The recognition accuracy of the cheating user identification model shortens the training process.
  • Recognition result f (a * feature 1 + b * feature 2);
  • feature 1 and feature 2 are features of the sample for training (one of the positive sample and the negative sample), the model parameters a, b are used to control the weight of feature 1, feature 2, and the training process of the middle-level cheat user recognition model. It is the process of continuously optimizing the adjustment of the model parameters a/b.
  • the number of model parameters can be two or more, and the number of features used is not limited.
  • the advertising anti-cheat system can utilize a priori database (including cheating users, non-cheating users, and click log characteristics). Identification of cheating user identification model in the test The user's accuracy (that is, the correct rate), when the recognition accuracy does not reach the preset accuracy, the adjustment of the model parameters is used until the accuracy of the middle-level cheat user recognition model reaches the preset accuracy.
  • the middle-level cheat user identification model after training can be used to identify the middle-level cheat user.
  • the advertisement anti-cheat system obtains the sample to be identified from the advertisement system (step 105), and the sample data structure to be identified can refer to the foregoing Table 2 and Table 3, including the user to be identified and the click log of the user to be identified, and the advertisement anti-cheat system is waiting for Extracting the feature corresponding to the at least one dimension in the identification sample, inputting the trained cheating user recognition model, and determining the middle-level cheating in the sample to be identified based on the recognition result outputted by the middle-level cheating user identification model (whether it is a middle-level cheat user) User (step 106).
  • the click amount of the middle-level cheat user is also filtered (step 108), and the filter is filtered.
  • the middle-level cheating user's click volume is updated to the advertising system (step 109), so that the billing end of the advertising system uses the click volume of the updated advertisement in combination with the billing strategy for billing of the advertisement delivery, since the click volume in the advertisement has already been
  • the traffic of the middle-level cheating users is filtered to ensure that the click volume of the advertisement is formed by the user's normal click operation, ensuring the accuracy and authenticity of the advertisement traffic, and avoiding the inaccurate charging of the advertiser's advertisement. The problem.
  • the advertisement anti-cheat system to filter the clicks of the middle-level cheating users.
  • the following describes different filtering methods.
  • Filtering method 1 Filter the clicks of the middle-level cheat users according to the predetermined ratio.
  • the hits of the middle-level cheat users are a, and the predetermined proportion is 70%.
  • the filtered clicks of the middle-level cheat users are updated to a*30%, in particular, when the predetermined ratio is 100%, the hits of the middle-level cheat user are cleared.
  • Filtering method 2 Filter the clicks that do not exceed the click threshold in the clicks of the middle-level cheat users, or all the filters to clear the clicks of the middle-level cheat users; The number of clicks after the click threshold is exceeded is filtered according to a predetermined percentage. The more clicks that exceed the click threshold, the greater the filter ratio.
  • FIG. 4 An alternative implementation diagram of the middle-level cheat user identification model shown in FIG. 4 and the middle-level cheat user identification model is used to identify the middle-level cheat user, including two stages of model training and model use. .
  • the positive sample of the training comes from the click log of the advertising task platform, and the negative sample of the training is derived from the click log of the normal function APP (app which is known to have no cheating users).
  • Middle-level cheating users have long-term dispersed, intermittent click ads.
  • the Logistic Regression model is trained to determine whether the user is a middle-level cheat user.
  • the Logistic Regression model After training the model parameters of the logistic regression model, based on the click log of the clicked advertisement within one week of the user to be identified obtained from the advertising system, extracting the feature of the click log of the clicked advertisement within one week of the user to be identified, and selecting the above Six features are input into the logistic regression model, and the Logistic Regression model outputs the cheating recognition result of whether the user to be identified is a middle-level cheat user or a normal user (non-cheating user).
  • the inventor finds that high-level cheating users use (such as developing) specific applications to generate false traffic, and the specific application itself does not provide services for users (such as media services, social services).
  • the function is to use the packaged program to simulate different users to click on the advertisement in the advertisement slot of a specific traffic party to generate false traffic, that is to say, the specific application is an application dedicated to generating false traffic, wherein the user All are high-level cheat users.
  • the inventor finds that in the process of using high-level cheating users to use a specific application for cheating, the users of the simulated click advertisements are very close in many dimensions, that is, the correlation is very high. High, while normal users (non-cheating users) have discrete features in different dimensions, that is, the correlation is very low.
  • the application when the high-level cheat user is identified, the application is used as a unit to perform overall one-time identification on whether the user in the application is a high-level cheat user: all users in the application to be identified are in multiple dimensions. The degree of similarity is judged. Once the similarity is high, the application to be identified is identified as a specific application used by the high-level cheat user, and correspondingly, all users in the identification sample application are identified as high-level cheat users, and the following combination The flow chart is explained.
  • FIG. 5 an optional flowchart of identifying a high-level cheat user in the embodiment of the present application is shown. The following steps are described.
  • the anti-cheat system uses a high-level cheat user identification model to identify high-level cheat users. For this reason, the anti-cheat system needs to form a usable sample to train the high-level cheat user identification model, as before, for the high-level
  • the cheating user's identification is based on the application unit (one-time identification of whether an application user is a high-level cheat user), and accordingly, the advertisement anti-cheat system obtains a sample composed of application-specific samples (referred to as application samples).
  • each application sample in the sample set corresponds to an application, and at least one application sample corresponds to an application known to have a high cheating user for use in an advertisement anti-cheat system to form a high-level cheating user A positive sample that identifies the model for training.
  • the sample set may further include an application sample corresponding to the application of the high-level cheat, which is called an unmarked application sample.
  • the application sample includes various information corresponding to the application, and an optional data structure of the application sample is as shown in Table 4:
  • each application sample corresponds to one application, including at least one of the following information of the corresponding application:
  • the click log includes the following information:
  • the advertisements clicked by the user are distinguished by the serial number (ID) assigned by the advertisement system side for the advertisement, or by the category label assigned to the advertisement on the advertisement system side.
  • an advertisement clicked by a user during a statistical time period may be recorded for a user's record of clicked advertisements in all of the application's ad slots, such as in the form of advertisement 1, advertisement 2, advertisement 3.
  • the advertisement clicked by the user during the statistical time period is a record of the advertisement clicked by the user in different advertisement slots of the application, and is recorded in such a manner: advertisement slot 1 - advertisement 1 - advertisement 2, advertisement 2 - advertisement 3 Advertising 4.
  • the user clicks on the click volume of the ad in the ad slot of the app during the statistical time period, which is the total number of times the user clicks on the ad in the ad slot of the app during the statistical time period.
  • the corresponding number of clicks is 2, 3, and 4 times, and then in the statistical period.
  • the click volume is 9 (2+3+5).
  • the user clicks on the click volume of the advertisement in the advertisement slot of the application during the statistical period and may also be the total number of times the user clicks the same advertisement in the advertisement slot of the application in the statistical period, or is, for the user in the statistical period (such as The total number of times the app's ad slot clicks on the same ad for each time period of a week (less than a statistical period, such as one day or one hour).
  • the user clicks on the advertisement 1, the advertisement 2, and the advertisement 3 in the first time period of the statistical period to click 3 advertisements, and the corresponding number of clicks is 2, 3, and 4 times, and the user is in the statistical period.
  • the second time period of the application 3 ads of advertisement 1, advertisement 2 and advertisement 3 are clicked in the advertisement slot of the application, and the corresponding number of clicks is 2, 3 and 4 times, then the user corresponds to the advertisement 1 in the statistical period.
  • the clicks of the advertisement 2 and the advertisement 3 are 4 (2+2), 6 (3+3), and 8 (4+4).
  • the time during which the user clicks on the ad in the ad slot of the app during the statistical time period for the total duration of time the user clicks on the ad in the ad slot of the app during the statistical time period.
  • the duration of the user clicking the advertisement in the advertisement slot 1 is T1
  • the duration of the user clicking the advertisement in the advertisement slot 2 is T2.
  • the total duration that the user clicks on the ad in the ad slot of the app during the statistical time period is T1+T2.
  • the time when the user clicks on the advertisement in the advertisement slot in the application during the statistical period may also be the duration for the user to click on the advertisement in each advertisement slot, such as the aforementioned duration of the user clicking the advertisement in the advertisement slot 1 and the user in the advertisement slot. 2 Click on the duration of the ad T2.
  • ad slot Take the ad slot in the app as an example.
  • the types of ad slots include:
  • the open ad slot displays the location of the ad in the app's interface before the app's content is loaded after the app opens the screen.
  • the placard ad slot inserts the location of the ad in the app's interface during the application's content loading process.
  • the exposure log is used to record the objects exposed by the advertisements clicked by each user of the application in the application's ad slot, such as the name of the application, the name of the product, the address of the page, and the like.
  • the performance log includes the advertising performance achieved by the exposed object for the ad after each user clicks on the ad in the app.
  • the advertisement effect may be one of the following: the user starts downloading the application; the application download is completed; the application is installed on the user's device; the application is activated on the user's device; the user is deleted in the user's device. application.
  • the advertisement effect for the advertisement recorded in the effect log may be: the user places an order for the product; the user pays the order; the user cancels the order.
  • the information of the device may be hardware information of the device such as the model of the device, the remaining space of the device, the remaining power of the device, and the like.
  • the software information of the device may be information such as the communication carrier used by the device, the operating system (type and model) used by the device, and the networking mode of the device.
  • the information of the device may also be information such as the location of the device (such as latitude and longitude), the moving speed of the device, and the like.
  • the advertisement anti-cheat system After the advertisement anti-cheat system obtains the sample set, for each application sample, the advertisement anti-cheat system parses the relevance of the features of any two users in the application sample in at least one dimension (step 202).
  • the dimensions used by the feature are selected according to the type of information included in the application sample. The following is an example of the relevance of the features of different dimensions.
  • the characteristics of the user clicking on the advertisement in the ad slot of the application may be such as the location (or frequency) of the user clicking in the application, the number of times the application is exposed by the advertisement advertisement, and the number of times the webpage is exposed by the advertisement.
  • the relevance of the device used by the user of the application may be related to the dimensions of the hardware information, the software information, the location of the device, and the moving speed of the device.
  • the correlation between the difference between the remaining space of the device and the remaining power of the device may be adopted.
  • the similarity of any two users in the application in the above dimension is 100%. If the sample application is an unknown application of a high-level cheat user, the similarity of any two users in the above-mentioned dimensions is 0%.
  • the advertisement anti-cheat system parses the relevance of any two users of any sample application in at least one dimension, the application including the high-level cheat user will be known.
  • the sample, and the relevance of any two users corresponding to at least one dimension of the application sample are marked as positive samples (step 203), and the positive samples are entered into the cheat user identification model to train the model parameters in the cheat user identification model (step 204).
  • the advertisement anti-cheat system also utilizes any two users in the unlabeled application sample in the sample set, and the similarity (0%) of the two users in the above dimension forms a training for the high-level cheating user recognition model.
  • the unmarked sample, the unmarked sample is input into the high-level cheat user identification model together with the positive sample (step 210), and the unmarked sample is marked as a positive sample by an iterative method based on the high-level cheat user identification model to increase the number of positive samples.
  • the remaining unlabeled application samples in the sample set are marked as negative samples, where the negative samples The correlation between any two users is 0%.
  • the high-level cheat user identification model can be considered as a series of functions, the purpose is to construct from the lose
  • An example of mapping the average relevance of an application sample to an application sample is an optional example:
  • feature 3 and feature 4 are features of the sample for training (one of the positive sample and the negative sample), the model parameters a, b are used to control the weight of feature 3, feature 3, and the training process of the high-level cheat user recognition model. It is a process of continuously optimizing the adjustment model parameter c/d to make the average similarity of the output more precise.
  • the number of model parameters may be two or more, and the number of features used is not limited.
  • the advertisement anti-cheat system trains the high-level cheat user identification model
  • the feature to be identified and applied to the at least one dimension is input into the high-level cheat user recognition model (step 205), and the cheating user recognition model is obtained.
  • Correlating the relevance of the user in the to-be-identified application with the at least one dimension, and averaging the correlations of the features of the at least one dimension of the two users to obtain an average correlation of the application sample corresponding to the at least one dimension step 206) ).
  • the similarity of any two users in the device information similarity dimension is set to s1, s2, and s3, the average similarity of the application 1 in the similarity dimension of the device information is used. Is (s1+s2+s3)/3.
  • Identifying the high-level cheat user based on the average correlation (step 207): comparing the average relevance with the average relevance threshold, and if the average correlation of the output is higher than the average relevance threshold, indicating the characteristics of the user in the application to be identified Extremely close, the application to be identified is determined to be an application used by high-level cheating users for cheating, and all users in the application to be identified are identified as high-level cheat users. Thus, whether the user of the application to be identified is a one-time efficient decision for the high-level cheat user.
  • the click amount of the high-level cheat user is also filtered (step 208), and the filtered high-level cheat user is filtered.
  • the click volume is updated to the advertisement system (step 209), so that the billing end of the advertisement system uses the click volume of the updated advertisement in combination with the billing strategy to perform the billing of the advertisement delivery, since the click amount of the advertisement has been cheated to the user at the high level.
  • the clicks are filtered to ensure that the traffic’s clicks are formed by the user’s regular clicks, ensuring the accuracy and authenticity of the ad’s clicks.
  • the accuracy of the billing data for advertisers to advertise is affected by the amount of clicks generated by mid-level users.
  • the advertisement anti-cheat system filters the clicks of the high-level cheating users in various ways, for example, filtering the clicks of the high-level cheating users according to a predetermined ratio, and the hits of the high-level cheating users are a. For example, if the predetermined ratio is 70%, the click amount of the high-level cheat user after filtering is updated to a*30%. In particular, when the predetermined ratio is 100%, the hit amount of the high-level cheat user is cleared.
  • High-level cheating users are a group of users who falsify cheating apps and use them as cheating ads for cheating. Usually, high-level cheat users are concentrated on cheating apps. Regular APPs (such as social apps) do not have high-level cheat users, while users in the cheat app are all high-level cheat users. Since a single cheating user does not have a large number of clicks, it is necessary to identify the feature relevance of the cheating user group. For the APP with high-level cheating user groups, the most obvious feature is that the users in the APP have very similar feature similarity in device information and exposure, click and effect.
  • Device information related features similarity of two user equipment models, equipment residual space difference, latitude and longitude similarity, operator similarity, network similarity, etc.
  • the Tree model calculates the average similarity of the user of the application to be identified in at least one dimension.
  • the positive sample of the initial training for the gradient-enhanced regression tree model is derived from the data of the APP (including the exposure log, the click log, the effect log, the user equipment information) of the sample set of the application sample that is known to have high-level cheat users, and the initial training is not
  • the tagged samples are derived from the remaining APPs in the sample set of the application sample.
  • Positive and unlabeled sample learning Positive-Unlabeled Learning
  • the number of positive samples is continuously iterated. After the training results are stable, that is, the number of positive samples in the sample set is stable, the remaining unlabeled samples in the sample set are used as Negative sample.
  • the gradient-enhanced regression tree model is trained using positive and negative samples.
  • the trained model is used for the similarity between the users of the application to be identified, and the average similarity of the users in the application to be identified is used to determine whether the to-be-identified application has a high-level cheating user group.
  • the recognition result of the application to be identified can be updated to the sample set to continuously accumulate training samples, thereby completing the automatic correction of the gradient lifting regression tree model.
  • the advertisement anti-cheating device needs to comprehensively identify the cheat users at different levels, and accordingly, an optional flow chart of the cheat user identification by the advertisement anti-cheat system shown in FIG. 7 is performed. There are two main processes involved:
  • the blacklist policy filters the clicks of users in the blacklist
  • the pre-maintenance includes a blacklist of users who have low-level cheats, including the logo of the low-level cheating user.
  • the click log obtained from the advertisement system implementation extracts the identifier of the user who is currently clicking the advertisement, and the low-level cheating user in the blacklist. The identity of the match is matched. Once the match is successful, it is determined that the user currently clicking the advertisement is a low-level cheat user, and the click amount of the low-level cheat user is filtered.
  • the advertising anti-cheat system uses a statistical strategy to filter the portion of the low-level cheat user's clicks that does not exceed the click threshold.
  • the advertising anti-cheat system uses low-level cheating user identification strategies to identify middle-level cheat users and filters clicks on middle-level cheat users.
  • the advertising anti-cheat system utilizes high-level cheating user identification strategies. Don't cheat users at the top level and filter the traffic of high-level cheat users.
  • the advertising anti-cheat system divides the cheating users into three levels of low-level cheating users, middle-level cheating users, and high-level cheating, according to different cheating methods and abnormal behaviors of the cheating users.
  • the cheating users identify in a corresponding way, and identify the cheating users hierarchically and comprehensively, without the problem of missing identification.
  • the corresponding advertisements are filtered for the clicks of the advertisements, thereby ensuring the true reliability of the statistical advertisement effects.
  • the advertisement anti-cheat device provided by the embodiment of the present application can be implemented independently in the server, or distributed in the server cluster in the manner of the advertisement anti-cheat system, and an optional functional architecture diagram of the advertisement anti-cheat system is shown in FIG.
  • the method includes a sample module 10, an extraction module 20, a model training module 30, a model application module 40, a statistics module 50, and a penalty module 60.
  • FIG. 9 A schematic diagram of the advertisement anti-cheat system hierarchically identifying the cheating user shown in FIG. 9 is described.
  • the statistics module 50 pre-maintains the blacklist including the low-level cheat user, including the identifier of the low-level cheat user, extracts the click log of the user who is currently clicking the advertisement from the click log obtained by the advertisement system implementation, and the blacklist The identification of the low-level cheat user matches, and once the match is successful, it is determined that the user currently clicking the advertisement is a low-level cheat user (low-level cheat result).
  • the statistics module 50 collects the click log statistics obtained from the advertisement system, and when the number of times the user clicks on the advertisement exceeds the click threshold, the user is identified as a low-level cheat user.
  • the penalty module 60 filters the clicks of the low-level cheating users and feeds them back to the advertising system.
  • the amount of clicks that exceed the click threshold is filtered by a predetermined percentage, and the more clicks that exceed the click threshold, the greater the filtered ratio.
  • the penalty module 60 filters the clicks of the low-level cheating users that do not exceed the click threshold by a predetermined ratio, or all the filters clear the clicks of the low-level cheating users; generally, the offline delay is heavy.
  • the predetermined percentage used in the judgment method is larger than the predetermined ratio used in the statistical strategy, so that the number of clicks of the low-level cheating users that do not exceed the click threshold is less than the probability that the click is maliciously triggered. A larger number of clicks than the click threshold exceeds a greater degree of filtering.
  • the sample module 10 obtains a sample set, and at least one sample in the sample set includes a cheating user, and a click log of the cheating user clicking the advertisement;
  • the extraction module 20 extracts features of at least one dimension corresponding to the hierarchy of the cheat user from the samples of the sample set, wherein each cheat user corresponds to one level, and the dimensions corresponding to the different levels of cheat users are different.
  • the model training module 30 and the model application module 40 jointly implement a middle level cheat user identification strategy.
  • the model training module 30 marks the click log of the cheating user and the cheating user clicking the advertisement as a positive sample in at least one dimension, and at least based on the positive sample on the cheating user recognition model corresponding to the level of the cheating user;
  • the model application module 40 determines that the sample to be identified corresponds to the feature of at least one dimension; inputs the feature corresponding to the at least one dimension of the sample to be identified into the trained cheating user recognition model, and identifies the cheating user in the sample to be identified (middle level anti-cheating result).
  • the extraction module 20 parses the click log in the sample collection to obtain features associated with the operation of clicking on the advertisement.
  • the features associated with the operation of clicking on the advertisement include features of at least one of the following dimensions:
  • the model training module 30 performs training by inputting a positive sample into the cheating user recognition model to train the model parameters in the cheating user recognition model; testing the cheating user identification model to identify the accuracy of the cheating user, and the recognition accuracy does not reach the preset accuracy.
  • the model parameters are adjusted and processed until the accuracy of the cheating user identification model reaches the preset accuracy; wherein the cheating user's click log is a cheating user executing the click log corresponding to the advertising task in the advertising task platform.
  • the model training module 30 can also be combined with the positive sample to train with the positive sample: the non-advertising user, the click log of the non-cheating user's click advertisement corresponding to the feature of at least one dimension is marked as a negative sample; the negative sample together with the positive sample is entered as a cheating user Identifying models to train model parameters in the cheat user identification model; wherein at least one sample in the sample set includes a non-cheating user, and a non-cheating user clicks on the advertisement's click log, and the non-cheating user's click log is a non-cheating user Click the log corresponding to the ad in the app.
  • the sample in the sample set formed by the sample module 10 is an application sample corresponding to a different application, and the at least one application sample is an application corresponding to a high-level cheat user, and each application sample includes at least information of the following dimensions of the corresponding application.
  • the user of the app clicks on the click log of the ad in the app
  • the user of the app clicks on the exposure log of the ad in the app
  • the user of the app clicks on the performance log of the ad in the app
  • the extraction module 20 parses the correlation degree of at least one dimension corresponding to any two users in the application sample, and determines an average correlation degree of the application sample corresponding to the at least one dimension, wherein the average correlation degree corresponding to one dimension is any two users of the application sample.
  • the average of the relevance of the features of the corresponding dimension is any two users of the application sample.
  • the model training module 30 marks the application sample including the high-level cheat user and the average correlation of the application sample corresponding to at least one dimension as a positive sample; and inputs the positive sample into the cheat user recognition model to identify the model in the cheat user identification model.
  • the parameters are trained.
  • the model training module 30 uses the unmarked application sample in the sample set (the application sample of the high-level cheat user is unknown in the sample set), and the average correlation of the application sample corresponding to at least one dimension as the unmarked Applying the correlation degree of at least one dimension of any two users in the application sample and the unmarked application sample as the unmarked sample, and inputting the unlabeled sample together with the positive sample into the cheating user recognition model to train the model parameters of the cheating user identification model until The number of the cheated user recognition model marked as positive samples in the unmarked sample of the input cheat user identification model is in a stable state.
  • the model application module 40 obtains the correlation degree between any two users in the application to be identified outputted by the cheat user identification model and at least one dimension, and determines the average correlation degree of the user corresponding to at least one dimension in the application to be identified; when the average correlation exceeds the average When the correlation threshold is used, it is determined that the user in the application to be identified is a high-level cheat user (high-level cheat recognition result).
  • the offline delay re-judgment can cover the cheating users who identify each level, thus ensuring a comprehensive filtering of the cheating users' clicks.
  • FIG. 10 shows a schematic structural diagram of an advertisement anti-cheat system according to an embodiment of the present application, specifically:
  • the advertising anti-cheat system may include one or more processor 101 processing cores, memory 102 of one or more computer readable storage media, and the anti-cheat system structure illustrated in FIG. 10 does not constitute an anti-cheat system for advertisements.
  • the definitions may include more or fewer components than those illustrated, or some components may be combined, or different component arrangements. among them:
  • the processor 101 is a control center for the advertising anti-cheat system that performs various functions of the advertising anti-cheat system by running or executing software programs and/or modules stored in the memory 102, as well as invoking data stored in the memory 102. Process data to monitor the overall anti-cheat system.
  • the processor 101 may include one or more processing cores; preferably, the processor 101 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
  • the modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 51.
  • the memory 102 can be used to store software programs and modules, and the processor 101 executes various functional applications and data processing by running software programs and modules stored in the memory 102.
  • the memory 102 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored according to Data created by the use of the server, etc.
  • memory 102 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 102 may also include a memory controller to provide processor 101 access to memory 102.
  • the advertisement anti-cheat system may further include an input device, an RF circuit, a power supply, a display unit, a camera, a Bluetooth module, and the like, and details are not described herein again.
  • the processor 101 in the advertisement anti-cheat system loads the executable file corresponding to the process of one or more applications into the memory 102 according to the following instruction, and is executed by the processor 101.
  • the application stored in the memory 102 implements various functions as follows:
  • the integrated modules described in the embodiments of the present application may also be stored in a computer readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media containing computer usable program code, including but not limited to a USB flash drive, a mobile hard drive, a read only memory (ROM, Read-Only Memory), Random Access Memory (RAM), disk storage, CD-ROM, optical storage, and the like.
  • a USB flash drive a mobile hard drive
  • ROM read only memory
  • RAM Random Access Memory
  • disk storage CD-ROM, optical storage, and the like.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that A series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing such that instructions executed on a computer or other programmable device are provided for implementing one or more processes and/or block diagrams in the flowchart The steps of a function specified in a box or multiple boxes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种广告反作弊方法及装置,方法包括:获取样本集合,其中样本集合中的至少一个样本包括作弊用户、以及作弊用户点击广告的点击日志;从样本集合的样本中提取与作弊用户的层级对应的至少一个维度的特征,不同层级待识别的作弊用户所对应的特征不同;将作弊用户、作弊用户点击广告的点击日志对应至少一个维度的特征形成正样本,至少基于正样本对与待识别的作弊用户的层级对应的作弊用户识别模型进行训练;确定待识别的样本对应至少一个维度的特征;将待识别样本对应至少一个维度的特征输入训练后的作弊用户识别模型,识别出待识别的样本中的作弊用户。本方法能够准确识别互联网中进行广告作弊的作弊用户。

Description

广告反作弊方法,装置及存储介质
本申请要求于2016年5月24日提交中国专利局、申请号201610349338.7,发明名称为“广告反作弊方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域的互联网广告技术,尤其涉及一种广告反作弊方法,装置及存储介质。
背景技术
目前,广告主存在向用户推送广告以对产品或服务进行宣传的需求,伴随互联网用户尤其是移动互联网用户的快速增长,互联网广告成为广告投放的新的形式,互联网广告的投放量也呈现快速增长的趋势。
在互联网广告的生态系统中,流量方基于用户提供各种形式的基于互联网的服务(如提供新闻、媒体播放、在线游戏等各种形式),在用户使用服务的过程中广告系统向用户使用的服务中(如用户使用的应用,或用户访问的网页)投放广告,如果用户点击广告则使广告的点击量(也称为广告流量)增加,可见流量方基于自身所拥有的广告资源(如应用中的广告、网页中的广告位等)对广告的点击量进行消耗。
上述互联网广告的系统中存在以下问题:
流量方为了提高用户在拥有的广告资源上投放的广告的点击量,以获取更多的收入,会采用作弊的方式对广告资源上投放的广告进行点击,从而形成广告的虚假的点击量(也成为虚假广告流量),而对于准确识别作弊用户以从广告的点击量中过滤虚假的点击量,相关技术尚无有效解决方案。
发明内容
本申请实施例提供一种广告反作弊方法及装置,能够准确识别互联网中进行广告作弊的作弊用户。
本申请实施例的技术方案是这样实现的:
第一方面,本申请实施例提供一种广告反作弊方法,所述方法包括:
广告反作弊装置获取样本集合,其中,所述样本集合中的至少一个样本包括作弊用户、以及所述作弊用户点击广告的点击日志;
广告反作弊装置从所述样本集合的样本中提取与作弊用户层级对应的至少一个维度的特征,其中,每一个作弊用户对应一个层级,不同层级作弊用户所对应的特征不同;
广告反作弊装置基于所述作弊用户、所述作弊用户点击广告的点击日志在所述至少一个维度的特征形成正样本,至少基于所述正样本对与作弊用户的层级对应的作弊用户识别模型进行训练;
广告反作弊装置确定待识别的样本对应所述至少一个维度的特征;
广告反作弊装置将所述待识别样本对应所述至少一个维度的特征输入训练后的所述作弊用户识别模型,基于输出结果识别出所述待识别的样本中的作弊用户。
第二方面,本申请实施例提供一种广告反作弊装置,所述装置包括:
样本模块,用于获取样本集合,其中,所述样本集合中的至少一个样本包括作弊用户、以及所述作弊用户点击广告的点击日志;
提取模块,用于从所述样本集合的样本中提取与作弊用户的层级对应的至少一个维度的特征,其中,每一个作弊用户对应一个层级,不同层级的作弊用户所对应的特征不同;
模型训练模块,用于基于所述作弊用户、所述作弊用户点击广告的点击日志在所述至少一个维度的特征形成正样本,至少基于所述正样本对与待识别的作弊用户的层级对应的作弊用户识别模型进行训练;
模型应用模块,用于确定待识别的样本对应所述至少一个维度的特征;将所述待识别样本对应所述至少一个维度的特征输入训练后的所述作弊用户识别模型,基于输出结果识别出所述待识别的样本中的作弊用户。
第三方面,本申请实施例提供了一种计算机存储介质,用于储存为上述广告反作弊装置所用的计算机软件指令,其包含用于执行上述广告反作弊方法的步骤。
本申请实施例中,基于待识别的作弊用户的不同层级,从样本中提取相应的特征对相应层级作弊用户识别模型进行训练,从而可以利用训练后的模型对不同层级的作弊用户进行有针对性的全面的识别。
附图说明
图1-1为本申请实施例中广告反作弊装置的一个可选的架构示意图;
图1-2为本申请实施例中广告反作弊装置的一个可选的架构示意图;
图2为本申请实施例中广告反作弊装置识别低层级作弊用户的一个可选的实现示意图;
图3-1为本申请实施例中识别低层级作弊用户的一个可选的流程示意图;
图3-2为本申请实施例中识别低层级作弊用户的一个可选的流程示意图;
图4为本申请实施例中训练中层级作弊用户识别模型、以及利用中层级作弊用户识别模型识别中层级作弊用户的一个可选的实现示意图;
图5为本申请实施例中识别高层级作弊用户的一个可选的流程示意图;
图6为本申请实施例中训练高层级作弊用户识别模型、以及利用高层级作弊用户识别模型识别高层级作弊用户的一个可选的实现示意图;
图7为本申请实施例中示广告反作弊系统进行作弊用户识别的一个可选的示意图;
图8为本申请实施例中广告反作弊系统的一个可选的功能架构示意图;
图9为本申请实施例中示广告反作弊系统进行作弊用户识别的一个可选的示意图;
图10为本申请实施例提供的广告反作弊系统的结构示意图。
具体实施方式
以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所提供的实施例仅仅用以解释本申请,并不用于限定本申请。另外,以下所提供的实施例是用于实施本申请的部分实施例,而非提供实施本申请的全部实施例,在本领域技术人员不付出创造性劳动的前提下,对以下实施例的技术方案进行重组所得的实施例、以及基于对申请所实施的其他实施例均属于本申请的保护范围。
需要说明的是,在本申请实施例中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的方法或者装置不仅包括所明确记载的要素,而且还包括没有明确列出的其他要素,或者是还包括为实施方法或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的方法或者装置中还存在另外的相关要素(例如方法中的步骤或者装置中的单元)。
本申请实施例中涉及的名词和术语适用于如下的解释。
广告曝光:广告在用户侧的广告位(如用户访问的页面中的广告位、用户使用的应用中的广告位)展示,广告在用户侧展示一次称为一次广告曝光。
广告点击:用户在终端(如智能手机、平板电脑)通过点击广告而访问广告主的页面,用户点击一次广告而访问广告主的页面,称为广告点击。
广告效果:广告在被曝光后,用户点击广告从而在广告主的网页下单购买商品或下载应用,称为广告效果。
点击率:广告点击量与广告曝光次数的比值。
水军:受雇于网络公司通过点击广告、下载应用或发帖回帖等手段达到盈利或营造舆论等目的的网络人员,本申请实施例中也称为作弊用户。
广告作弊:在广告曝光、点击、效果等环节,用户出于某种恶意的目的, 存在可以提升广告曝光次数、广告点击量、广告效果的行为,这种作弊用户的恶意的行为称为广告作弊。
广告反作弊:对广告曝光、点击和效果等环节进行检查,判断广告曝光、广告点击、广告效果等是由于用户侧的正常访问触发,还是由于作弊用户通过广告作弊手段实现。
广告反作弊系统:对广告曝光、广告点击和广告效果等环节进行反作弊检查的系统。
广告反作弊策略:广告反作弊系统为打击作弊行为所使用的一系列规则,每种规则称为一种策略。
广告任务平台:仅提供广告浏览、广告点击或应用下载等有偿任务的平台,平台用户通过完成有偿任务获取积分来兑换钱或奖品,平台用户的广告点击行为与作弊用户的广告点击行为类似。
高(第一)层级作弊用户:专业的作弊用户群体,对反作弊系统了解透彻,一群高层级的作弊用户共同点击一批应用(APP),高层级作弊用户使用的应用是带壳的虚假APP,专用于供高层级作弊用户进行广告作弊,保证单个作弊用户的行为与正常用户无异,多为作弊软件伪造的用户群体。
中(第二)层级作弊用户:专业的作弊用户、对反作弊系统有所了解,长期分散地、有间歇地点击广告,多为广告任务平台的用户或职业水军。
低(第三)层级作弊用户:无组织的作弊用户、对反作弊系统了解较少,短时间内点击大量广告,多为流量方内部或周边人员。
在互联网广告的生态系统中,部分流量方为了获取更高的点击率和收入,会短期或长期自己内部或雇佣水军或诱导用户来点击自己流量上的广告。反作弊系统(本申请实施例中以反作弊装置实施为反作弊系统为例进行说明)需要识别出作弊用户并过滤作弊用户针对广告的点击量。
相关技术提供的反作弊系统能够识别比较明显的作弊用的广告作弊行为,但随着作弊用户的作弊手段的变化和深入,一些隐藏更深的作弊用户难以识别。
针对这种情况,本申请实施例提供一种广告反作弊方法、以及应用广告反 作弊方法的广告反作弊装置,广告反作弊装置可以采用各种方式来实施,以下对广告反作弊装置的实施方式进行说明。
在一个示例中,参见图1-1示出的广告反作弊装置的一个可选的架构示意图,广告反作弊装置实施为广告反作弊系统(实际应用中可以服务器或服务器集群的形式实现,可选地,以云服务的形式提供广告反作弊业务),广告反作弊系统与广告系统连接,下面对广告系统进行说明。
广告系统根据广告主设定的投放广告的定向条件(如广告受众的年龄、地域、群体、消费能力等信息)向相应的用户的终端的广告位投放广告,并根据用户对广告的点击情况,对应形成每个统计时段(如一周)的点击日志,点击日志用于记录用户针对广告的点击的各种信息如点击量、点击时间等。
另外,对于每个统计时段,广告系统还统计形成曝光日志,示例性地,曝光日志包括用户所点击的广告所曝光的对象如应用、商品等。
此外,对于每个统计时段,广告系统对应每个应用还统计形成效果日志,示例性地,效果日志包括用户点击广告后所达到的针对广告的曝光对象实现的效果。
除此之外,对于每个统计时段,广告系统对应统计用户点击广告所使用的设备的信息,如设备的硬件信息和软件信息等。
广告反作弊系统从广告系统获取用户点击广告的点击日志、效果日志、曝光日志、以及用户的设备信息等,基于上述至少一种信息进行处理形成用于识别不同层级的作弊用户的模型,进而利用不同的模型识别出不同层级的作弊用户,还可对作弊用户点击广告的点击量进行过滤处理,以确保统计到的用户侧的广告点击量的准确性。
在另一个示例中,参见图1-2,广告反作弊装置作为图1-1示出的广告系统的一个功能模块耦合入广告系统中,广告反作弊装置从广告系统获取用户点击广告的点击日志、效果日志、曝光日志、以及用户的设备信息等,基于上述至少一种信息进行处理形成用于识别不同层级的作弊用户的模型,进而利用不同的模型识别出对应层级的作弊用户,还可对作弊用户点击广告的点击量进行过 滤处理,以确保统计到的用户侧的广告点击量的准确性。
需要指出的是,图1-1和图1-2示出的广告反作弊处理装置可选的架构仅仅是示意性的,实际应用中可以根据图1-1和图1-2示出的广告反作弊处理装置进行轻易变换而以不同的方式实施。
下面结合图1-1对广告反作弊系统针对低层级作弊用户、中层级作弊用户和高层级作弊用户的识别进行说明,对于基于图1-2示出的广告反作弊装置对广告反作弊系统针对低层级作弊用户、中层级作弊用户和高层级作弊用户的识别,可以参照以下的记载而实施。
一、识别低层级作弊用户
在一些实施例中,参见图2示出的广告反作弊装置识别低层级作弊用户的一个可选的实现示意图,低层级的作弊用户的识别采用线上实时判罚以及线下延迟重判的处理方式。示例性地,线上实施判罚的处理方式包括黑名单策略和统计型策略,线下延迟重判的处理方式包括统计型策略,以下分别进行说明。
1)线上实时判罚
1.1)黑名单策略
作为线上实时判罚的一个示例,广告反作弊系统预先维护了包括有低层级作弊用户的黑名单,其中包括有低层级作弊用户的标识。广告反作弊系统从广告系统实施获取的点击日志提取出当前正在点击广告的用户的标识,与黑名单中低层级作弊用户的标识匹配,一旦匹配成功,则确定当前点击广告的用户为低层级作弊用户。
示例性地,低层级作弊用户的标识采用唯一区分用户的信息,如用户的手机号码、社交平台账号(如微信账号、QQ账号)等,当然低层级作弊用户的标识的类型不限于此,还可以采用网际协议(IP)地址、介质接入(MAC)地址等任意类型的标识。可选地,为了保证识别低层级作弊用户的准确性,可以将上述的标识的两种或多种结合使用来标定低层级作弊用户。
1.2)统计型策略
作为线上实时判罚的另一个示例,广告反作弊系统统计从广告系统获取的点击日志统计出用户在统计时段(如5分钟、1小时,实际应用中根据情况设定)中点击广告的次数,当点击广告的次数超出点击量阈值时,将用户识别为低层级作弊用户。广告反作弊系统利用对低层级的作弊用户的点击量进行过滤(判罚)并反馈至广告系统,避免广告系统利用点击量因低层级作弊用户的点击量造成的不精确的问题。
作为对低层级作弊用户的点击量进行过滤的一个示例,对超过点击量阈值后的点击量按照预定比例过滤,超出点击量阈值的点击量越多,则过滤比例越大。
例如,设用户的点击量为a,点击量阈值为b,当a大于b时,对超出点击量阈值的点击量(a-b)按照(a-b)取值空间与过滤比例的对应关系选择相应的过滤比例进行过滤,(a-b)取值空间与过滤比例的对应关系的一个示例如表1所示,
(a-b) 1000 2000
过滤比例 50% 80%
表1
从表1中可以看出,超出点击量阈值的点击量越多,则相应的过滤比例越大,从而最大程度减少广告的点击量由低层级作弊用户产生的点击量。
2)线下延迟重判
在一些实施例中,为了进一步减小广告的点击量中由低层级作弊用户产生的点击量,广告反作弊系统还采用延迟重判的方式。
示例性地,广告反作弊系统统计出从广告系统获取的点击日志统计出用户在统计时段(设定的间隔时间如5分钟、1小时,实际应用中根据情况设定)中点击广告的次数,当点击广告的次数超出点击量阈值时,将用户识别为低层级作弊用户。对低层级的作弊用户的点击量中未超出点击量阈值的点击量按照预 定比例进行过滤,或者全部过滤也就是将低层级作弊用户的点击量清零。
另外,线下延迟重判方式中使用的预定比例可以是一个固定不变的比例,或者,根据用户在统计时段的点击量动态确定(如成正比),利用根据用户在统计时段的点击量与预定比例正相关(例如正比例)的关系动态确定针对每个低层级作弊用户的预定比例,也就是用户在统计时段的点击量越大,则对用户的点击量中未超出点击量阈值部分进行过滤的预定比例越大。
假设是1小时内的点击超过20次后的点击量开始过滤,延迟重判是对前20次没有超过阈值的点击量过滤,并不会再处理超过点击量阈值的部分点击量。同时,对前20次没有超过点击量阈值(20)的点击量过滤的比例基于用户在这1小时的点击量确定。假设用户A在1小时内点击了21次,那么对前20次的过滤的比例,低于用户B在1小时内点击了100次时对用户B的前20次点击的过滤比例。
例如,设用户的点击量为a,点击量阈值为b,当a大于b时,对点击量a未超出点击量阈值的点击量也就是点击量b按照预定比例(如70%)进行过滤处理,则用户的点击量为b*(1-70%),或者将点击量b全部过滤,则用户的点击量未超出点击量阈值的点击量b被清零。
结合图2示出的针对低层级作弊用户的实时判罚和延迟重判,对于低层级作弊用户的点击量中超出点击量阈值的点击量进行实施判罚(按照比例过滤),对于低层级作弊用户的点击量中未超出点击量阈值的部分进行线下的延迟重判(按照如的固定预定比例或动态调整的预定比例进行过滤),能够最大程度减少广告的点击量中低层级作弊用户的点击量,确保广告系统最终统计到的广告点击量的准确性和可靠性,也保证针对广告主的投放广告生成准确地计费数据。
二、识别中层级作弊用户
参见图3-1示出的本申请实施例中识别低层级作弊用户的一个可选的流程示意图,包括步骤101至步骤106,以下对各步骤进行说明。
本申请实施例中广告反作弊系统使用中层级作弊用户识别模型从用户中识 别出中层级作弊用户,为此,广告反作弊系统需要形成可用的样本对中层级作弊用户识别模型进行训练,以使中层级作弊用户识别模型的识别精度达到可用的预设精度。
在一些实施例中,广告反作弊系统从广告任务平台获取样本集合(步骤101)用以形成对中层级作弊用户识别模型进行训练的样本。样本集合包括与中层级作弊用户对应的样本,样本中的一个可选的数据结构如表2所示:
样本 用户标识 点击日志
样本1 中层级作弊用户1 点击日志1
样本2 中层级作弊用户2 点击日志2
表2
参见表2,样本集合中的样本包括至少一个中层级作弊用户以及中层级作弊用户在统计时段(如一周)的点击日志,示例性地,点击日志包括中层级作弊用户点击广告的操作数据,如每次点击广告的ID、点击的时间等。
实际应用中,由于广告任务平台的用户与中层级作弊用户的点击行为最接近,因此可以将广告任务平台中完成广告任务的平台用户视为中层级作弊用户,相应地,从广告任务平台获取广告任务平台用户完成广告任务时所对应的点击日志形成样本集合。
继续对广告反作弊系统获取样本集合的处理进行说明,前述的与中层级作弊用户对应的样本用于供广告反作弊系统形成对中层级作弊用户识别模型进行训练的正样本,为了进一步提升中层级作弊用户识别模型识别中层级作弊用户的精度,在另一些实施例中,广告反作弊系统获取的样本集合中还包括与非作弊用户对应的样本,用于供广告反作弊系统形成用以训练中层级作弊用户识别模型的负样本,示例性地,非作弊用户对应的样本包括:正常应用(也就是已知未存在作弊用户的应用)的用户也即非作弊用户、以及用户在使用正常应用的过程中在应用的广告位中点击广告所对应的点击日志,与非作弊用户对应的样本的一个可选的数据结构如表3所示:
Figure PCTCN2017085687-appb-000001
表3
表3示出了非作弊用户对应的样本的一个可选的数据结构,在表3中,以应用3为正常应用为例,用户3和用户4均在各自的终端中安装了应用1,并且都在应用1的广告位中点击过广告,相应地,广告反作弊系统基于从广告系统获取的点击日志形成对应应用1中各非作弊用户(用户3和用户4)的样本。
接续对前述步骤进行说明,在广告反作弊系统获取到样本集合后,解析样本集合中的点击日志对应用户点击广告的操作数据,从操作数据中提取得到与用户点击广告的操作相关联的特征(步骤102)。
如前,在一些实施例中,当样本集合中仅包括与中层级作弊用户对应的样本时,则广告反作弊系统解析与中层级作弊用户对应样本中的点击日志,以确定与中层级作弊用户点击广告的操作关联的特征。在另一些实施例中,当样本集合中还包括与非作弊用户对应的样本时,则广告反作弊系统还解析与非作弊用户对应样本中的点击日志,以确定与非作弊用户点击广告的操作关联的特征。
在一些实施例中,与用户(中层级作弊用户或非作弊用户)点击广告的操作相关联的特征包括以下至少一个维度的特征:
1)用户在统计时段内的点击量。
示例性地,用户在统计时段内点击广告的点击量,为用户在统计时段内在任意广告位,如页面的广告、应用中广告位点击广告的次数的总量。
例如,用户在统计时段内点击了广告1、广告2和广告3共3个广告,相应的点击次数为1次、2次和3次,则用户在统计时段的点击量为6(1+2+3)。
或者,用户在统计时段内点击广告的点击量为用户在统计时段内点击同一 广告的次数的总量,用户在统计时段的第一个时间周期点击了广告1、广告2和广告3共3个广告,相应的点击次数为1次、2次和3次,用户在统计时段的第二个时间周期点击了广告1、广告2和广告3共3个广告,相应的点击次数为1次、2次和3次,则用户在统计时段对应广告1、广告2和广告3的点击量为2(1+1)、4(2+2)、6(3+3)。
2)用户在统计时段内点击过广告的时间周期的数量。
示例性地,用户点击广告的时间周期的数量为用户点击广告时所处的时间周期的数量。
以统计时段为1天,时间周期为小时为例,假设用户在1天第1/2/4/5小时均点击的广告,则用户在该统计时段内点击过广告的时间周期的数量为4。
3)用户在统计时段中点击广告的间隔时间的平均值。
以用户在统计时段的T`1/T2/T3时刻点击了广告为例,对应的平均值为(T2-T1)/2+(T3-T2)/2。
4)用户统计时段内识别的作弊用户的历史比例。
在当前统计时段的任一统计时段中,识别出的中层级作弊用户的数量与点击广告的用户(包括中层级作弊用户和非作弊用户)的历史比例,当然,历史比例也可以为当前统计时段的多个统计时段的比例的平均值。
5)用户在统计时段内所点击过广告的时间周期中点击广告的平均点击量。
仍以统计时段为1天,时间周期为小时为例,假设用户在1天第1/2/4/5小时均点击的广告,则用户在该统计时段内点击过广告的时间周期的数量为4,统计时段的点击量为12(1+2+4+5),在4个时间周期点击广告的平均点击量为3(12/4)。
需要指出的是,在本申请实施例中使用的与用户点击广告的操作相关联的特征不仅限于以上所示,本领域的技术人员可以轻易对上述与用户点击广告的操作相关联的特征进行变形或延伸,从而实施出不同于上述与用户点击广告的操作相关联的特征。
接续对前述步骤进行说明,在广告反作弊系统从与中层级作弊用户对应样 本中提取出至少一个维度特征后,则可以形成用于训练中层级作弊用户识别模型的正样本,示例性地,广告反作弊系统将作弊用户、作弊用户点击广告的点击日志在至少一个维度的特征标记为正样本(步骤103)。
在一些实施例中,若广告反作弊系统还从非作弊用户对应样本中提取出至少一个维度的特征,则广告反作弊系统可以形成用于训练中层级作弊用户识别模型的负样本,示例性地,参见图3-2示出的本申请实施例中识别低层级作弊用户的一个可选的流程示意图,广告反作弊系统将非作弊用户、非作弊用户点击广告的点击日志在至少一个维度的特征标记为负样本(步骤107)。
接续对前述步骤进行说明,当广告反作弊系统形成用于训练中层级作弊用户识别模型的正样本后,将正样本输入中层级作弊用户识别模型以对中层级作弊用户识别模型的模型参数进行训练(步骤104)。在一些实施例中,若广告反作弊系统还形成了用于训练中层级作弊用户识别模型的负样本,则将负样本连同正样本共同输入待训练的中层级作弊用户识别模型,以提升中层级作弊用户识别模型的识别精度,缩短训练过程。
以下对利用样本(正样本和负样本)对中层级作弊用户识别模型的训练进行说明,中层级作弊用户识别模型可以视为由一系列函数形成的从所提取的至少一个维度的特征到用户的识别结果(是否为中层级作弊用户)的映射,一个可选的示例为:
识别结果=f(a*特征1+b*特征2);
其中,特征1和特征2为用于训练的样本(正样本和负样本之一)的特征,模型参数a、b用于控制特征1、特征2的权重,中层级作弊用户识别模型的训练过程就是不断优化调整模型参数a/b的过程,实际应用中模型参数的数量可以为两个或多个,且使用的特征的数量也不存在限制。
那么,在一个实施例中,为了验证中层级作弊用户识别模型的识别精度是否达到实用需求,广告反作弊系统可以利用先验的数据库(其中包括作弊用户、非作弊用户、以及点击日志的特征)测试中层级作弊用户识别模型的识别作弊 用户的精度(也即是正确率),识别精度未达到预设精度时,利用对模型参数进行调整处理,直至中层级作弊用户识别模型的精度达到预设精度。
接续对前述的步骤进行说明,在广告反作弊系统训练中层级作弊用户识别模型之后,则可以利用训练后的中层级作弊用户识别模型识别中层级作弊用户。广告反作弊系统从广告系统获取待识别的样本(步骤105),待识别的样本数据结构可以参照前述表2和表3,包括待识别用户以及待识别用户的点击日志,广告反作弊系统从待识别样本中提取对应前述至少一个维度的特征,输入训练后的作弊用户识别模型,基于中层级作弊用户识别模型输出的识别结果(是否为中层级作弊用户)确定待识别的样本中的中层级作弊用户(步骤106)。
在一些实施例中,参见图3-2,当广告反作弊系统从待识别样本中识别出中层级作弊用户后,还对中层级作弊用户的点击量进行过滤(步骤108),并将过滤后的中层级作弊用户的点击量更新至广告系统(步骤109),使广告系统的计费端利用更新后的广告的点击量结合计费策略进行广告投放的计费,由于在广告的点击量已经对中层级作弊用户的点击量进行了过滤,确保了广告的点击量是由用户的常规点击操作形成的,保证广告点击量的准确性和真实性,避免了对广告主的广告计费不准确的问题。
示例性地,广告反作弊系统对中层级作弊用户的点击量进行过滤时有多种方式,以下结合不同过滤方式进行说明。
过滤方式1)按照预定比例对中层级作弊用户的点击量进行过滤,以中层级作弊用户的点击量为a,预定比例为70%为例,则过滤后中层级作弊用户的点击量被更新为a*30%,特别地,当预定比例为100%时,中层级作弊用户的点击量被清零。
过滤方式2)将中层级作弊用户的点击量中未超出点击量阈值的点击量按照比例进行过滤,或者全部过滤也就是将中层级作弊用户的点击量清零;将中层级作弊用户的点击量中超过点击量阈值后的点击量按照预定比例过滤,超出点击量阈值的点击量越多,则过滤比例越大。
再结合图4示出的训练中层级作弊用户识别模型、以及利用中层级作弊用户识别模型识别中层级作弊用户的一个可选的实现示意图,包括模型训练和模型使用两个阶段,下面分别进行说明。
1)模型训练
训练的正样本来源于广告任务平台的点击日志,训练的负样本来源于正常功能APP(已知未存在作弊用户的APP)的点击日志,
中层级作弊用户长期分散的、有间歇的点击广告。
针对这一特性,基于正样本和负样本提取了6个特征:一周的点击量、一周点击过广告的天数、一周点击过广告的小时数、一周平均的相邻点击时间差、一周线上识别的作弊比例、一周的点击量与一周点击过广告的小时数的比例。基于这6个特征,训练逻辑斯蒂回归(Logistic Regression)模型来判断用户是否是中层级作弊用户。
2)模型使用
训练得到逻辑斯蒂回归模型的模型参数之后,基于从广告系统获取的待识别用户的一周内点击广告的点击日志,提取待识别用户的一周内点击广告的点击日志的特征,并选取出如上的6个特征输入逻辑斯蒂回归模型,逻辑斯蒂回归(Logistic Regression)模型输出待识别用户是中层级作弊用户还是正常用户(非作弊用户)的作弊识别结果。
三、识别高层级作弊用户
发明人在实施本申请实施例的过程中发现,高层级作弊用户使用(如开发)特定的应用来产生虚假的流量,该特定应用本身并不具有为用户提供服务(如媒体服务、社交服务)的功能,仅仅是利用自身封装的程序模拟不同的用户来点击特定流量方的广告位中的广告,以产生虚假的流量,也就是说该特定应用是专用于产生虚假流量的应用,其中的用户全部是高层级作弊用户。一旦能够识别出一个应用是高层级作弊用户所使用的特定应用,则可将该特定应用中的全部用户都识别为高层级作弊用户。
另外,发明人在实施本申请实施例的过程中发现,高层级作弊用户在使用特定应用进行广告作弊的过程中,所模拟的点击广告的用户在很多维度的特征非常接近,也就是相关度很高,而正常用户(非作弊用户)在不同维度的特征则具有离散的特点,也就是相关度很低。
基于此,本申请实施例中识别高层级作弊用户时以应用为单位,对应用中的用户是否为高层级作弊用户进行整体的一次性识别:对待识别的应用中的全部用户在多个维度的相似程度进行判断,一旦相似度较高则将该待识别的应用识别为高层级作弊用户所使用的特定应用,相应地,将该识别样本应用中的全部用户识别为高层级作弊用户,下面结合流程图进行说明。
参见图5示出的本申请实施例中识别高层级作弊用户的一个可选的流程示意图,以下对各步骤进行说明。
本申请实施例中广告反作弊系统使用高层级作弊用户识别模型识别高层级作弊用户,为此,广告反作弊系统需要形成可用的样本对高层级作弊用户识别模型进行训练,如前,对于高层级作弊用户的识别是以应用为单位(对一个应用的用户是否为高层级作弊用户进行一次性识别),相应地,广告反作弊系统获取以应用为单位的样本(简称为应用样本)构成的样本集合(步骤201),样本集合中的每个应用样本与一个应用对应,并且,至少一个应用样本与已知存在高作弊用户的应用对应,以供广告反作弊系统用以形成对高层级作弊用户识别模型进行训练的正样本。另外,可选地,样本集合中还可以包括未知是否存在高层级作弊用的应用对应的应用样本,称为无标记的应用样本。
在一些实施例中,应用样本中包括与应用对应的各种信息,应用样本的一个可选的数据结构如表4所示:
Figure PCTCN2017085687-appb-000002
表4
如表4所示,示例性地,每个应用样本与一个应用对应,包括所对应应用的以下信息至少之一:
1)应用的每个用户在应用中点击广告的点击日志。
点击日志从不同的记录应用中的每个用户点击广告的操作的相关信息。示例性地,点击日志包括以下信息:
1.1)用户在统计时段在应用的广告位中点击的广告。
用户点击的广告以广告系统侧为广告分配的序列号(ID)来区分,或者,以广告系统侧的为广告分配的类别标签来区分。
示例性地,用户在统计时间段点击的广告,可以为用户在应用的所有广告位中的点击的广告的记录,如采用广告1、广告2、广告3这样的形式记录。
又或者,用户在统计时间段点击的广告为用户在应用的不同广告位中点击的广告的记录,如采用这样的方式来记录:广告位1-广告1-广告2、广告2-广告3-广告4。
1.2)用户在统计时段内在应用的广告位所点击广告的点击量。
示例性地,用户在统计时段内在应用的广告位中点击广告的点击量,为用户在统计时段内在应用的广告位点击广告的次数的总量。
例如,用户在统计时段(如一周)在应用的广告位内点击了广告1、广告2和广告3共3个广告,相应的点击次数为2次、3次和4次,则在统计时段的点击量为9(2+3+5)。
示例性地,用户在统计时段内在应用的广告位点击广告的点击量,还可以是用户在统计时段内在应用的广告位点击同一广告的次数的总量,又或者,为用户在统计时段(如一周)的各个时间周期(小于统计时段,如一天或一小时)内在应用的广告位点击同一广告的次数的总量。
例如,用户在统计时段的第一个时间周期在应用的广告位点击了广告1、广告2和广告3共3个广告,相应的点击次数为2次、3次和4次,用户在统计时段的第二个时间周期在应用的广告位点击了广告1、广告2和广告3共3个广告,相应的点击次数为2次、3次和4次,则用户在统计时段对应广告1、广 告2和广告3的点击量为4(2+2)、6(3+3)、8(4+4)。
1.3)用户在统计时段内在应用中的广告位点击广告的时间。
示例性地,用户在统计时段内在应用中的广告位点击广告的时间,为用户在统计时段内在应用的广告位点击广告的总的时长。
例如,假设应用中具有广告位1和广告位2两个广告位,在统计时段内,用户在广告位1中点击广告的时长为T1,用户在广告位2中点击广告的时长为T2,则用户在统计时段内在应用的广告位点击广告的总的时长为T1+T2。
或者,用户在统计时段内在应用中的广告位点击广告的时间,也可以为用户在每个广告位点击广告的时长,如前述的用户在广告位1点击广告的时长T1,以及用户在广告位2点击广告的时长T2。
1.4)用户在统计时段内点击广告的广告位的类型。
以应用中的广告位为例,广告位的类型包括:
开屏广告位,在应用开启画面后在应用的内容加载前,应用的界面中用户显示广告的位置。
插屏广告位,在应用的内容加载的过程在应用的界面中插入广告的位置。
Banner广告位,应用中用户停留较久(停留时间超出停留时间阈值)的页面,或应用中用户访问比较频繁的页面中用于呈现广告的位置,如页面的边缘(顶部区域、底部区域等)。
2)应用的每个用户在应用的广告位中点击广告的曝光日志。
曝光日志用以记录应用的每个用户在应用的广告位点击的广告所曝光的对象,如应用的名称、商品的名称、页面的地址等。
3)应用的每个用户在应用中点击广告的效果日志。
如前,效果日志包括应用中的每个用户点击广告后针对广告的曝光对象所达到的广告效果。
以广告的曝光对象为应用为例,广告效果可以为以下之一:用户开始下载应用;应用下载完成;应用在用户的设备安装;应用在用户的设备激活使用;用户在用户的设备中删除了应用。
再以广告的曝光对象为在线销售的商品为例,效果日志中记录的针对广告的广告效果可以为:用户针对商品下订单;用户支付订单;用户撤销订单。
4)应用的每个用户所使用的设备的信息。
示例性地,设备的信息可以为设备的硬件信息如设备的型号、设备剩余空间、设备的剩余电量等。
当然,设备的软件信息可以为设备所使用的通信运营商、设备使用的操作系统(类型和型号)和设备的联网方式等信息。另外设备的信息还可以是设备的位置(如经纬度)等、设备的移动速度等信息。
需要指出的是,在本申请实施例中使用的与应用样本所包括的信息不仅限于以上所示,本领域的技术人员可以轻易对上述应用样本包括的信息进行变形或延伸,从而实施出不同于上述应用样本所包括的信息,这里不再一一说明。
接续对前述步骤进行说明,在广告反作弊系统获取到样本集合后,对于每个应用样本,广告反作弊系统解析出应用样本中任意两个用户在至少一个维度的特征的相关度(步骤202),特征所采用的维度根据应用样本中所包括的信息的类型选取,以下对不同维度的特征的相关度举例说明。
在一些实施例中,可以采用如下维度的特征的相关度:
1)应用中任意两个用户在应用中点击广告的操作的特征的相关度。
用户在应用的广告位中点击广告的特征可以采用如用户在应用中点击的位置(或频率)、下载广告所曝光应用的次数和访问广告所曝光网页的次数等。
2)应用中任意两个用户在应用样本中点击的广告所曝光的对象的相关度。
3)应用中任意两个用户点击广告所使用的设备的信息的相关度。
应用的用户所使用设备的相关度可以采用硬件信息、软件信息,设备的位置、设备的移动速度等维度的相关度。
以硬件信息的相关度为例,可以采用用户使用的设备在设备剩余空间、设备的剩余电量等方面的差值的相关度。
4)应用中任意两个用户点击所点击广告的广告效果的相关度。
对于样本集合中的样本应用,若样本应用为已知存在高层级作弊用户的应用,那么该应用中任意两个用户在上述维度的相似度均为100%。若样本应用为未知是否存在高层级作弊用户的应用,则该应用任意两个用户在上述维度的相似度均为0%。
例如,设已知一个存在高层级作弊用户的APP,将APP内的任意两个用户的组合、以及这两个用户的相似度作为正样本,且这个APP内用户两两之间的相似度总是100%。假设APP内存在4个用户A、B、C、D,则一共有6条正样本,即为:(A,B:100%;A,C:100%;A,D:100%;B,C:100%;B,D:100%;C,D:100%)。
对于未知是否存在高层级作弊用户的APP,把这个APP内的任意两个用户、以及这两个用户的相似度作为无标记样本,且这个APP内任意两个用户之间的相似度总是0%。假设APP内存在4个用户A、B、C、D,则一共有6条正样本,即为:(A,B:0%;A,C:0%;A,D:0%;B,C:0%;B,D:0%;C,D:0%)。
接续对前述步骤进行说明,对于每个样本应用,在广告反作弊系统解析出任意样本应用的任意两个用户在至少一个维度的特征的相关度之后,将已知包括有高层级作弊用户的应用样本、以及应用样本任意两个用户对应至少一个维度的相关度标记为正样本(步骤203),将正样本输入作弊用户识别模型以对作弊用户识别模型中的模型参数进行训练(步骤204)。
在一个实施例中,广告反作弊系统还利用样本集合中无标记应用样本中任意两个用户、以及这两个用户在上述维度的相似度(0%)形成对高层级作弊用户识别模型进行训练的无标记样本,将无标记样本连同正样本输入高层级作弊用户识别模型(步骤210),基于高层级作弊用户识别模型通过迭代的方式选取无标记样本标记为正样本以增加正样本的数量,当样本集合中的被标记为正样本的应用样本的数量稳定(多次迭代后正样本的数量不再增加)后,将样本集合中剩余的无标记的应用样本标记为负样本,其中负样本中任意两个用户的相关度为0%。
高层级作弊用户识别模型可以视为由一系列函数形成,目的在于构成从输 入应用样本到应用样本的平均相关度的映射,一个可选的示例为:
应用样本的平均相关度=f(c*特征3+b*特征4);
其中,特征3和特征4为用于训练的样本(正样本和负样本之一)的特征,模型参数a、b用于控制特征3、特征3的权重,高层级作弊用户识别模型的训练过程就是不断优化调整模型参数c/d,使输出的平均相似度更加精确的过程,实际应用中模型参数的数量可以为两个或多个,且使用的特征的数量也不存在限制。
接续对前述步骤进行说明,在广告反作弊系统对高层级作弊用户识别模型训练完成后,将待识别应用上述至少一个维度的特征输入高层级作弊用户识别模型(步骤205),获取作弊用户识别模型输出的待识别应用中的用户与至少一个维度对应的相关度,将任意两个用户在至少一个维度的特征的相关度取平均值,得到应用样本与至少一个维度对应的平均相关度(步骤206)。
以应用1的用户包括用户1、用户2和用户3为例,在设备信息相似度维度任意两个用户的相似度设为s1、s2和s3,则应用1在设备信息相似维度的平均相似度为(s1+s2+s3)/3。
基于平均相关度进行高层级作弊用户的识别:(步骤207):将平均相关度与平均相关度阈值进行比较,若输出的平均相关度高于平均相关度阈值,表明待识别应用中用户的特征极其接近,将待识别应用判定为高层级作弊用户进行广告作弊所使用的应用,待识别应用中的全部用户均识别为高层级作弊用户。从而,对待识别应用的用户是否为高层级作弊用户进行一次性地高效判决。
在一些实施例中,当广告反作弊系统从待识别样本中识别出高层级作弊用户后,还对高层级作弊用户的点击量进行过滤(步骤208),并将过滤后的高层级作弊用户的点击量更新至广告系统(步骤209),使广告系统的计费端利用更新后的广告的点击量结合计费策略进行广告投放的计费,由于在广告的点击量已经对高层级作弊用户的点击量进行了过滤,确保了广告的点击量是由用户的常规点击操作形成的,从而确保了广告点击量的准确性和真实性,避免针对广 告主投放广告的计费数据的精确性因中层级用户的产生的点击量而受到影响。
示例性地,广告反作弊系统对高层级作弊用户的点击量进行过滤时有多种方式,例如,按照预定比例对高层级作弊用户的点击量进行过滤,以高层级作弊用户的点击量为a,预定比例为70%为例,则过滤后高层级作弊用户的点击量被更新为a*30%,特别地,当预定比例为100%时,高层级作弊用户的点击量被清零。
再结合图6示出的训练高层级作弊用户识别模型、以及利用高层级作弊用户识别模型识别高层级作弊用户的一个可选的实现示意图。
高层级作弊用户是伪造作弊APP并使用作弊APP进行广告作弊的用户群体,通常在高层级作弊用户在作弊APP上具有集中性。常规的APP(如社交APP)不存在高层级作弊用户,而作弊APP中的用户全都是高层级作弊用户。由于单个作弊用户的点击次数不多,需要利用作弊用户群体的特征相关度来识别。对于存在高层级作弊用户群体的APP,其最明显的特征是:该APP内的用户在设备信息以及曝光、点击和效果方面的特征相似度非常高。针对这一特性,将待识别的APP内用户全部设备信息和所有曝光日志、点击日志和效果日志综合在一起,提取不同维度的特征计算用户之间的特征的相似度。然后,根据APP内用户的平均相似度与预设的平均相似度阈值来判断该APP内是否存在高层级作弊用户群体,预设的平均相似度阈值可以从对常规APP的用户之间的特征的相似度平均值。
在计算任意两个用户之间的相似度时,使用的特征如下:
设备信息相关的特征:两个用户设备型号的相似度、设备剩余空间差值、经纬度相似度、运营商相似度、联网方式相似度等特征;
曝光、点击和效果相关的特征:两个用户曝光APP的相似度、曝光次数的差值、点击APP的相似度、点击次数的差值、点击坐标的相似度、下载APP的相似度、下载次数的差值等特征。
基于上述特征训练了一个梯度提升回归树(Gradient Boosting Regression  Tree)模型来计算待识别应用的用户在至少一个维度的平均相似度。
对梯度提升回归树模型来初始训练的正样本来源于应用样本的样本集合中已知存在高层级作弊用户的APP的数据(包括曝光日志、点击日志、效果日志、用户设备信息),初始训练无标记的样本来源于应用样本的样本集合中剩余的APP。通过正例和无标记样本学习(Positive-Unlabeled Learning)的方法不断迭代增加正样本的数量,训练结果稳定后,也就是样本集合中正样本的数量稳定后,样本集合中剩余的无标记样本就作为负样本。利用正样本和负样本对梯度提升回归树模型进行训练。
训练得到的模型用于待识别应用的用户之间的相似度,根据待识别应用内用户的平均相似度来判断该待识别应用是否存在高层级作弊用户群体。对待识别应用的识别结果可以更新至样本集合不断累积训练样本,从而完成对梯度提升回归树模型的自动修正。
在一些实施例中,考虑到广告反作弊装置需要对不同层级的作弊用户进行权全面地识别,相应地,参见图7示出的广告反作弊系统进行作弊用户识别的一个可选的流程示意图,主要包括有两个流程:
1)线上实时判罚:黑名单策略过滤黑名单中用户的点击;
预先维护包括有低层级作弊用户的黑名单,其中包括有低层级的作弊用户的标识,从广告系统实施获取的点击日志提取出当前正在点击广告的用户的标识,与黑名单中低层级作弊用户的标识匹配,一旦匹配成功,则确定当前点击广告的用户为低层级作弊用户,并对低层级作弊用户的点击量进行过滤。
2)线下延迟重判:
2.1)如前第一部分章节,广告反作弊系统利用统计型策略对低层级作弊用户的点击量中未超出点击量阈值的部分进行过滤。
2.2)如前第二部分章节,广告反作弊系统利用低层级作弊用户识别策略识别中层级作弊用户,并过滤中层级作弊用户的点击量。
2.3)如前第三部分章节,广告反作弊系统利用高层级作弊用户识别策略识 别高层级作弊用户,并过滤高层级作弊用户的点击量。
从图7中可以看出,广告反作弊系统根据作弊用户不同的作弊手段和异常行为,将作弊用户分为低层级作弊用户、中层级作弊用户和高层级作弊用三个层级,对于每种层级的作弊用户采用对应的方式进行识别,对作弊用户进行分层次地、全面性地识别,不存在遗漏识别的问题。同时,对于识别出的作弊用户针对广告的点击量采用相应的广告进行过滤,确保了统计的广告效果的真实可靠性。
本申请实施例提供的广告反作弊装置可以独立实施于服务器中,抑或是以广告反作弊系统的方式分散实施于服务器集群中,广告反作弊系统的一个可选的功能架构示意图如图8所示,包括:样本模块10、提取模块20、模型训练模块30、模型应用模块40、统计模块50和判罚模块60。
结合图9示出的广告反作弊系统分层级识别作弊用户的示意图进行说明。
一、低层级作弊用户识别
1)线上实时判罚
1.1)黑名单策略
统计模块50预先维护了包括有低层级作弊用户的黑名单,其中包括有低层级的作弊用户的标识,从广告系统实施获取的点击日志提取出当前正在点击广告的用户的标识,与黑名单中低层级作弊用户的标识匹配,一旦匹配成功,则确定当前点击广告的用户为低层级作弊用户(低层级作弊结果)。
1.2)统计型策略
统计模块50从广告系统获取的点击日志统计,当统计出用户点击广告的次数超出点击量阈值时,将用户识别为低层级作弊用户。
判罚模块60对低层级的作弊用户的点击量进行过滤并反馈至广告系统。在一个示例中,对超过点击量阈值后的点击量按照预定比例过滤,超出点击量阈值的点击量越多,则过滤比例越大。
1.3)线下延迟重判
2)线下延迟重判
判罚模块60对低层级的作弊用户的点击量中未超出点击量阈值的点击量按照预定比例进行过滤,或者全部过滤也就将低层级作弊用户的点击量清零;一般地,线下延迟重判方式中使用的预定比例大于统计型策略中所使用的预定比例,从而对低层级作弊用户的点击量中未超出点击量阈值的部分点击量(这部分点击量被恶意触发产生的概率较未超出点击量阈值的部分点击量更大)进行更大程度过滤。
二、中层级作弊用户识别
样本模块10获取样本集合,样本集合中的至少一个样本包括作弊用户、以及作弊用户点击广告的点击日志;
提取模块20从样本集合的样本中提取与作弊用户的层级对应的至少一个维度的特征,其中,每一个作弊用户对应一个层级,不同层级作弊用户所对应的维度不同。
模型训练模块30和模型应用模块40共同实施中层级作弊用户识别策略。
模型训练模块30将作弊用户、作弊用户点击广告的点击日志在至少一个维度的特征标记为正样本,至少基于正样本对与作弊用户的层级对应的作弊用户识别模型进行训练;
模型应用模块40确定待识别的样本对应至少一个维度的特征;将待识别样本对应至少一个维度的特征输入训练后的作弊用户识别模型,识别出待识别的样本中的作弊用户(中层级反作弊结果)。
提取模块20解析样本集合中的点击日志对应得到与点击广告的操作相关联的特征。
其中,与点击广告的操作相关联的特征包括以下至少一个维度的特征:
在统计时段内的点击量;
在统计时段内点击过广告的时间周期的数量;
在统计时段中点击广告的间隔时间的平均值;
统计时段内识别的作弊用户的历史比例;
统计时段内所点击过广告的时间周期中点击广告的平均点击量。
模型训练模块30采用如下方式进行训练:将正样本输入作弊用户识别模型以对作弊用户识别模型中的模型参数进行训练;测试作弊用户识别模型的识别作弊用户的精度,识别精度未达到预设精度时对模型参数进行调整处理,直至作弊用户识别模型的精度达到预设精度;其中,作弊用户的点击日志为作弊用户执行广告任务平台中广告任务所对应的点击日志。
模型训练模块30还可结合负样本与正样本共同训练:将非广告作弊用户、非作弊用户的点击广告的点击日志对应至少一个维度的特征标记为负样本;将负样本连同正样本输入作弊用户识别模型以对作弊用户识别模型中的模型参数进行训练;其中,样本集合中的至少一个样本包括非作弊用户、以及非作弊用户点击广告的点击日志,非作弊用户的点击日志为非作弊用户在应用中点击广告所对应的点击日志。
三、高层级作弊用户识别
样本模块10形成的样本集合中的样本为与不同的应用对应的应用样本,至少一个应用样本为已知存在高层级作弊用户的应用对应,每个应用样本包括所对应应用的以下维度的信息至少之一:
应用的用户在应用中点击广告的点击日志;
应用的用户在应用中点击广告的曝光日志;
应用的用户在应用中点击广告的效果日志;
应用的用户所使用的设备的信息。
提取模块20解析应用样本中任意两个用户对应至少一个维度的相关度;确定应用样本与至少一个维度对应的平均相关度,其中,与一个维度对应的平均相关度为应用样本的任意两个用户对应维度的特征的相关度的平均值。
模型训练模块30将已知包括有高层级作弊用户的应用样本、以及应用样本对应至少一个维度的平均相关度标记为正样本;将正样本输入作弊用户识别模型以对作弊用户识别模型中的模型参数进行训练。
模型训练模块30将样本集合中无标记应用样本(样本集合中未知存在高层级作弊用户的应用样本)、应用样本对应至少一个维度的平均相关度作为无标记 应用样本、无标记应用样本中任意两个用户对应至少一个维度的相关度作为无标记样本,将无标记样本连同正样本输入作弊用户识别模型,以对作弊用户识别模型的模型参数进行训练,直至输入作弊用户识别模型的无标记样本中被作弊用户识别模型标记为正样本的数量处于稳定状态。
模型应用模块40获取作弊用户识别模型输出的待识别应用中任意两个用户与至少一个维度对应的相关度,确定待识别应用中用户与至少一个维度对应的平均相关度;当平均相关度超出平均相关度阈值时判定待识别应用中的用户为高层级作弊用户(高层级作弊识别结果)。
综上,本申请实施例具有以下有益效果:
1)在线下采用延迟处理的方式对中层级作弊用户进行识别,即采用逻辑斯蒂回归模型在线下识别中层级作弊用户,并对中层级作弊用户的点击量进行过滤,确保统计的广告的点击量的准确性;
2)在线下采用延迟处理的方式对高层级作弊用户进行识别,即采用梯度提升回归树模型识别高层级作弊用户,并对高层级作弊用户的点击量进行过滤,确保统计的广告的点击量的准确性;
3)线上实时判罚的方式,能够对低层级作弊用户进行实时识别,并对低层级作弊用户的点击量进行实时过滤,保证了在需要实时获取广告点击量的需求场景中,能够对低层级作弊用户的点击量进行有效过滤。同时,
4)线下延迟重判能够覆盖识别每个层级的作弊用户,从而保证了对作弊用户的点击进行全面过滤。
如图10所示,其示出了本申请实施例所涉及的广告反作弊系统的结构示意图,具体来讲:
该广告反作弊系统可以包括一个或者一个以上处理核心的处理器101、一个或一个以上计算机可读存储介质的存储器102、图10中示出的广告反作弊系统结构并不构成对广告反作弊系统的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:
处理器101是该广告反作弊系统的控制中心,通过运行或执行存储在存储器102内的软件程序和/或模块,以及调用存储在存储器102内的数据,执行广告反作弊系统的各种功能和处理数据,从而对广告反作弊系统进行整体监控。可选的,处理器101可包括一个或多个处理核心;优选的,处理器101可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器51中。
存储器102可用于存储软件程序以及模块,处理器101通过运行存储在存储器102的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器102可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据服务器的使用所创建的数据等。此外,存储器102可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器102还可以包括存储器控制器,以提供处理器101对存储器102的访问。
尽管未示出,广告反作弊系统还可以包括输入装置,RF电路,电源,显示单元,摄像头、蓝牙模块等,在此不再赘述。具体在本实施例中,广告反作弊系统中的处理器101会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行文件加载到存储器102中,并由处理器101来运行存储在存储器102中的应用程序,从而实现各种功能,如下:
获取样本集合,其中,所述样本集合中的至少一个样本包括作弊用户、以及所述作弊用户点击广告的点击日志;
从所述样本集合的样本中提取与作弊用户层级对应的至少一个维度的特征,其中,每一个作弊用户对应一个层级,不同层级作弊用户所对应的特征不同;
基于所述作弊用户、所述作弊用户点击广告的点击日志对应所述至少一个维度的特征形成正样本,至少基于所述正样本对与作弊用户的层级对应的作弊 用户识别模型进行训练;
确定待识别的样本对应所述至少一个维度的特征;
将所述待识别样本对应所述至少一个维度的特征输入训练后的所述作弊用户识别模型,基于输出结果识别出所述待识别的样本中的作弊用户。
以上各操作的实现方法具体可参见上述实施例,此处不再赘述。
本申请实施例所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质上实施的计算机程序产品的形式,所述存储介质包括但不限于U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘存储器、CD-ROM、光学存储器等。
本申请是根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使 得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请的实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括实施例以及落入本申请范围的所有变更和修改。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (19)

  1. 一种广告反作弊方法,其特征在于,所述方法包括:
    广告反作弊装置获取样本集合,其中,所述样本集合中的至少一个样本包括作弊用户、以及所述作弊用户点击广告的点击日志;
    广告反作弊装置从所述样本集合的样本中提取与作弊用户层级对应的至少一个维度的特征,其中,每一个作弊用户对应一个层级,不同层级作弊用户所对应的特征不同;
    广告反作弊装置基于所述作弊用户、所述作弊用户点击广告的点击日志对应所述至少一个维度的特征形成正样本,至少基于所述正样本对与作弊用户的层级对应的作弊用户识别模型进行训练;
    广告反作弊装置确定待识别的样本对应所述至少一个维度的特征;以及
    广告反作弊装置将所述待识别样本对应所述至少一个维度的特征输入训练后的所述作弊用户识别模型,基于输出结果识别出所述待识别的样本中的作弊用户。
  2. 根据权利要求要求1所述的方法,其特征在于,
    所述样本集合中的样本为与应用对应的应用样本,至少一个所述应用样本为已知存在第一层级作弊用户的应用,每个所述应用样本包括所对应应用的以下维度的信息至少之一:
    所述应用的用户在所述应用中点击广告的点击日志;
    所述应用的用户在所述应用中点击广告的曝光日志;
    所述应用的用户在所述应用中点击广告的效果日志;以及
    所述应用的用户所使用的设备的信息;
    所述从样本中提取与待识别的作弊用户的层级对应的至少一个维度的特征,包括:
    解析出所述应用样本中所包括的用户、以及所述应用样本中任意两个用户 在以下至少一个维度的相关度:
    所述应用中任意两个用户在应用中点击广告的操作的特征的相关度;
    所述应用中任意两个用户在所述应用中点击的广告所曝光对象的相关度;
    所述应用中任意两个用户点击广告所使用的设备的信息的相关度;以及
    所述应用中任意两个用户点击所点击广告的广告效果的相关度。
  3. 根据权利要求要求2所述的方法,其特征在于,所述基于所述作弊用户、所述作弊用户点击广告的点击日志在所述至少一个维度的特征形成所述正样本,至少基于所述正样本对与待识别的作弊用户的层级对应的作弊用户识别模型进行训练,包括:
    将已知包括有所述第一层级作弊用户的所述应用样本、以及所述应用样本中任意两个用户在所述至少一个维度的特征的相关度标记为所述正样本;以及
    将所述正样本输入第一层级作弊用户识别模型,基于输入的正样本对所述第一层级作弊用户识别模型中的模型参数进行训练。
  4. 根据权利要求要求2所述的方法,其特征在于,所述基于所述作弊用户、所述作弊用户点击广告的点击日志在所述至少一个维度的特征标记正样本,至少基于所述正样本对与待识别的作弊用户的层级对应的作弊用户识别模型进行训练,包括:
    基于所述样本集合中的无标记应用样本、所述无标记应用样本中任意两个用户对应至少一个所述维度的相关度形成无标记样本,基于所述无标记样本和所述正样本对第一层级作弊用户识别模型的模型参数进行训练,直至,输入所述第一层级作弊用户识别模型的所述无标记样本中被所述第一层级作弊用户识别模型标记为正样本的数量处于稳定状态;
    其中,所述无标记应用样本为所述样本集合中未知存在高层级作弊用户的应用样本。
  5. 根据权利要求要求2所述的方法,其特征在于,所述待识别样本为待识别应用;所述基于输出结果识别出所述待识别的样本中的作弊用户,包括:
    获取第一层级作弊用户识别模型输出的所述待识别应用中任意两个用户在 至少一个所述维度对应的相关度,确定待识别应用中用户在至少一个所述维度对应的平均相关度;以及
    当所述平均相关度超出平均相关度阈值时判定所述待识别应用中的用户为第一层级作弊用户。
  6. 根据权利要求要求1所述的方法,其特征在于,所述从样本集合的样本中提取与待识别的作弊用户的层级对应的至少一个维度的特征,包括:
    提取出所述样本集合中的点击日志中对应点击广告的操作数据;
    解析所提取的操作数据对应得到与点击广告的操作相关联的特征;
    其中,与点击广告的操作相关联的特征包括以下至少一个维度的特征:
    在统计时段内的点击量;
    在所述统计时段内点击过广告的时间周期的数量;
    在所述统计时段中点击广告的间隔时间的平均值;
    所述统计时段内识别的作弊用户的历史比例;以及
    所述统计时段内所点击过广告的时间周期中广告的平均点击量。
  7. 根据权利要求要求6所述的方法,其特征在于,所述至少基于所述正样本对与待识别的作弊用户的层级对应的作弊用户识别模型进行训练,包括:
    将所述正样本输入第二层级作弊用户识别模型,基于输入的正样本对所述第二层级作弊用户识别模型中的模型参数进行训练;以及
    测试所述第二层级作弊用户识别模型的识别作弊用户的精度,识别精度未达到预设精度时对所述模型参数进行调整处理,直至所述第二层级作弊用户识别模型的精度达到预设精度;
    其中,所述作弊用户的点击日志为所述作弊用户执行广告任务平台中广告任务所对应的点击日志。
  8. 根据权利要求要求7所述的方法,其特征在于,所述方法还包括:
    所述样本集合中的至少一个样本包括非作弊用户、以及所述非作弊用户点击广告的点击日志,所述非作弊用户的点击日志用于记录所述非作弊用户在应用中点击广告的操作;
    将所述非广告作弊用户、所述非作弊用户的点击广告的点击日志对应所述至少一个维度的特征标记为负样本;以及
    将所述负样本连同所述正样本输入所述第二层级作弊用户识别模型,基于输入的正样本和负样本对所述第二层级作弊用户识别模型中的模型参数进行训练。
  9. 根据权利要求要求1所述的方法,其特征在于,
    所述方法还包括:
    当点击广告的用户的标识与预设的第三层级作弊用户的标识匹配时,将所述点击广告的用户识别为所述第三层级作弊用户;
    或者,
    获取点击广告的用户在统计时段中点击广告的次数,当点击广告的次数超出点击量阈值时,将所述点击广告的用户识别为所述第三层级作弊用户;
    所述方法还包括:
    对所述作弊用户的点击量进行过滤,过滤方式包括以下至少之一:
    对所述作弊用户的点击量中超出点击量阈值之外的点击量进行过滤;
    对所述作弊用户的点击量中未超出点击量阈值的点击量进行过滤。
  10. 一种广告反作弊装置,其特征在于,所述装置包括:
    样本模块,用于获取样本集合,其中,所述样本集合中的至少一个样本包括作弊用户、以及所述作弊用户点击广告的点击日志;
    提取模块,用于从所述样本集合的样本中提取与作弊用户的层级对应的至少一个维度的特征,其中,每一个作弊用户对应一个层级,不同层级的作弊用户所对应的特征不同;
    模型训练模块,用于基于所述作弊用户、所述作弊用户点击广告的点击日志对应所述至少一个维度的特征形成正样本,至少基于所述正样本对与待识别的作弊用户的层级对应的作弊用户识别模型进行训练;以及
    模型应用模块,用于确定待识别的样本对应所述至少一个维度的特征;将所述待识别样本对应所述至少一个维度的特征输入训练后的所述作弊用户识别 模型,基于输出结果识别出所述待识别的样本中的作弊用户。
  11. 根据权利要求要求10所述的装置,其特征在于,
    所述样本集合中的样本为与应用对应的应用样本,至少一个所述应用样本为已知存在第一层级作弊用户的应用,每个所述应用样本包括所对应应用的以下维度的信息至少之一:
    所述应用的用户在所述应用中点击广告的点击日志;
    所述应用的用户在所述应用中点击广告的曝光日志;
    所述应用的用户在所述应用中点击广告的效果日志;以及
    所述应用的用户所使用的设备的信息;
    所述样本模块,还用于解析出所述应用样本中所包括的用户、以及所述应用样本中任意两个用户在以下至少一个维度的相关度:
    所述应用中任意两个用户在应用中点击广告的操作的特征的相关度;
    所述应用中任意两个用户在所述应用中点击的广告所曝光对象的相关度;
    所述应用中任意两个用户点击广告所使用的设备的信息的相关度;以及
    所述应用中任意两个用户点击所点击广告的广告效果的相关度。
  12. 根据权利要求要求11所述的装置,其特征在于,
    所述模型训练模块,还用于将已知包括有所述第一层级作弊用户的所述应用样本、以及所述应用样本中任意两个用户在所述至少一个维度的特征的相关度标记为所述正样本;将所述正样本输入第一层级作弊用户识别模型,基于输入的正样本对所述第一层级作弊用户识别模型中的模型参数进行训练。
  13. 根据权利要求要求11所述的装置,其特征在于,
    所述模型训练模块,还用于基于所述样本集合中的无标记应用样本、所述无标记应用样本中任意两个用户对应至少一个所述维度的相关度形成无标记样本,基于所述无标记样本和所述正样本对第一层级作弊用户识别模型的模型参数进行训练,直至,输入所述第一层级作弊用户识别模型的所述无标记样本中被所述第一层级作弊用户识别模型标记为正样本的数量处于稳定状态;
    其中,所述无标记应用样本为所述样本集合中未知存在高层级作弊用户的应用样本。
  14. 根据权利要求要求11所述的装置,其特征在于,
    所述模型应用模块,还用于获取第一层级作弊用户识别模型输出的待识别应用中任意两个用户在至少一个所述维度对应的相关度,确定待识别应用中用户在至少一个所述维度对应的平均相关度;以及
    当所述平均相关度超出平均相关度阈值时判定所述待识别应用中的用户为第一层级作弊用户。
  15. 根据权利要求要求10所述的装置,其特征在于,
    所述提取模块,还用于提取出所述样本集合中的点击日志中对应点击广告的操作数据;以及
    解析所提取的操作数据对应得到与点击广告的操作相关联的特征;
    其中,与点击广告的操作相关联的特征包括以下至少一个维度的特征:
    在统计时段内的点击量;
    在所述统计时段内点击过广告的时间周期的数量;
    在所述统计时段中点击广告的间隔时间的平均值;
    所述统计时段内识别的作弊用户的历史比例;以及
    所述统计时段内所点击过广告的时间周期中广告的平均点击量。
  16. 根据权利要求要求15所述的装置,其特征在于,
    所述模型训练模块,还用于将所述正样本输入第二层级作弊用户识别模型,基于输入的正样本对所述第二层级作弊用户识别模型中的模型参数进行训练;测试所述第二层级作弊用户识别模型的识别作弊用户的精度,识别精度未达到预设精度时对所述模型参数进行调整处理,直至所述第二层级作弊用户识别模型的精度达到预设精度;
    其中,所述作弊用户的点击日志为所述作弊用户执行广告任务平台中广告任务所对应的点击日志。
  17. 根据权利要求要求16所述的装置,其特征在于,
    所述模型训练模块,还用于将非广告作弊用户、所述非作弊用户的点击广告的点击日志对应所述至少一个维度的特征标记为负样本;以及
    将所述负样本连同所述正样本输入所述第二层级作弊用户识别模型,基于所述正样本和所述负样本对所述第二层级作弊用户识别模型中的模型参数进行训练;
    其中,所述样本集合中的至少一个样本包括所述非作弊用户、以及所述非作弊用户点击广告的点击日志,所述非作弊用户的点击日志用于记录所述非作弊用户在应用中点击广告的操作;
    其中,所述待识别样本为所述待识别应用。
  18. 根据权利要求要求10所述的装置,其特征在于,
    所述装置还包括:
    统计模块,用于当点击广告的用户的标识与预设的第三层级作弊用户的标识匹配时,将所述点击广告的用户识别为所述第三层级作弊用户;或者,获取点击广告的用户在统计时段中点击广告的次数,当点击广告的次数超出点击量阈值时,将所述点击广告的用户识别为所述第三层级作弊用户;
    所述装置还包括:
    判罚模块,用于对所述作弊用户的点击量进行过滤,过滤方式包括以下至少之一:
    对所述作弊用户的点击量中超出点击量阈值之外的点击量进行过滤;以及
    对所述作弊用户的点击量中未超出点击量阈值的点击量进行过滤。
  19. 一种非易失性存储介质,用于存储一个或多个计算机程序,其中,所述计算机程序包括具有一个或多个存储器的处理器可运行的指令,所述指令被计算机执行时,使得所述计算机执行以下操作:
    获取样本集合,其中,所述样本集合中的至少一个样本包括作弊用户、以及所述作弊用户点击广告的点击日志;
    从所述样本集合的样本中提取与作弊用户层级对应的至少一个维度的特征,其中,每一个作弊用户对应一个层级,不同层级作弊用户所对应的特征不 同;
    基于所述作弊用户、所述作弊用户点击广告的点击日志对应所述至少一个维度的特征形成正样本,至少基于所述正样本对与作弊用户的层级对应的作弊用户识别模型进行训练;
    确定待识别的样本对应所述至少一个维度的特征;以及
    将所述待识别样本对应所述至少一个维度的特征输入训练后的所述作弊用户识别模型,基于输出结果识别出所述待识别的样本中的作弊用户。
PCT/CN2017/085687 2016-05-24 2017-05-24 广告反作弊方法,装置及存储介质 WO2017202336A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2018543423A JP6878450B2 (ja) 2016-05-24 2017-05-24 広告に関する不正行為を防止するための方法及びデバイス並びに記憶媒体
US15/971,614 US10929879B2 (en) 2016-05-24 2018-05-04 Method and apparatus for identification of fraudulent click activity

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610349338.7 2016-05-24
CN201610349338.7A CN106022834B (zh) 2016-05-24 2016-05-24 广告反作弊方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/971,614 Continuation US10929879B2 (en) 2016-05-24 2018-05-04 Method and apparatus for identification of fraudulent click activity

Publications (1)

Publication Number Publication Date
WO2017202336A1 true WO2017202336A1 (zh) 2017-11-30

Family

ID=57093146

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/085687 WO2017202336A1 (zh) 2016-05-24 2017-05-24 广告反作弊方法,装置及存储介质

Country Status (4)

Country Link
US (1) US10929879B2 (zh)
JP (1) JP6878450B2 (zh)
CN (1) CN106022834B (zh)
WO (1) WO2017202336A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126448A (zh) * 2019-11-29 2020-05-08 无线生活(北京)信息技术有限公司 一种智能识别诈骗用户的方法及装置
CN111242318A (zh) * 2020-01-13 2020-06-05 拉扎斯网络科技(上海)有限公司 基于异构特征库的业务模型训练方法及装置
CN113034123A (zh) * 2021-02-19 2021-06-25 腾讯科技(深圳)有限公司 异常资源转移识别方法、装置、电子设备及可读存储介质

Families Citing this family (72)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022834B (zh) * 2016-05-24 2020-04-07 腾讯科技(深圳)有限公司 广告反作弊方法及装置
CN106651458B (zh) * 2016-12-29 2020-07-07 腾讯科技(深圳)有限公司 一种广告反作弊方法和装置
CN106651475A (zh) * 2017-02-22 2017-05-10 广州万唯邑众信息科技有限公司 一种移动视频广告假量识别方法和系统
CN108734495A (zh) * 2017-04-20 2018-11-02 腾讯科技(深圳)有限公司 一种推广信息的资源信息确定方法、主机、服务器及系统
CN108734366B (zh) * 2017-04-24 2022-09-30 北京京东尚科信息技术有限公司 用户识别方法及其系统、非易失性存储介质和计算机系统
CN107194215B (zh) * 2017-05-05 2020-06-26 北京神州新桥科技有限公司 用户行为分析方法、装置、系统及机器可读存储介质
CN107274212A (zh) * 2017-05-26 2017-10-20 北京小度信息科技有限公司 作弊识别方法及装置
CN107241347B (zh) * 2017-07-10 2020-06-02 上海精数信息科技有限公司 广告流量质量的分析方法和装置
US11721090B2 (en) * 2017-07-21 2023-08-08 Samsung Electronics Co., Ltd. Adversarial method and system for generating user preferred contents
CN107481049A (zh) * 2017-08-10 2017-12-15 北京铭嘉实咨询有限公司 对广告进行监测的方法和系统
CN107507027A (zh) * 2017-08-15 2017-12-22 上海精数信息科技有限公司 基于人机识别的广告投放控制方法及投放系统
CN107483443B (zh) * 2017-08-22 2020-06-05 北京京东尚科信息技术有限公司 广告信息处理方法、客户端、存储介质和电子设备
CN109428776B (zh) * 2017-08-23 2020-11-27 北京国双科技有限公司 一种网站流量的监控方法及装置
CN107742265A (zh) * 2017-10-27 2018-02-27 合肥亚慕信息科技有限公司 一种基于大数据分析在线考试监考系统
CN109754272A (zh) * 2017-11-03 2019-05-14 北京京东尚科信息技术有限公司 网络广告的计费方法和系统
CN108093428B (zh) * 2017-11-06 2021-02-19 每日互动股份有限公司 用于鉴别真实流量的服务器
CN108280332B (zh) 2017-12-15 2021-08-03 创新先进技术有限公司 移动终端的生物特征认证识别检测方法、装置和设备
CN108062686A (zh) * 2017-12-20 2018-05-22 广州容骏信息科技有限公司 一种dsp广告投放反作弊系统
CN109995834A (zh) * 2017-12-30 2019-07-09 中国移动通信集团贵州有限公司 大流量数据处理方法、装置、计算设备及存储介质
CN108415931B (zh) * 2018-01-22 2020-05-19 北京深演智能科技股份有限公司 一种用于识别作弊流量的模型建立方法及系统
CN110097389A (zh) * 2018-01-31 2019-08-06 上海甚术网络科技有限公司 一种广告流量反作弊方法
CN108520438B (zh) * 2018-03-30 2021-06-22 北京小米移动软件有限公司 行为类型确定方法及装置
CN108470253B (zh) * 2018-04-02 2021-08-03 腾讯科技(深圳)有限公司 一种用户识别方法、装置及存储设备
AU2019253918A1 (en) * 2018-04-18 2020-11-12 TrafficGuard Pty Ltd System and methods for mitigating fraud in real time using feedback
US10645111B1 (en) * 2018-04-23 2020-05-05 Facebook, Inc. Browsing identity
CN110213209B (zh) * 2018-05-11 2022-01-07 腾讯科技(深圳)有限公司 一种推送信息点击的作弊检测方法、装置及存储介质
CN108694616A (zh) * 2018-05-24 2018-10-23 百度在线网络技术(北京)有限公司 广告作弊的识别方法和装置
CN109167698A (zh) * 2018-07-10 2019-01-08 百度在线网络技术(北京)有限公司 人机流量鉴别方法、装置、计算机设备及存储介质
CN109191167A (zh) * 2018-07-17 2019-01-11 阿里巴巴集团控股有限公司 一种目标用户的挖掘方法和装置
CN109034906A (zh) * 2018-08-03 2018-12-18 北京木瓜移动科技股份有限公司 广告转化的反作弊方法、装置、电子设备及存储介质
CN109189937B (zh) * 2018-08-22 2021-02-09 创新先进技术有限公司 一种特征关系推荐方法及装置、一种计算设备及存储介质
CN109165691B (zh) * 2018-09-05 2022-04-22 北京奇艺世纪科技有限公司 用于识别作弊用户的模型的训练方法、装置及电子设备
CN109461068A (zh) * 2018-09-13 2019-03-12 深圳壹账通智能科技有限公司 欺诈行为的判断方法、装置、设备及计算机可读存储介质
US11620675B2 (en) 2018-09-25 2023-04-04 Nippon Telegraph And Telephone Corporation Detector, detection method, and detection program
US20200118162A1 (en) * 2018-10-15 2020-04-16 Affle (India) Limited Method and system for application installation and detection of fraud in advertisement
CN111105262B (zh) * 2018-10-29 2024-05-14 北京奇虎科技有限公司 一种用户识别方法、装置、电子设备和存储介质
CN111199415B (zh) * 2018-11-20 2024-05-24 北京京东尚科信息技术有限公司 识别点击广告有效性的模型训练方法、装置、设备及介质
CN109587248B (zh) * 2018-12-06 2023-08-29 腾讯科技(深圳)有限公司 用户识别方法、装置、服务器及存储介质
CN109842619B (zh) * 2019-01-08 2022-07-08 北京百度网讯科技有限公司 用户账号拦截方法和装置
CN111435507A (zh) * 2019-01-11 2020-07-21 腾讯科技(北京)有限公司 广告反作弊方法、装置、电子设备及可读存储介质
KR102027409B1 (ko) * 2019-02-18 2019-10-02 넷마블 주식회사 광고 사기 탐지 방법 및 장치
CN110111155A (zh) * 2019-05-14 2019-08-09 重庆天蓬网络有限公司 广告反作弊处理方法、系统、介质和电子设备
CN110191119B (zh) * 2019-05-28 2021-09-10 秒针信息技术有限公司 一种产生异常流量的app的确定方法及装置
WO2020257991A1 (zh) * 2019-06-24 2020-12-30 深圳市欢太科技有限公司 用户识别方法及相关产品
CN110415044A (zh) * 2019-08-01 2019-11-05 秒针信息技术有限公司 作弊检测方法、装置、设备及存储介质
CN112529605B (zh) * 2019-09-17 2023-12-22 北京互娱数字科技有限公司 一种广告异常曝光识别系统及方法
CN110807068B (zh) * 2019-10-08 2022-09-23 北京百度网讯科技有限公司 换设备用户的识别方法、装置、计算机设备和存储介质
CN110852761B (zh) * 2019-10-11 2023-07-04 支付宝(杭州)信息技术有限公司 制定反作弊策略的方法、装置及电子设备
CN110827094B (zh) * 2019-11-15 2023-05-23 湖南快乐阳光互动娱乐传媒有限公司 广告投放的反作弊方法及系统
CN111028011A (zh) * 2019-12-10 2020-04-17 北京华峰创业科技有限公司 一种广告的点击防作弊方法、智能终端和服务器
EP3882795A4 (en) * 2019-12-26 2021-12-15 Rakuten Group, Inc. FRAUD DETECTION SYSTEM, FRAUD DETECTION METHOD AND PROGRAM
CN111177725B (zh) * 2019-12-31 2023-06-20 广州市百果园信息技术有限公司 一种检测恶意刷点击操作的方法、装置、设备及存储介质
CN111242239B (zh) * 2020-01-21 2023-05-30 腾讯科技(深圳)有限公司 一种训练样本选取方法、装置、以及计算机存储介质
GB2591805A (en) * 2020-02-07 2021-08-11 Beaconsoft Ltd Journey validation tool
KR102365868B1 (ko) * 2020-02-26 2022-02-21 아주대학교산학협력단 전자 장치 및 이의 잠재적 유해 애플리케이션 판단 방법
CN111292139A (zh) * 2020-03-12 2020-06-16 上海数川数据科技有限公司 一种基于时频分析的反作弊方法
CN111401447B (zh) * 2020-03-16 2023-04-07 腾讯云计算(北京)有限责任公司 一种基于人工智能的流量作弊识别方法、装置、电子设备
CN111404835B (zh) * 2020-03-30 2023-05-30 京东科技信息技术有限公司 流量控制方法、装置、设备及存储介质
CN111563765A (zh) * 2020-04-21 2020-08-21 北京龙云科技有限公司 一种作弊用户筛选方法、装置、设备及可读存储介质
CN111612531B (zh) * 2020-05-13 2024-05-10 宁波财经学院 一种点击欺诈的检测方法及系统
CN114078016B (zh) * 2020-08-12 2023-10-10 腾讯科技(深圳)有限公司 一种反作弊行为识别方法、装置、电子设备和存储介质
CN112188291B (zh) * 2020-09-24 2022-11-29 北京明略昭辉科技有限公司 广告位异常的识别方法和装置
CN112258221A (zh) * 2020-10-12 2021-01-22 上海酷量信息技术有限公司 一种识别作弊终端的系统和方法
KR20220099016A (ko) * 2021-01-05 2022-07-12 삼성전자주식회사 전자 장치 및 그 제어 방법
CN112581195B (zh) * 2021-02-25 2021-05-28 武汉卓尔数字传媒科技有限公司 一种广告推送方法、装置和电子设备
CN113610569A (zh) * 2021-07-27 2021-11-05 上海交通大学 广告点击农场检测方法、系统、终端及介质
CN113486984B (zh) * 2021-08-02 2022-05-17 智慧足迹数据科技有限公司 基于信令数据识别用户类型的方法及相关装置
CN113986091A (zh) * 2021-10-29 2022-01-28 掌阅科技股份有限公司 防重复点击方法、电子设备及计算机存储介质
KR20230153092A (ko) * 2022-04-28 2023-11-06 넷마블 주식회사 광고 사기 유저를 분류하기 위한 장치 및 방법
CN115604027B (zh) * 2022-11-28 2023-03-14 中南大学 网络指纹识别模型训练方法、识别方法、设备及存储介质
CN116662466B (zh) * 2023-05-18 2023-12-19 重庆市规划和自然资源调查监测院 通过大数据进行土地全生命周期维护系统
CN117033745B (zh) * 2023-10-10 2024-01-09 北京智慧易科技有限公司 一种作弊对象识别方法、系统、设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150369A (zh) * 2013-03-07 2013-06-12 人民搜索网络股份公司 作弊网页识别方法及装置
CN103390027A (zh) * 2013-06-25 2013-11-13 亿赞普(北京)科技有限公司 一种互联网广告反作弊方法和系统
CN105404947A (zh) * 2014-09-02 2016-03-16 阿里巴巴集团控股有限公司 用户质量侦测方法及装置
CN106022834A (zh) * 2016-05-24 2016-10-12 腾讯科技(深圳)有限公司 广告反作弊方法及装置

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002245339A (ja) * 2001-02-20 2002-08-30 Mitsubishi Electric Corp インターネット広告の対価決定システム及び不正防止システム
US7584287B2 (en) * 2004-03-16 2009-09-01 Emergency,24, Inc. Method for detecting fraudulent internet traffic
US20070061211A1 (en) * 2005-09-14 2007-03-15 Jorey Ramer Preventing mobile communication facility click fraud
US20070073579A1 (en) * 2005-09-23 2007-03-29 Microsoft Corporation Click fraud resistant learning of click through rate
JP2007286803A (ja) * 2006-04-14 2007-11-01 Nippon Telegr & Teleph Corp <Ntt> 広告配信管理装置、広告配信管理方法、広告配信管理プログラム
US20070255821A1 (en) * 2006-05-01 2007-11-01 Li Ge Real-time click fraud detecting and blocking system
WO2008030670A1 (en) * 2006-09-08 2008-03-13 Microsoft Corporation Detecting and adjudicating click fraud
US8880541B2 (en) * 2006-11-27 2014-11-04 Adobe Systems Incorporated Qualification of website data and analysis using anomalies relative to historic patterns
US20080147456A1 (en) * 2006-12-19 2008-06-19 Andrei Zary Broder Methods of detecting and avoiding fraudulent internet-based advertisement viewings
US8131611B2 (en) * 2006-12-28 2012-03-06 International Business Machines Corporation Statistics based method for neutralizing financial impact of click fraud
US20080162202A1 (en) * 2006-12-29 2008-07-03 Richendra Khanna Detecting inappropriate activity by analysis of user interactions
US20080270154A1 (en) * 2007-04-25 2008-10-30 Boris Klots System for scoring click traffic
US8135615B2 (en) * 2007-12-18 2012-03-13 Amdocs Software Systems Limited Systems and methods for detecting click fraud
US8639570B2 (en) * 2008-06-02 2014-01-28 Microsoft Corporation User advertisement click behavior modeling
US8245282B1 (en) * 2008-08-19 2012-08-14 Eharmony, Inc. Creating tests to identify fraudulent users
KR20100057192A (ko) * 2008-11-21 2010-05-31 강용석 Cpc 광고의 부정클릭 차단 시스템 및 그 방법
US20110131652A1 (en) * 2009-05-29 2011-06-02 Autotrader.Com, Inc. Trained predictive services to interdict undesired website accesses
US8533825B1 (en) * 2010-02-04 2013-09-10 Adometry, Inc. System, method and computer program product for collusion detection
US20110208714A1 (en) * 2010-02-19 2011-08-25 c/o Microsoft Corporation Large scale search bot detection
US20120130801A1 (en) * 2010-05-27 2012-05-24 Victor Baranov System and method for mobile advertising
US10298614B2 (en) * 2010-11-29 2019-05-21 Biocatch Ltd. System, device, and method of generating and managing behavioral biometric cookies
US9418221B2 (en) * 2010-11-29 2016-08-16 Biocatch Ltd. Method, device, and system of differentiating among users based on responses to injected interferences
US20130117081A1 (en) * 2011-11-07 2013-05-09 Fair Isaac Corporation Lead Fraud Detection
US10387911B1 (en) * 2012-06-01 2019-08-20 Integral Ad Science, Inc. Systems, methods, and media for detecting suspicious activity
US9027127B1 (en) * 2012-12-04 2015-05-05 Google Inc. Methods for detecting machine-generated attacks based on the IP address size
CN103310003A (zh) * 2013-06-28 2013-09-18 华东师范大学 一种基于点击日志的新广告点击率预测方法及系统
JP6365010B2 (ja) * 2014-06-30 2018-08-01 富士ゼロックス株式会社 学習プログラム及び情報処理装置
US10621613B2 (en) * 2015-05-05 2020-04-14 The Nielsen Company (Us), Llc Systems and methods for monitoring malicious software engaging in online advertising fraud or other form of deceit
WO2016191910A1 (en) * 2015-05-29 2016-12-08 Excalibur Ip, Llc. Detecting coalition fraud in online advertising
CN105046529A (zh) * 2015-07-30 2015-11-11 华南理工大学 一种移动广告作弊识别方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150369A (zh) * 2013-03-07 2013-06-12 人民搜索网络股份公司 作弊网页识别方法及装置
CN103390027A (zh) * 2013-06-25 2013-11-13 亿赞普(北京)科技有限公司 一种互联网广告反作弊方法和系统
CN105404947A (zh) * 2014-09-02 2016-03-16 阿里巴巴集团控股有限公司 用户质量侦测方法及装置
CN106022834A (zh) * 2016-05-24 2016-10-12 腾讯科技(深圳)有限公司 广告反作弊方法及装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126448A (zh) * 2019-11-29 2020-05-08 无线生活(北京)信息技术有限公司 一种智能识别诈骗用户的方法及装置
CN111242318A (zh) * 2020-01-13 2020-06-05 拉扎斯网络科技(上海)有限公司 基于异构特征库的业务模型训练方法及装置
CN111242318B (zh) * 2020-01-13 2024-04-26 拉扎斯网络科技(上海)有限公司 基于异构特征库的业务模型训练方法及装置
CN113034123A (zh) * 2021-02-19 2021-06-25 腾讯科技(深圳)有限公司 异常资源转移识别方法、装置、电子设备及可读存储介质
CN113034123B (zh) * 2021-02-19 2024-03-12 腾讯科技(深圳)有限公司 异常资源转移识别方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
JP6878450B2 (ja) 2021-05-26
US20180253755A1 (en) 2018-09-06
JP2018536956A (ja) 2018-12-13
CN106022834B (zh) 2020-04-07
US10929879B2 (en) 2021-02-23
CN106022834A (zh) 2016-10-12

Similar Documents

Publication Publication Date Title
WO2017202336A1 (zh) 广告反作弊方法,装置及存储介质
CN110033314B (zh) 广告数据处理方法及装置
CN106355431B (zh) 作弊流量检测方法、装置及终端
US10282748B2 (en) System and method for measuring advertising effectiveness
WO2017206811A1 (zh) 一种信息处理方法、服务器及非易失性存储介质
US10384136B2 (en) User matching method, apparatus, and system
US20120030009A1 (en) Digital creative interaction system
US20150324857A1 (en) Cross-platform advertising systems and methods
WO2014123677A1 (en) Initiating real-time bidding based on expected revenue from bids
WO2010057195A2 (en) System, method and computer program product for predicting customer behavior
CN102289756A (zh) 点击有效性的判断方法及其系统
CN103593355A (zh) 用户原创内容的推荐方法及推荐装置
US7987114B2 (en) Method of managing advertisers and system for executing the method
JP7106760B2 (ja) サービスの成長速度を高めるためのコンピュータ処理
CN111563765A (zh) 一种作弊用户筛选方法、装置、设备及可读存储介质
US20190340184A1 (en) System and method for managing content presentations
CN110717653A (zh) 风险识别方法及装置和电子设备
CN106033302B (zh) 信息展示区的操作处理方法及系统
US20140200990A1 (en) Scoring and ranking advertisement content creators
US11978002B2 (en) Computer enhancements for increasing service growth speed
CN107818483B (zh) 网络卡券推荐方法及系统
CN112016959A (zh) 广告处理方法及服务器
US20200042858A1 (en) Understanding social media user behavior
CN106296236B (zh) 信息处理方法及信息投放系统
CN110310146A (zh) 确定网红商户的方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2018543423

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17802182

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17802182

Country of ref document: EP

Kind code of ref document: A1