CN109146574A - Ad click cheating monitoring method and device - Google Patents

Ad click cheating monitoring method and device Download PDF

Info

Publication number
CN109146574A
CN109146574A CN201811040607.7A CN201811040607A CN109146574A CN 109146574 A CN109146574 A CN 109146574A CN 201811040607 A CN201811040607 A CN 201811040607A CN 109146574 A CN109146574 A CN 109146574A
Authority
CN
China
Prior art keywords
characteristic
click
data
ratio
cheating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811040607.7A
Other languages
Chinese (zh)
Inventor
张舒虹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Papaya Mobile Technology Co Ltd
Original Assignee
Shenzhen Papaya Mobile Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Papaya Mobile Technology Co Ltd filed Critical Shenzhen Papaya Mobile Technology Co Ltd
Priority to CN201811040607.7A priority Critical patent/CN109146574A/en
Publication of CN109146574A publication Critical patent/CN109146574A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0248Avoiding fraud

Abstract

The embodiment of the present application provides a kind of ad click cheating monitoring method and device, is related to data processing field.Method includes: the M click data based on advertisement, obtains the various related datas in M click data, M is positive integer;Various related datas in M click data are associated by identical dimensional, are combined by different dimensions, N characteristic is calculated, N is positive integer;Obtain the information gain-ratio of every characteristic in N characteristic;It calls preset Gauss model to input n characteristic of high information gain-ratio in N characteristic, determines that the click data for whether having cheating to click in M click data, n are the positive integer no more than N.It realizes and click cheating is monitored, avoid because of the resource cost caused by safeguarding and updating blacklist.When there are new click fraudulent means, it can also be identified using the feature that classification capacity is strong in the new click cheating is analyzed, greatly improve the safety that antipoints hits cheating protection.

Description

Ad click cheating monitoring method and device
Technical field
This application involves data processing fields, in particular to a kind of ad click cheating monitoring method and device.
Background technique
With the extensive use of mobile device, the also corresponding extension rapidly of the market of advertisement.Flow side uses shifting in user Advertisement is launched to it during dynamic terminal, is wide by behaviors such as the exposure of user, click, downloading installation, activation and purchases The desired conversion of master tape is accused, while being sought profit for oneself.Then, it is come into being by forging the moving advertising cheating of flow.And According to the charging mode CPC (Cost Per Click, clicking charging every time) of current mainstream, the anti-means practised fraud are mainly with identification Based on falseness is clicked.
Currently, the anti-cheating technology of ad click is so that blacklist master is arranged mostly.For example, being picked by establishing blacklist Except all from anonymous or Agent IP, suspicious traffic is filtered in the click of high risk or new device id from the source of access.With And the click that the identical device model of statistics, UA or IP are generated excessively or is excessively concentrated to identify and click cheating.But this anti-cheating Method needs real-time servicing and updates blacklist, and resource consumption is big, and new click fraudulent means once occurs, original black name It is single often to identify, to be allowed to cause huge loss.
Summary of the invention
The application is to provide a kind of ad click cheating monitoring method and device, to be effectively improved above-mentioned defect.
To achieve the goals above, embodiments herein is accomplished in that
In a first aspect, the embodiment of the present application provides a kind of ad click cheating monitoring method, which comprises be based on M click data of advertisement, obtains the various related datas in the M click data, and M is positive integer;The M item is clicked Various related datas in data are associated by identical dimensional, are combined by different dimensions, and N feature is calculated Data, N are positive integer;Obtain the information gain-ratio of every characteristic in the N characteristic, wherein each information increases Beneficial rate is used to indicate the size of corresponding each characteristic classification capacity;Preset Gauss model is called to input the N item special N characteristic for levying high information gain-ratio in data, whether in the M click data have click practise fraud, n is little if determining In the positive integer of N.
With reference to first aspect, described to obtain every feature in the N characteristic in some possible implementations The information gain-ratio of data, comprising: every characteristic in the N characteristic is subjected to boxcox transformation, obtains every The transformation results data of characteristic;Transformation results data based on every characteristic carry out information gain-ratio calculating, obtain The information gain-ratio of every characteristic.
With reference to first aspect, in some possible implementations, the transformation results number based on every characteristic According to feature selecting is carried out, the information gain-ratio of every characteristic is obtained, comprising: calculate the transformation results number of every characteristic According to entropy, and calculate every characteristic conditional entropy of the transformation results data based on primitive class label, wherein it is described original Whether class label is the authentic signature for clicking cheating as a click data;According to the entropy of every characteristic, the condition The comentropy of entropy and the primitive class label obtains the information gain-ratio of every characteristic.
With reference to first aspect, described that preset Gauss model is called to input the N item in some possible implementations Whether n characteristic of high information gain-ratio in characteristic, determining has the click clicked and practised fraud in the M click data Data, comprising: n characteristic of high information gain-ratio is determined from the N characteristic;Call preset Gaussian mode Type calculates the probability density of every characteristic in the n characteristic, and it is close to obtain the corresponding probability of the n characteristic Spend product;Whether according to the probability density product, determining has click to practise fraud in the M click data.
With reference to first aspect, described according to the probability density product in some possible implementations, determine described in Whether there is click to practise fraud in M click data, comprising: according to the probability density product, to obtain every point in M click data Data corresponding probability density product in the probability density product is hit,;According to the probability density of every click data Whether it is less than preset threshold in product, whether be click cheating, wherein the institute of every click data if determining that every click data is Stating probability density product indicates every click data less than the preset threshold to click cheating.
Second aspect, the embodiment of the present application provide a kind of ad click cheating monitoring device, and described device includes: data It obtains module and obtains the various related datas in the M click data for the M click data based on advertisement, M is positive Integer;Feature extraction module is pressed for being associated the various related datas in the M click data by identical dimensional Different dimensions are combined, and N characteristic is calculated, and N is positive integer;Feature selection module, for obtaining the N The information gain-ratio of every characteristic in characteristic, wherein each information gain-ratio is for indicating corresponding each spy Levy the size of data classification ability;Cheating determining module is clicked, for calling preset Gauss model to input the N characteristic Whether according to n characteristic of middle high information gain-ratio, determining has click to practise fraud in the M click data, and n is no more than N Positive integer.
In conjunction with second aspect, in some possible implementations, the feature selection module is also used to the N item Every characteristic carries out boxcox transformation in characteristic, obtains the transformation results data of every characteristic;Based on every The transformation results data of characteristic carry out feature selecting calculating, obtain the information gain-ratio of every characteristic.
In conjunction with second aspect, in some possible implementations, the feature selection module is also used to calculate every spy The entropy of the transformation results data of data is levied, and calculates item of the transformation results data based on primitive class label of every characteristic Part entropy, wherein whether the primitive class label is the authentic signature clicking cheating and hitting as a click data;According to every spy The entropy of data, the comentropy of the conditional entropy and the primitive class label are levied, the information gain-ratio of every characteristic is obtained.
In conjunction with second aspect, in some possible implementations, the click cheating determining module is also used to from described N characteristic of high information gain-ratio is determined in N characteristic;Preset Gauss model is called to calculate the n item special The probability density for levying every characteristic in data, obtains the corresponding probability density product of the n characteristic;According to described Whether probability density product, determining has click to practise fraud in the M click data.
In conjunction with second aspect, in some possible implementations, the click cheating determining module is also used to according to institute Probability density product is stated, it is close to obtain every click data corresponding probability in the probability density product in M click data Product is spent,;Whether it is less than preset threshold according in the probability density product of every click data, determines every click data For whether be click cheating, wherein the probability density product of every click data be less than the preset threshold indicate every Click data is to click cheating.
The third aspect, the embodiment of the present application provide a kind of electronic equipment, and the electronic equipment includes: processor, storage Device, bus and communication module.The processor, the communication module and memory are connected by the bus.The memory, For storing program.The processor, for storing first aspect and first aspect in the memory by calling The cheating monitoring method of ad click described in any possible implementation.
Fourth aspect, the embodiment of the present application provide a kind of meter of non-volatile program code that can be performed with processor The readable storage medium of calculation machine, said program code make the processor execute any possible of first aspect and first aspect The cheating monitoring method of ad click described in implementation.
The beneficial effect of the embodiment of the present application is:
Feature extraction is carried out by parsing and being associated with identical dimensional, combination different dimensions to M click data of advertisement, So that counting and N characteristic being calculated based on M click data, and then the letter of every characteristic can be calculated Cease ratio of profit increase.Since information gain-ratio is used to indicate the size of tagsort ability, then by calling preset Gauss model N characteristic of high information gain-ratio in the N characteristic is inputted, then can whether a little be determined in M click data Hit cheating.It realizes and click cheating is monitored, avoid because of the resource cost caused by safeguarding and updating blacklist, and going out When now new click fraudulent means, it can also be known using the feature that classification capacity is strong in the new click cheating is analyzed Not, the safety that antipoints hits cheating protection is greatly improved.
To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows the structural block diagram of a kind of electronic equipment of the application first embodiment offer;
Fig. 2 shows a kind of flow charts for ad click cheating monitoring method that the application second embodiment provides;
Fig. 3 shows a kind of structural block diagram of ad click cheating monitoring method of the application 3rd embodiment offer.
Specific embodiment
Below in conjunction with attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Ground description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.Usually exist The component of the embodiment of the present application described and illustrated in attached drawing can be arranged and be designed with a variety of different configurations herein.Cause This, is not intended to limit claimed the application's to the detailed description of the embodiments herein provided in the accompanying drawings below Range, but it is merely representative of the selected embodiment of the application.Based on embodiments herein, those skilled in the art not into Row goes out every other embodiment obtained under the premise of creative work, shall fall in the protection scope of this application.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.Term " first ", " the Two " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.
First embodiment
Referring to Fig. 1, the embodiment of the present application provides electronic equipment 10, the electronic equipment 10 may include: memory 11, Communication module 12, bus 13 and processor 14.Wherein, processor 14, communication module 12 and memory 11 are connected by bus 13. Processor 14 is for executing the executable module stored in memory 11, such as computer program.Electronic equipment 10 shown in FIG. 1 Component and structure be it is illustrative, and not restrictive, as needed, electronic equipment 10 also can have other assemblies and Structure
Wherein, memory 11 may include high-speed random access memory (Random Access Memory RAM), It may further include non-labile memory (non-volatile memory), for example, at least a magnetic disk storage.This implementation In example, memory 11 stores program required for executing ad click cheating monitoring method.
Bus 13 can be isa bus, pci bus or eisa bus etc..It is total that bus can be divided into address bus, data Line, control bus etc..Only to be indicated with a four-headed arrow in Fig. 1, it is not intended that an only bus or one convenient for indicating The bus of seed type.
Processor 14 may be a kind of IC chip, the processing capacity with signal.During realization, above-mentioned side Each step of method can be completed by the integrated logic circuit of the hardware in processor 14 or the instruction of software form.Above-mentioned Processor 14 can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network Processor (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (DSP), specific integrated circuit (ASIC), ready-made programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components.It may be implemented or execute disclosed each method, step and the logic diagram in the embodiment of the present invention.It is general Processor can be microprocessor or the processor is also possible to any conventional processor etc..In conjunction with institute of the embodiment of the present invention The step of disclosed method, can be embodied directly in hardware decoding processor and execute completion, or with the hardware in decoding processor And software module combination executes completion.Software module can be located at random access memory, and flash memory, read-only memory may be programmed read-only In the storage medium of this fields such as memory or electrically erasable programmable memory, register maturation.
Method performed by the device of stream process or definition that any embodiment of the embodiment of the present invention discloses can be applied to In processor 14, or realized by processor 14.Processor 14 is stored in after receiving and executing instruction by the calling of bus 13 After program in memory 11, processor 14, which controls communication module 12 by bus 13, can then execute ad click cheating monitoring The process of method.
Second embodiment
Present embodiments provide a kind of ad click cheating monitoring method, it should be noted that illustrate in the process of attached drawing Out the step of, can execute in a computer system such as a set of computer executable instructions, although also, in flow charts Logical order is shown, but in some cases, it can be with the steps shown or described are performed in an order that is different from the one herein. It describes in detail below to the present embodiment.
Referring to Fig. 2, in ad click provided in this embodiment cheating monitoring method, ad click cheating monitoring side Method includes: step S100, step S200, step S300 and step S400.
Step S100: the M click data based on advertisement obtains the various related datas in the M click data, M For positive integer;
Step S200: the various related datas in the M click data are associated by identical dimensional, by different dimensional Degree is combined, and N characteristic, the equal positive integer of N is calculated.
Step S300: the information gain-ratio of every characteristic in the N characteristic is obtained, wherein each information Ratio of profit increase is used to indicate the size of corresponding each characteristic classification capacity.
Step S400: preset Gauss model is called to input n feature of high information gain-ratio in the N characteristic Whether data, determining has click to practise fraud in the M click data, and n is the positive integer no more than N.
The scheme of the application will be described in detail below.
Step S100: the M click data based on advertisement obtains the various related datas in the M click data, M For positive integer.
Electronic equipment can obtain the M click data an of advertisement, wherein M is positive integer.For example, being launched on website There is the advertisement, then the clicking operation to the advertisement that each user executes on the web site, it can be so that running the website The corresponding M click data of clicking operation that each user is stored on server, then the electronic equipment can be obtained from server Obtain M click data of the advertisement offline.Also for example, electronic equipment can obtain 1000 click datas of advertisement.
In M click data, every click data may include: ad click time, user equipment UA (User Agent, user agent), device id FA (Identifier For Advertising, advertising logo), User IP, referer Data such as (reference fields).It is understood that the information content of the initial data in every click data or insufficient, in order to rich Rich data information, electronic equipment can parse every click data, so as to from original in every click data Data extract the various related datas of more information.For example, electronic equipment is to the user equipment UA in every click data It is parsed, then can be parsed out equipment brand, device model, device systems and version, browser and version and device class Etc. information, electronic equipment the User IP in every click data is parsed, then can be parsed out country where user, city City, area, longitude and latitude, autonomous system, ISP (Internet Service Provider, Internet Service Provider) and mechanism Etc. information.Wherein, for the User IP in every click data, electronic equipment can obtain first 24 of User IP, with after an action of the bowels It is continuous when being calculated by the network address of the User IP.
The various related datas of M click data of acquisition make M click data have bigger information on the whole Amount, convenient for the accuracy of subsequent calculating.For example, including 10 information in the initial data of every click data, then passing through Parsing can make comprising 500 information in the initial data of every click data, thus the letter of M click data on the whole Breath amount just becomes the bigger M*500 of information content from M*10.
Step S200: the various related datas in the M click data are associated by identical dimensional, by different dimensional Degree is combined, and N characteristic, the equal positive integer of N is calculated.
In this present embodiment, it is not that every click data can parse all information, such as the hits having The device model is lacked according to the click data that can be parsed device model, but have, therefore electronic equipment is after parsing, electronics is set It is standby also to need to judge whether the every information parsed can be used.Specifically, the total amount due to click data is M item, therefore parse The total amount of each information also should be M in the case where not lacking, then electronic equipment can be by judging in M hits Whether the total amount of each information and the ratio of M are greater than default ratio in various related datas in, are parsed with determination every Whether a information can be used, wherein default ratio can be 0.5.
Determining that the total amount of any one information that parses and the ratio of M be greater than default ratio, then electronic equipment can be with Determine that any one information is available.In view of the situation, if the ratio is 1, illustrate that the total amount of any one information is It does not lack.For example, the quantity of device systems and version that M click data parses is M, then illustrate the device systems and Version does not lack, subsequent directly to use the M device systems and version.If but the ratio be between 0.5 to 1 it Between, then illustrate what the total amount of any one information was missing from, therefore electronic equipment then needs the information according to preset rules by missing It is supplemented.For example, the quantity for the country that M click data parses is 2/3M, then illustrate that the total amount of the country is missing from , electronic equipment can supplement 1/3M country of missing according to preset rules, for example, 1/3M country of missing is Supplement is Other, and the M device systems and version can be used in order to subsequent.
Determining that the total amount of any one information that parses and the ratio of M be less than default ratio, then electronic equipment can be with Determine that any one information is disabled, therefore electronic equipment can abandon the not available information, to prevent subsequent use The not available information carries out calculating the accuracy that will affect calculated result.
It should be noted that also for can be with one day for the click time in every click data convenient for subsequent calculating In multiple periods be unit, such as will be be divided within 24 hours one day 4 periods, i.e., according to the work and rest rule of people point Are as follows: morning, the morning, afternoon, at night.So then the click time in M click data was divided into each same period, with Just subsequent calculating.
Also as a kind of optionally mode, multiple information of the various related datas parsed based on every click data And be not easy to subsequent calculating, therefore the present embodiment can using M information of any sort in various related datas as an information group, Each information group of identical dimensional is associated with at least one other information group again, and is associated with the feature for merging and forming different dimensions Data.Electronic equipment is obtained with based on closing to the various related datas in M click data by identical dimensional in this way Connection, obtains N characteristic of each dimension, wherein N is also positive integer.
For example, the characteristic of certain dimension can be with are as follows: in one day, the number of the distinct device UA from same User IP Amount or the characteristic of certain dimension again can be with are as follows: in one day, the quantity of the distinct device IDFA from same User IP.Or The characteristic of certain dimension can be with are as follows: in one day, the ratio of the different User IPs from same country is wherein, different The ratio of User IP=different User IPs from same country quantity/all touching quantities from same country;Or The characteristic of certain dimension can be with are as follows: in one day in different time sections, the different device models from same User IP Quantity variance;Or the characteristic of certain dimension can be with are as follows: in one day in different time sections, from same User IP The standard deviation of different IDFA quantity;Or the characteristic of certain dimension can be with are as follows: in one day, the difference from same User IP Equipment brand number is distributed in the features such as the accounting of intraday multiple periods.
In this way, passing through the N characteristic that M click data is counted to and obtained each dimension, it can by advertisement Original 10 attribute datas of 1000 clicks are converted to the characteristic of 400 different dimensions, and the characteristic of each dimension Data volume in is 1000.
Step S300: the information gain-ratio of every characteristic in the N characteristic is obtained, wherein each information Ratio of profit increase is used to indicate the size of corresponding each characteristic classification capacity.
The present embodiment can be using the information gain-ratio for calculating every characteristic, to pass through the information of every characteristic Ratio of profit increase filters out the strong data of classification capacity, so that later use these classification capacities strong characteristics is counted It calculates, while realizing effect, reduces the load of electronic equipment.
Specifically, since subsequent calculating is to use Gauss model, and every characteristic is discomfort in N characteristic Long-tail for Gauss model is distributed, then for convenient for the subsequent calculating using Gauss model, electronic equipment can be by N feature Every characteristic carries out boxcox transformation in data, obtains the transformation results data of every characteristic.It is appreciated that It is that the transformation results data of every characteristic are then closer to normal distribution, so as to apply it to Gauss model In calculated.
Optionally, the formula of boxcox transformation is as shown in following formula 1:
Wherein, wherein y is characterized data, and y (λ) is characterized the transformation results data of data, and λ is the ginseng of boxcox transformation Number.
In the present embodiment, electronic equipment, can be with base using the transformation results data of each characteristic after boxcox transformation Information gain-ratio calculating is carried out in the transformation results data of every characteristic, to obtain the information gain of every characteristic Rate can thus determine the high characteristic of information gain-ratio.
Optionally, electronic equipment can calculate the entropy of the transformation results data of every characteristic, and calculate every spy The transformation results data of data are levied based on the conditional entropy of primitive class label, wherein primitive class label is as a click data The no authentic signature to click cheating.
It is to be appreciated that the transformation results data of every characteristic of calculating are just based on the conditional entropy of primitive class label The information content of primitive class label is influenced in the transformation results data for determining every characteristic, if influence is bigger, every The calculated conditional entropy of transformation results data of characteristic is smaller, and illustrates that the classification capacity of every every characteristic is stronger.
Optionally, the comentropy of primitive class label has also been precomputed in electronic equipment, therefore electronic equipment is based on every The entropy of characteristic, each characteristic relative to the conditional entropy of primitive class label and the comentropy of primitive class label, then The information gain-ratio for obtaining every characteristic can be calculated.
Optionally, the formula of the information gain-ratio of every characteristic is obtained as shown in following formula 2 and formula 3:
Wherein, H (X) indicates the entropy of stochastic variable X;Stochastic variable X can be the transformation results data of every characteristic; The probability value of pi expression discrete random variable;gR(D, A) indicates the information gain-ratio of every characteristic;H (D) indicates primitive class The comentropy of label, and H (D | A) indicate conditional entropy of the transformation results data of every characteristic relative to primitive class label;H (A) entropy of the transformation results data of every characteristic is indicated.
Step S400: preset Gauss model is called to input n feature of high information gain-ratio in the N characteristic Whether data, determining has click to practise fraud in the M click data, and n is the positive integer no more than N.
Since each information gain-ratio is used to indicate the size of corresponding each characteristic classification capacity, then electronics is set It is standby screening to be carried out to characteristic to there is the low characteristic of expenditure based on the information gain-ratio for obtaining every characteristic According to abandoning, the strong characteristic of classification capacity is left.
Optionally, information gain-ratio threshold value is previously provided in electronic equipment, which can be electronics Equipment voluntarily dynamic regulation, for example, electronic equipment is based on being currently set dynamically the ratio of profit increase to the discrimination of ad click cheating Threshold value, for example, electronic equipment is currently lower to the discrimination of ad click cheating, then based on the lower result of discrimination and its Its some reference factors, then dynamically ratio of profit increase threshold value can be arranged it is higher, it is on the contrary then ratio of profit increase threshold value is arranged It is lower.So, electronic equipment is based on the ratio of profit increase threshold value, then can determine information gain-ratio from N characteristic It is used greater than n characteristic of the high information gain-ratio of the ratio of profit increase threshold value, and by n characteristic of the high information gain-ratio In subsequent calculating, wherein n is the positive integer no more than N.For example, by the screening based on ratio of profit increase threshold value, it can be by advertisement The characteristic screening of 400 different dimensions of 1000 click datas is 20 different dimensionals of 1000 click datas of advertisement The characteristic of the high information gain-ratio of degree, wherein n 1000*20.
Optionally, electronic equipment can also be according to calling preset Gauss model to input the n characteristic, to obtain The corresponding n probability density of n characteristic, and obtain the corresponding probability density product of n characteristic.
Optionally, Gauss model is called to calculate the formula of the probability density product of the n characteristic as described in following formula 4:
Wherein, μ and σ indicates the mean value and variance of normal distribution, and p (x) is each probability density product.
Optionally, the corresponding probability density product of n characteristic based on acquisition, electronic equipment is according to the probability density Product, so that it may which whether determine has click to practise fraud in the M click data.
In the present embodiment, preset threshold is also previously provided in electronic equipment, which can be for electronic equipment certainly Mobile state is adjusted, for example, the preset threshold is set dynamically based on the discrimination currently practised fraud to ad click in electronic equipment, than Such as, electronic equipment is currently lower to the discrimination of ad click cheating, then based on the lower result of discrimination and others one A little reference factors, then can dynamic preset threshold setting it is lower, it is on the contrary then ratio of profit increase threshold value is arranged higher.Base In this, whether electronic equipment is that can then determine every less than preset threshold according in the probability density product of every click data Click data be whether be to click cheating.Wherein, the probability density product of every click data, which is less than preset threshold, indicates every Click data is to click cheating.
It will be appreciated that probability density product is smaller, then it represents that the corresponding hits of corresponding each characteristic Cheating is clicked according to being more possible to.Therefore, corresponding hits are determined based on the comparison of probability density sum of products preset threshold According to whether being to click cheating to identify that emerging cheating technology is based on probability and can also be identified, so that electronics Equipment can effectively detect zero-day attack.
3rd embodiment
Referring to Fig. 3, the embodiment of the present application provides a kind of ad click cheating monitoring device 100, which makees Disadvantage monitoring device 100 is applied to electronic equipment, and ad click cheating monitoring device 100 includes.
Data obtaining module 110 obtains various in the M click data for the M click data based on advertisement Related data, M are positive integer.
Feature extraction module 120, for closing the various related datas in the M click data by identical dimensional Connection, is combined by different dimensions, N characteristic is calculated, N is positive integer.
Feature selection module 130, for obtaining the information gain-ratio of every characteristic in the N characteristic, In, each information gain-ratio is used to indicate the size of corresponding each characteristic classification capacity.
Cheating determining module 140 is clicked, for calling preset Gauss model to input high information in the N characteristic Whether n characteristic of ratio of profit increase, determining has click to practise fraud in the M click data, and n is the positive integer no more than N.
Wherein, the feature selection module 120 is also used to carry out every characteristic in the N characteristic Boxcox transformation, obtains the transformation results data of every characteristic;Transformation results data based on every characteristic carry out Ratio of profit increase calculates, and obtains the information gain-ratio of every characteristic.
And the feature selection module 120, it is also used to calculate the entropy of the transformation results data of every characteristic, with And calculate conditional entropy of the transformation results data based on primitive class label of every characteristic, wherein the primitive class label is made It whether is the authentic signature for clicking cheating for a click data;According to the entropy of every characteristic, the conditional entropy and described The comentropy of primitive class label obtains the information gain-ratio of every characteristic.
And the click cheating determining module 130, the n of high information gain-ratio is determined from the N characteristic Characteristic;It calls preset Gauss model to calculate the probability density of every characteristic in the n characteristic, obtains The corresponding probability density product of the n characteristic;According to the probability density product, determine in the M click data Whether there is click to practise fraud.
And the click cheating determining module 130, it is also used to obtain M hits according to the probability density product Every click data corresponding probability density product in the probability density product in,;According to the institute of every click data It states in probability density product and whether is less than preset threshold, whether be click cheating, wherein every point if determining that every click data is The probability density product for hitting data, which is less than the preset threshold, indicates every click data to click cheating.
It should be noted that due to it is apparent to those skilled in the art that, for the convenience and letter of description Clean, system, the specific work process of device and unit of foregoing description can be with reference to corresponding in preceding method embodiment Journey, details are not described herein.
It should be understood by those skilled in the art that, the embodiment of the present application can provide as the production of method, system or computer program Product.Therefore, in terms of the embodiment of the present application can be used complete hardware embodiment, complete software embodiment or combine software and hardware Embodiment form.Moreover, it wherein includes computer available programs generation that the embodiment of the present application, which can be used in one or more, The meter implemented in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of code The form of calculation machine program product.
In conclusion the embodiment of the present application provides a kind of ad click cheating monitoring method and device, method include: base In M click data of advertisement, the various related datas in M click data are obtained, M is positive integer;It will be in M click data Various related datas be associated by identical dimensional, be combined by different dimensions, N characteristic, N be calculated For positive integer;Obtain the information gain-ratio of every characteristic in N characteristic, wherein each information gain-ratio is used for table Show the size of corresponding each characteristic classification capacity;Preset Gauss model is called to input high information in N characteristic N characteristic of ratio of profit increase determines whether there is the click data clicked and practised fraud in M click data, and n is just no more than N Integer.
Feature extraction is carried out by parsing to M click data of advertisement and being associated with by identical dimensional, so that being based on M item point It hits data and obtains N characteristic of each dimension, and then the information gain-ratio of every characteristic can be calculated.By It is used to indicate the size of characteristic classification capacity in information gain-ratio, then by calling preset Gauss model to calculate the N N characteristic of high information gain-ratio, then accurately can determine in M click data whether there is click in characteristic The click data of cheating.It realizes and click cheating is monitored, avoid because of the resource consumption caused by safeguarding and updating blacklist Take, and when there are new click fraudulent means, it can also be with using classification capacity is strong in the new click cheating feature is analyzed It is identified, the safety that antipoints hits cheating protection is greatly improved.
The above is only preferred embodiment of the present application, are not intended to limit this application, for those skilled in the art For member, various changes and changes are possible in this application.Within the spirit and principles of this application, it is made it is any modification, Equivalent replacement, improvement etc., should be included within the scope of protection of this application.It should also be noted that similar label and letter are under Similar terms are indicated in the attached drawing in face, therefore, once being defined in a certain Xiang Yi attached drawing, are not then needed in subsequent attached drawing It is further defined and explained.
More than, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, and it is any to be familiar with Those skilled in the art within the technical scope of the present application, can easily think of the change or the replacement, and should all cover Within the protection scope of the application.Therefore, the protection scope of the application should be subject to the protection scope in claims.

Claims (10)

  1. The monitoring method 1. a kind of ad click is practised fraud, which is characterized in that the described method includes:
    The M click data based on advertisement, obtains the various related datas in the M click data, and M is positive integer;
    Various related datas in the M click data are associated by identical dimensional, are combined by different dimensions, are united N characteristic is calculated, N is positive integer;
    Obtain the information gain-ratio of every characteristic in the N characteristic, wherein each information gain-ratio is for indicating The size of corresponding each characteristic classification capacity;
    Call preset Gauss model to input n characteristic of high information gain-ratio in the N characteristic, determine described in Whether there is click to practise fraud in M click data, n is the positive integer no more than N.
  2. The monitoring method 2. ad click according to claim 1 is practised fraud, which is characterized in that described to obtain the N feature The information gain-ratio of every characteristic in data, comprising:
    Every characteristic in the N characteristic is subjected to boxcox transformation, obtains the transformation results of every characteristic Data;
    Transformation results data based on every characteristic carry out feature selecting, obtain the information gain-ratio of every characteristic.
  3. The monitoring method 3. ad click according to claim 2 is practised fraud.It is characterized in that, described be based on every characteristic Transformation results data carry out feature selecting, obtain every characteristic information gain-ratio, comprising:
    The entropy of the transformation results data of every characteristic is calculated, and the transformation results data of every characteristic of calculating are based on The conditional entropy of primitive class label, wherein whether the primitive class label is the true mark clicked of practising fraud as a click data Note;
    According to the entropy of every characteristic, the comentropy of the conditional entropy and the primitive class label, every characteristic is obtained Information gain-ratio.
  4. 4. the cheating of ad click described in -3 any claims monitoring method according to claim 1, which is characterized in that described to call in advance If Gauss model input n characteristic of high information gain-ratio in the N characteristic, determine the M hits Whether there is click to practise fraud in, comprising:
    N characteristic of high information gain-ratio is determined from the N characteristic;
    It calls preset Gauss model to calculate the probability density of every characteristic in the n characteristic, obtains the n item The corresponding probability density product of characteristic;
    Whether according to the probability density product, determining has click to practise fraud in the M click data.
  5. The monitoring method 5. ad click according to claim 4 is practised fraud, which is characterized in that multiplied according to the probability density Whether product, determining has click to practise fraud in the M click data, comprising:
    According to the probability density product, it is right in the probability density product to obtain every click data in M click data The probability density product answered;
    Whether it is less than preset threshold according in the probability density product of every click data, determines that every click data is yes It is no to practise fraud to click, wherein the probability density product of every click data, which is less than the preset threshold, indicates every click Data are to click cheating.
  6. The monitoring device 6. a kind of ad click is practised fraud, which is characterized in that described device includes:
    Data obtaining module obtains the various dependency numbers in the M click data for the M click data based on advertisement According to M is positive integer;
    Feature extraction module, for being associated the various related datas in the M click data by identical dimensional, by not It is combined with dimension, N characteristic is calculated, N is positive integer;
    Feature selection module, for obtaining the information gain-ratio of every characteristic in the N characteristic, wherein each Information gain-ratio is used to indicate the size of corresponding each characteristic classification capacity;
    Cheating determining module is clicked, for calling preset Gauss model to input high information gain-ratio in the N characteristic N characteristic, whether in the M click data have click practise fraud, n is positive integer no more than N if determining.
  7. The monitoring device 7. ad click according to claim 6 is practised fraud, which is characterized in that
    The feature selection module is also used to every characteristic in the N characteristic carrying out boxcox transformation, obtain The transformation results data of every characteristic;Transformation results data based on every characteristic carry out information gain-ratio calculating, Obtain the information gain-ratio of every characteristic.
  8. The monitoring device 8. ad click according to claim 7 is practised fraud.It is characterized in that,
    The feature selection module is also used to calculate the entropy of the transformation results data of every characteristic, and calculates every spy The transformation results data of data are levied based on the conditional entropy of primitive class label, wherein the primitive class label is as a hits According to whether be click cheating authentic signature;According to the entropy of every characteristic, the conditional entropy and the primitive class label Comentropy obtains the information gain-ratio of every characteristic.
  9. 9. according to the cheating monitoring device of ad click described in any claim of claim 6-8, which is characterized in that
    The click cheating determining module, is also used to determine n feature of high information gain-ratio from the N characteristic Data;It calls preset Gauss model to calculate the probability density of every characteristic in the n characteristic, obtains the n The corresponding probability density product of characteristic;Whether according to the probability density product, determining has in the M click data Click cheating.
  10. The monitoring device 10. ad click according to claim 9 is practised fraud, which is characterized in that
    The click cheating determining module is also used to obtain every click in M click data according to the probability density product Data corresponding probability density product in the probability density product;According to the probability density product of every click data In whether be less than preset threshold, whether determine that every click data is is to click cheating, wherein every click data it is described general Rate density product, which is less than the preset threshold, indicates every click data to click cheating.
CN201811040607.7A 2018-09-06 2018-09-06 Ad click cheating monitoring method and device Pending CN109146574A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811040607.7A CN109146574A (en) 2018-09-06 2018-09-06 Ad click cheating monitoring method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811040607.7A CN109146574A (en) 2018-09-06 2018-09-06 Ad click cheating monitoring method and device

Publications (1)

Publication Number Publication Date
CN109146574A true CN109146574A (en) 2019-01-04

Family

ID=64827532

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811040607.7A Pending CN109146574A (en) 2018-09-06 2018-09-06 Ad click cheating monitoring method and device

Country Status (1)

Country Link
CN (1) CN109146574A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109905738A (en) * 2019-03-26 2019-06-18 湖南快乐阳光互动娱乐传媒有限公司 Video ads show monitoring method and device, storage medium and electronic equipment extremely
CN111612550A (en) * 2020-05-28 2020-09-01 北京学之途网络科技有限公司 Advertisement trigger cheating identification method and device, electronic equipment and storage medium
CN112202807A (en) * 2020-10-13 2021-01-08 北京明略昭辉科技有限公司 Grayscale replacement method and device for IP (Internet protocol) blacklist, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103248955A (en) * 2013-04-22 2013-08-14 深圳Tcl新技术有限公司 Identity recognition method and device based on intelligent remote control system
CN106355431A (en) * 2016-08-18 2017-01-25 晶赞广告(上海)有限公司 Detection method, device and terminal for cheating traffic
CN106612216A (en) * 2015-10-27 2017-05-03 北京国双科技有限公司 Method and apparatus of detecting website access exception
CN106657141A (en) * 2017-01-19 2017-05-10 西安电子科技大学 Android malware real-time detection method based on network flow analysis
CN106815452A (en) * 2015-11-27 2017-06-09 苏宁云商集团股份有限公司 A kind of cheat detection method and device
CN107168854A (en) * 2017-06-01 2017-09-15 北京京东尚科信息技术有限公司 Detection method, device, equipment and readable storage medium storing program for executing are clicked in Internet advertising extremely
CN108389109A (en) * 2018-02-11 2018-08-10 中国民航信息网络股份有限公司 A kind of suspicious order feature extracting method of civil aviaton based on composite character selection algorithm

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103248955A (en) * 2013-04-22 2013-08-14 深圳Tcl新技术有限公司 Identity recognition method and device based on intelligent remote control system
CN106612216A (en) * 2015-10-27 2017-05-03 北京国双科技有限公司 Method and apparatus of detecting website access exception
CN106815452A (en) * 2015-11-27 2017-06-09 苏宁云商集团股份有限公司 A kind of cheat detection method and device
CN106355431A (en) * 2016-08-18 2017-01-25 晶赞广告(上海)有限公司 Detection method, device and terminal for cheating traffic
CN106657141A (en) * 2017-01-19 2017-05-10 西安电子科技大学 Android malware real-time detection method based on network flow analysis
CN107168854A (en) * 2017-06-01 2017-09-15 北京京东尚科信息技术有限公司 Detection method, device, equipment and readable storage medium storing program for executing are clicked in Internet advertising extremely
CN108389109A (en) * 2018-02-11 2018-08-10 中国民航信息网络股份有限公司 A kind of suspicious order feature extracting method of civil aviaton based on composite character selection algorithm

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109905738A (en) * 2019-03-26 2019-06-18 湖南快乐阳光互动娱乐传媒有限公司 Video ads show monitoring method and device, storage medium and electronic equipment extremely
CN109905738B (en) * 2019-03-26 2022-03-08 湖南快乐阳光互动娱乐传媒有限公司 Video advertisement abnormal display monitoring method and device, storage medium and electronic equipment
CN111612550A (en) * 2020-05-28 2020-09-01 北京学之途网络科技有限公司 Advertisement trigger cheating identification method and device, electronic equipment and storage medium
CN112202807A (en) * 2020-10-13 2021-01-08 北京明略昭辉科技有限公司 Grayscale replacement method and device for IP (Internet protocol) blacklist, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US10929879B2 (en) Method and apparatus for identification of fraudulent click activity
CN107168854B (en) Internet advertisement abnormal click detection method, device, equipment and readable storage medium
CN108737535B (en) Message pushing method, storage medium and server
US8620746B2 (en) Scoring quality of traffic to network sites
CN106204108B (en) The anti-cheat method of advertisement and the anti-cheating device of advertisement
CN102831218B (en) Method and device for determining data in thermodynamic chart
CN109146574A (en) Ad click cheating monitoring method and device
CN103605714B (en) The recognition methods of website abnormal data and device
CN107330718B (en) Media anti-cheating method and device, storage medium and terminal
CN108509583A (en) A kind of information-pushing method, server and computer readable storage medium
CN108460627A (en) Marketing activity scheme method for pushing, device, computer equipment and storage medium
TW201826188A (en) Data processing method and system
CN111738770B (en) Advertisement abnormal flow detection method and device
WO2020257991A1 (en) User identification method and related product
CN109146581A (en) A kind of resource allocation methods, device and readable storage medium storing program for executing
CN105631708B (en) Information processing method and device
CN108694603A (en) A kind of method and apparatus of advertisement price
CN112101691A (en) Method and device for dynamically adjusting risk level and server
CN112650921A (en) Object recommendation method, device, equipment and storage medium
CN111563765A (en) Cheating user screening method, device and equipment and readable storage medium
CN101449284A (en) Scoring quality of traffic to network sites using interrelated traffic parameters
CN106878410A (en) The detection method and device of a kind of request of data
CN113011912A (en) Media information processing method, device, electronic equipment and storage medium
CN112468444A (en) Internet domain name abuse identification method and device independent of content analysis
CN113191800B (en) Method and device for counting advertisement click rate on APP

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190104

RJ01 Rejection of invention patent application after publication