CN109146574A - Ad click cheating monitoring method and device - Google Patents
Ad click cheating monitoring method and device Download PDFInfo
- Publication number
- CN109146574A CN109146574A CN201811040607.7A CN201811040607A CN109146574A CN 109146574 A CN109146574 A CN 109146574A CN 201811040607 A CN201811040607 A CN 201811040607A CN 109146574 A CN109146574 A CN 109146574A
- Authority
- CN
- China
- Prior art keywords
- characteristic
- click
- data
- ratio
- cheating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0248—Avoiding fraud
Abstract
The embodiment of the present application provides a kind of ad click cheating monitoring method and device, is related to data processing field.Method includes: the M click data based on advertisement, obtains the various related datas in M click data, M is positive integer;Various related datas in M click data are associated by identical dimensional, are combined by different dimensions, N characteristic is calculated, N is positive integer;Obtain the information gain-ratio of every characteristic in N characteristic;It calls preset Gauss model to input n characteristic of high information gain-ratio in N characteristic, determines that the click data for whether having cheating to click in M click data, n are the positive integer no more than N.It realizes and click cheating is monitored, avoid because of the resource cost caused by safeguarding and updating blacklist.When there are new click fraudulent means, it can also be identified using the feature that classification capacity is strong in the new click cheating is analyzed, greatly improve the safety that antipoints hits cheating protection.
Description
Technical field
This application involves data processing fields, in particular to a kind of ad click cheating monitoring method and device.
Background technique
With the extensive use of mobile device, the also corresponding extension rapidly of the market of advertisement.Flow side uses shifting in user
Advertisement is launched to it during dynamic terminal, is wide by behaviors such as the exposure of user, click, downloading installation, activation and purchases
The desired conversion of master tape is accused, while being sought profit for oneself.Then, it is come into being by forging the moving advertising cheating of flow.And
According to the charging mode CPC (Cost Per Click, clicking charging every time) of current mainstream, the anti-means practised fraud are mainly with identification
Based on falseness is clicked.
Currently, the anti-cheating technology of ad click is so that blacklist master is arranged mostly.For example, being picked by establishing blacklist
Except all from anonymous or Agent IP, suspicious traffic is filtered in the click of high risk or new device id from the source of access.With
And the click that the identical device model of statistics, UA or IP are generated excessively or is excessively concentrated to identify and click cheating.But this anti-cheating
Method needs real-time servicing and updates blacklist, and resource consumption is big, and new click fraudulent means once occurs, original black name
It is single often to identify, to be allowed to cause huge loss.
Summary of the invention
The application is to provide a kind of ad click cheating monitoring method and device, to be effectively improved above-mentioned defect.
To achieve the goals above, embodiments herein is accomplished in that
In a first aspect, the embodiment of the present application provides a kind of ad click cheating monitoring method, which comprises be based on
M click data of advertisement, obtains the various related datas in the M click data, and M is positive integer;The M item is clicked
Various related datas in data are associated by identical dimensional, are combined by different dimensions, and N feature is calculated
Data, N are positive integer;Obtain the information gain-ratio of every characteristic in the N characteristic, wherein each information increases
Beneficial rate is used to indicate the size of corresponding each characteristic classification capacity;Preset Gauss model is called to input the N item special
N characteristic for levying high information gain-ratio in data, whether in the M click data have click practise fraud, n is little if determining
In the positive integer of N.
With reference to first aspect, described to obtain every feature in the N characteristic in some possible implementations
The information gain-ratio of data, comprising: every characteristic in the N characteristic is subjected to boxcox transformation, obtains every
The transformation results data of characteristic;Transformation results data based on every characteristic carry out information gain-ratio calculating, obtain
The information gain-ratio of every characteristic.
With reference to first aspect, in some possible implementations, the transformation results number based on every characteristic
According to feature selecting is carried out, the information gain-ratio of every characteristic is obtained, comprising: calculate the transformation results number of every characteristic
According to entropy, and calculate every characteristic conditional entropy of the transformation results data based on primitive class label, wherein it is described original
Whether class label is the authentic signature for clicking cheating as a click data;According to the entropy of every characteristic, the condition
The comentropy of entropy and the primitive class label obtains the information gain-ratio of every characteristic.
With reference to first aspect, described that preset Gauss model is called to input the N item in some possible implementations
Whether n characteristic of high information gain-ratio in characteristic, determining has the click clicked and practised fraud in the M click data
Data, comprising: n characteristic of high information gain-ratio is determined from the N characteristic;Call preset Gaussian mode
Type calculates the probability density of every characteristic in the n characteristic, and it is close to obtain the corresponding probability of the n characteristic
Spend product;Whether according to the probability density product, determining has click to practise fraud in the M click data.
With reference to first aspect, described according to the probability density product in some possible implementations, determine described in
Whether there is click to practise fraud in M click data, comprising: according to the probability density product, to obtain every point in M click data
Data corresponding probability density product in the probability density product is hit,;According to the probability density of every click data
Whether it is less than preset threshold in product, whether be click cheating, wherein the institute of every click data if determining that every click data is
Stating probability density product indicates every click data less than the preset threshold to click cheating.
Second aspect, the embodiment of the present application provide a kind of ad click cheating monitoring device, and described device includes: data
It obtains module and obtains the various related datas in the M click data for the M click data based on advertisement, M is positive
Integer;Feature extraction module is pressed for being associated the various related datas in the M click data by identical dimensional
Different dimensions are combined, and N characteristic is calculated, and N is positive integer;Feature selection module, for obtaining the N
The information gain-ratio of every characteristic in characteristic, wherein each information gain-ratio is for indicating corresponding each spy
Levy the size of data classification ability;Cheating determining module is clicked, for calling preset Gauss model to input the N characteristic
Whether according to n characteristic of middle high information gain-ratio, determining has click to practise fraud in the M click data, and n is no more than N
Positive integer.
In conjunction with second aspect, in some possible implementations, the feature selection module is also used to the N item
Every characteristic carries out boxcox transformation in characteristic, obtains the transformation results data of every characteristic;Based on every
The transformation results data of characteristic carry out feature selecting calculating, obtain the information gain-ratio of every characteristic.
In conjunction with second aspect, in some possible implementations, the feature selection module is also used to calculate every spy
The entropy of the transformation results data of data is levied, and calculates item of the transformation results data based on primitive class label of every characteristic
Part entropy, wherein whether the primitive class label is the authentic signature clicking cheating and hitting as a click data;According to every spy
The entropy of data, the comentropy of the conditional entropy and the primitive class label are levied, the information gain-ratio of every characteristic is obtained.
In conjunction with second aspect, in some possible implementations, the click cheating determining module is also used to from described
N characteristic of high information gain-ratio is determined in N characteristic;Preset Gauss model is called to calculate the n item special
The probability density for levying every characteristic in data, obtains the corresponding probability density product of the n characteristic;According to described
Whether probability density product, determining has click to practise fraud in the M click data.
In conjunction with second aspect, in some possible implementations, the click cheating determining module is also used to according to institute
Probability density product is stated, it is close to obtain every click data corresponding probability in the probability density product in M click data
Product is spent,;Whether it is less than preset threshold according in the probability density product of every click data, determines every click data
For whether be click cheating, wherein the probability density product of every click data be less than the preset threshold indicate every
Click data is to click cheating.
The third aspect, the embodiment of the present application provide a kind of electronic equipment, and the electronic equipment includes: processor, storage
Device, bus and communication module.The processor, the communication module and memory are connected by the bus.The memory,
For storing program.The processor, for storing first aspect and first aspect in the memory by calling
The cheating monitoring method of ad click described in any possible implementation.
Fourth aspect, the embodiment of the present application provide a kind of meter of non-volatile program code that can be performed with processor
The readable storage medium of calculation machine, said program code make the processor execute any possible of first aspect and first aspect
The cheating monitoring method of ad click described in implementation.
The beneficial effect of the embodiment of the present application is:
Feature extraction is carried out by parsing and being associated with identical dimensional, combination different dimensions to M click data of advertisement,
So that counting and N characteristic being calculated based on M click data, and then the letter of every characteristic can be calculated
Cease ratio of profit increase.Since information gain-ratio is used to indicate the size of tagsort ability, then by calling preset Gauss model
N characteristic of high information gain-ratio in the N characteristic is inputted, then can whether a little be determined in M click data
Hit cheating.It realizes and click cheating is monitored, avoid because of the resource cost caused by safeguarding and updating blacklist, and going out
When now new click fraudulent means, it can also be known using the feature that classification capacity is strong in the new click cheating is analyzed
Not, the safety that antipoints hits cheating protection is greatly improved.
To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate
Appended attached drawing, is described in detail below.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached
Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows the structural block diagram of a kind of electronic equipment of the application first embodiment offer;
Fig. 2 shows a kind of flow charts for ad click cheating monitoring method that the application second embodiment provides;
Fig. 3 shows a kind of structural block diagram of ad click cheating monitoring method of the application 3rd embodiment offer.
Specific embodiment
Below in conjunction with attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Ground description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.Usually exist
The component of the embodiment of the present application described and illustrated in attached drawing can be arranged and be designed with a variety of different configurations herein.Cause
This, is not intended to limit claimed the application's to the detailed description of the embodiments herein provided in the accompanying drawings below
Range, but it is merely representative of the selected embodiment of the application.Based on embodiments herein, those skilled in the art not into
Row goes out every other embodiment obtained under the premise of creative work, shall fall in the protection scope of this application.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.Term " first ", " the
Two " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.
First embodiment
Referring to Fig. 1, the embodiment of the present application provides electronic equipment 10, the electronic equipment 10 may include: memory 11,
Communication module 12, bus 13 and processor 14.Wherein, processor 14, communication module 12 and memory 11 are connected by bus 13.
Processor 14 is for executing the executable module stored in memory 11, such as computer program.Electronic equipment 10 shown in FIG. 1
Component and structure be it is illustrative, and not restrictive, as needed, electronic equipment 10 also can have other assemblies and
Structure
Wherein, memory 11 may include high-speed random access memory (Random Access Memory RAM),
It may further include non-labile memory (non-volatile memory), for example, at least a magnetic disk storage.This implementation
In example, memory 11 stores program required for executing ad click cheating monitoring method.
Bus 13 can be isa bus, pci bus or eisa bus etc..It is total that bus can be divided into address bus, data
Line, control bus etc..Only to be indicated with a four-headed arrow in Fig. 1, it is not intended that an only bus or one convenient for indicating
The bus of seed type.
Processor 14 may be a kind of IC chip, the processing capacity with signal.During realization, above-mentioned side
Each step of method can be completed by the integrated logic circuit of the hardware in processor 14 or the instruction of software form.Above-mentioned
Processor 14 can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network
Processor (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (DSP), specific integrated circuit
(ASIC), ready-made programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components.It may be implemented or execute disclosed each method, step and the logic diagram in the embodiment of the present invention.It is general
Processor can be microprocessor or the processor is also possible to any conventional processor etc..In conjunction with institute of the embodiment of the present invention
The step of disclosed method, can be embodied directly in hardware decoding processor and execute completion, or with the hardware in decoding processor
And software module combination executes completion.Software module can be located at random access memory, and flash memory, read-only memory may be programmed read-only
In the storage medium of this fields such as memory or electrically erasable programmable memory, register maturation.
Method performed by the device of stream process or definition that any embodiment of the embodiment of the present invention discloses can be applied to
In processor 14, or realized by processor 14.Processor 14 is stored in after receiving and executing instruction by the calling of bus 13
After program in memory 11, processor 14, which controls communication module 12 by bus 13, can then execute ad click cheating monitoring
The process of method.
Second embodiment
Present embodiments provide a kind of ad click cheating monitoring method, it should be noted that illustrate in the process of attached drawing
Out the step of, can execute in a computer system such as a set of computer executable instructions, although also, in flow charts
Logical order is shown, but in some cases, it can be with the steps shown or described are performed in an order that is different from the one herein.
It describes in detail below to the present embodiment.
Referring to Fig. 2, in ad click provided in this embodiment cheating monitoring method, ad click cheating monitoring side
Method includes: step S100, step S200, step S300 and step S400.
Step S100: the M click data based on advertisement obtains the various related datas in the M click data, M
For positive integer;
Step S200: the various related datas in the M click data are associated by identical dimensional, by different dimensional
Degree is combined, and N characteristic, the equal positive integer of N is calculated.
Step S300: the information gain-ratio of every characteristic in the N characteristic is obtained, wherein each information
Ratio of profit increase is used to indicate the size of corresponding each characteristic classification capacity.
Step S400: preset Gauss model is called to input n feature of high information gain-ratio in the N characteristic
Whether data, determining has click to practise fraud in the M click data, and n is the positive integer no more than N.
The scheme of the application will be described in detail below.
Step S100: the M click data based on advertisement obtains the various related datas in the M click data, M
For positive integer.
Electronic equipment can obtain the M click data an of advertisement, wherein M is positive integer.For example, being launched on website
There is the advertisement, then the clicking operation to the advertisement that each user executes on the web site, it can be so that running the website
The corresponding M click data of clicking operation that each user is stored on server, then the electronic equipment can be obtained from server
Obtain M click data of the advertisement offline.Also for example, electronic equipment can obtain 1000 click datas of advertisement.
In M click data, every click data may include: ad click time, user equipment UA (User
Agent, user agent), device id FA (Identifier For Advertising, advertising logo), User IP, referer
Data such as (reference fields).It is understood that the information content of the initial data in every click data or insufficient, in order to rich
Rich data information, electronic equipment can parse every click data, so as to from original in every click data
Data extract the various related datas of more information.For example, electronic equipment is to the user equipment UA in every click data
It is parsed, then can be parsed out equipment brand, device model, device systems and version, browser and version and device class
Etc. information, electronic equipment the User IP in every click data is parsed, then can be parsed out country where user, city
City, area, longitude and latitude, autonomous system, ISP (Internet Service Provider, Internet Service Provider) and mechanism
Etc. information.Wherein, for the User IP in every click data, electronic equipment can obtain first 24 of User IP, with after an action of the bowels
It is continuous when being calculated by the network address of the User IP.
The various related datas of M click data of acquisition make M click data have bigger information on the whole
Amount, convenient for the accuracy of subsequent calculating.For example, including 10 information in the initial data of every click data, then passing through
Parsing can make comprising 500 information in the initial data of every click data, thus the letter of M click data on the whole
Breath amount just becomes the bigger M*500 of information content from M*10.
Step S200: the various related datas in the M click data are associated by identical dimensional, by different dimensional
Degree is combined, and N characteristic, the equal positive integer of N is calculated.
In this present embodiment, it is not that every click data can parse all information, such as the hits having
The device model is lacked according to the click data that can be parsed device model, but have, therefore electronic equipment is after parsing, electronics is set
It is standby also to need to judge whether the every information parsed can be used.Specifically, the total amount due to click data is M item, therefore parse
The total amount of each information also should be M in the case where not lacking, then electronic equipment can be by judging in M hits
Whether the total amount of each information and the ratio of M are greater than default ratio in various related datas in, are parsed with determination every
Whether a information can be used, wherein default ratio can be 0.5.
Determining that the total amount of any one information that parses and the ratio of M be greater than default ratio, then electronic equipment can be with
Determine that any one information is available.In view of the situation, if the ratio is 1, illustrate that the total amount of any one information is
It does not lack.For example, the quantity of device systems and version that M click data parses is M, then illustrate the device systems and
Version does not lack, subsequent directly to use the M device systems and version.If but the ratio be between 0.5 to 1 it
Between, then illustrate what the total amount of any one information was missing from, therefore electronic equipment then needs the information according to preset rules by missing
It is supplemented.For example, the quantity for the country that M click data parses is 2/3M, then illustrate that the total amount of the country is missing from
, electronic equipment can supplement 1/3M country of missing according to preset rules, for example, 1/3M country of missing is
Supplement is Other, and the M device systems and version can be used in order to subsequent.
Determining that the total amount of any one information that parses and the ratio of M be less than default ratio, then electronic equipment can be with
Determine that any one information is disabled, therefore electronic equipment can abandon the not available information, to prevent subsequent use
The not available information carries out calculating the accuracy that will affect calculated result.
It should be noted that also for can be with one day for the click time in every click data convenient for subsequent calculating
In multiple periods be unit, such as will be be divided within 24 hours one day 4 periods, i.e., according to the work and rest rule of people point
Are as follows: morning, the morning, afternoon, at night.So then the click time in M click data was divided into each same period, with
Just subsequent calculating.
Also as a kind of optionally mode, multiple information of the various related datas parsed based on every click data
And be not easy to subsequent calculating, therefore the present embodiment can using M information of any sort in various related datas as an information group,
Each information group of identical dimensional is associated with at least one other information group again, and is associated with the feature for merging and forming different dimensions
Data.Electronic equipment is obtained with based on closing to the various related datas in M click data by identical dimensional in this way
Connection, obtains N characteristic of each dimension, wherein N is also positive integer.
For example, the characteristic of certain dimension can be with are as follows: in one day, the number of the distinct device UA from same User IP
Amount or the characteristic of certain dimension again can be with are as follows: in one day, the quantity of the distinct device IDFA from same User IP.Or
The characteristic of certain dimension can be with are as follows: in one day, the ratio of the different User IPs from same country is wherein, different
The ratio of User IP=different User IPs from same country quantity/all touching quantities from same country;Or
The characteristic of certain dimension can be with are as follows: in one day in different time sections, the different device models from same User IP
Quantity variance;Or the characteristic of certain dimension can be with are as follows: in one day in different time sections, from same User IP
The standard deviation of different IDFA quantity;Or the characteristic of certain dimension can be with are as follows: in one day, the difference from same User IP
Equipment brand number is distributed in the features such as the accounting of intraday multiple periods.
In this way, passing through the N characteristic that M click data is counted to and obtained each dimension, it can by advertisement
Original 10 attribute datas of 1000 clicks are converted to the characteristic of 400 different dimensions, and the characteristic of each dimension
Data volume in is 1000.
Step S300: the information gain-ratio of every characteristic in the N characteristic is obtained, wherein each information
Ratio of profit increase is used to indicate the size of corresponding each characteristic classification capacity.
The present embodiment can be using the information gain-ratio for calculating every characteristic, to pass through the information of every characteristic
Ratio of profit increase filters out the strong data of classification capacity, so that later use these classification capacities strong characteristics is counted
It calculates, while realizing effect, reduces the load of electronic equipment.
Specifically, since subsequent calculating is to use Gauss model, and every characteristic is discomfort in N characteristic
Long-tail for Gauss model is distributed, then for convenient for the subsequent calculating using Gauss model, electronic equipment can be by N feature
Every characteristic carries out boxcox transformation in data, obtains the transformation results data of every characteristic.It is appreciated that
It is that the transformation results data of every characteristic are then closer to normal distribution, so as to apply it to Gauss model
In calculated.
Optionally, the formula of boxcox transformation is as shown in following formula 1:
Wherein, wherein y is characterized data, and y (λ) is characterized the transformation results data of data, and λ is the ginseng of boxcox transformation
Number.
In the present embodiment, electronic equipment, can be with base using the transformation results data of each characteristic after boxcox transformation
Information gain-ratio calculating is carried out in the transformation results data of every characteristic, to obtain the information gain of every characteristic
Rate can thus determine the high characteristic of information gain-ratio.
Optionally, electronic equipment can calculate the entropy of the transformation results data of every characteristic, and calculate every spy
The transformation results data of data are levied based on the conditional entropy of primitive class label, wherein primitive class label is as a click data
The no authentic signature to click cheating.
It is to be appreciated that the transformation results data of every characteristic of calculating are just based on the conditional entropy of primitive class label
The information content of primitive class label is influenced in the transformation results data for determining every characteristic, if influence is bigger, every
The calculated conditional entropy of transformation results data of characteristic is smaller, and illustrates that the classification capacity of every every characteristic is stronger.
Optionally, the comentropy of primitive class label has also been precomputed in electronic equipment, therefore electronic equipment is based on every
The entropy of characteristic, each characteristic relative to the conditional entropy of primitive class label and the comentropy of primitive class label, then
The information gain-ratio for obtaining every characteristic can be calculated.
Optionally, the formula of the information gain-ratio of every characteristic is obtained as shown in following formula 2 and formula 3:
Wherein, H (X) indicates the entropy of stochastic variable X;Stochastic variable X can be the transformation results data of every characteristic;
The probability value of pi expression discrete random variable;gR(D, A) indicates the information gain-ratio of every characteristic;H (D) indicates primitive class
The comentropy of label, and H (D | A) indicate conditional entropy of the transformation results data of every characteristic relative to primitive class label;H
(A) entropy of the transformation results data of every characteristic is indicated.
Step S400: preset Gauss model is called to input n feature of high information gain-ratio in the N characteristic
Whether data, determining has click to practise fraud in the M click data, and n is the positive integer no more than N.
Since each information gain-ratio is used to indicate the size of corresponding each characteristic classification capacity, then electronics is set
It is standby screening to be carried out to characteristic to there is the low characteristic of expenditure based on the information gain-ratio for obtaining every characteristic
According to abandoning, the strong characteristic of classification capacity is left.
Optionally, information gain-ratio threshold value is previously provided in electronic equipment, which can be electronics
Equipment voluntarily dynamic regulation, for example, electronic equipment is based on being currently set dynamically the ratio of profit increase to the discrimination of ad click cheating
Threshold value, for example, electronic equipment is currently lower to the discrimination of ad click cheating, then based on the lower result of discrimination and its
Its some reference factors, then dynamically ratio of profit increase threshold value can be arranged it is higher, it is on the contrary then ratio of profit increase threshold value is arranged
It is lower.So, electronic equipment is based on the ratio of profit increase threshold value, then can determine information gain-ratio from N characteristic
It is used greater than n characteristic of the high information gain-ratio of the ratio of profit increase threshold value, and by n characteristic of the high information gain-ratio
In subsequent calculating, wherein n is the positive integer no more than N.For example, by the screening based on ratio of profit increase threshold value, it can be by advertisement
The characteristic screening of 400 different dimensions of 1000 click datas is 20 different dimensionals of 1000 click datas of advertisement
The characteristic of the high information gain-ratio of degree, wherein n 1000*20.
Optionally, electronic equipment can also be according to calling preset Gauss model to input the n characteristic, to obtain
The corresponding n probability density of n characteristic, and obtain the corresponding probability density product of n characteristic.
Optionally, Gauss model is called to calculate the formula of the probability density product of the n characteristic as described in following formula 4:
Wherein, μ and σ indicates the mean value and variance of normal distribution, and p (x) is each probability density product.
Optionally, the corresponding probability density product of n characteristic based on acquisition, electronic equipment is according to the probability density
Product, so that it may which whether determine has click to practise fraud in the M click data.
In the present embodiment, preset threshold is also previously provided in electronic equipment, which can be for electronic equipment certainly
Mobile state is adjusted, for example, the preset threshold is set dynamically based on the discrimination currently practised fraud to ad click in electronic equipment, than
Such as, electronic equipment is currently lower to the discrimination of ad click cheating, then based on the lower result of discrimination and others one
A little reference factors, then can dynamic preset threshold setting it is lower, it is on the contrary then ratio of profit increase threshold value is arranged higher.Base
In this, whether electronic equipment is that can then determine every less than preset threshold according in the probability density product of every click data
Click data be whether be to click cheating.Wherein, the probability density product of every click data, which is less than preset threshold, indicates every
Click data is to click cheating.
It will be appreciated that probability density product is smaller, then it represents that the corresponding hits of corresponding each characteristic
Cheating is clicked according to being more possible to.Therefore, corresponding hits are determined based on the comparison of probability density sum of products preset threshold
According to whether being to click cheating to identify that emerging cheating technology is based on probability and can also be identified, so that electronics
Equipment can effectively detect zero-day attack.
3rd embodiment
Referring to Fig. 3, the embodiment of the present application provides a kind of ad click cheating monitoring device 100, which makees
Disadvantage monitoring device 100 is applied to electronic equipment, and ad click cheating monitoring device 100 includes.
Data obtaining module 110 obtains various in the M click data for the M click data based on advertisement
Related data, M are positive integer.
Feature extraction module 120, for closing the various related datas in the M click data by identical dimensional
Connection, is combined by different dimensions, N characteristic is calculated, N is positive integer.
Feature selection module 130, for obtaining the information gain-ratio of every characteristic in the N characteristic,
In, each information gain-ratio is used to indicate the size of corresponding each characteristic classification capacity.
Cheating determining module 140 is clicked, for calling preset Gauss model to input high information in the N characteristic
Whether n characteristic of ratio of profit increase, determining has click to practise fraud in the M click data, and n is the positive integer no more than N.
Wherein, the feature selection module 120 is also used to carry out every characteristic in the N characteristic
Boxcox transformation, obtains the transformation results data of every characteristic;Transformation results data based on every characteristic carry out
Ratio of profit increase calculates, and obtains the information gain-ratio of every characteristic.
And the feature selection module 120, it is also used to calculate the entropy of the transformation results data of every characteristic, with
And calculate conditional entropy of the transformation results data based on primitive class label of every characteristic, wherein the primitive class label is made
It whether is the authentic signature for clicking cheating for a click data;According to the entropy of every characteristic, the conditional entropy and described
The comentropy of primitive class label obtains the information gain-ratio of every characteristic.
And the click cheating determining module 130, the n of high information gain-ratio is determined from the N characteristic
Characteristic;It calls preset Gauss model to calculate the probability density of every characteristic in the n characteristic, obtains
The corresponding probability density product of the n characteristic;According to the probability density product, determine in the M click data
Whether there is click to practise fraud.
And the click cheating determining module 130, it is also used to obtain M hits according to the probability density product
Every click data corresponding probability density product in the probability density product in,;According to the institute of every click data
It states in probability density product and whether is less than preset threshold, whether be click cheating, wherein every point if determining that every click data is
The probability density product for hitting data, which is less than the preset threshold, indicates every click data to click cheating.
It should be noted that due to it is apparent to those skilled in the art that, for the convenience and letter of description
Clean, system, the specific work process of device and unit of foregoing description can be with reference to corresponding in preceding method embodiment
Journey, details are not described herein.
It should be understood by those skilled in the art that, the embodiment of the present application can provide as the production of method, system or computer program
Product.Therefore, in terms of the embodiment of the present application can be used complete hardware embodiment, complete software embodiment or combine software and hardware
Embodiment form.Moreover, it wherein includes computer available programs generation that the embodiment of the present application, which can be used in one or more,
The meter implemented in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of code
The form of calculation machine program product.
In conclusion the embodiment of the present application provides a kind of ad click cheating monitoring method and device, method include: base
In M click data of advertisement, the various related datas in M click data are obtained, M is positive integer;It will be in M click data
Various related datas be associated by identical dimensional, be combined by different dimensions, N characteristic, N be calculated
For positive integer;Obtain the information gain-ratio of every characteristic in N characteristic, wherein each information gain-ratio is used for table
Show the size of corresponding each characteristic classification capacity;Preset Gauss model is called to input high information in N characteristic
N characteristic of ratio of profit increase determines whether there is the click data clicked and practised fraud in M click data, and n is just no more than N
Integer.
Feature extraction is carried out by parsing to M click data of advertisement and being associated with by identical dimensional, so that being based on M item point
It hits data and obtains N characteristic of each dimension, and then the information gain-ratio of every characteristic can be calculated.By
It is used to indicate the size of characteristic classification capacity in information gain-ratio, then by calling preset Gauss model to calculate the N
N characteristic of high information gain-ratio, then accurately can determine in M click data whether there is click in characteristic
The click data of cheating.It realizes and click cheating is monitored, avoid because of the resource consumption caused by safeguarding and updating blacklist
Take, and when there are new click fraudulent means, it can also be with using classification capacity is strong in the new click cheating feature is analyzed
It is identified, the safety that antipoints hits cheating protection is greatly improved.
The above is only preferred embodiment of the present application, are not intended to limit this application, for those skilled in the art
For member, various changes and changes are possible in this application.Within the spirit and principles of this application, it is made it is any modification,
Equivalent replacement, improvement etc., should be included within the scope of protection of this application.It should also be noted that similar label and letter are under
Similar terms are indicated in the attached drawing in face, therefore, once being defined in a certain Xiang Yi attached drawing, are not then needed in subsequent attached drawing
It is further defined and explained.
More than, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, and it is any to be familiar with
Those skilled in the art within the technical scope of the present application, can easily think of the change or the replacement, and should all cover
Within the protection scope of the application.Therefore, the protection scope of the application should be subject to the protection scope in claims.
Claims (10)
- The monitoring method 1. a kind of ad click is practised fraud, which is characterized in that the described method includes:The M click data based on advertisement, obtains the various related datas in the M click data, and M is positive integer;Various related datas in the M click data are associated by identical dimensional, are combined by different dimensions, are united N characteristic is calculated, N is positive integer;Obtain the information gain-ratio of every characteristic in the N characteristic, wherein each information gain-ratio is for indicating The size of corresponding each characteristic classification capacity;Call preset Gauss model to input n characteristic of high information gain-ratio in the N characteristic, determine described in Whether there is click to practise fraud in M click data, n is the positive integer no more than N.
- The monitoring method 2. ad click according to claim 1 is practised fraud, which is characterized in that described to obtain the N feature The information gain-ratio of every characteristic in data, comprising:Every characteristic in the N characteristic is subjected to boxcox transformation, obtains the transformation results of every characteristic Data;Transformation results data based on every characteristic carry out feature selecting, obtain the information gain-ratio of every characteristic.
- The monitoring method 3. ad click according to claim 2 is practised fraud.It is characterized in that, described be based on every characteristic Transformation results data carry out feature selecting, obtain every characteristic information gain-ratio, comprising:The entropy of the transformation results data of every characteristic is calculated, and the transformation results data of every characteristic of calculating are based on The conditional entropy of primitive class label, wherein whether the primitive class label is the true mark clicked of practising fraud as a click data Note;According to the entropy of every characteristic, the comentropy of the conditional entropy and the primitive class label, every characteristic is obtained Information gain-ratio.
- 4. the cheating of ad click described in -3 any claims monitoring method according to claim 1, which is characterized in that described to call in advance If Gauss model input n characteristic of high information gain-ratio in the N characteristic, determine the M hits Whether there is click to practise fraud in, comprising:N characteristic of high information gain-ratio is determined from the N characteristic;It calls preset Gauss model to calculate the probability density of every characteristic in the n characteristic, obtains the n item The corresponding probability density product of characteristic;Whether according to the probability density product, determining has click to practise fraud in the M click data.
- The monitoring method 5. ad click according to claim 4 is practised fraud, which is characterized in that multiplied according to the probability density Whether product, determining has click to practise fraud in the M click data, comprising:According to the probability density product, it is right in the probability density product to obtain every click data in M click data The probability density product answered;Whether it is less than preset threshold according in the probability density product of every click data, determines that every click data is yes It is no to practise fraud to click, wherein the probability density product of every click data, which is less than the preset threshold, indicates every click Data are to click cheating.
- The monitoring device 6. a kind of ad click is practised fraud, which is characterized in that described device includes:Data obtaining module obtains the various dependency numbers in the M click data for the M click data based on advertisement According to M is positive integer;Feature extraction module, for being associated the various related datas in the M click data by identical dimensional, by not It is combined with dimension, N characteristic is calculated, N is positive integer;Feature selection module, for obtaining the information gain-ratio of every characteristic in the N characteristic, wherein each Information gain-ratio is used to indicate the size of corresponding each characteristic classification capacity;Cheating determining module is clicked, for calling preset Gauss model to input high information gain-ratio in the N characteristic N characteristic, whether in the M click data have click practise fraud, n is positive integer no more than N if determining.
- The monitoring device 7. ad click according to claim 6 is practised fraud, which is characterized in thatThe feature selection module is also used to every characteristic in the N characteristic carrying out boxcox transformation, obtain The transformation results data of every characteristic;Transformation results data based on every characteristic carry out information gain-ratio calculating, Obtain the information gain-ratio of every characteristic.
- The monitoring device 8. ad click according to claim 7 is practised fraud.It is characterized in that,The feature selection module is also used to calculate the entropy of the transformation results data of every characteristic, and calculates every spy The transformation results data of data are levied based on the conditional entropy of primitive class label, wherein the primitive class label is as a hits According to whether be click cheating authentic signature;According to the entropy of every characteristic, the conditional entropy and the primitive class label Comentropy obtains the information gain-ratio of every characteristic.
- 9. according to the cheating monitoring device of ad click described in any claim of claim 6-8, which is characterized in thatThe click cheating determining module, is also used to determine n feature of high information gain-ratio from the N characteristic Data;It calls preset Gauss model to calculate the probability density of every characteristic in the n characteristic, obtains the n The corresponding probability density product of characteristic;Whether according to the probability density product, determining has in the M click data Click cheating.
- The monitoring device 10. ad click according to claim 9 is practised fraud, which is characterized in thatThe click cheating determining module is also used to obtain every click in M click data according to the probability density product Data corresponding probability density product in the probability density product;According to the probability density product of every click data In whether be less than preset threshold, whether determine that every click data is is to click cheating, wherein every click data it is described general Rate density product, which is less than the preset threshold, indicates every click data to click cheating.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811040607.7A CN109146574A (en) | 2018-09-06 | 2018-09-06 | Ad click cheating monitoring method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811040607.7A CN109146574A (en) | 2018-09-06 | 2018-09-06 | Ad click cheating monitoring method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109146574A true CN109146574A (en) | 2019-01-04 |
Family
ID=64827532
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811040607.7A Pending CN109146574A (en) | 2018-09-06 | 2018-09-06 | Ad click cheating monitoring method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109146574A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109905738A (en) * | 2019-03-26 | 2019-06-18 | 湖南快乐阳光互动娱乐传媒有限公司 | Video ads show monitoring method and device, storage medium and electronic equipment extremely |
CN111612550A (en) * | 2020-05-28 | 2020-09-01 | 北京学之途网络科技有限公司 | Advertisement trigger cheating identification method and device, electronic equipment and storage medium |
CN112202807A (en) * | 2020-10-13 | 2021-01-08 | 北京明略昭辉科技有限公司 | Grayscale replacement method and device for IP (Internet protocol) blacklist, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103248955A (en) * | 2013-04-22 | 2013-08-14 | 深圳Tcl新技术有限公司 | Identity recognition method and device based on intelligent remote control system |
CN106355431A (en) * | 2016-08-18 | 2017-01-25 | 晶赞广告(上海)有限公司 | Detection method, device and terminal for cheating traffic |
CN106612216A (en) * | 2015-10-27 | 2017-05-03 | 北京国双科技有限公司 | Method and apparatus of detecting website access exception |
CN106657141A (en) * | 2017-01-19 | 2017-05-10 | 西安电子科技大学 | Android malware real-time detection method based on network flow analysis |
CN106815452A (en) * | 2015-11-27 | 2017-06-09 | 苏宁云商集团股份有限公司 | A kind of cheat detection method and device |
CN107168854A (en) * | 2017-06-01 | 2017-09-15 | 北京京东尚科信息技术有限公司 | Detection method, device, equipment and readable storage medium storing program for executing are clicked in Internet advertising extremely |
CN108389109A (en) * | 2018-02-11 | 2018-08-10 | 中国民航信息网络股份有限公司 | A kind of suspicious order feature extracting method of civil aviaton based on composite character selection algorithm |
-
2018
- 2018-09-06 CN CN201811040607.7A patent/CN109146574A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103248955A (en) * | 2013-04-22 | 2013-08-14 | 深圳Tcl新技术有限公司 | Identity recognition method and device based on intelligent remote control system |
CN106612216A (en) * | 2015-10-27 | 2017-05-03 | 北京国双科技有限公司 | Method and apparatus of detecting website access exception |
CN106815452A (en) * | 2015-11-27 | 2017-06-09 | 苏宁云商集团股份有限公司 | A kind of cheat detection method and device |
CN106355431A (en) * | 2016-08-18 | 2017-01-25 | 晶赞广告(上海)有限公司 | Detection method, device and terminal for cheating traffic |
CN106657141A (en) * | 2017-01-19 | 2017-05-10 | 西安电子科技大学 | Android malware real-time detection method based on network flow analysis |
CN107168854A (en) * | 2017-06-01 | 2017-09-15 | 北京京东尚科信息技术有限公司 | Detection method, device, equipment and readable storage medium storing program for executing are clicked in Internet advertising extremely |
CN108389109A (en) * | 2018-02-11 | 2018-08-10 | 中国民航信息网络股份有限公司 | A kind of suspicious order feature extracting method of civil aviaton based on composite character selection algorithm |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109905738A (en) * | 2019-03-26 | 2019-06-18 | 湖南快乐阳光互动娱乐传媒有限公司 | Video ads show monitoring method and device, storage medium and electronic equipment extremely |
CN109905738B (en) * | 2019-03-26 | 2022-03-08 | 湖南快乐阳光互动娱乐传媒有限公司 | Video advertisement abnormal display monitoring method and device, storage medium and electronic equipment |
CN111612550A (en) * | 2020-05-28 | 2020-09-01 | 北京学之途网络科技有限公司 | Advertisement trigger cheating identification method and device, electronic equipment and storage medium |
CN112202807A (en) * | 2020-10-13 | 2021-01-08 | 北京明略昭辉科技有限公司 | Grayscale replacement method and device for IP (Internet protocol) blacklist, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10929879B2 (en) | Method and apparatus for identification of fraudulent click activity | |
CN107168854B (en) | Internet advertisement abnormal click detection method, device, equipment and readable storage medium | |
CN108737535B (en) | Message pushing method, storage medium and server | |
US8620746B2 (en) | Scoring quality of traffic to network sites | |
CN106204108B (en) | The anti-cheat method of advertisement and the anti-cheating device of advertisement | |
CN102831218B (en) | Method and device for determining data in thermodynamic chart | |
CN109146574A (en) | Ad click cheating monitoring method and device | |
CN103605714B (en) | The recognition methods of website abnormal data and device | |
CN107330718B (en) | Media anti-cheating method and device, storage medium and terminal | |
CN108509583A (en) | A kind of information-pushing method, server and computer readable storage medium | |
CN108460627A (en) | Marketing activity scheme method for pushing, device, computer equipment and storage medium | |
TW201826188A (en) | Data processing method and system | |
CN111738770B (en) | Advertisement abnormal flow detection method and device | |
WO2020257991A1 (en) | User identification method and related product | |
CN109146581A (en) | A kind of resource allocation methods, device and readable storage medium storing program for executing | |
CN105631708B (en) | Information processing method and device | |
CN108694603A (en) | A kind of method and apparatus of advertisement price | |
CN112101691A (en) | Method and device for dynamically adjusting risk level and server | |
CN112650921A (en) | Object recommendation method, device, equipment and storage medium | |
CN111563765A (en) | Cheating user screening method, device and equipment and readable storage medium | |
CN101449284A (en) | Scoring quality of traffic to network sites using interrelated traffic parameters | |
CN106878410A (en) | The detection method and device of a kind of request of data | |
CN113011912A (en) | Media information processing method, device, electronic equipment and storage medium | |
CN112468444A (en) | Internet domain name abuse identification method and device independent of content analysis | |
CN113191800B (en) | Method and device for counting advertisement click rate on APP |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190104 |
|
RJ01 | Rejection of invention patent application after publication |