CN109034867B - Click traffic detection method and device and storage medium - Google Patents

Click traffic detection method and device and storage medium Download PDF

Info

Publication number
CN109034867B
CN109034867B CN201810644161.2A CN201810644161A CN109034867B CN 109034867 B CN109034867 B CN 109034867B CN 201810644161 A CN201810644161 A CN 201810644161A CN 109034867 B CN109034867 B CN 109034867B
Authority
CN
China
Prior art keywords
click
frequency
detected
flow
traffic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810644161.2A
Other languages
Chinese (zh)
Other versions
CN109034867A (en
Inventor
周忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810644161.2A priority Critical patent/CN109034867B/en
Publication of CN109034867A publication Critical patent/CN109034867A/en
Application granted granted Critical
Publication of CN109034867B publication Critical patent/CN109034867B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0248Avoiding fraud
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Abstract

The application discloses a click traffic detection method, which comprises the following steps: acquiring a click time sequence corresponding to click traffic to be detected, wherein the click time sequence comprises click volumes corresponding to a plurality of preset statistical periods respectively; performing time-frequency transformation on the click time sequence to obtain a frequency domain sequence, wherein the frequency domain sequence comprises amplitudes corresponding to a plurality of frequencies; and determining whether the click flow corresponding to the click time sequence is suspicious according to the proportional relation between the amplitude of each frequency which is greater than the frequency threshold in the frequency domain sequence and the amplitude of each frequency in the frequency domain sequence. The application also provides a corresponding device and a storage medium.

Description

Click traffic detection method, device and storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to a click traffic detection method, device and storage medium.
Background
Currently, with the rapid growth of internet users, especially mobile internet users, internet advertisements are a new form of advertisement placement, and the placement amount of internet advertisements also shows a rapid growth trend. Most internet advertisements are charged by click rate, and under the drive of benefits, malicious operations are performed on advertisements delivered on flow in a cheating mode, so that the behavior of click rate is improved, and the benefits of advertisers are damaged.
For example, in an internet advertising ecosystem, a traffic owner provides various forms of internet-based services (such as providing news, media play, online games, and the like) to users, and an advertising system delivers advertisements to services used by users (such as applications used by users, or web pages accessed by users) during the use of the services by the users. When a user clicks on an advertisement, the click rate of the advertisement is increased, and the traffic owner consumes the click rate of the advertisement based on the advertisement resources owned by the traffic owner (such as the advertisement in an application, the advertisement slot in a webpage and the like). However, some traffic owners may perform malicious operations on advertisements delivered on traffic in a cheating manner to improve advertisement behavior indexes such as click rate and the like in order to increase the click rate of advertisements delivered on advertisement resources owned by users and obtain more advertisement revenue.
The advertisement clicks are performed through a simulator, an automatic script and the like, and the motivation of the clicks is false, so that any advertisement conversion effect cannot be generated, and the benefit of an advertiser is damaged.
Disclosure of Invention
The embodiment of the application provides a click traffic detection method, which comprises the following steps:
acquiring a click time sequence corresponding to click traffic to be detected, wherein the click time sequence comprises click volumes corresponding to a plurality of preset statistical periods respectively;
performing time-frequency transformation on the click time sequence to obtain a frequency domain sequence, wherein the frequency domain sequence comprises amplitudes corresponding to a plurality of frequencies;
and determining whether the click flow corresponding to the click time sequence is suspicious according to the proportional relation between the amplitude of each frequency greater than the frequency threshold in the frequency domain sequence and the amplitude of each frequency in the frequency domain sequence.
The embodiment of the application provides a click flow detection device, which comprises:
the acquisition unit is used for acquiring a click time sequence corresponding to the click traffic to be detected, and the click time sequence comprises click volumes corresponding to a plurality of preset statistical periods respectively;
the time-frequency transformation unit is used for carrying out time-frequency transformation on the click time sequence to obtain a frequency domain sequence, and the frequency domain sequence comprises amplitudes corresponding to a plurality of frequencies;
and the determining unit is used for determining whether the click flow corresponding to the click time sequence is suspicious according to the proportional relation between the amplitude of each frequency which is greater than the frequency threshold in the frequency domain sequence and the amplitude of each frequency in the frequency domain sequence.
The present examples provide a computer-readable storage medium storing computer-readable instructions that can cause at least one processor to perform the method as described above.
By adopting the scheme provided by the application, the detection on the abnormal click flow is more accurate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a diagrammatic illustration of a system architecture to which some examples of the present application relate;
FIG. 2 is a schematic flow chart diagram of a click traffic detection method according to some embodiments of the present application;
FIG. 3 is a schematic flow chart diagram of a click traffic detection method according to some embodiments of the present application;
FIG. 4 is a schematic diagram of a structure of a click time series in some embodiments of the present application;
FIG. 5 is a schematic diagram of a structure of a frequency domain sequence in some embodiments of the present application;
FIG. 6 is a schematic flow chart diagram of a click traffic detection method according to some embodiments of the present application;
FIG. 7 is a schematic structural diagram of a click traffic detection device according to some embodiments of the present application; and
fig. 8 is a schematic diagram of a computing device composition structure in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
For convenience of description, the following description will briefly describe terms involved in the respective embodiments.
An advertiser: an advertiser refers to a user or a service provider who pays for the click through amount of advertising. Advertisers expect that each paid ad click on the advertiser is an effective click on by the actual user, rather than a cheating click.
Flow rate is main: traffic refers primarily to the carrier, usually media, websites, or software, that provides the user's traffic. In the WeChat advertising platform, the traffic owner may be a public number that has a certain amount of vermicelli. The trafficker can participate in the profit sharing of the advertisement, and under the same advertisement exposure, the higher the click rate is, the higher the profit is, so that the trafficker has a strong cheating incentive to promote the click rate of the advertisement.
Machine cheating: the general advertisement charges the advertiser according to the times of exposure or click, and the machine cheating means that the advertiser fees are cheated by false advertisement exposure and click behaviors caused by technical means such as scripts and simulators.
In some examples, the underlying code is analyzed to determine if the user App is instrumented with malicious code segments to be machine controlled. In the scheme, the acquisition difficulty of the malicious code is high, and in addition, the bottom-layer code needs to be translated into the code which can be understood by a human through a decompilation process, so that the manual verification cost is high. In other examples, the machine user is determined by analyzing features for anomalies. For example, whether the user is a machine user is determined by analyzing whether there is an abnormality in characteristics such as user gender, nickname, region, model distribution, and the like. A disadvantage of this solution is the easy omission of features.
In order to more effectively detect machine cheating, the application provides a click traffic detection method, a device and a storage medium. FIG. 1 is a block diagram of an operating environment 100 for click traffic detection in an embodiment of the invention. As shown in fig. 1, traffic detection provider 102a provides a traffic detection server 112a. The traffic detection server 112a provides traffic detection services to a plurality of users operating their respective user devices 104 (e.g., user devices 104 a-c) via one or more networks 106.
In some embodiments, each user connects to the traffic detection server 112a through a client application 108 (e.g., client applications 108 a-c) executing on the user device 104. Wherein the client application 108 may be a social application, such as WeChat, QQ, microblog, etc.; the client application 108 may also be a multimedia application such as a video application, an article application, and the like; the client application 108 may also be a mailbox application. The advertisement delivery system delivers advertisements on the traffic on the client application 108, and when an end user clicks on an advertisement presented on the client application 108, the client application 108 sends a click log to the traffic detection server 112a, and the traffic detection server 112a stores the click log in the log database 110 a. The traffic detection server 112a detects machine cheating according to the saved click log.
Examples of user equipment 104 include, but are not limited to, a palmtop computer, a wearable computing device, a Personal Digital Assistant (PDA), a tablet computer, a laptop computer, a desktop computer, a mobile phone, a smartphone, an Enhanced General Packet Radio Service (EGPRS) mobile phone, a media player, a navigation device, a game console, a television, or a combination of any two or more of these or other data processing devices.
Examples of the one or more networks 106 include a Local Area Network (LAN) and a Wide Area Network (WAN) such as the internet. Alternatively, one or more of the networks 106 may be implemented using any well-known network protocol, including various wired or wireless protocols such as Ethernet, universal Serial Bus (USB), FIREWIRE, global System for Mobile communications (GSM), enhanced Data GSM Environment (EDGE), code Division Multiple Access (CDMA), time Division Multiple Access (TDMA), bluetooth, wiFi, voice over IP (VoIP), wi-MAX, or any other suitable communication protocol.
The traffic detection server 112a may be implemented on one or more stand-alone data processing devices or a distributed computer network. In some embodiments, the traffic detection server 112a may also use services of various virtual appliances and/or third party service providers (e.g., third party cloud service providers) to provide underlying computing and/or infrastructure resources of the traffic detection server 112a.
Each user device 104 optionally includes one or more internal peripheral modules or may be connected to one or more peripherals (e.g., navigation systems, health monitors, climate controllers, smart sports equipment, bluetooth headsets, smart watches, etc.) via wires or wirelessly.
In some examples, the present application provides a click traffic detection method, which is performed by the traffic detection server 112a. As shown in fig. 2, the method comprises the following steps:
s201: and acquiring a click time sequence corresponding to the click traffic to be detected, wherein the click time sequence comprises click volumes corresponding to a plurality of preset statistical periods respectively.
In some examples, a click log corresponding to a click behavior is obtained and saved, wherein the click log includes at least one of the following parameters: the flow rate control method comprises the steps of obtaining click time corresponding to click behaviors, user identifications corresponding to the click behaviors and flow rate main identifications corresponding to the click behaviors;
determining a plurality of click logs corresponding to the click traffic to be detected;
and determining click volumes corresponding to the preset statistical cycles according to the click logs. To determine a time series of clicks.
In some examples, the to-be-detected click traffic is a traffic-dominant click traffic, and when determining multiple click logs corresponding to the to-be-detected click traffic, the method includes the steps of:
determining a main flow identifier to be detected;
selecting a click log corresponding to the flow main identification from the stored click logs;
when the click quantity corresponding to each preset statistical period is determined according to the click logs, the method comprises the following steps:
and aiming at each preset counting period, determining one or more click logs corresponding to the counting period in the click logs corresponding to the flow main identification, and taking the number of the determined one or more click logs as the click quantity corresponding to the counting period.
In some examples, when the click traffic to be detected is the click traffic of the traffic master, and a plurality of click logs corresponding to the click traffic to be detected are determined, the method may further include the following steps:
determining a main flow identifier to be detected;
selecting a click log corresponding to the flow main identification from click logs corresponding to each preset statistical period;
when the click quantity corresponding to each preset statistical period is determined according to the click logs, the method comprises the following steps:
and aiming at each preset statistical period, taking the number of the selected click logs corresponding to the flow main identification as the click quantity corresponding to the statistical period.
In some examples, the click traffic to be detected is click traffic of one user, and when determining multiple click logs corresponding to the click traffic to be detected, the method includes the steps of:
determining a user identifier to be detected;
selecting a click log corresponding to the user identifier from the stored click logs;
when the click quantity corresponding to each preset statistical cycle is determined according to the click logs, the method comprises the following steps:
and aiming at each preset counting period, determining one or more click logs corresponding to the counting period in the click logs corresponding to the user identification, and taking the number of the determined one or more click logs as the click quantity corresponding to the counting period.
In some examples, when the click traffic to be detected is click traffic of one user, and when determining multiple click logs corresponding to the click traffic to be detected, the method includes the steps of:
determining a user identifier to be detected;
selecting a click log corresponding to the user identifier from click logs corresponding to each preset statistical period;
when the click rate corresponding to each preset statistical cycle is determined according to the click logs, the method comprises the following steps:
and aiming at each preset statistical period, taking the number of the selected click logs corresponding to the user identification as the click quantity corresponding to the statistical period.
S202: and performing time-frequency transformation on the click time sequence to obtain a frequency domain sequence, wherein the frequency domain sequence comprises amplitudes corresponding to a plurality of frequencies.
In some examples, the time-frequency transform is a discrete fourier transform or a wavelet transform.
S203: and determining whether the click flow corresponding to the click time sequence is suspicious according to the proportional relation between the amplitude of each frequency which is greater than the frequency threshold in the frequency domain sequence and the amplitude of each frequency in the frequency domain sequence.
In some examples, a ratio of the sum of the amplitudes of the frequencies greater than a frequency threshold to the sum of the amplitudes of the frequencies in the frequency domain sequence is determined;
and determining whether the click flow is suspicious according to the ratio.
In some examples, a ratio of a sum of squares of the magnitude of each frequency greater than a frequency threshold to a sum of squares of the magnitude of each frequency in the frequency domain sequence is determined;
and determining whether the click traffic is suspicious according to the ratio.
In some examples, if the ratio is greater than a ratio threshold, determining that the click traffic corresponding to the click behavior sequence is suspicious;
otherwise, the click traffic corresponding to the click behavior sequence is not suspicious.
By adopting the click traffic detection method provided by the application, the click time sequence corresponding to the click traffic is subjected to time-frequency transformation to obtain a frequency domain sequence, and whether the click traffic is suspicious or not is determined according to the characteristics of the frequency domain sequence. Specifically, whether machine cheating behaviors exist or not is judged by detecting whether high-frequency periodic time sequences exist in advertisement click time sequences within a period of time, so that the detection on abnormal clicks is more accurate.
The principle according to which the click traffic detection method provided by the application is based includes:
(1) Principle of information asymmetry
Usually, the actions such as exposure or advertisement clicking occur randomly, so the traffic owner and the end user cannot control the exposure or click rate of the advertisement at each moment, and the traffic owner and the end user cannot know the time distribution of the click rate of the large-disk advertisement. Therefore, when a cheating traffic owner or an end user clicks on an advertisement through machine cheating, the generated advertisement click time sequence may be different from the click time sequence of a normal user.
(2) Principle of maximum profit
When a cheating traffic owner or a cheating terminal user cheats and clicks an advertisement through a machine, a large number of advertisement clicks can be performed through an automatic means for maximizing profits, namely, a large number of advertisement clicks can be performed in a short time, so that an advertisement click time sequence can present certain high frequency.
According to the principle, the abnormal flow detection is completed by detecting whether high-frequency periodic clicks exist in the advertisement click time sequence of the click flow in a period of time.
Fig. 3 is a flowchart illustrating a click traffic detection method according to some embodiments of the present application, which is executed by the traffic detection server 112a. As shown in fig. 3, the click traffic detection method includes the following steps:
s301: and acquiring and storing a click log corresponding to the click behavior. Wherein the click log comprises at least one of the following parameters: the flow control method comprises the steps of clicking time corresponding to the clicking behavior, user identification corresponding to the clicking behavior and flow main identification corresponding to the clicking behavior.
When a user at the user device 104 clicks on a displayed advertisement on the client application 108, the client application 108 reports a click log to the traffic detection server 112a. The traffic detection server 112a collects click logs from multiple user devices 104 and stores the click logs in the log database 110 a. The format of the click log is as follows: { current time; a user ID; a terminal device IP; a media content ID; the flow master ID, which mainly comprises the current time, the user ID, the terminal equipment IP, the media content ID and the flow master ID. Wherein, the current time is the time of the click behavior; the user ID is a user identification, such as a wechat account of a wechat user. The terminal device IP is the IP of the user device 104 used by the user, and the media content is media carrying advertisements, such as articles added with advertisements. For example, when the client application 108 is a wechat APP, the traffic is mainly a wechat public number, and when a user clicks an advertisement in an article in a wechat public number, the click log reported to the traffic detection server 112a includes: the user's WeChat account (corresponding to the user ID), the user's device IP (corresponding to the terminal device IP), the article's identity (corresponding to the media content ID), and the public number's identity (corresponding to the traffic owner).
S302: and acquiring a click time sequence corresponding to the click flow to be detected.
The click time sequence comprises click amounts corresponding to a plurality of preset statistical periods, the preset statistical periods can be one day, one hour, one minute and the like, and the preset statistical periods are not limited in the scheme of the application. The format of the click time series may be: { p 0 、p 1 、p 2 ……p N-1 In which P i And the advertisement click rate in the ith preset statistical period.
The example detects click traffic of a traffic owner, and detects whether the click traffic of the traffic owner is suspicious. When the click time sequence corresponding to the click traffic of the traffic owner is obtained, one or more click logs corresponding to the traffic owner can be selected from the stored logs, one or more click logs corresponding to each preset statistical period are further determined from the determined one or more click logs, and the number of the click logs corresponding to each preset statistical period is used as the click quantity corresponding to each preset statistical period. One or more click logs corresponding to each preset statistical period can be selected from the stored logs, for each preset statistical period, one or more click logs corresponding to the traffic owner are determined from the one or more click logs corresponding to the preset statistical period, and the number of the determined click logs is used as the click volume corresponding to the preset statistical period.
In some examples, obtaining the click time series includes the steps of:
s3021: and determining a plurality of click logs corresponding to the click traffic to be detected.
In this example, the click traffic to be detected is the click traffic of the traffic owner, and when determining whether the click traffic is suspicious, it is determined whether the click traffic of all media contents corresponding to the traffic owner is suspicious, and it is determined whether the traffic owner cheats. For example, for a wechat public number, click logs corresponding to all articles under the wechat public number are obtained, whether the wechat public number is suspicious or not is determined according to the obtained click logs, and whether cheating behaviors about click volumes exist in a blogger corresponding to the wechat public number or not is determined.
When determining a plurality of click logs corresponding to the click traffic of the traffic owner, the method comprises the following steps:
s30211: determining a main flow identifier to be detected;
s30212: and selecting the click log corresponding to the main flow identification from the saved click logs.
In the above, the click logs include the identifier of the traffic owner, and according to the identifier of the traffic owner to be detected, a plurality of click logs including the identifier of the traffic owner are searched in the stored click logs uploaded by the user equipment 104.
S3022: and determining the click rate corresponding to each preset statistical period according to the plurality of click logs.
The click log comprises click volumes corresponding to a plurality of preset statistical periods, and when the click volumes corresponding to the plurality of preset statistical periods are determined, the method comprises the following steps:
s30221: and aiming at each preset counting period, determining one or more click logs corresponding to the counting period in the click logs corresponding to the flow main identification, and taking the number of the determined one or more click logs as the click quantity corresponding to the counting period.
For determining multiple click logs in step S3021, each click log includes click time, and each click log corresponds to each preset statistical period according to the click time of each click log, so as to determine the number of click logs in each statistical period, and determine the number of click logs as the click amount corresponding to one statistical period.
S303: and performing time-frequency transformation on the click time sequence to obtain a frequency domain sequence.
In some examples, the time-frequency transform is a discrete fourier transform or a wavelet transform. In the discrete fourier transform, the discrete fourier transform is performed by using the following formula (1).
Figure GDA0003826452800000101
Wherein x is k Is the amplitude of frequency 2 pi k/N, the time-click sequence { p in the time domain is transformed by discrete Fourier transform 0 、p 1 、p 2 ……p N-1 The frequency domain sequence format may be: { x 0 、x 1 、x 2 ……x N-1 }。
In the traffic click detection method in this example, a time series of click times of advertisements delivered on a resource of one traffic owner for a period of time is subjected to discrete fourier transform, and the time series of click times of the advertisements is transformed from a time domain to a frequency domain. When regular machine clicking behaviors exist, when most of the clicking behaviors of the cheating traffic owner corresponding to the advertisement are clicked by the machine cheating clicks, the energy of the amplitude of the high-frequency part in the frequency domain sequence is larger than that of the normal traffic owner after Fourier transformation. Thus, the relation between the energy of the high-frequency part and the capacity of all the frequencies in the frequency domain sequence is determined in the frequency domain sequence, and whether the flow rate of the main flow rate is abnormal or not is determined according to the relation.
S304: and determining whether the click flow corresponding to the click time sequence is suspicious according to the proportional relation between the amplitude of each frequency which is greater than the frequency threshold in the frequency domain sequence and the amplitude of each frequency in the frequency domain sequence.
In some examples, in determining whether click traffic is suspicious, the method includes the steps of:
s3041: determining the ratio of the sum of the amplitudes of the frequencies greater than a frequency threshold to the sum of the amplitudes of the frequencies in the frequency domain sequence; and determining whether the click traffic is suspicious according to the ratio.
Determining the ratio according to the following equation (2):
Figure GDA0003826452800000102
wherein T is a preset value, 2 pi (N-T)/k is the frequency threshold, and N is the number of amplitudes included in the frequency domain sequence.
When determining whether the click traffic is suspicious according to the ratio, if the ratio is larger than a ratio threshold, determining that the click traffic corresponding to the click behavior sequence is suspicious; otherwise, the click traffic corresponding to the click behavior sequence is not suspicious. For example, when λ is greater than θ, it indicates that the flow owner contains an abnormal periodic sequence, and determines that the flow of the flow owner is suspicious, where θ is a preset ratio threshold.
For example, fig. 4 is a time series of clicks per minute in one flow cycle, and the energy (amplitude) distribution in the frequency domain is obtained after discrete fourier transform, as shown in fig. 5. For example, the frequency domain threshold is 0.05HZ, the energy occupation ratio of the high-frequency part is higher than the preset threshold, and the click traffic cheating of the traffic owner is determined. In fig. 5, the unit of the abscissa frequency is HZ, and the ordinate is the assignment of each frequency obtained by discrete fourier transform, which is a relative value.
In some examples, when determining whether click traffic is suspicious, the method may further include the steps of:
s3042: determining a ratio of a sum of squares of amplitudes of frequencies greater than a frequency threshold to a sum of squares of amplitudes of frequencies in the frequency domain sequence; and determining whether the click traffic is suspicious according to the ratio.
In determining the energy proportion of the high frequency part, the ratio of the sum of squares of the amplitudes of the respective frequencies of the high frequency part to the sum of squares of the amplitudes of all the frequencies may also be used as the energy proportion. If the ratio is larger than a ratio threshold, determining that the click flow corresponding to the click behavior sequence is suspicious; otherwise, the click traffic corresponding to the click behavior sequence is not suspicious.
Fig. 6 is a flowchart illustrating a click traffic detection method according to some embodiments of the present application, which is executed by the traffic detection server 112a. In this example, steps S601 to S604 are similar to the operations in steps S301 to S304, step S6022 is similar to the operation in step S3022, and steps S6041 to S6042 are similar to the operations in steps S3041 to S3042, respectively, and will not be described herein again. In this example, in the execution of step S6021: the method comprises the following steps of when a click time sequence corresponding to click traffic to be detected is obtained:
s60211: and determining the user identification to be detected.
S60212: and selecting the click log corresponding to the user identification from the saved click logs.
When step S6022 is executed, the method includes the steps of:
s60221: and aiming at each preset counting period, determining one or more click logs corresponding to the counting period in the click logs corresponding to the user identification, and taking the number of the determined one or more click logs as the click quantity corresponding to the counting period.
In this example, the click traffic of an end user is detected to detect whether the user is a cheating user. The user identifier may be an account, such as a WeChat account, for the user to log in the client application 108. The traffic detection server 112a searches the plurality of saved click logs for a plurality of click logs including the user identifier. And when the click flow is finally determined to be suspicious according to the determined plurality of click logs, indicating that the user is a cheating user.
When the click time sequence corresponding to the click traffic of the user is obtained, one or more click logs corresponding to the user can be selected from the stored logs, one or more click logs corresponding to each preset statistical period are further determined from the determined one or more click logs, and the number of the click logs corresponding to each preset statistical period is used as the click quantity corresponding to each preset statistical period. One or more click logs corresponding to each preset statistical period can be selected from the stored logs, for each preset statistical period, one or more click logs corresponding to the user are determined from the one or more click logs corresponding to the preset statistical period, and the number of the determined click logs is used as the click amount corresponding to the preset statistical period.
In some other examples, multiple click logs corresponding to a media content (for example, an article under a wechat public account) may be further obtained, and according to the click logs, whether click traffic for the media content is suspicious or not may be determined by using the click traffic detection method described above. In some other embodiments, multiple click logs corresponding to the terminal device IP may also be obtained to detect whether the terminal device corresponding to the terminal device IP is a cheating terminal device.
The present application further provides a click traffic detection device 700, as shown in fig. 7, including:
an obtaining unit 701, configured to obtain a click time sequence corresponding to a click traffic to be detected, where the click time sequence includes click amounts corresponding to a plurality of preset statistical periods, respectively;
a time-frequency transformation unit 702, configured to perform time-frequency transformation on the click time sequence to obtain a frequency domain sequence, where the frequency domain sequence includes amplitudes corresponding to multiple frequencies;
the determining unit 703 is configured to determine whether the click traffic corresponding to the click time sequence is suspicious according to a proportional relationship between the amplitude of each frequency in the frequency domain sequence that is greater than the frequency threshold and the amplitude of each frequency in the frequency domain sequence.
By adopting the click flow detection device provided by the application, the click time sequence corresponding to the click flow is subjected to time-frequency transformation to obtain a frequency domain sequence, and whether the click flow is suspicious or not is determined according to the characteristics of the frequency domain sequence. Specifically, whether the machine cheating behavior exists or not is judged by detecting whether a high-frequency periodic time sequence exists in the advertisement clicking time sequence within a period of time, so that the detection of the abnormal clicking behavior is more accurate.
In some examples, the obtaining unit 701 is further configured to:
obtaining and storing a click log corresponding to the click behavior, wherein the click log comprises at least one of the following parameters: the flow rate control method comprises the steps of obtaining click time corresponding to click behaviors, user identifications corresponding to the click behaviors and flow rate main identifications corresponding to the click behaviors;
determining a plurality of click logs corresponding to the click traffic to be detected;
and determining the click rate corresponding to each preset statistical period according to the plurality of click logs.
In some examples, the obtaining unit 701 is further configured to:
determining a main flow identifier to be detected;
selecting a click log corresponding to the flow main identification from the stored click logs;
and aiming at each preset counting period, determining one or more click logs corresponding to the counting period in the click logs corresponding to the flow main identification, and taking the number of the determined one or more click logs as the click quantity corresponding to the counting period.
In some examples, the obtaining unit 701 is further configured to:
determining a main flow identifier to be detected;
selecting a click log corresponding to the flow main identification from click logs corresponding to each preset statistical period;
and aiming at each preset statistical period, taking the number of the selected click logs corresponding to the flow main identification as the click quantity corresponding to the statistical period. In some examples, the obtaining unit 701 is further configured to:
determining a user identifier to be detected;
selecting a click log corresponding to the user identifier from the saved click logs;
and aiming at each preset counting period, determining one or more click logs corresponding to the counting period in the click logs corresponding to the user identification, and taking the number of the determined one or more click logs as the click quantity corresponding to the counting period.
In some examples, the obtaining unit 701 is further configured to:
determining a user identifier to be detected;
selecting a click log corresponding to the user identifier from click logs corresponding to each preset statistical period;
and aiming at each preset counting period, taking the number of the selected click logs corresponding to the user identification as the click quantity corresponding to the counting period.
In some examples, the determining unit 703 is configured to:
determining the ratio of the sum of the amplitudes of the frequencies greater than a frequency threshold to the sum of the amplitudes of the frequencies in the frequency domain sequence;
and determining whether the click flow is suspicious according to the ratio.
In some examples, the determining unit 703 is configured to:
determining a ratio of a sum of squares of amplitudes of frequencies greater than a frequency threshold to a sum of squares of amplitudes of frequencies in the frequency domain sequence; and determining whether the click traffic is suspicious according to the ratio.
In some examples, the determining unit 703 is further configured to:
if the ratio is larger than a ratio threshold value, determining that the click flow corresponding to the click behavior sequence is suspicious; otherwise, the click traffic corresponding to the click behavior sequence is not suspicious.
In some examples, the time-frequency transform is a discrete fourier transform or a wavelet transform.
The present application also provides a computer-readable storage medium storing computer-readable instructions that can cause at least one processor to perform the method as described above.
Fig. 8 is a block diagram showing a configuration of a computing device in which the click traffic detection device 700 is located. As shown in fig. 8, the computing device includes one or more processors (CPUs) 802, a communication module 804, a memory 806, a user interface 810, and a communication bus 808 for interconnecting these components.
The processor 802 may receive and transmit data via the communication module 804 to enable network communications and/or local communications.
User interface 810 includes one or more output devices 812 including one or more speakers and/or one or more visual displays. The user interface 810 also includes one or more input devices 814, including for example, a keyboard, a mouse, a voice command input unit or microphone, a touch screen display, a touch sensitive tablet, a gesture capture camera or other input buttons or controls, and the like.
The memory 806 may be high-speed random access memory, such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices; or non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
The memory 806 stores a set of instructions executable by the processor 802, including:
an operating system 816, including programs for handling various basic system services and for performing hardware related tasks;
applications 818, including some or all of the elements or modules of click traffic detection device 700. At least one element of the click traffic detection device 700 may store machine executable instructions. The processor 802 may be capable of performing the functions of at least one of the units or modules described above by executing machine-executable instructions in at least one of the units in the memory 806.
It should be noted that not all steps and modules in the above flows and structures are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The division of each module is only for convenience of describing adopted functional division, and in actual implementation, one module may be implemented by multiple modules, and the functions of multiple modules may also be implemented by the same module, and these modules may be located in the same device or in different devices.
The hardware modules in the embodiments may be implemented in hardware or a hardware platform plus software. The software includes machine-readable instructions stored on a non-volatile storage medium. Thus, embodiments may also be embodied as software products.
In various examples, the hardware may be implemented by specialized hardware or hardware executing machine-readable instructions. For example, the hardware may be specially designed permanent circuits or logic devices (e.g., special purpose processors, such as FPGAs or ASICs) for performing the specified operations. The hardware may also include programmable logic devices or circuits temporarily configured by software (e.g., including a general purpose processor or other programmable processor) to perform certain operations.
In addition, each example of the present application may be realized by a data processing program executed by a data processing apparatus such as a computer. It is clear that a data processing program constitutes the present application. Further, the data processing program, which is generally stored in one storage medium, is executed by directly reading the program out of the storage medium or by installing or copying the program into a storage device (such as a hard disk and/or a memory) of the data processing device. Such a storage medium therefore also constitutes the present application, which also provides a non-volatile storage medium in which a data processing program is stored, which data processing program can be used to carry out any one of the above-mentioned method examples of the present application.
The corresponding machine-readable instructions of the modules of fig. 8 may cause an operating system or the like operating on the computer to perform some or all of the operations described herein. The nonvolatile computer-readable storage medium may be a memory provided in an expansion board inserted into the computer or written to a memory provided in an expansion unit connected to the computer. A CPU or the like mounted on the expansion board or the expansion unit may perform part or all of the actual operations according to the instructions.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (12)

1. A click traffic detection method is characterized by comprising the following steps:
when a client application receives a click action made by a user for media content provided by a flow master, receiving a click log reported by the client application, wherein the click log comprises: the user identifier corresponding to the click behavior, the media content identifier corresponding to the click behavior, and the flow master identifier corresponding to the click behavior, wherein the flow master has a plurality of users in the client application;
determining a main flow identifier to be detected;
determining a plurality of click logs corresponding to the main flow identification to be detected;
acquiring a click time sequence corresponding to the flow main identification to be detected according to the click logs;
performing time-frequency transformation on the click time sequence to obtain a frequency domain sequence, wherein the frequency domain sequence comprises amplitudes corresponding to a plurality of frequencies;
based on the principle that normal clicking behaviors occur randomly and machine clicking behaviors occur in a large amount in a short time, if the energy occupation ratio between the amplitude of each frequency of the high-frequency part in the frequency domain sequence and the amplitude of each frequency in the frequency domain sequence is higher than a preset threshold value, it is determined that the machine clicking behaviors exist in the clicking behaviors of all media contents corresponding to the main traffic identification to be detected, and the main traffic corresponding to the main traffic identification to be detected is a cheating traffic main.
2. The method according to claim 1, wherein the determining the plurality of click logs corresponding to the to-be-detected flow primary identifier includes:
selecting a click log corresponding to the flow main identification to be detected from the stored click logs;
the obtaining, according to the multiple click logs, a click time sequence corresponding to the to-be-detected flow main identifier includes:
and aiming at each preset counting period, determining one or more click logs corresponding to the counting period in the click logs corresponding to the flow main identification to be detected, and taking the number of the determined one or more click logs as the click quantity corresponding to the counting period.
3. The method according to claim 1, wherein the determining the plurality of click logs corresponding to the to-be-detected flow primary identifier includes:
selecting a click log corresponding to the to-be-detected flow main identification from click logs corresponding to each preset statistical period;
the obtaining, according to the multiple click logs, a click time sequence corresponding to the to-be-detected flow main identifier includes:
and aiming at each preset statistical period, taking the number of the selected click logs corresponding to the main flow identification to be detected as the click quantity corresponding to the statistical period.
4. The method of claim 1, wherein the click log is formatted as: { current time; a user identification; a terminal device IP address; a media content identification; traffic primary identity }.
5. The method of claim 1, wherein the energy ratio between the amplitude of each frequency in the high frequency portion of the frequency domain sequence and the amplitude of each frequency in the frequency domain sequence is a ratio of the sum of the amplitudes of each frequency greater than a frequency threshold to the sum of the amplitudes of each frequency in the frequency domain sequence.
6. The method of claim 1, wherein the energy ratio between the amplitude of each frequency in the high frequency portion of the frequency domain sequence and the amplitude of each frequency in the frequency domain sequence is a ratio of a sum of squares of the amplitudes of each frequency greater than a frequency threshold to a sum of squares of the amplitudes of each frequency in the frequency domain sequence.
7. The method of claim 1, wherein the client application is a social application, a multimedia application, or a mailbox application.
8. The method of claim 1, wherein the time-frequency transform is a discrete fourier transform or a wavelet transform.
9. A click traffic detection device, comprising:
the device comprises an acquisition unit and a processing unit, wherein the acquisition unit is used for receiving a click log reported by a client application when the client application receives a click action of a user on media content provided by a flow master, and the click log comprises: the user identifier corresponding to the click behavior, the media content identifier corresponding to the click behavior, and the flow master identifier corresponding to the click behavior, wherein the flow master has a plurality of users in the client application; determining a main flow identifier to be detected; determining a plurality of click logs corresponding to the main flow identification to be detected; acquiring a click time sequence corresponding to the flow main identification to be detected according to the click logs;
the time-frequency transformation unit is used for performing time-frequency transformation on the click time sequence to obtain a frequency domain sequence, and the frequency domain sequence comprises amplitudes corresponding to a plurality of frequencies;
and the determining unit is used for determining that machine click behaviors exist in the click behaviors of all media contents corresponding to the main flow identifier to be detected and that the main flow identifier to be detected corresponds to a cheating flow identifier if the energy occupation ratio between the amplitude of each frequency of the high-frequency part in the frequency domain sequence and the amplitude of each frequency in the frequency domain sequence is higher than a preset threshold value on the basis of the principle that normal click behaviors occur randomly and machine click behaviors occur in a large amount in a short time.
10. The apparatus of claim 9, wherein: the acquisition unit is configured to:
selecting a click log corresponding to the flow main identification to be detected from the stored click logs;
and aiming at each preset counting period, determining one or more click logs corresponding to the counting period in the click logs corresponding to the main flow identification to be detected, and taking the number of the determined one or more click logs as the click quantity corresponding to the counting period.
11. A computer-readable storage medium storing computer-readable instructions that can cause at least one processor to perform the method of any one of claims 1-8.
12. A computing device, comprising: a processor and a memory; wherein the memory stores a program adapted to perform the method steps of any of claims 1 to 8 when executed by the processor.
CN201810644161.2A 2018-06-21 2018-06-21 Click traffic detection method and device and storage medium Active CN109034867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810644161.2A CN109034867B (en) 2018-06-21 2018-06-21 Click traffic detection method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810644161.2A CN109034867B (en) 2018-06-21 2018-06-21 Click traffic detection method and device and storage medium

Publications (2)

Publication Number Publication Date
CN109034867A CN109034867A (en) 2018-12-18
CN109034867B true CN109034867B (en) 2022-10-25

Family

ID=64610246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810644161.2A Active CN109034867B (en) 2018-06-21 2018-06-21 Click traffic detection method and device and storage medium

Country Status (1)

Country Link
CN (1) CN109034867B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781605B (en) * 2019-11-05 2023-08-25 恩亿科(北京)数据科技有限公司 Advertisement putting model testing method and device, computer equipment and storage medium
CN112465549A (en) * 2020-11-30 2021-03-09 上海酷量信息技术有限公司 System and method for identifying channel cheating
CN113610569A (en) * 2021-07-27 2021-11-05 上海交通大学 Advertisement click farm detection method, system, terminal and medium
CN115150159B (en) * 2022-06-30 2023-11-10 深信服科技股份有限公司 Flow detection method, device, equipment and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016169193A1 (en) * 2015-04-24 2016-10-27 百度在线网络技术(北京)有限公司 Method and apparatus for detecting cheated clicks
CN108063698A (en) * 2017-12-15 2018-05-22 东软集团股份有限公司 Unit exception detection method and device, program product and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070255821A1 (en) * 2006-05-01 2007-11-01 Li Ge Real-time click fraud detecting and blocking system
CN101345664A (en) * 2008-08-05 2009-01-14 成都市华为赛门铁克科技有限公司 Detection method and apparatus for network flux exception
CN103577432B (en) * 2012-07-26 2017-07-14 阿里巴巴集团控股有限公司 A kind of Commodity Information Search method and system
CN103235796B (en) * 2013-04-07 2019-12-24 北京百度网讯科技有限公司 Search method and system based on user click behavior
CN103684910A (en) * 2013-12-02 2014-03-26 北京工业大学 Abnormality detecting method based on industrial control system network traffic

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016169193A1 (en) * 2015-04-24 2016-10-27 百度在线网络技术(北京)有限公司 Method and apparatus for detecting cheated clicks
CN108063698A (en) * 2017-12-15 2018-05-22 东软集团股份有限公司 Unit exception detection method and device, program product and storage medium

Also Published As

Publication number Publication date
CN109034867A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
CN109034867B (en) Click traffic detection method and device and storage medium
CN105447724B (en) Content item recommendation method and device
US10127294B2 (en) Idempotency of application state data
WO2016054928A1 (en) Method and device for providing push information
KR101619946B1 (en) Native application testing
AU2016202094A1 (en) Systems and methods for sentiment detection, measurement, and normalization over social networks
US9886705B2 (en) Advertisement opportunity bidding
CN107077498B (en) Representing entity relationships in online advertisements
KR20140101697A (en) Automatic detection of fraudulent ratings/comments related to an application store
US10055754B2 (en) Systems and methods for tracking application installs that distinguish new users from existing users without directly accessing user account records
US11397965B2 (en) Processor systems to estimate audience sizes and impression counts for different frequency intervals
US20190130440A1 (en) Method and system for detecting fraudulent advertisement activity
JP2010044303A (en) Advertisement delivery method based on assumption targeting attribute
US20130346870A1 (en) Multi-user targeted content delivery
US20160253711A1 (en) Methods and systems for network terminal identification
US20230325878A1 (en) Systems and methods for leveraging social queuing to simulate ticket purchaser behavior
CN103412932A (en) Method and device for monitoring information push effect
US20130304566A1 (en) Apparatus and method for providing advertising ranking information
US11869033B2 (en) Content item selection and measurement determination
JP2022145691A (en) Computer processing for increasing growth speed of service
KR20130089900A (en) Method for simultaneously processing advertisement material at internet mobile device
US20200160241A1 (en) Computer enhancements for increasing service growth speed
US11564012B1 (en) Methods and apparatus to identify and triage digital ad ratings data quality issues
US20230043820A1 (en) Method and system for user group determination, churn identification and content selection
US20230156024A1 (en) System and method for fraud identification utilizing combined metrics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant