CN114491407A - Traffic cheating identification method, device, equipment and storage medium - Google Patents

Traffic cheating identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN114491407A
CN114491407A CN202011275069.7A CN202011275069A CN114491407A CN 114491407 A CN114491407 A CN 114491407A CN 202011275069 A CN202011275069 A CN 202011275069A CN 114491407 A CN114491407 A CN 114491407A
Authority
CN
China
Prior art keywords
traffic
cheating
distribution
detected
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011275069.7A
Other languages
Chinese (zh)
Inventor
秦莎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN202011275069.7A priority Critical patent/CN114491407A/en
Publication of CN114491407A publication Critical patent/CN114491407A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Evolutionary Biology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention belongs to the technical field of internet big data analysis, and discloses a flow cheating identification method, a flow cheating identification device, flow cheating identification equipment and a storage medium. The method comprises the steps of obtaining natural flow distribution probability and flow distribution data of flow to be detected; determining a cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability; and judging whether the flow to be detected has flow cheating according to the cheating values. The flow distribution data of the flow to be detected and the natural flow distribution probability obtained by statistics of the big data are compared and calculated to obtain the corresponding cheating score, and the cheating score can represent the difference degree between the flow distribution of the flow to be detected and the natural flow without flow cheating, so that whether the flow cheating exists can be judged according to the cheating score, and the proper rights and interests of flow buyers can be protected.

Description

流量作弊识别方法、装置、设备及存储介质Traffic cheating identification method, device, equipment and storage medium

技术领域technical field

本发明涉及互联网大数据分析技术领域,尤其涉及一种流量作弊识别方法、装置、设备及存储介质。The invention relates to the technical field of Internet big data analysis, and in particular, to a traffic cheating identification method, device, equipment and storage medium.

背景技术Background technique

现如今,流量变现已经为各个企业带来了大量的利益,在流量变现的利益驱动下,流量造假越发猖獗,流量造假形式和技术手段也越发高级。无论什么形式的作弊,最后损失最大的都是为流量买单的流量购买者。流量购买者使用了大量的资金预算购买流量,其目的是为了带来新增用户,由于存在大量造假流量,此类造假流量并没有带来可实际留存的新增用户,也无法带来收入,极大的损害了流量购买者的正当权益,而流量反作弊既可以促进企业业务生态正向增长,又可以节约一定的资金预算,可以保护流量购买者的正当权益,因此流量反作弊迫在眉睫。Nowadays, traffic monetization has brought a lot of benefits to various enterprises. Driven by the benefits of traffic monetization, traffic fraud has become more and more rampant, and the forms and technical means of traffic fraud have become more advanced. No matter what form of cheating, the ones who lose the most in the end are the traffic buyers who pay for the traffic. Traffic buyers use a lot of capital budget to buy traffic. The purpose is to bring in new users. Due to the existence of a large number of fake traffic, such fake traffic does not bring new users that can actually be retained, nor can it bring income. It greatly damages the legitimate rights and interests of traffic buyers, and traffic anti-cheating can not only promote the positive growth of enterprise business ecology, but also save a certain capital budget and protect the legitimate rights and interests of traffic buyers. Therefore, traffic anti-cheating is imminent.

上述内容仅用于辅助理解本发明的技术方案,并不代表承认上述内容是现有技术。The above content is only used to assist the understanding of the technical solutions of the present invention, and does not mean that the above content is the prior art.

发明内容SUMMARY OF THE INVENTION

本发明的主要目的在于提供一种流量作弊识别方法、装置、设备及存储介质,旨在解决如何检测是否存在流量作弊,以保护流量购买者的正当权益的技术问题。The main purpose of the present invention is to provide a traffic cheating identification method, device, equipment and storage medium, aiming at solving the technical problem of how to detect whether there is traffic cheating so as to protect the legitimate rights and interests of traffic buyers.

为实现上述目的,本发明提供了一种流量作弊识别方法,所述方法包括以下步骤:In order to achieve the above object, the invention provides a kind of traffic cheating identification method, the method comprises the following steps:

获取自然流量分布概率及待检测流量的流量分布数据;Obtain the natural traffic distribution probability and the traffic distribution data of the traffic to be detected;

根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值;Determine the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability;

根据所述作弊分值判断所述待检测流量是否存在流量作弊。Determine whether there is traffic fraud in the traffic to be detected according to the fraud score.

可选地,所述获取自然流量分布概率及待检测流量的流量分布数据的步骤之前,还包括:Optionally, before the step of acquiring the natural traffic distribution probability and the traffic distribution data of the traffic to be detected, the method further includes:

根据待检测流量中包含的用户操作信息确定用户思考时长;Determine the user's thinking time according to the user operation information contained in the traffic to be detected;

根据所述用户思考时长确定待检测流量的流量分布数据。The traffic distribution data of the traffic to be detected is determined according to the user's thinking time.

可选地,所述根据待检测流量中包含的用户操作信息确定用户思考时长的步骤,包括:Optionally, the step of determining the user's thinking duration according to the user operation information contained in the traffic to be detected includes:

获取待检测流量中包含的用户操作信息确定相邻用户操作的操作时间;Obtain the user operation information contained in the traffic to be detected to determine the operation time of adjacent user operations;

根据所述操作时间确定所述相邻用户操作的操作时间差,将所述操作时间差作为对应的用户思考时长。The operation time difference between adjacent user operations is determined according to the operation time, and the operation time difference is used as the corresponding user thinking time period.

可选地,所述根据所述用户思考时长确定待检测流量的流量分布数据的步骤,包括:Optionally, the step of determining the traffic distribution data of the traffic to be detected according to the user's thinking time includes:

根据预设时间分布区间对所述用户思考时长进行分组,将各个预设时间分布区间对应的用户思考时长的数量作为对应的流量分布数量;Group the user thinking durations according to preset time distribution intervals, and use the number of user thinking durations corresponding to each preset time distribution interval as the corresponding traffic distribution number;

根据所述流量分布数量及用户思考时长的总数确定各个预设时间分布区间的流量分布概率;Determine the traffic distribution probability of each preset time distribution interval according to the traffic distribution quantity and the total number of user thinking time;

根据所述流量分布概率及流量分布数量确定待检测流量的流量分布数据。The flow distribution data of the flow to be detected is determined according to the flow distribution probability and the flow distribution quantity.

可选地,所述根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值的步骤包括:Optionally, the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability includes:

获取所述流量分布数据中的流量分布概率及流量分布数量;obtaining the traffic distribution probability and the traffic distribution quantity in the traffic distribution data;

根据所述流量分布概率、所述流量分布数量及自然流量分布概率确定所述待检测流量对应的作弊分值。The cheating score corresponding to the traffic to be detected is determined according to the traffic distribution probability, the traffic distribution quantity and the natural traffic distribution probability.

可选地,所述根据所述流量分布概率、所述流量分布数量及自然流量分布概率确定所述待检测流量对应的作弊分值的步骤,包括:Optionally, the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution probability, the traffic distribution quantity and the natural traffic distribution probability includes:

根据所述流量分布概率、所述流量分布数量及所述自然流量分布概率通过作弊分值计算公式确定所述待检测流量对应的作弊分值;According to the traffic distribution probability, the traffic distribution quantity and the natural traffic distribution probability, the cheating score corresponding to the traffic to be detected is determined by the cheating score calculation formula;

所述作弊分值计算公式为:The formula for calculating the cheating score is:

Figure BDA0002775874440000031
Figure BDA0002775874440000031

式中,score为作弊分,P(organicBini)为自然流量在第i个时间分布区间的分布概率,P(channelBini)为待检测流量在第i个时间分布区间的分布概率,N为待检测流量在第i个时间分布区间的分布数量,P(organicBinj)为自然流量在第j个时间分布区间的分布概率,P(channelBinj)为待检测流量在第j个时间分布区间的分布概率,M为待检测流量在第j个时间分布区间的分布数量。In the formula, score is the cheating score, P(organicBin i ) is the distribution probability of natural traffic in the ith time distribution interval, P(channelBin i ) is the distribution probability of the traffic to be detected in the ith time distribution interval, and N is the distribution probability of the traffic to be detected in the ith time distribution interval. The distribution quantity of the detected traffic in the ith time distribution interval, P(organicBin j ) is the distribution probability of the natural traffic in the jth time distribution interval, P(channelBin j ) is the distribution of the traffic to be detected in the jth time distribution interval probability, M is the distribution quantity of the traffic to be detected in the jth time distribution interval.

可选地,所述根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值的步骤之前,还包括:Optionally, before the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability, the method further includes:

根据所述流量分布数据及所述自然流量分布概率计算相对熵值;Calculate the relative entropy value according to the traffic distribution data and the natural traffic distribution probability;

在所述相对熵值满足作弊分值计算条件时,执行所述根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值的步骤。When the relative entropy value satisfies the cheating score calculation condition, the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability is performed.

可选地,所述根据所述流量分布数据及所述自然流量分布概率计算相对熵值的步骤,包括:Optionally, the step of calculating the relative entropy value according to the traffic distribution data and the natural traffic distribution probability includes:

获取所述流量分布数据中的流量分布概率;obtaining the traffic distribution probability in the traffic distribution data;

根据所述流量分布概率及所述自然流量分布概率计算相对熵值。The relative entropy value is calculated according to the flow distribution probability and the natural flow distribution probability.

可选地,所述根据所述流量分布概率及所述自然流量分布概率计算相对熵值的步骤,包括:Optionally, the step of calculating the relative entropy value according to the traffic distribution probability and the natural traffic distribution probability includes:

根据所述流量分布概率及所述自然流量分布概率通过相对熵计算公式计算相对熵值;Calculate the relative entropy value through the relative entropy calculation formula according to the flow distribution probability and the natural flow distribution probability;

所述相对熵计算公式为:The relative entropy calculation formula is:

Figure BDA0002775874440000032
Figure BDA0002775874440000032

式中,DKL(p||q)为相对熵值,p(xi)为自然流量在第i个时间分布区间的分布概率,q为待检测流量在第i个时间分布区间的分布概率,N为时间分布区间总数。In the formula, D KL (p||q) is the relative entropy value, p(x i ) is the distribution probability of natural flow in the ith time distribution interval, q is the distribution probability of the flow to be detected in the ith time distribution interval , N is the total number of time distribution intervals.

可选地,所述根据所述作弊分值判断所述待检测流量是否存在流量作弊的步骤之前,还包括:Optionally, before the step of judging whether there is traffic cheating in the traffic to be detected according to the cheating score, the method further includes:

根据所述流量分布数据及所述自然流量分布概率计算相对熵值;Calculate the relative entropy value according to the traffic distribution data and the natural traffic distribution probability;

所述根据所述作弊分值判断所述待检测流量是否存在流量作弊的步骤,包括:The step of judging whether there is traffic cheating in the traffic to be detected according to the cheating score includes:

根据所述作弊分值及所述相对熵值判断所述待检测流量是否存在流量作弊。According to the cheating score and the relative entropy value, it is determined whether there is traffic cheating in the to-be-detected traffic.

可选地,所述根据所述作弊分值及所述相对熵值判断所述待检测流量是否存在流量作弊的步骤,包括:Optionally, the step of judging whether there is traffic cheating in the traffic to be detected according to the cheating score and the relative entropy value includes:

在所述作弊分值大于预设作弊阈值且所述相对熵值大于预设相对熵阈值时,判定所述待检测流量存在流量作弊;When the cheating score is greater than the preset cheating threshold and the relative entropy value is greater than the preset relative entropy threshold, it is determined that the traffic to be detected has traffic cheating;

在所述作弊分值不大于所述预设作弊阈值或所述相对熵值不大于所述预设相对熵阈值时,判定所述待检测流量不存在流量作弊。When the cheating score is not greater than the preset cheating threshold or the relative entropy value is not greater than the preset relative entropy threshold, it is determined that there is no traffic cheating in the traffic to be detected.

可选地,所述根据所述作弊分值判断所述待检测流量是否存在流量作弊的步骤,包括:Optionally, the step of judging whether there is traffic cheating in the traffic to be detected according to the cheating score includes:

在所述作弊分值大于预设作弊阈值时,判定所述待检测流量存在流量作弊;When the cheating score is greater than a preset cheating threshold, it is determined that the flow to be detected has flow cheating;

在所述作弊分值小于或等于所述预设作弊阈值时,判定所述待检测流量不存在流量作弊。When the cheating score is less than or equal to the preset cheating threshold, it is determined that there is no flow cheating in the traffic to be detected.

此外,为实现上述目的,本发明还提出一种流量作弊识别装置,其特征在于,所述流量作弊识别装置包括:In addition, in order to achieve the above purpose, the present invention also provides a traffic cheating identification device, characterized in that the traffic cheating identification device includes:

数据获取模块,用于获取自然流量分布概率及待检测流量的流量分布数据;The data acquisition module is used to acquire the natural traffic distribution probability and the traffic distribution data of the traffic to be detected;

分值计算模块,用于根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值;a score calculation module, configured to determine the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability;

作弊识别模块,用于根据所述作弊分值判断所述待检测流量是否存在流量作弊。A cheating identification module, configured to determine whether there is traffic cheating in the traffic to be detected according to the cheating score.

可选地,所述数据获取模块,还用于根据待检测流量中包含的用户操作信息确定用户思考时长;根据所述用户思考时长确定待检测流量的流量分布数据。Optionally, the data acquisition module is further configured to determine the user's thinking duration according to the user operation information contained in the traffic to be detected; and determine the traffic distribution data of the traffic to be detected according to the user's thinking duration.

可选地,所述数据获取模块,还用于获取待检测流量中包含的用户操作信息确定相邻用户操作的操作时间;根据所述操作时间确定所述相邻用户操作的操作时间差,将所述操作时间差作为对应的用户思考时长。Optionally, the data acquisition module is further configured to acquire the user operation information contained in the traffic to be detected to determine the operation time of the adjacent user operation; determine the operation time difference of the adjacent user operation according to the operation time, and determine the operation time difference of the adjacent user operation. The above operation time difference is used as the corresponding user thinking time.

可选地,所述分值计算模块,还用于根据所述流量分布数据及所述自然流量分布概率计算相对熵值;在所述相对熵值满足作弊分值计算条件时,执行所述根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值的步骤。Optionally, the score calculation module is further configured to calculate the relative entropy value according to the traffic distribution data and the natural traffic distribution probability; when the relative entropy value satisfies the cheating score calculation condition, execute the The flow distribution data and the natural flow distribution probability determine the cheating score corresponding to the flow to be detected.

可选地,所述分值计算模块,还用于根据所述流量分布数据及所述自然流量分布概率计算相对熵值;Optionally, the score calculation module is further configured to calculate a relative entropy value according to the flow distribution data and the natural flow distribution probability;

所述作弊识别模块,还用于根据所述作弊分值及所述相对熵值判断所述待检测流量是否存在流量作弊。The cheating identification module is further configured to judge whether the flow to be detected has flow cheating according to the cheating score and the relative entropy value.

可选地,所述作弊识别模块,还用于在所述作弊分值大于预设作弊阈值时,判定所述待检测流量存在流量作弊;在所述作弊分值小于或等于所述预设作弊阈值时,判定所述待检测流量不存在流量作弊。Optionally, the cheating identification module is further configured to determine that there is traffic cheating in the traffic to be detected when the cheating score is greater than a preset cheating threshold; when the cheating score is less than or equal to the preset cheating When the threshold is reached, it is determined that there is no traffic cheating in the traffic to be detected.

此外,为实现上述目的,本发明还提出一种流量作弊识别设备,所述流量作弊识别设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的流量作弊识别程序,所述流量作弊识别程序被所述处理器执行时实现如上述任一项所述的流量作弊识别方法的步骤。In addition, in order to achieve the above object, the present invention also provides a traffic cheating identification device, the traffic cheating identification device includes: a memory, a processor, and a traffic cheating identification device stored in the memory and running on the processor A program, when the traffic cheating identification program is executed by the processor, implements the steps of the traffic cheating identification method according to any one of the above.

此外,为实现上述目的,本发明还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有流量作弊识别程序,所述流量作弊识别程序执行时实现如上述任一项所述的流量作弊识别方法的步骤。In addition, in order to achieve the above object, the present invention also provides a computer-readable storage medium, where a traffic cheating identification program is stored on the computer-readable storage medium, and the flow cheating identification program is executed as described in any of the above. The steps of the traffic cheating identification method.

本发明通过获取自然流量分布概率及待检测流量的流量分布数据;根据流量分布数据及自然流量分布概率确定待检测流量对应的作弊分值;根据作弊分值判断待检测流量是否存在流量作弊。由于是将待检测流量的流量分布数据与大数据统计得到的自然流量分布概率进行对比计算,得到对应的作弊分值,作弊分值可以表示待检测流量的流量分布与不存在流量作弊的自然流量的分布差异度,因此,根据作弊分值即可判断是否存在流量作弊,利于保护流量购买者的正当权益。The invention obtains the natural flow distribution probability and the flow distribution data of the to-be-detected flow; determines the cheating score corresponding to the to-be-detected flow according to the flow distribution data and the natural flow distribution probability; Because the traffic distribution data of the traffic to be detected is compared with the natural traffic distribution probability obtained by big data statistics, the corresponding cheating score is obtained. The cheating score can represent the traffic distribution of the traffic to be detected and the natural traffic without traffic cheating. Therefore, according to the cheating score, it can be judged whether there is traffic cheating, which is beneficial to protect the legitimate rights and interests of traffic buyers.

附图说明Description of drawings

图1是本发明实施例方案涉及的硬件运行环境的电子设备的结构示意图;1 is a schematic structural diagram of an electronic device of a hardware operating environment involved in an embodiment of the present invention;

图2为本发明流量作弊识别方法第一实施例的流程示意图;2 is a schematic flowchart of a first embodiment of a method for identifying traffic cheating according to the present invention;

图3为本发明流量作弊识别方法第二实施例的流程示意图;3 is a schematic flowchart of a second embodiment of a traffic cheating identification method according to the present invention;

图4为本发明流量作弊识别方法第三实施例的流程示意图;4 is a schematic flowchart of a third embodiment of a traffic cheating identification method according to the present invention;

图5为本发明流量作弊识别装置第一实施例的结构框图。FIG. 5 is a structural block diagram of the first embodiment of the apparatus for identifying traffic cheating according to the present invention.

本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics and advantages of the present invention will be further described with reference to the accompanying drawings in conjunction with the embodiments.

具体实施方式Detailed ways

应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

参照图1,图1为本发明实施例方案涉及的硬件运行环境的流量作弊识别设备结构示意图。Referring to FIG. 1 , FIG. 1 is a schematic structural diagram of a device for identifying traffic cheating in a hardware operating environment according to an embodiment of the present invention.

如图1所示,该电子设备可以包括:处理器1001,例如中央处理器(CentralProcessing Unit,CPU),通信总线1002、用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如无线保真(WIreless-FIdelity,WI-FI)接口)。存储器1005可以是高速的随机存取存储器(RandomAccess Memory,RAM)存储器,也可以是稳定的非易失性存储器(Non-Volatile Memory,NVM),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG. 1 , the electronic device may include: a processor 1001 , such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002 , a user interface 1003 , a network interface 1004 , and a memory 1005 . Among them, the communication bus 1002 is used to realize the connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. Optionally, the network interface 1004 may include a standard wired interface and a wireless interface (such as a wireless fidelity (WIreless-FIdelity, WI-FI) interface). The memory 1005 may be a high-speed random access memory (Random Access Memory, RAM) memory, or may be a stable non-volatile memory (Non-Volatile Memory, NVM), such as a disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .

本领域技术人员可以理解,图1中示出的结构并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 1 does not constitute a limitation on the electronic device, and may include more or less components than the one shown, or combine some components, or arrange different components.

如图1所示,作为一种存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及流量作弊识别程序。As shown in FIG. 1 , the memory 1005 as a storage medium may include an operating system, a network communication module, a user interface module, and a traffic cheating identification program.

在图1所示的电子设备中,网络接口1004主要用于与网络服务器进行数据通信;用户接口1003主要用于与用户进行数据交互;本发明电子设备中的处理器1001、存储器1005可以设置在流量作弊识别设备中,所述电子设备通过处理器1001调用存储器1005中存储的流量作弊识别程序,并执行本发明实施例提供的流量作弊识别方法。In the electronic device shown in FIG. 1, the network interface 1004 is mainly used for data communication with the network server; the user interface 1003 is mainly used for data interaction with the user; the processor 1001 and the memory 1005 in the electronic device of the present invention can be set in In the traffic fraud identification device, the electronic device invokes the traffic fraud identification program stored in the memory 1005 through the processor 1001, and executes the traffic fraud identification method provided by the embodiment of the present invention.

本发明实施例提供了一种流量作弊识别方法,参照图2,图2为本发明一种流量作弊识别方法第一实施例的流程示意图。An embodiment of the present invention provides a method for identifying traffic cheating. Referring to FIG. 2 , FIG. 2 is a schematic flowchart of a first embodiment of a method for identifying traffic cheating according to the present invention.

本实施例中,所述流量作弊识别方法包括以下步骤:In this embodiment, the traffic cheating identification method includes the following steps:

步骤S10:获取自然流量分布概率及待检测流量的流量分布数据。Step S10: Obtain the natural traffic distribution probability and the traffic distribution data of the traffic to be detected.

需要说明的是,本实施例的执行主体可以是所述流量作弊识别设备,所述流量作弊识别设备可以为个人电脑、服务器等电子设备,还可为其他可实现相同或相似功能的设备,本实施例对此不作限制,在本实施例以及下述各实施例中,以流量作弊识别设备为例对本发明流量作弊识别方法进行说明。It should be noted that the execution body of this embodiment may be the traffic cheating identification device, and the traffic cheating identification device may be an electronic device such as a personal computer or a server, or may be other devices that can implement the same or similar functions. The embodiment does not limit this. In this embodiment and the following embodiments, the traffic fraud identification method of the present invention is described by taking a traffic fraud identification device as an example.

需要说明的是,自然流量分布概率可以是根据不存在作弊现象的正常流量进行大数据统计得到的用户思考时长分布于各个预设时间分布区间的概率。用户思考时长为相邻用户操作的操作时间差,分布概率为用户思考时长在预设时间分布区间内的数量于用户思考时长的总量的比值,例如:两次相邻用户操作时间分别为:9:00:01,9:00:03,操作时间差为2秒,则对应的用户思考时长为2秒。预设时间分布区间可以根据实际需求进行划分,例如:以3秒为一个时间段划分时间分布区间,则对应的时间分布区间可以为0-3秒,3-6秒,6-9秒等,可以写作[0,3),[3,6),[6,9)。It should be noted that the natural traffic distribution probability may be the probability that the user's thinking time is distributed in each preset time distribution interval obtained by performing big data statistics on normal traffic without cheating phenomenon. The user's thinking time is the operation time difference between adjacent user operations, and the distribution probability is the ratio of the number of user's thinking time within the preset time distribution interval to the total user's thinking time. For example: two adjacent user operation times are: 9 :00:01, 9:00:03, the operation time difference is 2 seconds, the corresponding user thinking time is 2 seconds. The preset time distribution interval can be divided according to actual needs. For example, if the time distribution interval is divided into a time period of 3 seconds, the corresponding time distribution interval can be 0-3 seconds, 3-6 seconds, 6-9 seconds, etc., Can be written as [0,3), [3,6), [6,9).

需要说明的是,待检测流量可以为需要检测是否存在流量作弊的流量。待检测流量的流量分布数据可以包括流量分布概率及流量分布数量,流量分布数量为待检测流量对应的各个用户思考时长分布在预设时间分布区间内的数量,流量分布概率为待检测流量对应的各个用户思考时长分布预设时间分布区间内的概率。It should be noted that the traffic to be detected may be traffic that needs to be detected whether there is traffic cheating. The traffic distribution data of the traffic to be detected may include the probability of traffic distribution and the number of traffic distribution, the number of traffic distribution is the number of the thinking time durations of each user corresponding to the traffic to be detected distributed in the preset time distribution interval, and the probability of traffic distribution is the corresponding to the traffic to be detected. The probability of each user's thinking time distribution within the preset time distribution interval.

步骤S20:根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值。Step S20: Determine a cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability.

需要说明的是,作弊分值可以为用于表示待检测流量存在流量作弊可能性的量化分值。It should be noted that the cheating score may be a quantitative score used to indicate the possibility of traffic cheating in the traffic to be detected.

进一步地,为了便于计算作弊分值,本实施例步骤S20,可以为:Further, in order to facilitate the calculation of the cheating score, step S20 in this embodiment may be:

获取所述流量分布数据中的流量分布概率及流量分布数量;根据所述流量分布概率、所述流量分布数量及自然流量分布概率确定所述待检测流量对应的作弊分值。Obtain the traffic distribution probability and the traffic distribution quantity in the traffic distribution data; determine the cheating score corresponding to the traffic to be detected according to the traffic distribution probability, the traffic distribution quantity and the natural traffic distribution probability.

在实际使用中,可以通过根据流量分布概率、流量分布数量及自然流量分布概率通过作弊分值计算公式确定所述待检测流量对应的作弊分值。In actual use, the cheating score corresponding to the traffic to be detected may be determined by using the cheating score calculation formula according to the traffic distribution probability, the traffic distribution quantity and the natural traffic distribution probability.

作弊分值计算公式为:The formula for calculating cheating points is:

Figure BDA0002775874440000081
Figure BDA0002775874440000081

式中,score为作弊分,P(organicBini)为自然流量在第i个时间分布区间的分布概率,P(channelBini)为待检测流量在第i个时间分布区间的分布概率,N为待检测流量在第i个时间分布区间的分布数量,P(organicBinj)为自然流量在第j个时间分布区间的分布概率,P(channelBinj)为待检测流量在第j个时间分布区间的分布概率,M为待检测流量在第j个时间分布区间的分布数量。In the formula, score is the cheating score, P(organicBin i ) is the distribution probability of natural traffic in the ith time distribution interval, P(channelBin i ) is the distribution probability of the traffic to be detected in the ith time distribution interval, and N is the distribution probability of the traffic to be detected in the ith time distribution interval. The distribution quantity of the detected traffic in the ith time distribution interval, P(organicBin j ) is the distribution probability of the natural traffic in the jth time distribution interval, P(channelBin j ) is the distribution of the traffic to be detected in the jth time distribution interval probability, M is the distribution quantity of the traffic to be detected in the jth time distribution interval.

例如:假设自然流量分布概率为:第一预设时间分布区间的分布概率为1/10,第二预设时间分布区间的分布概率为4/10,第三预设时间分布区间的分布概率为2/10,第四预设时间分布区间的分布概率为3/10。待检测流量在第一预设时间分布区间的分布概率为1/3,分布数量为1,待检测流量在第四预设时间分布区间的分布概率为2/3,分布数量为2,则计算作弊分值score=[(1/3-1/10)*1+(2/3-3/10)*2]/(1+2)=0.322。For example, suppose the natural flow distribution probability is: the distribution probability of the first preset time distribution interval is 1/10, the distribution probability of the second preset time distribution interval is 4/10, and the distribution probability of the third preset time distribution interval is 2/10, and the distribution probability of the fourth preset time distribution interval is 3/10. The distribution probability of the traffic to be detected in the first preset time distribution interval is 1/3, the distribution quantity is 1, the distribution probability of the traffic to be detected in the fourth preset time distribution interval is 2/3, and the distribution quantity is 2, then calculate The cheating score score=[(1/3-1/10)*1+(2/3-3/10)*2]/(1+2)=0.322.

步骤S30:根据所述作弊分值判断所述待检测流量是否存在流量作弊。Step S30: Determine whether there is traffic cheating in the traffic to be detected according to the cheating score.

需要说明的是,在试验不变的条件下,重复试验多次,随机事件的频率近似于它的概率。在实际购买流量到达一定量级时,其用户思考时间分布应与自然流量中该分布保持一致,因此,作弊分值越高,则说明待检测流量与自然流量的分布差距越大,根据作弊分值可以判断待检测流量是否存在流量作弊。It should be noted that, under the condition that the experiment is unchanged, the frequency of the random event is approximated to its probability by repeating the experiment many times. When the actual purchase traffic reaches a certain level, the distribution of the user's thinking time should be consistent with the distribution in the natural traffic. Therefore, the higher the cheating score, the larger the distribution gap between the traffic to be detected and the natural traffic. The value can determine whether there is traffic cheating in the traffic to be detected.

在实际使用中,可以预设作弊阈值,在作弊分值大于预设作弊阈值时,判定待检测流量存在流量作弊;在作弊分值小于或等于预设作弊阈值时,判定待检测流量不存在流量作弊。In actual use, the cheating threshold can be preset. When the cheating score is greater than the preset cheating threshold, it is determined that there is traffic cheating in the traffic to be detected; when the cheating score is less than or equal to the preset cheating threshold, it is determined that there is no traffic in the traffic to be detected. Cheating.

例如:预设作弊阈值为0.4,在计算得到的作弊分值大于0.4时,判定待检测流量存在流量作弊,在计算得到的作弊分值小于或等于0.4时,判定待检测流量不存在流量作弊。For example, the preset cheating threshold is 0.4. When the calculated cheating score is greater than 0.4, it is determined that there is traffic cheating in the traffic to be detected. When the calculated cheating score is less than or equal to 0.4, it is determined that there is no traffic cheating in the traffic to be detected.

本实施例通过获取自然流量分布概率及待检测流量的流量分布数据;根据流量分布数据及自然流量分布概率确定待检测流量对应的作弊分值;根据作弊分值判断待检测流量是否存在流量作弊。由于是将待检测流量的流量分布数据与大数据统计得到的自然流量分布概率进行对比计算,得到对应的作弊分值,作弊分值可以表示待检测流量的流量分布与不存在流量作弊的自然流量的分布差异度,因此,根据作弊分值即可判断是否存在流量作弊,利于保护流量购买者的正当权益。In this embodiment, the natural traffic distribution probability and the traffic distribution data of the to-be-detected traffic are obtained; the cheating score corresponding to the to-be-detected traffic is determined according to the traffic distribution data and the natural traffic distribution probability; and whether the to-be-detected traffic has traffic cheating is determined according to the cheating score. Since the traffic distribution data of the traffic to be detected is compared with the natural traffic distribution probability obtained by big data statistics, the corresponding cheating score is obtained. The cheating score can represent the traffic distribution of the traffic to be detected and the natural traffic without traffic cheating. Therefore, according to the cheating score, it can be judged whether there is traffic cheating, which is beneficial to protect the legitimate rights and interests of traffic buyers.

参考图3,图3为本发明一种流量作弊识别方法第二实施例的流程示意图。Referring to FIG. 3 , FIG. 3 is a schematic flowchart of a second embodiment of a traffic cheating identification method according to the present invention.

基于上述第一实施例,本实施例流量作弊识别方法在所述步骤S10之前,还包括:Based on the above-mentioned first embodiment, before the step S10, the method for identifying traffic cheating in this embodiment further includes:

步骤S01:根据待检测流量中包含的用户操作信息确定用户思考时长。Step S01: Determine the user thinking time period according to the user operation information included in the traffic to be detected.

需要说明的是,用户操作信息可以包括用户操作类型、用户操作时间等信息。It should be noted that the user operation information may include user operation type, user operation time and other information.

进一步地,为了确定用户思考时长,本实施例步骤S01,可以为:Further, in order to determine the user's thinking time, step S01 in this embodiment may be:

获取待检测流量中包含的用户操作信息确定相邻用户操作的操作时间;根据所述操作时间确定所述相邻用户操作的操作时间差,将所述操作时间差作为对应的用户思考时长。Obtain the user operation information contained in the traffic to be detected to determine the operation time of adjacent user operations; determine the operation time difference of the adjacent user operations according to the operation time, and use the operation time difference as the corresponding user thinking time.

例如:根据用户操作信息可以确定用户A共有三次操作,操作时间分别为:9:00:00,9:00:02,9:00:03,则对应的用户思考时长有两个,分别为用户思考时长A,2秒,用户思考时长B,1秒。For example, according to the user operation information, it can be determined that user A has a total of three operations, and the operation times are: 9:00:00, 9:00:02, and 9:00:03, then there are two corresponding user thinking times, one for the user Thinking time A, 2 seconds, user thinking time B, 1 second.

步骤S02:根据所述用户思考时长确定待检测流量的流量分布数据。Step S02: Determine the traffic distribution data of the traffic to be detected according to the user's thinking time.

可以理解的是,流量分布数据可以包括流量分布数量及流量分布概率,根据预设时间分布区间将时间分布时长分组后进行计算即可得到对应的流量分布数量及流量分布概率,在通过流量分布数量及流量分布概率组合即可确定待检测流量的流量分布数据。It can be understood that the traffic distribution data may include the number of traffic distributions and the probability of traffic distribution, and the corresponding number of traffic distributions and probability of traffic distribution can be obtained by grouping the time distribution durations according to the preset time distribution interval and calculating the corresponding number of traffic distributions and probability of traffic distribution. The traffic distribution data of the traffic to be detected can be determined by combining with the traffic distribution probability.

在实际使用中,可以根据预设时间分布区间对用户思考时长进行分组,将各个预设时间分布区间对应的用户思考时长的数量作为对应的流量分布数量;根据所述流量分布数量及用户思考时长的总数确定各个预设时间分布区间的流量分布概率;根据所述流量分布概率及流量分布数量确定待检测流量的流量分布数据。In actual use, the user's thinking time can be grouped according to the preset time distribution interval, and the number of the user's thinking time corresponding to each preset time distribution interval is used as the corresponding traffic distribution quantity; Determine the traffic distribution probability of each preset time distribution interval; determine the traffic distribution data of the traffic to be detected according to the traffic distribution probability and the traffic distribution quantity.

例如:以3秒为一段划分时间分布区间,预设时间分布区间有4个:第一时间分布区间[0,3),第二时间分布区间[3,6),第三时间分布区间[6,9),第四时间分布区间[9,12);用户操作共有6个,操作时间分别为:9:00:00,9:00:02,9:00:03,9:00:07,9:00:17,9:00:00,则用户思考时长共5个,分别记为A、B、C、D、E,用户思考时长分别为2秒、1秒、4秒、7秒、10秒,则第一时间分布区间对应的用户思考时长为A、B,分布数量为2个,第二时间分布区间对应的用户思考时长为C,分布数量为1个,第三时间分布区间对应的用户思考时长为D,分布数量为1个,第四时间分布区间对应的用户思考时长为E,分布数量为1个,则各个时间分布区间对应的分布概率为2/5、1/5、1/5、1/5、1/5。For example, the time distribution interval is divided into a period of 3 seconds, and there are 4 preset time distribution intervals: the first time distribution interval [0,3), the second time distribution interval [3,6), and the third time distribution interval [6 ,9), the fourth time distribution interval [9,12); there are 6 user operations, and the operation times are: 9:00:00, 9:00:02, 9:00:03, 9:00:07, 9:00:17, 9:00:00, the user thinks for a total of 5 times, denoted as A, B, C, D, E, and the user thinks for 2 seconds, 1 second, 4 seconds, 7 seconds, 10 seconds, the user thinking time corresponding to the first time distribution interval is A and B, and the number of distributions is 2; the user thinking time corresponding to the second time distribution interval is C, the distribution number is 1, and the third time distribution interval corresponds to The user’s thinking time is D, the number of distributions is 1, the user’s thinking time corresponding to the fourth time distribution interval is E, and the number of distributions is 1, then the distribution probability corresponding to each time distribution interval is 2/5, 1/5, 1/5, 1/5, 1/5.

进一步地,为了判断是否需要计算作弊分值,本实施例步骤S20之前,还可以包括:Further, in order to determine whether the cheating score needs to be calculated, before step S20 in this embodiment, the method may further include:

根据所述流量分布数据及所述自然流量分布概率计算相对熵值;在所述相对熵值满足作弊分值计算条件时,执行所述根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值的步骤。Calculate the relative entropy value according to the traffic distribution data and the natural traffic distribution probability; when the relative entropy value satisfies the cheating score calculation condition, execute the determining according to the traffic distribution data and the natural traffic distribution probability. Describe the steps of the cheating score corresponding to the traffic to be detected.

需要说明的是,并非所有的流量都需要进行作弊分值计算,可以先计算待检测流量对应的相对熵值,通过相对熵值评估待检测流量存在流量作弊的可能性,在相对熵值满足作弊分值计算条件时,即流量作弊可能性较高时,计算作弊分值用于判断待检测流量是否存在流量作弊。It should be noted that not all traffic needs to be calculated for cheating score. The relative entropy value corresponding to the traffic to be detected can be calculated first, and the relative entropy value can be used to evaluate the possibility of traffic cheating in the traffic to be detected. When the score calculation condition is used, that is, when the possibility of traffic cheating is high, the calculated cheat score is used to determine whether there is traffic cheating in the traffic to be detected.

在实际使用中,可以预设相对熵阈值,在计算得到的相对熵值大于预设相对熵阈值时,判定相对熵值满足作弊分值计算条件。In actual use, a relative entropy threshold may be preset, and when the calculated relative entropy value is greater than the preset relative entropy threshold, it is determined that the relative entropy value satisfies the cheating score calculation condition.

例如:预设相对熵阈值为0.5,计算得到的相对熵值为0.6,则此时判定相对熵值满足作弊分值计算条件,可以计算作弊分值用于判断待检测流量是否存在流量作弊。For example, the preset relative entropy threshold is 0.5, and the calculated relative entropy value is 0.6. At this time, it is determined that the relative entropy value satisfies the cheating score calculation conditions, and the cheating score can be calculated to determine whether there is traffic cheating in the traffic to be detected.

进一步地,为了便于计算相对熵值,本实施例根据所述流量分布数据及所述自然流量分布概率计算相对熵值的步骤,可以为:Further, in order to facilitate the calculation of the relative entropy value, the step of calculating the relative entropy value according to the traffic distribution data and the natural traffic distribution probability in this embodiment may be:

获取所述流量分布数据中的流量分布概率;根据所述流量分布概率及所述自然流量分布概率计算相对熵值。Obtain the traffic distribution probability in the traffic distribution data; calculate the relative entropy value according to the traffic distribution probability and the natural traffic distribution probability.

在实际使用中,可以根据所述流量分布概率及所述自然流量分布概率通过相对熵计算公式计算相对熵值;In actual use, the relative entropy value can be calculated by the relative entropy calculation formula according to the flow distribution probability and the natural flow distribution probability;

所述相对熵计算公式为:The relative entropy calculation formula is:

Figure BDA0002775874440000111
Figure BDA0002775874440000111

式中,DKL(p||q)为相对熵值,p(xi)为自然流量在第i个时间分布区间的分布概率,q为待检测流量在第i个时间分布区间的分布概率,N为时间分布区间总数。In the formula, D KL (p||q) is the relative entropy value, p(x i ) is the distribution probability of natural flow in the ith time distribution interval, q is the distribution probability of the flow to be detected in the ith time distribution interval , N is the total number of time distribution intervals.

本实施例通过根据待检测流量中包含的用户操作信息确定用户思考时长,根据所述用户思考时长确定待检测流量的流量分布数据。可提前构建好计算作弊分值需要的各项数据,便于进行作弊分值计算,提高了计算作弊分值的效率。In this embodiment, the user's thinking duration is determined according to the user operation information included in the traffic to be detected, and the traffic distribution data of the traffic to be detected is determined according to the user's thinking duration. Various data required for calculating cheating points can be constructed in advance, which is convenient for calculating cheating points and improves the efficiency of calculating cheating points.

参考图4,图4为本发明一种流量作弊识别方法第三实施例的流程示意图。Referring to FIG. 4 , FIG. 4 is a schematic flowchart of a third embodiment of a traffic cheating identification method according to the present invention.

基于上述第一实施例,本实施例流量作弊识别方法在所述步骤S30之前,还包括:Based on the above-mentioned first embodiment, before the step S30, the method for identifying traffic cheating in this embodiment further includes:

步骤S201:根据所述流量分布数据及所述自然流量分布概率计算相对熵值;Step S201: Calculate a relative entropy value according to the flow distribution data and the natural flow distribution probability;

进一步地,为了便于计算相对熵值,本实施例步骤S201,可以为:Further, in order to facilitate the calculation of the relative entropy value, step S201 in this embodiment may be:

获取所述流量分布数据中的流量分布概率;根据所述流量分布概率及所述自然流量分布概率计算相对熵值。Obtain the traffic distribution probability in the traffic distribution data; calculate the relative entropy value according to the traffic distribution probability and the natural traffic distribution probability.

在实际使用中,可以根据所述流量分布概率及所述自然流量分布概率通过相对熵计算公式计算相对熵值;In actual use, the relative entropy value can be calculated by the relative entropy calculation formula according to the flow distribution probability and the natural flow distribution probability;

所述相对熵计算公式为:The relative entropy calculation formula is:

Figure BDA0002775874440000121
Figure BDA0002775874440000121

式中,DKL(p||q)为相对熵值,p(xi)为自然流量在第i个时间分布区间的分布概率,q为待检测流量在第i个时间分布区间的分布概率,N为时间分布区间总数。In the formula, D KL (p||q) is the relative entropy value, p(x i ) is the distribution probability of natural flow in the ith time distribution interval, q is the distribution probability of the flow to be detected in the ith time distribution interval , N is the total number of time distribution intervals.

相应地,在本实施例中所述步骤S30,包括:Correspondingly, the step S30 in this embodiment includes:

步骤S30':根据所述作弊分值及所述相对熵值判断所述待检测流量是否存在流量作弊。Step S30': Determine whether there is traffic cheating in the traffic to be detected according to the cheating score and the relative entropy value.

可以理解的是,仅通过作弊分值判断待检测流量可能会存在误判的情况,因此,可以通过计算相对熵值,同时使用相对熵值及作弊分值判断待检测流量是否存在流量作弊,会更加准确。It is understandable that there may be a misjudgment of the traffic to be detected only by the cheating score. Therefore, the relative entropy value can be calculated, and the relative entropy value and the cheating score can be used to determine whether there is traffic cheating in the traffic to be detected. more precise.

在实际使用中,可以预设作弊阈值及预设相对熵阈值,在作弊分值大于预设作弊阈值且相对熵值大于预设相对熵阈值时,判定待检测流量存在流量作弊;在作弊分值不大于预设作弊阈值或相对熵值不大于预设相对熵阈值时,判定待检测流量不存在流量作弊。In actual use, the cheating threshold and the preset relative entropy threshold can be preset, and when the cheating score is greater than the preset cheating threshold and the relative entropy value is greater than the preset relative entropy threshold, it is determined that there is traffic cheating in the traffic to be detected; When it is not greater than the preset cheating threshold or the relative entropy value is not greater than the preset relative entropy threshold, it is determined that there is no traffic cheating in the traffic to be detected.

例如:预设作弊阈值为0.4,预设相对熵阈值为0.5,在待检测流量的作弊分值为0.6,相对熵值为0.7时,判定待检测流量存在流量作弊,在待检测流量的作弊分值为0.3,相对熵值为0.6时,判定待检测流量不存在流量作弊,在待检测流量的作弊分值为0.5,相对熵值为0.4时,判定待检测流量不存在流量作弊,在待检测流量的作弊分值为0.3,相对熵值为0.3时,判定待检测流量不存在流量作弊。For example: the preset cheating threshold is 0.4, the preset relative entropy threshold is 0.5, when the cheating score of the traffic to be detected is 0.6, and the relative entropy value is 0.7, it is determined that there is traffic cheating in the traffic to be detected, and the cheating score of the traffic to be detected is determined. When the value is 0.3 and the relative entropy value is 0.6, it is determined that there is no traffic cheating in the traffic to be detected. When the cheating score of the traffic to be detected is 0.5 and the relative entropy value is 0.4, it is determined that there is no traffic cheating in the traffic to be detected. When the fraud score of the traffic is 0.3 and the relative entropy value is 0.3, it is determined that there is no traffic fraud in the traffic to be detected.

本实施例通过根据所述流量分布数据及所述自然流量分布概率计算相对熵值,再根据所述作弊分值及所述相对熵值判断所述待检测流量是否存在流量作弊。同时使用作弊分值及相对熵值判断待检测流量是否存在流量作弊,使得流量作弊判断更加准确且更加不易被流量作弊方破解,提高了流量作弊识别方法的准确性和可靠性。In this embodiment, the relative entropy value is calculated according to the traffic distribution data and the natural traffic distribution probability, and then it is judged whether the traffic to be detected has traffic cheating according to the cheating score and the relative entropy value. At the same time, the cheating score and relative entropy value are used to judge whether there is traffic cheating in the traffic to be detected, which makes the traffic cheating judgment more accurate and less easy to be cracked by the traffic cheating party, and improves the accuracy and reliability of the traffic cheating identification method.

此外,本发明实施例还提出一种存储介质,所述存储介质上存储有流量作弊识别程序,所述流量作弊识别程序被处理器执行时实现如上文所述的流量作弊识别方法的步骤。In addition, an embodiment of the present invention further provides a storage medium, where a traffic cheating identification program is stored thereon, and when the flow cheating identification program is executed by a processor, the steps of the traffic cheating identification method as described above are implemented.

参照图5,图5为本发明流量作弊识别装置第一实施例的结构框图。Referring to FIG. 5 , FIG. 5 is a structural block diagram of the first embodiment of the apparatus for identifying traffic cheating according to the present invention.

如图5所示,本发明实施例提出的流量作弊识别装置包括:As shown in FIG. 5 , the device for identifying traffic cheating provided by the embodiment of the present invention includes:

数据获取模块501,用于获取自然流量分布概率及待检测流量的流量分布数据;The data acquisition module 501 is used for acquiring the natural traffic distribution probability and the traffic distribution data of the traffic to be detected;

分值计算模块502,用于根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值;A score calculation module 502, configured to determine the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability;

作弊识别模块503,用于根据所述作弊分值判断所述待检测流量是否存在流量作弊。The cheating identification module 503 is configured to judge whether the flow to be detected has flow cheating according to the cheating score.

本实施例通过获取自然流量分布概率及待检测流量的流量分布数据;根据流量分布数据及自然流量分布概率确定待检测流量对应的作弊分值;根据作弊分值判断待检测流量是否存在流量作弊。由于是将待检测流量的流量分布数据与大数据统计得到的自然流量分布概率进行对比计算,得到对应的作弊分值,作弊分值可以表示待检测流量的流量分布与不存在流量作弊的自然流量的分布差异度,因此,根据作弊分值即可判断是否存在流量作弊,利于保护流量购买者的正当权益。In this embodiment, the natural traffic distribution probability and the traffic distribution data of the to-be-detected traffic are obtained; the cheating score corresponding to the to-be-detected traffic is determined according to the traffic distribution data and the natural traffic distribution probability; and whether the traffic to be detected has traffic cheating is determined according to the cheating score. Because the traffic distribution data of the traffic to be detected is compared with the natural traffic distribution probability obtained by big data statistics, the corresponding cheating score is obtained. The cheating score can represent the traffic distribution of the traffic to be detected and the natural traffic without traffic cheating. Therefore, according to the cheating score, it can be judged whether there is traffic cheating, which is beneficial to protect the legitimate rights and interests of traffic buyers.

进一步地,所述数据获取模块501,还用于根据待检测流量中包含的用户操作信息确定用户思考时长;根据所述用户思考时长确定待检测流量的流量分布数据。Further, the data acquisition module 501 is further configured to determine the user's thinking duration according to the user operation information contained in the traffic to be detected; and determine the traffic distribution data of the traffic to be detected according to the user's thinking duration.

进一步地,所述数据获取模块501,还用于获取待检测流量中包含的用户操作信息确定相邻用户操作的操作时间;根据所述操作时间确定所述相邻用户操作的操作时间差,将所述操作时间差作为对应的用户思考时长。Further, the data acquisition module 501 is further configured to acquire the user operation information contained in the traffic to be detected to determine the operation time of adjacent user operations; determine the operation time difference of the adjacent user operations according to the operation time, The above operation time difference is used as the corresponding user thinking time.

进一步地,所述数据获取模块501,还用于根据预设时间分布区间对所述用户思考时长进行分组,将各个预设时间分布区间对应的用户思考时长的数量作为对应的流量分布数量;根据所述流量分布数量及用户思考时长的总数确定各个预设时间分布区间的流量分布概率;根据所述流量分布概率及流量分布数量确定待检测流量的流量分布数据。Further, the data acquisition module 501 is further configured to group the user thinking durations according to preset time distribution intervals, and use the number of user thinking durations corresponding to each preset time distribution interval as the corresponding traffic distribution quantity; The traffic distribution number and the total number of user thinking time determine the traffic distribution probability of each preset time distribution interval; the traffic distribution data of the traffic to be detected is determined according to the traffic distribution probability and the traffic distribution number.

进一步地,所述分值计算模块502,还用于获取所述流量分布数据中的流量分布概率及流量分布数量;根据所述流量分布概率、所述流量分布数量及自然流量分布概率确定所述待检测流量对应的作弊分值。Further, the score calculation module 502 is further configured to obtain the flow distribution probability and the flow distribution quantity in the flow distribution data; determine the flow distribution probability according to the flow distribution probability, the flow distribution quantity and the natural flow distribution probability. Fraud score corresponding to the traffic to be detected.

进一步地,所述分值计算模块502,还用于根据所述流量分布概率、所述流量分布数量及所述自然流量分布概率通过作弊分值计算公式确定所述待检测流量对应的作弊分值;Further, the score calculation module 502 is further configured to determine the cheating score corresponding to the traffic to be detected through the cheating score calculation formula according to the traffic distribution probability, the traffic distribution quantity and the natural traffic distribution probability. ;

所述作弊分值计算公式为:The formula for calculating the cheating score is:

Figure BDA0002775874440000141
Figure BDA0002775874440000141

式中,score为作弊分,P(organicBini)为自然流量在第i个时间分布区间的分布概率,P(channelBini)为待检测流量在第i个时间分布区间的分布概率,N为待检测流量在第i个时间分布区间的分布数量,P(organicBinj)为自然流量在第j个时间分布区间的分布概率,P(channelBinj)为待检测流量在第j个时间分布区间的分布概率,M为待检测流量在第j个时间分布区间的分布数量。In the formula, score is the cheating score, P(organicBin i ) is the distribution probability of natural traffic in the ith time distribution interval, P(channelBin i ) is the distribution probability of the traffic to be detected in the ith time distribution interval, and N is the distribution probability of the traffic to be detected in the ith time distribution interval. The distribution quantity of the detected traffic in the ith time distribution interval, P(organicBin j ) is the distribution probability of the natural traffic in the jth time distribution interval, P(channelBin j ) is the distribution of the traffic to be detected in the jth time distribution interval probability, M is the distribution quantity of the traffic to be detected in the jth time distribution interval.

进一步地,所述分值计算模块502,还用于根据所述流量分布数据及所述自然流量分布概率计算相对熵值;在所述相对熵值满足作弊分值计算条件时,执行所述根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值的步骤。Further, the score calculation module 502 is further configured to calculate the relative entropy value according to the traffic distribution data and the natural traffic distribution probability; when the relative entropy value satisfies the cheating score calculation condition, execute the The flow distribution data and the natural flow distribution probability determine the cheating score corresponding to the flow to be detected.

进一步地,所述分值计算模块502,还用于获取所述流量分布数据中的流量分布概率;根据所述流量分布概率及所述自然流量分布概率计算相对熵值。Further, the score calculation module 502 is further configured to obtain the traffic distribution probability in the traffic distribution data; calculate the relative entropy value according to the traffic distribution probability and the natural traffic distribution probability.

进一步地,所述分值计算模块502,还用于根据所述流量分布概率及所述自然流量分布概率通过相对熵计算公式计算相对熵值;Further, the score calculation module 502 is further configured to calculate the relative entropy value through the relative entropy calculation formula according to the flow distribution probability and the natural flow distribution probability;

所述相对熵计算公式为:The relative entropy calculation formula is:

Figure BDA0002775874440000142
Figure BDA0002775874440000142

式中,DKL(p||q)为相对熵值,p(xi)为自然流量在第i个时间分布区间的分布概率,q为待检测流量在第i个时间分布区间的分布概率,N为时间分布区间总数。In the formula, D KL (p||q) is the relative entropy value, p(x i ) is the distribution probability of natural flow in the ith time distribution interval, q is the distribution probability of the flow to be detected in the ith time distribution interval , N is the total number of time distribution intervals.

进一步地,所述分值计算模块502,还用于根据所述流量分布数据及所述自然流量分布概率计算相对熵值;Further, the score calculation module 502 is further configured to calculate the relative entropy value according to the flow distribution data and the natural flow distribution probability;

所述作弊识别模块503,还用于根据所述作弊分值及所述相对熵值判断所述待检测流量是否存在流量作弊。The cheating identification module 503 is further configured to judge whether the flow to be detected has flow cheating according to the cheating score and the relative entropy value.

进一步地,所述作弊识别模块503,还用于在所述作弊分值大于预设作弊阈值且所述相对熵值大于预设相对熵阈值时,判定所述待检测流量存在流量作弊;在所述作弊分值不大于所述预设作弊阈值或所述相对熵值不大于所述预设相对熵阈值时,判定所述待检测流量不存在流量作弊。Further, the cheating identification module 503 is further configured to determine that the flow to be detected has flow cheating when the cheating score is greater than a preset cheating threshold and the relative entropy value is greater than a preset relative entropy threshold; When the cheating score is not greater than the preset cheating threshold or the relative entropy value is not greater than the preset relative entropy threshold, it is determined that there is no traffic cheating in the traffic to be detected.

进一步地,所述作弊识别模块503,还用于在所述作弊分值大于预设作弊阈值时,判定所述待检测流量存在流量作弊;在所述作弊分值小于或等于所述预设作弊阈值时,判定所述待检测流量不存在流量作弊。Further, the cheating identification module 503 is further configured to determine that there is traffic cheating in the traffic to be detected when the cheating score is greater than a preset cheating threshold; when the cheating score is less than or equal to the preset cheating When the threshold is reached, it is determined that there is no traffic cheating in the traffic to be detected.

应当理解的是,以上仅为举例说明,对本发明的技术方案并不构成任何限定,在具体应用中,本领域的技术人员可以根据需要进行设置,本发明对此不做限制。It should be understood that the above are only examples, and do not constitute any limitation to the technical solutions of the present invention. In specific applications, those skilled in the art can make settings as required, which is not limited by the present invention.

需要说明的是,以上所描述的工作流程仅仅是示意性的,并不对本发明的保护范围构成限定,在实际应用中,本领域的技术人员可以根据实际的需要选择其中的部分或者全部来实现本实施例方案的目的,此处不做限制。It should be noted that the above-described workflow is only illustrative, and does not limit the protection scope of the present invention. In practical applications, those skilled in the art can select some or all of them to implement according to actual needs. The purpose of the solution in this embodiment is not limited here.

另外,未在本实施例中详尽描述的技术细节,可参见本发明任意实施例所提供的流量作弊识别方法,此处不再赘述。In addition, for technical details that are not described in detail in this embodiment, reference may be made to the traffic cheating identification method provided by any embodiment of the present invention, and details are not repeated here.

此外,需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。Furthermore, it should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or system comprising a series of elements includes not only those elements, but also other elements not expressly listed or inherent to such a process, method, article or system. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article or system that includes the element.

上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如只读存储器(Read Only Memory,ROM)/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on such understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as a read-only memory). , ROM)/RAM, magnetic disk, optical disk), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) execute the methods described in the various embodiments of the present invention.

以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。The above are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present invention, or directly or indirectly applied in other related technical fields , are similarly included in the scope of patent protection of the present invention.

本发明公开了A1、一种流量作弊识别方法,所述流量作弊识别方法包括以下步骤:The present invention discloses A1, a method for identifying traffic cheating. The method for identifying traffic cheating includes the following steps:

获取自然流量分布概率及待检测流量的流量分布数据;Obtain the natural traffic distribution probability and the traffic distribution data of the traffic to be detected;

根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值;Determine the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability;

根据所述作弊分值判断所述待检测流量是否存在流量作弊。Determine whether there is traffic fraud in the traffic to be detected according to the fraud score.

A2、如A1所述的流量作弊识别方法,所述获取自然流量分布概率及待检测流量的流量分布数据的步骤之前,还包括:A2. The method for identifying traffic cheating according to A1, before the step of acquiring the natural traffic distribution probability and the traffic distribution data of the traffic to be detected, further comprising:

根据待检测流量中包含的用户操作信息确定用户思考时长;Determine the user's thinking time according to the user operation information contained in the traffic to be detected;

根据所述用户思考时长确定待检测流量的流量分布数据。The traffic distribution data of the traffic to be detected is determined according to the user's thinking time.

A3、如A2所述的流量作弊识别方法,所述根据待检测流量中包含的用户操作信息确定用户思考时长的步骤,包括:A3. The method for identifying traffic cheating according to A2, wherein the step of determining the user's thinking time according to the user operation information contained in the traffic to be detected includes:

获取待检测流量中包含的用户操作信息确定相邻用户操作的操作时间;Obtain the user operation information contained in the traffic to be detected to determine the operation time of adjacent user operations;

根据所述操作时间确定所述相邻用户操作的操作时间差,将所述操作时间差作为对应的用户思考时长。The operation time difference between adjacent user operations is determined according to the operation time, and the operation time difference is used as the corresponding user thinking time period.

A4、如A2所述的流量作弊识别方法,所述根据所述用户思考时长确定待检测流量的流量分布数据的步骤,包括:A4. The method for identifying traffic cheating according to A2, wherein the step of determining the traffic distribution data of the traffic to be detected according to the user's thinking time includes:

根据预设时间分布区间对所述用户思考时长进行分组,将各个预设时间分布区间对应的用户思考时长的数量作为对应的流量分布数量;Group the user thinking durations according to preset time distribution intervals, and use the number of user thinking durations corresponding to each preset time distribution interval as the corresponding traffic distribution number;

根据所述流量分布数量及用户思考时长的总数确定各个预设时间分布区间的流量分布概率;Determine the traffic distribution probability of each preset time distribution interval according to the traffic distribution quantity and the total number of user thinking time;

根据所述流量分布概率及流量分布数量确定待检测流量的流量分布数据。The flow distribution data of the flow to be detected is determined according to the flow distribution probability and the flow distribution quantity.

A5、如A1所述的流量作弊识别方法,所述根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值的步骤包括:A5. The traffic cheating identification method according to A1, wherein the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability includes:

获取所述流量分布数据中的流量分布概率及流量分布数量;obtaining the traffic distribution probability and the traffic distribution quantity in the traffic distribution data;

根据所述流量分布概率、所述流量分布数量及自然流量分布概率确定所述待检测流量对应的作弊分值。The cheating score corresponding to the traffic to be detected is determined according to the traffic distribution probability, the traffic distribution quantity and the natural traffic distribution probability.

A6、如A5所述的流量作弊识别方法,所述根据所述流量分布概率、所述流量分布数量及自然流量分布概率确定所述待检测流量对应的作弊分值的步骤,包括:A6. The method for identifying traffic cheating according to A5, wherein the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution probability, the traffic distribution quantity and the natural traffic distribution probability includes:

根据所述流量分布概率、所述流量分布数量及所述自然流量分布概率通过作弊分值计算公式确定所述待检测流量对应的作弊分值;According to the traffic distribution probability, the traffic distribution quantity and the natural traffic distribution probability, the cheating score corresponding to the traffic to be detected is determined by the cheating score calculation formula;

所述作弊分值计算公式为:The formula for calculating the cheating score is:

Figure BDA0002775874440000171
Figure BDA0002775874440000171

式中,score为作弊分,P(organicBini)为自然流量在第i个时间分布区间的分布概率,P(channelBini)为待检测流量在第i个时间分布区间的分布概率,N为待检测流量在第i个时间分布区间的分布数量,P(organicBinj)为自然流量在第j个时间分布区间的分布概率,P(channelBinj)为待检测流量在第j个时间分布区间的分布概率,M为待检测流量在第j个时间分布区间的分布数量。In the formula, score is the cheating score, P(organicBin i ) is the distribution probability of natural traffic in the ith time distribution interval, P(channelBin i ) is the distribution probability of the traffic to be detected in the ith time distribution interval, and N is the distribution probability of the traffic to be detected in the ith time distribution interval. The distribution quantity of the detected traffic in the ith time distribution interval, P(organicBin j ) is the distribution probability of the natural traffic in the jth time distribution interval, P(channelBin j ) is the distribution of the traffic to be detected in the jth time distribution interval probability, M is the distribution quantity of the traffic to be detected in the jth time distribution interval.

A7、如A1所述的流量作弊识别方法,所述根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值的步骤之前,还包括:A7. The traffic cheating identification method according to A1, before the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability, further comprising:

根据所述流量分布数据及所述自然流量分布概率计算相对熵值;Calculate the relative entropy value according to the traffic distribution data and the natural traffic distribution probability;

在所述相对熵值满足作弊分值计算条件时,执行所述根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值的步骤。When the relative entropy value satisfies the cheating score calculation condition, the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability is performed.

A8、如A7所述的流量作弊识别方法,所述根据所述流量分布数据及所述自然流量分布概率计算相对熵值的步骤,包括:A8. The method for identifying traffic cheating according to A7, wherein the step of calculating the relative entropy value according to the traffic distribution data and the natural traffic distribution probability includes:

获取所述流量分布数据中的流量分布概率;obtaining the traffic distribution probability in the traffic distribution data;

根据所述流量分布概率及所述自然流量分布概率计算相对熵值。The relative entropy value is calculated according to the flow distribution probability and the natural flow distribution probability.

A9、如A8所述的流量作弊识别方法,所述根据所述流量分布概率及所述自然流量分布概率计算相对熵值的步骤,包括:A9. The method for identifying traffic cheating according to A8, wherein the step of calculating the relative entropy value according to the traffic distribution probability and the natural traffic distribution probability includes:

根据所述流量分布概率及所述自然流量分布概率通过相对熵计算公式计算相对熵值;Calculate the relative entropy value through the relative entropy calculation formula according to the flow distribution probability and the natural flow distribution probability;

所述相对熵计算公式为:The relative entropy calculation formula is:

Figure BDA0002775874440000181
Figure BDA0002775874440000181

式中,DKL(p||q)为相对熵值,p(xi)为自然流量在第i个时间分布区间的分布概率,q为待检测流量在第i个时间分布区间的分布概率,N为时间分布区间总数。In the formula, D KL (p||q) is the relative entropy value, p(x i ) is the distribution probability of natural flow in the ith time distribution interval, q is the distribution probability of the flow to be detected in the ith time distribution interval , N is the total number of time distribution intervals.

A10、如A1所述的流量作弊识别方法,所述根据所述作弊分值判断所述待检测流量是否存在流量作弊的步骤之前,还包括:A10. The method for identifying traffic cheating according to A1, before the step of judging whether there is traffic cheating in the traffic to be detected according to the cheating score, further comprising:

根据所述流量分布数据及所述自然流量分布概率计算相对熵值;Calculate the relative entropy value according to the traffic distribution data and the natural traffic distribution probability;

所述根据所述作弊分值判断所述待检测流量是否存在流量作弊的步骤,包括:The step of judging whether there is traffic cheating in the traffic to be detected according to the cheating score includes:

根据所述作弊分值及所述相对熵值判断所述待检测流量是否存在流量作弊。According to the cheating score and the relative entropy value, it is determined whether there is traffic cheating in the to-be-detected traffic.

A11、如A10所述的流量作弊识别方法,所述根据所述作弊分值及所述相对熵值判断所述待检测流量是否存在流量作弊的步骤,包括:A11. The method for identifying traffic fraud according to A10, wherein the step of judging whether the traffic to be detected has traffic fraud according to the fraud score and the relative entropy value includes:

在所述作弊分值大于预设作弊阈值且所述相对熵值大于预设相对熵阈值时,判定所述待检测流量存在流量作弊;When the cheating score is greater than the preset cheating threshold and the relative entropy value is greater than the preset relative entropy threshold, it is determined that the traffic to be detected has traffic cheating;

在所述作弊分值不大于所述预设作弊阈值或所述相对熵值不大于所述预设相对熵阈值时,判定所述待检测流量不存在流量作弊。When the cheating score is not greater than the preset cheating threshold or the relative entropy value is not greater than the preset relative entropy threshold, it is determined that there is no traffic cheating in the traffic to be detected.

A12、如A1-A9任一项所述的流量作弊识别方法,所述根据所述作弊分值判断所述待检测流量是否存在流量作弊的步骤,包括:A12. The method for identifying traffic cheating according to any one of A1-A9, wherein the step of judging whether there is traffic cheating in the traffic to be detected according to the cheating score includes:

在所述作弊分值大于预设作弊阈值时,判定所述待检测流量存在流量作弊;When the cheating score is greater than a preset cheating threshold, it is determined that the flow to be detected has flow cheating;

在所述作弊分值小于或等于所述预设作弊阈值时,判定所述待检测流量不存在流量作弊。When the cheating score is less than or equal to the preset cheating threshold, it is determined that there is no flow cheating in the traffic to be detected.

本发明公开了B13、一种流量作弊识别装置,所述流量作弊识别装置包括:The present invention discloses B13, a traffic cheating identification device, and the traffic cheating identification device includes:

数据获取模块,用于获取自然流量分布概率及待检测流量的流量分布数据;The data acquisition module is used to acquire the natural traffic distribution probability and the traffic distribution data of the traffic to be detected;

分值计算模块,用于根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值;a score calculation module, configured to determine the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability;

作弊识别模块,用于根据所述作弊分值判断所述待检测流量是否存在流量作弊。A cheating identification module, configured to determine whether there is traffic cheating in the traffic to be detected according to the cheating score.

B14、如B13所述的流量作弊识别装置,所述数据获取模块,还用于根据待检测流量中包含的用户操作信息确定用户思考时长;根据所述用户思考时长确定待检测流量的流量分布数据。B14. The device for identifying traffic cheating according to B13, wherein the data acquisition module is further configured to determine the user's thinking time according to the user operation information contained in the traffic to be detected; and determine the traffic distribution data of the to-be-detected traffic according to the user's thinking time .

B15、如B13所述的流量作弊识别装置,所述数据获取模块,还用于获取待检测流量中包含的用户操作信息确定相邻用户操作的操作时间;根据所述操作时间确定所述相邻用户操作的操作时间差,将所述操作时间差作为对应的用户思考时长。B15. The device for identifying traffic cheating according to B13, wherein the data acquisition module is further configured to acquire user operation information included in the traffic to be detected to determine the operation time of adjacent user operations; determine the adjacent user operation time according to the operation time The operation time difference of the user operation is used as the corresponding user thinking time period.

B16、如B13所述的流量作弊识别装置,所述分值计算模块,还用于根据所述流量分布数据及所述自然流量分布概率计算相对熵值;在所述相对熵值满足作弊分值计算条件时,执行所述根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值的步骤。B16. The traffic cheating identification device according to B13, the score calculation module is further configured to calculate a relative entropy value according to the traffic distribution data and the natural traffic distribution probability; when the relative entropy value satisfies the cheating score When calculating the condition, the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability is performed.

B17、如B13所述的流量作弊识别装置,所述分值计算模块,还用于根据所述流量分布数据及所述自然流量分布概率计算相对熵值;B17. The traffic cheating identification device according to B13, wherein the score calculation module is further configured to calculate a relative entropy value according to the traffic distribution data and the natural traffic distribution probability;

所述作弊识别模块,还用于根据所述作弊分值及所述相对熵值判断所述待检测流量是否存在流量作弊。The cheating identification module is further configured to judge whether the flow to be detected has flow cheating according to the cheating score and the relative entropy value.

B18、如B13所述的流量作弊识别装置,所述作弊识别模块,还用于在所述作弊分值大于预设作弊阈值时,判定所述待检测流量存在流量作弊;在所述作弊分值小于或等于所述预设作弊阈值时,判定所述待检测流量不存在流量作弊。B18. The device for identifying traffic cheating according to B13, wherein the cheating identification module is further configured to determine that there is traffic cheating in the traffic to be detected when the cheating score is greater than a preset cheating threshold; When it is less than or equal to the preset cheating threshold, it is determined that there is no flow cheating in the to-be-detected flow.

本发明公开了C19、一种流量作弊识别设备,所述流量作弊识别设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的流量作弊识别程序,所述流量作弊识别程序被所述处理器执行时实现如上所述的流量作弊识别方法的步骤。The present invention discloses C19, a traffic cheating identification device. The traffic cheating identification device includes: a memory, a processor, and a traffic cheating identification program stored in the memory and running on the processor. When the cheating identification program is executed by the processor, the steps of the traffic cheating identification method as described above are implemented.

本发明公开了D20、一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有流量作弊识别程序,所述流量作弊识别程序执行时实现如上所述的流量作弊识别方法的步骤。The present invention discloses D20, a computer-readable storage medium, characterized in that a traffic cheating identification program is stored on the computer-readable storage medium, and the traffic cheating identification method as described above is implemented when the flow cheating identification program is executed. A step of.

Claims (10)

1.一种流量作弊识别方法,其特征在于,所述流量作弊识别方法包括以下步骤:1. a traffic cheating identification method, is characterized in that, described traffic cheating identification method comprises the following steps: 获取自然流量分布概率及待检测流量的流量分布数据;Obtain the natural traffic distribution probability and the traffic distribution data of the traffic to be detected; 根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值;Determine the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability; 根据所述作弊分值判断所述待检测流量是否存在流量作弊。Determine whether there is traffic fraud in the traffic to be detected according to the fraud score. 2.如权利要求1所述的流量作弊识别方法,其特征在于,所述获取自然流量分布概率及待检测流量的流量分布数据的步骤之前,还包括:2. The method for identifying traffic cheating as claimed in claim 1, wherein before the step of obtaining the natural traffic distribution probability and the traffic distribution data of the traffic to be detected, the method further comprises: 根据待检测流量中包含的用户操作信息确定用户思考时长;Determine the user's thinking time according to the user operation information contained in the traffic to be detected; 根据所述用户思考时长确定待检测流量的流量分布数据。The traffic distribution data of the traffic to be detected is determined according to the user's thinking time. 3.如权利要求2所述的流量作弊识别方法,其特征在于,所述根据待检测流量中包含的用户操作信息确定用户思考时长的步骤,包括:3. The method for identifying traffic cheating as claimed in claim 2, wherein the step of determining the user's thinking duration according to the user operation information contained in the traffic to be detected comprises: 获取待检测流量中包含的用户操作信息确定相邻用户操作的操作时间;Obtain the user operation information contained in the traffic to be detected to determine the operation time of adjacent user operations; 根据所述操作时间确定所述相邻用户操作的操作时间差,将所述操作时间差作为对应的用户思考时长。The operation time difference between adjacent user operations is determined according to the operation time, and the operation time difference is used as the corresponding user thinking time period. 4.如权利要求2所述的流量作弊识别方法,其特征在于,所述根据所述用户思考时长确定待检测流量的流量分布数据的步骤,包括:4. The method for identifying traffic cheating according to claim 2, wherein the step of determining the traffic distribution data of the traffic to be detected according to the user's thinking time length comprises: 根据预设时间分布区间对所述用户思考时长进行分组,将各个预设时间分布区间对应的用户思考时长的数量作为对应的流量分布数量;Group the user thinking durations according to preset time distribution intervals, and use the number of user thinking durations corresponding to each preset time distribution interval as the corresponding traffic distribution number; 根据所述流量分布数量及用户思考时长的总数确定各个预设时间分布区间的流量分布概率;Determine the traffic distribution probability of each preset time distribution interval according to the traffic distribution quantity and the total number of user thinking time; 根据所述流量分布概率及流量分布数量确定待检测流量的流量分布数据。The flow distribution data of the flow to be detected is determined according to the flow distribution probability and the flow distribution quantity. 5.如权利要求1所述的流量作弊识别方法,其特征在于,所述根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值的步骤包括:5. The method for identifying traffic cheating according to claim 1, wherein the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability comprises: 获取所述流量分布数据中的流量分布概率及流量分布数量;obtaining the traffic distribution probability and the traffic distribution quantity in the traffic distribution data; 根据所述流量分布概率、所述流量分布数量及自然流量分布概率确定所述待检测流量对应的作弊分值。The cheating score corresponding to the traffic to be detected is determined according to the traffic distribution probability, the traffic distribution quantity and the natural traffic distribution probability. 6.如权利要求5所述的流量作弊识别方法,其特征在于,所述根据所述流量分布概率、所述流量分布数量及自然流量分布概率确定所述待检测流量对应的作弊分值的步骤,包括:6. The traffic cheating identification method according to claim 5, wherein the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution probability, the traffic distribution quantity and the natural traffic distribution probability ,include: 根据所述流量分布概率、所述流量分布数量及所述自然流量分布概率通过作弊分值计算公式确定所述待检测流量对应的作弊分值;According to the traffic distribution probability, the traffic distribution quantity and the natural traffic distribution probability, the cheating score corresponding to the traffic to be detected is determined by the cheating score calculation formula; 所述作弊分值计算公式为:The formula for calculating the cheating score is:
Figure FDA0002775874430000021
Figure FDA0002775874430000021
式中,score为作弊分,P(organicBini)为自然流量在第i个时间分布区间的分布概率,P(channelBini)为待检测流量在第i个时间分布区间的分布概率,N为待检测流量在第i个时间分布区间的分布数量,P(organicBinj)为自然流量在第j个时间分布区间的分布概率,P(channelBinj)为待检测流量在第j个时间分布区间的分布概率,M为待检测流量在第j个时间分布区间的分布数量。In the formula, score is the cheating score, P(organicBin i ) is the distribution probability of natural traffic in the ith time distribution interval, P(channelBin i ) is the distribution probability of the traffic to be detected in the ith time distribution interval, and N is the distribution probability of the traffic to be detected in the ith time distribution interval. The distribution quantity of the detected traffic in the ith time distribution interval, P(organicBin j ) is the distribution probability of the natural traffic in the jth time distribution interval, P(channelBin j ) is the distribution of the traffic to be detected in the jth time distribution interval probability, M is the distribution quantity of the traffic to be detected in the jth time distribution interval.
7.如权利要求1所述的流量作弊识别方法,其特征在于,所述根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值的步骤之前,还包括:7. The method for identifying traffic cheating according to claim 1, wherein before the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability, the method further comprises: : 根据所述流量分布数据及所述自然流量分布概率计算相对熵值;Calculate the relative entropy value according to the traffic distribution data and the natural traffic distribution probability; 在所述相对熵值满足作弊分值计算条件时,执行所述根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值的步骤。When the relative entropy value satisfies the cheating score calculation condition, the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability is performed. 8.一种流量作弊识别装置,其特征在于,所述流量作弊识别装置包括:8. A traffic cheating identification device, characterized in that, the traffic cheating identification device comprises: 数据获取模块,用于获取自然流量分布概率及待检测流量的流量分布数据;The data acquisition module is used to acquire the natural traffic distribution probability and the traffic distribution data of the traffic to be detected; 分值计算模块,用于根据所述流量分布数据及所述自然流量分布概率确定所述待检测流量对应的作弊分值;a score calculation module, configured to determine the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability; 作弊识别模块,用于根据所述作弊分值判断所述待检测流量是否存在流量作弊。A cheating identification module, configured to determine whether there is traffic cheating in the traffic to be detected according to the cheating score. 9.一种流量作弊识别设备,其特征在于,所述流量作弊识别设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的流量作弊识别程序,所述流量作弊识别程序被所述处理器执行时实现如权利要求1-7中任一项所述的流量作弊识别方法的步骤。9. A traffic cheating identification device, characterized in that, the traffic cheating identification device comprises: a memory, a processor, and a traffic cheating identification program stored on the memory and running on the processor, the traffic When the cheating identification program is executed by the processor, the steps of the traffic cheating identification method according to any one of claims 1-7 are implemented. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有流量作弊识别程序,所述流量作弊识别程序执行时实现如权利要求1-7中任一项所述的流量作弊识别方法的步骤。10. A computer-readable storage medium, wherein a traffic cheating identification program is stored on the computer-readable storage medium, and the flow cheating identification program is executed as described in any one of claims 1-7. The steps of the traffic cheating identification method.
CN202011275069.7A 2020-11-12 2020-11-12 Traffic cheating identification method, device, equipment and storage medium Pending CN114491407A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011275069.7A CN114491407A (en) 2020-11-12 2020-11-12 Traffic cheating identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011275069.7A CN114491407A (en) 2020-11-12 2020-11-12 Traffic cheating identification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114491407A true CN114491407A (en) 2022-05-13

Family

ID=81490861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011275069.7A Pending CN114491407A (en) 2020-11-12 2020-11-12 Traffic cheating identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114491407A (en)

Similar Documents

Publication Publication Date Title
US10270785B2 (en) Method and apparatus for identifying malicious account
CN112529575B (en) Risk early warning method, equipment, storage medium and device
CN107067319A (en) Loan limit measuring method and device
US20170140472A1 (en) Method and system for assessing auditing likelihood
CN105824805B (en) Identification method and device
CN110992135B (en) Risk identification method and device, electronic equipment and storage medium
CN115375177A (en) User value evaluation method and device, electronic equipment and storage medium
CN113114631B (en) Method, device, equipment and medium for evaluating trust degree of nodes of Internet of things
CN108171537B (en) User experience assessment method and device, electronic equipment and storage medium
CN112347457A (en) Abnormal account detection method and device, computer equipment and storage medium
CN112085240A (en) Method and system for processing transaction data of first-order commodities
CN111563765A (en) Cheating user screening method, device and equipment and readable storage medium
JP7059160B2 (en) Providing equipment, providing method and providing program
CN114757757A (en) Wind control method
CN107871213B (en) Transaction behavior evaluation method, device, server and storage medium
WO2025103135A1 (en) Data adjustment method and apparatus, model training method and apparatus, device, medium, and product
CN114491407A (en) Traffic cheating identification method, device, equipment and storage medium
CN109711984B (en) Pre-loan risk monitoring method and device based on collection urging
CN111833141A (en) Information push processing method, device, equipment and storage medium
CN115119197B (en) Wireless network risk analysis method, device, equipment and medium based on big data
CN114417007B (en) Financial product recommendation method and related equipment
CN115150100A (en) Scene-based verification code verification method and device
CN114285896A (en) Information pushing method, device, equipment, storage medium and program product
CN112529690A (en) Financial wind control management method, device, equipment and medium for bulk commodity supply chain
CN111951024A (en) Reporting information processing method, device and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination