CN114491407A - Flow cheating identification method, device, equipment and storage medium - Google Patents

Flow cheating identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN114491407A
CN114491407A CN202011275069.7A CN202011275069A CN114491407A CN 114491407 A CN114491407 A CN 114491407A CN 202011275069 A CN202011275069 A CN 202011275069A CN 114491407 A CN114491407 A CN 114491407A
Authority
CN
China
Prior art keywords
flow
cheating
detected
distribution
traffic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011275069.7A
Other languages
Chinese (zh)
Inventor
秦莎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN202011275069.7A priority Critical patent/CN114491407A/en
Publication of CN114491407A publication Critical patent/CN114491407A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention belongs to the technical field of internet big data analysis, and discloses a flow cheating identification method, a flow cheating identification device, flow cheating identification equipment and a storage medium. The method comprises the steps of obtaining natural flow distribution probability and flow distribution data of flow to be detected; determining a cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability; and judging whether the flow to be detected has flow cheating according to the cheating values. The flow distribution data of the flow to be detected and the natural flow distribution probability obtained by statistics of the big data are compared and calculated to obtain the corresponding cheating score, and the cheating score can represent the difference degree between the flow distribution of the flow to be detected and the natural flow without flow cheating, so that whether the flow cheating exists can be judged according to the cheating score, and the proper rights and interests of flow buyers can be protected.

Description

Flow cheating identification method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of internet big data analysis, in particular to a flow cheating identification method, a flow cheating identification device, flow cheating equipment and a storage medium.
Background
At present, flow change brings a great deal of benefits to each enterprise, flow counterfeiting becomes rampant more and more rampant and flow counterfeiting forms and technical means become advanced more and more under the drive of the benefit of the flow change. Whatever the form of cheating, the ultimate loss is the traffic purchaser who bought the order for the traffic. The flow buyer uses a large amount of capital budget to buy the flow, the aim is to bring new users, and because of the existence of a large amount of fake flow, the fake flow does not bring new users which can be actually reserved, and can not bring income, thereby greatly damaging the legitimate rights and interests of the flow buyer, and the flow anti-cheating can promote the ecological and forward growth of enterprise business, save a certain capital budget, and protect the legitimate rights and interests of the flow buyer, so the flow anti-cheating is urgent.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a storage medium for identifying flow cheating, and aims to solve the technical problem of how to detect whether the flow cheating exists so as to protect the legitimate rights and interests of a flow buyer.
In order to achieve the above object, the present invention provides a traffic cheating identification method, comprising the steps of:
acquiring natural flow distribution probability and flow distribution data of the flow to be detected;
determining a cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability;
and judging whether the flow to be detected has flow cheating according to the cheating values.
Optionally, before the step of obtaining the natural flow distribution probability and the flow distribution data of the flow to be detected, the method further includes:
determining the thinking time of a user according to user operation information contained in the flow to be detected;
and determining flow distribution data of the flow to be detected according to the user thinking duration.
Optionally, the step of determining the user thinking duration according to the user operation information included in the flow to be detected includes:
acquiring user operation information contained in the flow to be detected to determine the operation time of adjacent user operations;
and determining the operation time difference of the adjacent user operation according to the operation time, and taking the operation time difference as the corresponding user thinking duration.
Optionally, the step of determining flow distribution data of the flow to be detected according to the user thinking duration includes:
grouping the user thinking durations according to preset time distribution intervals, and taking the number of the user thinking durations corresponding to each preset time distribution interval as the corresponding flow distribution number;
determining the flow distribution probability of each preset time distribution interval according to the flow distribution quantity and the total number of the user thinking time;
and determining flow distribution data of the flow to be detected according to the flow distribution probability and the flow distribution quantity.
Optionally, the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability includes:
acquiring the flow distribution probability and the flow distribution quantity in the flow distribution data;
and determining a cheating score corresponding to the flow to be detected according to the flow distribution probability, the flow distribution quantity and the natural flow distribution probability.
Optionally, the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution probability, the traffic distribution quantity, and the natural traffic distribution probability includes:
determining a cheating score corresponding to the flow to be detected through a cheating score calculation formula according to the flow distribution probability, the flow distribution quantity and the natural flow distribution probability;
the cheating score calculation formula is as follows:
Figure BDA0002775874440000031
in the formula, score is cheating score, P (organic Bin)i) The distribution probability of the natural flow in the ith time distribution interval, P (channelBin)i) Is the distribution probability of the flow to be detected in the ith time distribution interval, N is the distribution number of the flow to be detected in the ith time distribution interval, P (organic Bin)j) Is the distribution probability of the natural flow in the jth time distribution interval, P (channelBin)j) For the flow to be detected at jAnd M is the distribution number of the flow to be detected in the jth time distribution interval.
Optionally, before the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability, the method further includes:
calculating a relative entropy value according to the flow distribution data and the natural flow distribution probability;
and when the relative entropy meets the computation condition of the cheating score, executing the step of determining the cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability.
Optionally, the step of calculating a relative entropy value according to the flow distribution data and the natural flow distribution probability includes:
acquiring the flow distribution probability in the flow distribution data;
and calculating a relative entropy value according to the flow distribution probability and the natural flow distribution probability.
Optionally, the step of calculating a relative entropy value according to the flow distribution probability and the natural flow distribution probability includes:
calculating a relative entropy value through a relative entropy calculation formula according to the flow distribution probability and the natural flow distribution probability;
the relative entropy calculation formula is as follows:
Figure BDA0002775874440000032
in the formula, DKL(p | | q) is the relative entropy value, p (x)i) The distribution probability of the natural flow in the ith time distribution interval is q, the distribution probability of the flow to be detected in the ith time distribution interval is q, and N is the total number of the time distribution intervals.
Optionally, before the step of determining whether the flow to be detected has the flow cheating according to the cheating score, the method further includes:
calculating a relative entropy value according to the flow distribution data and the natural flow distribution probability;
the step of judging whether the flow to be detected has flow cheating according to the cheating values comprises the following steps:
and judging whether the flow to be detected has flow cheating according to the cheating score and the relative entropy.
Optionally, the step of determining whether the flow to be detected has a cheating flow according to the cheating score and the relative entropy includes:
when the cheating value is larger than a preset cheating threshold value and the relative entropy value is larger than a preset relative entropy threshold value, judging that the flow to be detected has cheating;
and when the cheating score is not greater than the preset cheating threshold or the relative entropy value is not greater than the preset relative entropy threshold, judging that no flow cheating exists in the flow to be detected.
Optionally, the step of judging whether the flow to be detected has cheating according to the cheating score includes:
when the cheating score is larger than a preset cheating threshold value, judging that the flow to be detected has cheating;
and when the cheating score is smaller than or equal to the preset cheating threshold value, judging that the flow to be detected has no cheating.
In order to achieve the above object, the present invention also provides a traffic cheating recognition apparatus, including:
the data acquisition module is used for acquiring the natural flow distribution probability and the flow distribution data of the flow to be detected;
the score calculation module is used for determining a cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability;
and the cheating identification module is used for judging whether the flow to be detected has cheating according to the cheating value.
Optionally, the data obtaining module is further configured to determine a user thinking duration according to user operation information included in the flow to be detected; and determining flow distribution data of the flow to be detected according to the user thinking duration.
Optionally, the data obtaining module is further configured to obtain user operation information included in the traffic to be detected to determine operation time of an adjacent user operation; and determining the operation time difference of the adjacent user operation according to the operation time, and taking the operation time difference as the corresponding user thinking duration.
Optionally, the score calculating module is further configured to calculate a relative entropy value according to the flow distribution data and the natural flow distribution probability; and when the relative entropy satisfies a cheating score calculation condition, executing the step of determining the cheating score corresponding to the to-be-detected flow according to the flow distribution data and the natural flow distribution probability.
Optionally, the score calculating module is further configured to calculate a relative entropy value according to the flow distribution data and the natural flow distribution probability;
and the cheating identification module is also used for judging whether the flow to be detected has cheating according to the cheating score and the relative entropy.
Optionally, the cheating identification module is further configured to determine that the traffic to be detected has cheating traffic when the cheating score is greater than a preset cheating threshold; and when the cheating score is smaller than or equal to the preset cheating threshold value, judging that the flow to be detected has no cheating.
In addition, to achieve the above object, the present invention further provides a traffic cheating recognition apparatus, including: the traffic cheating identification method comprises a memory, a processor and a traffic cheating identification program which is stored on the memory and can run on the processor, wherein when the traffic cheating identification program is executed by the processor, the steps of the traffic cheating identification method are realized.
In addition, in order to achieve the above object, the present invention further provides a computer-readable storage medium, in which a traffic cheating identifying program is stored, and the traffic cheating identifying program, when executed, implements the steps of the traffic cheating identifying method according to any one of the above items.
The method comprises the steps of obtaining natural flow distribution probability and flow distribution data of flow to be detected; determining a cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability; and judging whether the flow to be detected has flow cheating according to the cheating values. The flow distribution data of the flow to be detected and the natural flow distribution probability obtained by statistics of the big data are compared and calculated to obtain the corresponding cheating score, and the cheating score can represent the difference degree between the flow distribution of the flow to be detected and the natural flow without flow cheating, so that whether the flow cheating exists can be judged according to the cheating score, and the proper rights and interests of flow buyers can be protected.
Drawings
Fig. 1 is a schematic structural diagram of an electronic device in a hardware operating environment according to an embodiment of the present invention;
fig. 2 is a schematic flowchart illustrating a first embodiment of a method for identifying cheating on traffic according to the present invention;
FIG. 3 is a flowchart illustrating a traffic cheating identification method according to a second embodiment of the present invention;
FIG. 4 is a flowchart illustrating a flow cheating identification method according to a third embodiment of the present invention;
fig. 5 is a block diagram illustrating a first embodiment of a traffic cheating-identifying device according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a traffic cheating identification device in a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the electronic device may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the electronic device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and a traffic cheating recognition program.
In the electronic apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the electronic device according to the present invention may be provided in a traffic cheating recognition device, and the electronic device calls the traffic cheating recognition program stored in the memory 1005 through the processor 1001 and executes the traffic cheating recognition method according to the embodiment of the present invention.
An embodiment of the present invention provides a traffic cheating identification method, and referring to fig. 2, fig. 2 is a schematic flow diagram of a first embodiment of a traffic cheating identification method according to the present invention.
In this embodiment, the traffic cheating identification method includes the following steps:
step S10: and acquiring the natural flow distribution probability and flow distribution data of the flow to be detected.
It should be noted that the execution main body of this embodiment may be the traffic cheating identification device, and the traffic cheating identification device may be an electronic device such as a personal computer, a server, and the like, or may also be another device that can implement the same or similar functions.
It should be noted that the natural traffic distribution probability may be a probability that the user thought duration is distributed in each preset time distribution interval, which is obtained by performing big data statistics according to the normal traffic without cheating. The user thinking duration is an operation time difference of adjacent user operations, and the distribution probability is a ratio of the number of the user thinking duration in a preset time distribution interval to the total amount of the user thinking duration, for example: the two adjacent user operation times are respectively as follows: 9:00:01, 9:00:03, the operation time difference is 2 seconds, and the corresponding user thinking time length is 2 seconds. The preset time distribution interval may be divided according to actual requirements, for example: the time distribution interval is divided by taking 3 seconds as a time period, and the corresponding time distribution interval can be 0-3 seconds, 3-6 seconds, 6-9 seconds and the like, and can be written as [0,3 ], [3,6 ], and [6,9 ].
It should be noted that the traffic to be detected may be the traffic that needs to detect whether there is a traffic cheat. The flow distribution data of the flow to be detected may include a flow distribution probability and a flow distribution number, the flow distribution number is the number of the user thought time duration distribution corresponding to the flow to be detected in the preset time distribution interval, and the flow distribution probability is the probability of the user thought time duration distribution corresponding to the flow to be detected in the preset time distribution interval.
Step S20: and determining a cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability.
It should be noted that the cheating score may be a quantitative score used for indicating that there is a possibility of cheating on the traffic to be detected.
Further, in order to calculate the cheating score, in step S20 of this embodiment, it may be:
acquiring the flow distribution probability and the flow distribution quantity in the flow distribution data; and determining a cheating score corresponding to the flow to be detected according to the flow distribution probability, the flow distribution quantity and the natural flow distribution probability.
In actual use, the cheating score corresponding to the flow to be detected can be determined through a cheating score calculation formula according to the flow distribution probability, the flow distribution quantity and the natural flow distribution probability.
The cheating score calculation formula is as follows:
Figure BDA0002775874440000081
in the formula, score is cheating score, P (organic Bin)i) The distribution probability of the natural flow in the ith time distribution interval, P (channelBin)i) Is the distribution probability of the flow to be detected in the ith time distribution interval, N is the distribution number of the flow to be detected in the ith time distribution interval, P (organic Bin)j) The distribution probability of the natural flow in the jth time distribution interval, P (channelBin)j) The distribution probability of the flow to be detected in the jth time distribution interval is shown, and M is the distribution quantity of the flow to be detected in the jth time distribution interval.
For example: the natural flow distribution probability is assumed to be: the distribution probability of the first preset time distribution interval is 1/10, the distribution probability of the second preset time distribution interval is 4/10, the distribution probability of the third preset time distribution interval is 2/10, and the distribution probability of the fourth preset time distribution interval is 3/10. The distribution probability of the flow to be detected in the first preset time distribution interval is 1/3, the distribution number is 1, the distribution probability of the flow to be detected in the fourth preset time distribution interval is 2/3, and the distribution number is 2, so that the cheating score is [ (1/3-1/10) × 1+ (2/3-3/10) × 2]/(1+2) ═ 0.322.
Step S30: and judging whether the flow to be detected has flow cheating according to the cheating values.
It should be noted that, under the condition of the unchanged experiment, the experiment is repeated for a plurality of times, and the frequency of the random event is similar to the probability. When the actual purchase flow reaches a certain magnitude, the user thinking time distribution is consistent with the distribution in the natural flow, so that the higher the cheating score is, the larger the difference between the distribution of the flow to be detected and the natural flow is, and whether the flow cheating exists in the flow to be detected can be judged according to the cheating score.
In actual use, a cheating threshold value can be preset, and when the cheating score is larger than the preset cheating threshold value, the fact that the flow cheating exists in the flow to be detected is judged; and when the cheating score is smaller than or equal to a preset cheating threshold value, judging that the flow to be detected has no cheating.
For example: the preset cheating threshold value is 0.4, when the calculated cheating score is larger than 0.4, the fact that the flow cheating exists in the flow to be detected is judged, and when the calculated cheating score is smaller than or equal to 0.4, the fact that the flow cheating does not exist in the flow to be detected is judged.
In the embodiment, the natural flow distribution probability and the flow distribution data of the flow to be detected are obtained; determining a cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability; and judging whether the flow to be detected has flow cheating according to the cheating values. The flow distribution data of the flow to be detected and the natural flow distribution probability obtained by statistics of the big data are compared and calculated to obtain the corresponding cheating score, and the cheating score can represent the difference degree between the flow distribution of the flow to be detected and the natural flow without flow cheating, so that whether the flow cheating exists can be judged according to the cheating score, and the proper rights and interests of flow buyers can be protected.
Referring to fig. 3, fig. 3 is a flowchart illustrating a traffic cheating identification method according to a second embodiment of the present invention.
Based on the first embodiment, before the step S10, the method for identifying cheating on the flow rate in this embodiment further includes:
step S01: and determining the thinking time length of the user according to the user operation information contained in the flow to be detected.
It should be noted that the user operation information may include information such as a user operation type and a user operation time.
Further, in order to determine the duration of the user' S thinking, step S01 in this embodiment may be:
acquiring user operation information contained in the flow to be detected to determine the operation time of adjacent user operations; and determining the operation time difference of the adjacent user operation according to the operation time, and taking the operation time difference as the corresponding user thinking duration.
For example: according to the user operation information, determining that the user A has three operations, wherein the operation time is respectively as follows: 9:00:00, 9:00:02 and 9:00:03, the corresponding user thinking time lengths are two, namely user thinking time length A, 2 seconds, user thinking time length B and 1 second.
Step S02: and determining flow distribution data of the flow to be detected according to the user thinking duration.
It can be understood that the flow distribution data may include the flow distribution quantity and the flow distribution probability, the corresponding flow distribution quantity and the flow distribution probability may be obtained by grouping the time distribution durations according to the preset time distribution interval and then calculating, and the flow distribution data of the flow to be detected may be determined by combining the flow distribution quantity and the flow distribution probability.
In actual use, the user thought durations can be grouped according to the preset time distribution intervals, and the number of the user thought durations corresponding to each preset time distribution interval is used as the corresponding flow distribution number; determining the flow distribution probability of each preset time distribution interval according to the flow distribution quantity and the total number of the user thinking time; and determining flow distribution data of the flow to be detected according to the flow distribution probability and the flow distribution quantity.
For example: use 3 seconds as one section division time distribution interval, it has 4 to predetermine time distribution interval: a first time distribution interval [0,3), a second time distribution interval [3,6), a third time distribution interval [6,9), a fourth time distribution interval [9, 12); the total number of user operations is 6, and the operation time is respectively as follows: 9:00:00, 9:00:02, 9:00:03, 9:00:07, 9:00:17, and 9:00:00, the user thinking durations are 5 and A, B, C, D, E respectively, the user thinking durations are 2 seconds, 1 second, 4 seconds, 7 seconds, and 10 seconds respectively, the user thinking duration corresponding to the first time distribution interval is A, B, the number of distributions is 2, the user thinking duration corresponding to the second time distribution interval is C, the number of distributions is 1, the user thinking duration corresponding to the third time distribution interval is D, the number of distributions is 1, the user thinking duration corresponding to the fourth time distribution interval is E, the number of distributions is 1, and the distribution probability corresponding to each time distribution interval is 2/5, 1/5, 1/5, 1/5, and 1/5.
Further, in order to determine whether to calculate the cheating score, before step S20, the method may further include:
calculating a relative entropy value according to the flow distribution data and the natural flow distribution probability; and when the relative entropy meets the computation condition of the cheating score, executing the step of determining the cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability.
It should be noted that, not all flows need to be subjected to computation of cheating scores, a relative entropy corresponding to the flow to be detected may be first computed, and the possibility of cheating on the flow in the flow to be detected is evaluated through the relative entropy.
In actual use, a relative entropy threshold value can be preset, and when the calculated relative entropy value is larger than the preset relative entropy threshold value, the relative entropy value is judged to meet the cheating score calculation condition.
For example: the preset relative entropy threshold value is 0.5, the calculated relative entropy value is 0.6, the relative entropy value is judged to meet the cheating score calculation condition at the moment, and the cheating score can be calculated to judge whether the flow to be detected has cheating.
Further, in order to facilitate calculating the relative entropy, the step of calculating the relative entropy according to the flow distribution data and the natural flow distribution probability in this embodiment may be:
acquiring the flow distribution probability in the flow distribution data; and calculating a relative entropy value according to the flow distribution probability and the natural flow distribution probability.
In actual use, a relative entropy value can be calculated through a relative entropy calculation formula according to the flow distribution probability and the natural flow distribution probability;
the relative entropy calculation formula is as follows:
Figure BDA0002775874440000111
in the formula, DKL(p | | q) is the relative entropy value, p (x)i) The distribution probability of the natural flow in the ith time distribution interval is q, the distribution probability of the flow to be detected in the ith time distribution interval is q, and N is the total number of the time distribution intervals.
In the embodiment, the user thinking duration is determined according to the user operation information contained in the flow to be detected, and the flow distribution data of the flow to be detected is determined according to the user thinking duration. Various data required for calculating the cheating scores can be constructed in advance, the cheating scores can be calculated conveniently, and the efficiency of calculating the cheating scores is improved.
Referring to fig. 4, fig. 4 is a flowchart illustrating a traffic cheating identification method according to a third embodiment of the present invention.
Based on the first embodiment, before the step S30, the method for identifying a traffic cheating in this embodiment further includes:
step S201: calculating a relative entropy value according to the flow distribution data and the natural flow distribution probability;
further, in order to facilitate calculating the relative entropy, step S201 of this embodiment may be:
acquiring the flow distribution probability in the flow distribution data; and calculating a relative entropy value according to the flow distribution probability and the natural flow distribution probability.
In actual use, a relative entropy value can be calculated through a relative entropy calculation formula according to the flow distribution probability and the natural flow distribution probability;
the relative entropy calculation formula is as follows:
Figure BDA0002775874440000121
in the formula, DKL(p | | q) is the relative entropy value, p (x)i) The distribution probability of the natural flow in the ith time distribution interval is q, the distribution probability of the flow to be detected in the ith time distribution interval is q, and N is the total number of the time distribution intervals.
Accordingly, in this embodiment, the step S30 includes:
step S30': and judging whether the flow to be detected has flow cheating according to the cheating score and the relative entropy.
It can be understood that the condition that misjudgment may exist in the flow to be detected is judged only through the cheating score, so that whether the flow to be detected is cheated or not can be judged through calculating the relative entropy and simultaneously using the relative entropy and the cheating score, and the judgment is more accurate.
In actual use, a cheating threshold and a relative entropy threshold can be preset, and when the cheating value is greater than the preset cheating threshold and the relative entropy value is greater than the preset relative entropy threshold, the fact that the flow to be detected is cheated is judged; and when the cheating score is not greater than a preset cheating threshold or the relative entropy value is not greater than a preset relative entropy threshold, judging that the flow to be detected has no cheating.
For example: the preset cheating threshold value is 0.4, the preset relative entropy threshold value is 0.5, when the cheating score of the flow to be detected is 0.6, and the relative entropy value is 0.7, the flow to be detected is judged to have cheating, when the cheating score of the flow to be detected is 0.3, and the relative entropy value is 0.6, the flow to be detected is judged to have no cheating, when the cheating score of the flow to be detected is 0.5, and the relative entropy value is 0.4, the flow to be detected is judged to have no cheating, when the cheating score of the flow to be detected is 0.3, and the relative entropy value is 0.3, the flow to be detected is judged to have no cheating.
In this embodiment, a relative entropy is calculated according to the traffic distribution data and the natural traffic distribution probability, and then whether traffic cheating exists in the traffic to be detected is determined according to the cheating score and the relative entropy. Meanwhile, whether the flow to be detected has flow cheating is judged by using the cheating values and the relative entropy, so that the flow cheating judgment is more accurate and is more difficult to be cracked by a flow cheating party, and the accuracy and the reliability of the flow cheating identification method are improved.
In addition, an embodiment of the present invention further provides a storage medium, where a traffic cheating recognition program is stored on the storage medium, and the traffic cheating recognition program, when executed by a processor, implements the steps of the traffic cheating recognition method described above.
Referring to fig. 5, fig. 5 is a block diagram illustrating a first embodiment of a traffic cheating-identifying device according to the present invention.
As shown in fig. 5, the traffic cheating recognition apparatus according to the embodiment of the present invention includes:
a data obtaining module 501, configured to obtain a natural flow distribution probability and flow distribution data of a flow to be detected;
a score calculating module 502, configured to determine a cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability;
and the cheating identification module 503 is configured to determine whether the flow to be detected has cheating according to the cheating score.
In the embodiment, the natural flow distribution probability and the flow distribution data of the flow to be detected are obtained; determining a cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability; and judging whether the flow to be detected has flow cheating according to the cheating values. The flow distribution data of the flow to be detected and the natural flow distribution probability obtained by statistics of the big data are compared and calculated to obtain the corresponding cheating score, and the cheating score can represent the difference degree between the flow distribution of the flow to be detected and the natural flow without flow cheating, so that whether the flow cheating exists can be judged according to the cheating score, and the proper rights and interests of flow buyers can be protected.
Further, the data obtaining module 501 is further configured to determine a user thinking duration according to user operation information included in the flow to be detected; and determining flow distribution data of the flow to be detected according to the user thinking duration.
Further, the data obtaining module 501 is further configured to obtain user operation information included in the traffic to be detected to determine operation time of an adjacent user operation; and determining the operation time difference of the adjacent user operation according to the operation time, and taking the operation time difference as the corresponding user thinking duration.
Further, the data obtaining module 501 is further configured to group the user thinking durations according to preset time distribution intervals, and use the number of the user thinking durations corresponding to each preset time distribution interval as the corresponding flow distribution number; determining the flow distribution probability of each preset time distribution interval according to the flow distribution quantity and the total number of the user thinking time; and determining flow distribution data of the flow to be detected according to the flow distribution probability and the flow distribution quantity.
Further, the score calculating module 502 is further configured to obtain a traffic distribution probability and a traffic distribution quantity in the traffic distribution data; and determining a cheating score corresponding to the flow to be detected according to the flow distribution probability, the flow distribution quantity and the natural flow distribution probability.
Further, the score calculating module 502 is further configured to determine a cheating score corresponding to the flow to be detected through a cheating score calculating formula according to the flow distribution probability, the flow distribution quantity, and the natural flow distribution probability;
the cheating score calculation formula is as follows:
Figure BDA0002775874440000141
in the formula, score is cheating score, P (organic Bin)i) The distribution probability of the natural flow in the ith time distribution interval, P (channelBin)i) Is the distribution probability of the flow to be detected in the ith time distribution interval, N is the distribution number of the flow to be detected in the ith time distribution interval, P (organic Bin)j) The distribution probability of the natural flow in the jth time distribution interval, P (channelBin)j) The distribution probability of the flow to be detected in the jth time distribution interval is shown, and M is the distribution quantity of the flow to be detected in the jth time distribution interval.
Further, the score calculating module 502 is further configured to calculate a relative entropy value according to the flow distribution data and the natural flow distribution probability; and when the relative entropy meets the computation condition of the cheating score, executing the step of determining the cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability.
Further, the score calculating module 502 is further configured to obtain a traffic distribution probability in the traffic distribution data; and calculating a relative entropy value according to the flow distribution probability and the natural flow distribution probability.
Further, the score calculating module 502 is further configured to calculate a relative entropy value according to the flow distribution probability and the natural flow distribution probability through a relative entropy calculation formula;
the relative entropy calculation formula is as follows:
Figure BDA0002775874440000142
in the formula, DKL(p | | q) is the relative entropy value, p (x)i) The distribution probability of the natural flow in the ith time distribution interval is q, the distribution probability of the flow to be detected in the ith time distribution interval is q, and N is the total number of the time distribution intervals.
Further, the score calculating module 502 is further configured to calculate a relative entropy value according to the flow distribution data and the natural flow distribution probability;
the cheating identifying module 503 is further configured to determine whether the flow to be detected has cheating according to the cheating score and the relative entropy.
Further, the cheating identifying module 503 is further configured to determine that the flow to be detected has cheating when the cheating score is greater than a preset cheating threshold and the relative entropy is greater than a preset relative entropy threshold; and when the cheating score is not greater than the preset cheating threshold or the relative entropy value is not greater than the preset relative entropy threshold, judging that no flow cheating exists in the flow to be detected.
Further, the cheating identifying module 503 is further configured to determine that the flow to be detected has a cheating flow when the cheating score is greater than a preset cheating threshold; and when the cheating score is smaller than or equal to the preset cheating threshold value, judging that the flow to be detected has no cheating.
It should be understood that the above is only an example, and the technical solution of the present invention is not limited in any way, and in a specific application, a person skilled in the art may set the technical solution as needed, and the present invention is not limited thereto.
It should be noted that the above-described work flows are only exemplary, and do not limit the scope of the present invention, and in practical applications, a person skilled in the art may select some or all of them to achieve the purpose of the solution of the embodiment according to actual needs, and the present invention is not limited herein.
In addition, the technical details that are not described in detail in this embodiment may refer to the traffic cheating identification method provided in any embodiment of the present invention, and are not described herein again.
Further, it is to be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solution of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g. Read Only Memory (ROM)/RAM, magnetic disk, optical disk), and includes several instructions for enabling a terminal device (e.g. a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes performed by the present invention or directly or indirectly applied to other related technical fields are also included in the scope of the present invention.
The invention discloses A1 and a traffic cheating identification method, which comprises the following steps:
acquiring natural flow distribution probability and flow distribution data of the flow to be detected;
determining a cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability;
and judging whether the flow to be detected has flow cheating according to the cheating values.
A2, the method for identifying cheating on flow rate as in a1, wherein before the step of obtaining the probability of natural flow rate distribution and the flow rate distribution data of the flow rate to be detected, the method further comprises:
determining the thinking time of a user according to user operation information contained in the flow to be detected;
and determining flow distribution data of the flow to be detected according to the user thinking duration.
A3, the method for identifying cheating on flow rate as in a2, wherein the step of determining the thinking duration of the user according to the user operation information contained in the flow rate to be detected comprises:
acquiring user operation information contained in the flow to be detected to determine the operation time of adjacent user operations;
and determining the operation time difference of the adjacent user operation according to the operation time, and taking the operation time difference as the corresponding user thinking duration.
A4, the method for identifying cheating on flow rate as in a2, wherein the step of determining the flow rate distribution data of the flow rate to be detected according to the user thinking duration comprises the following steps:
grouping the user thinking durations according to preset time distribution intervals, and taking the number of the user thinking durations corresponding to each preset time distribution interval as the corresponding flow distribution number;
determining the flow distribution probability of each preset time distribution interval according to the flow distribution quantity and the total number of the user thinking time;
and determining flow distribution data of the flow to be detected according to the flow distribution probability and the flow distribution quantity.
A5, the traffic cheating identification method according to a1, wherein the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability comprises:
acquiring the flow distribution probability and the flow distribution quantity in the flow distribution data;
and determining a cheating score corresponding to the flow to be detected according to the flow distribution probability, the flow distribution quantity and the natural flow distribution probability.
A6, the method for identifying cheating on traffic according to a5, wherein the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution probability, the traffic distribution quantity and the natural traffic distribution probability comprises:
determining a cheating score corresponding to the flow to be detected through a cheating score calculation formula according to the flow distribution probability, the flow distribution quantity and the natural flow distribution probability;
the cheating score calculation formula is as follows:
Figure BDA0002775874440000171
in the formula, score is cheating score, P (organic Bin)i) The distribution probability of the natural flow in the ith time distribution interval, P (channelBin)i) Is the distribution probability of the flow to be detected in the ith time distribution interval, N is the distribution number of the flow to be detected in the ith time distribution interval, P (organic Bin)j) The distribution probability of the natural flow in the jth time distribution interval, P (channelBin)j) The distribution probability of the flow to be detected in the jth time distribution interval is shown, and M is the distribution quantity of the flow to be detected in the jth time distribution interval.
A7, the method for identifying cheating on traffic according to a1, wherein before the step of determining the cheating score corresponding to the traffic to be detected according to the traffic distribution data and the natural traffic distribution probability, the method further comprises:
calculating a relative entropy value according to the flow distribution data and the natural flow distribution probability;
and when the relative entropy meets the computation condition of the cheating score, executing the step of determining the cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability.
A8, the method for identifying cheating on traffic as in a7, wherein the step of calculating a relative entropy value according to the traffic distribution data and the natural traffic distribution probability comprises:
acquiring the flow distribution probability in the flow distribution data;
and calculating a relative entropy value according to the flow distribution probability and the natural flow distribution probability.
A9, the method for identifying cheating on traffic as in A8, wherein the step of calculating a relative entropy value according to the probability of traffic distribution and the probability of natural traffic distribution comprises:
calculating a relative entropy value through a relative entropy calculation formula according to the flow distribution probability and the natural flow distribution probability;
the relative entropy calculation formula is as follows:
Figure BDA0002775874440000181
in the formula, DKL(p | | q) is the relative entropy value, p (x)i) The distribution probability of the natural flow in the ith time distribution interval is obtained, q is the distribution probability of the flow to be detected in the ith time distribution interval, and N is the total number of the time distribution intervals.
A10, the method for identifying cheating on flow rate as in A1, wherein before the step of judging whether the flow rate to be detected has cheating on flow rate according to the cheating score, the method further comprises:
calculating a relative entropy value according to the flow distribution data and the natural flow distribution probability;
the step of judging whether the flow to be detected has flow cheating according to the cheating values comprises the following steps:
and judging whether the flow to be detected has flow cheating according to the cheating score and the relative entropy.
A11, the traffic cheating identification method according to A10, wherein the step of judging whether the traffic to be detected has traffic cheating according to the cheating score and the relative entropy includes:
when the cheating value is larger than a preset cheating threshold value and the relative entropy value is larger than a preset relative entropy threshold value, judging that the flow to be detected has cheating;
and when the cheating score is not greater than the preset cheating threshold or the relative entropy value is not greater than the preset relative entropy threshold, judging that no flow cheating exists in the flow to be detected.
The method for identifying the cheating on the flow rate according to the A12 or any one of the A1-A9 comprises the following steps of:
when the cheating score is larger than a preset cheating threshold value, judging that the flow to be detected has cheating;
and when the cheating score is smaller than or equal to the preset cheating threshold value, judging that the flow to be detected has no cheating.
The invention discloses B13, a flow cheating recognition device, comprising:
the data acquisition module is used for acquiring the natural flow distribution probability and the flow distribution data of the flow to be detected;
the score calculation module is used for determining a cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability;
and the cheating identification module is used for judging whether the flow to be detected has cheating according to the cheating value.
B14, the flow cheating recognition device according to B13, the data acquisition module is further configured to determine a user thinking duration according to user operation information included in the flow to be detected; and determining flow distribution data of the flow to be detected according to the user thinking duration.
B15, the traffic cheating recognition apparatus according to B13, the data obtaining module is further configured to obtain user operation information included in the traffic to be detected to determine operation time of an adjacent user operation; and determining the operation time difference of the adjacent user operation according to the operation time, and taking the operation time difference as the corresponding user thinking duration.
B16, the flow cheating identifying apparatus according to B13, wherein the score calculating module is further configured to calculate a relative entropy value according to the flow distribution data and the natural flow distribution probability; and when the relative entropy meets the computation condition of the cheating score, executing the step of determining the cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability.
B17, the flow cheating identifying apparatus according to B13, wherein the score calculating module is further configured to calculate a relative entropy value according to the flow distribution data and the natural flow distribution probability;
and the cheating identification module is also used for judging whether the flow to be detected has cheating according to the cheating score and the relative entropy.
B18, the traffic cheating recognition apparatus according to B13, wherein the cheating recognition module is further configured to determine that the traffic to be detected has traffic cheating when the cheating score is greater than a preset cheating threshold; and when the cheating score is smaller than or equal to the preset cheating threshold value, judging that the flow to be detected has no cheating.
The invention discloses C19, a traffic cheating recognition device, comprising: the traffic cheating identification method comprises the steps of a memory, a processor and a traffic cheating identification program which is stored on the memory and can run on the processor, wherein the steps of the traffic cheating identification method are realized when the traffic cheating identification program is executed by the processor.
The invention discloses D20 and a computer-readable storage medium, which is characterized in that a flow cheating identification program is stored on the computer-readable storage medium, and the flow cheating identification program realizes the steps of the flow cheating identification method when executed.

Claims (10)

1. A traffic cheating identification method is characterized by comprising the following steps:
acquiring natural flow distribution probability and flow distribution data of the flow to be detected;
determining a cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability;
and judging whether the flow to be detected has flow cheating according to the cheating values.
2. The method for identifying cheating on flow rate according to claim 1, wherein the step of obtaining the probability of natural flow rate distribution and the flow rate distribution data of the flow rate to be detected is preceded by the step of:
determining the thinking time of a user according to user operation information contained in the flow to be detected;
and determining flow distribution data of the flow to be detected according to the user thinking duration.
3. The traffic cheating identification method according to claim 2, wherein the step of determining a user's thinking duration according to user operation information contained in the traffic to be detected comprises:
acquiring user operation information contained in the flow to be detected to determine the operation time of adjacent user operations;
and determining the operation time difference of the adjacent user operation according to the operation time, and taking the operation time difference as the corresponding user thinking duration.
4. The method for identifying cheating on flow rate according to claim 2, wherein said step of determining flow rate distribution data of the flow rate to be detected according to said user's thinking duration comprises:
grouping the user thinking durations according to preset time distribution intervals, and taking the number of the user thinking durations corresponding to each preset time distribution interval as the corresponding flow distribution number;
determining the flow distribution probability of each preset time distribution interval according to the flow distribution quantity and the total number of the user thinking time;
and determining flow distribution data of the flow to be detected according to the flow distribution probability and the flow distribution quantity.
5. The method for identifying cheating on flow according to claim 1, wherein the step of determining the cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability comprises:
acquiring the flow distribution probability and the flow distribution quantity in the flow distribution data;
and determining cheating scores corresponding to the to-be-detected flow according to the flow distribution probability, the flow distribution quantity and the natural flow distribution probability.
6. The method for identifying cheating on traffic flow according to claim 5, wherein the step of determining the cheating score corresponding to the traffic flow to be detected according to the traffic flow distribution probability, the traffic flow distribution quantity and the natural traffic flow distribution probability comprises the following steps:
determining a cheating score corresponding to the flow to be detected through a cheating score calculation formula according to the flow distribution probability, the flow distribution quantity and the natural flow distribution probability;
the cheating score calculation formula is as follows:
Figure FDA0002775874430000021
in the formula, score is cheating score, P (organic Bin)i) The distribution probability of the natural flow in the ith time distribution interval, P (channelBin)i) Is the distribution probability of the flow to be detected in the ith time distribution interval, N is the distribution number of the flow to be detected in the ith time distribution interval, P (organic Bin)j) Is the distribution probability of the natural flow in the jth time distribution interval, P (channelBin)j) The distribution probability of the flow to be detected in the jth time distribution interval is shown, and M is the distribution quantity of the flow to be detected in the jth time distribution interval.
7. The method for identifying cheating on traffic flow according to claim 1, wherein before the step of determining the cheating score corresponding to the traffic flow to be detected according to the traffic flow distribution data and the natural traffic flow distribution probability, the method further comprises:
calculating a relative entropy value according to the flow distribution data and the natural flow distribution probability;
and when the relative entropy satisfies a cheating score calculation condition, executing the step of determining the cheating score corresponding to the to-be-detected flow according to the flow distribution data and the natural flow distribution probability.
8. A traffic cheating recognition device, comprising:
the data acquisition module is used for acquiring the natural flow distribution probability and the flow distribution data of the flow to be detected;
the score calculation module is used for determining a cheating score corresponding to the flow to be detected according to the flow distribution data and the natural flow distribution probability;
and the cheating identification module is used for judging whether the flow to be detected has cheating according to the cheating value.
9. A traffic cheating recognition device, comprising: memory, processor and a traffic cheating recognition program stored on said memory and executable on said processor, said traffic cheating recognition program when executed by said processor implementing the steps of the traffic cheating recognition method according to any one of claims 1-7.
10. A computer-readable storage medium, characterized in that a traffic cheating identification program is stored on said computer-readable storage medium, which when executed implements the steps of the traffic cheating identification method according to any one of claims 1-7.
CN202011275069.7A 2020-11-12 2020-11-12 Flow cheating identification method, device, equipment and storage medium Pending CN114491407A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011275069.7A CN114491407A (en) 2020-11-12 2020-11-12 Flow cheating identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011275069.7A CN114491407A (en) 2020-11-12 2020-11-12 Flow cheating identification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114491407A true CN114491407A (en) 2022-05-13

Family

ID=81490861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011275069.7A Pending CN114491407A (en) 2020-11-12 2020-11-12 Flow cheating identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114491407A (en)

Similar Documents

Publication Publication Date Title
CN107437223A (en) Credit information checking method, device and equipment
CN112507936B (en) Image information auditing method and device, electronic equipment and readable storage medium
CN110992167A (en) Bank client business intention identification method and device
CN107958382A (en) Abnormal behaviour recognition methods, device, electronic equipment and storage medium
CN106127505A (en) The single recognition methods of a kind of brush and device
CN108399565A (en) Financial product recommendation apparatus, method and computer readable storage medium
CN110659807B (en) Risk user identification method and device based on link
CN108876545A (en) Order recognition methods, device and readable storage medium storing program for executing
WO2019179030A1 (en) Product purchasing prediction method, server and storage medium
CN107220867A (en) object control method and device
CN113205403A (en) Method and device for calculating enterprise credit level, storage medium and terminal
CN111967948A (en) Bank product recommendation method and device, server and storage medium
CN112529575A (en) Risk early warning method, equipment, storage medium and device
CN113673870A (en) Enterprise data analysis method and related components
CN109670934A (en) Personal identification method, equipment, storage medium and device based on user behavior
CN109462582B (en) Text recognition method, text recognition device, server and storage medium
CN115203496A (en) Project intelligent prediction and evaluation method and system based on big data and readable storage medium
CN107622397A (en) Trade mark monitoring method and system
CN109450963B (en) Message pushing method and terminal equipment
CN112347457A (en) Abnormal account detection method and device, computer equipment and storage medium
CN113469519A (en) Attribution analysis method and device of business event, electronic equipment and storage medium
CN111222566B (en) User attribute identification method, device and storage medium
Sangaralingam et al. Takeoff and sustained success of apps in hypercompetitive mobile platform ecosystems: an empirical analysis
CN114491407A (en) Flow cheating identification method, device, equipment and storage medium
CN115982653A (en) Abnormal account identification method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination