WO2015062485A1 - 网页访问量作弊的检测方法和装置 - Google Patents

网页访问量作弊的检测方法和装置 Download PDF

Info

Publication number
WO2015062485A1
WO2015062485A1 PCT/CN2014/089724 CN2014089724W WO2015062485A1 WO 2015062485 A1 WO2015062485 A1 WO 2015062485A1 CN 2014089724 W CN2014089724 W CN 2014089724W WO 2015062485 A1 WO2015062485 A1 WO 2015062485A1
Authority
WO
WIPO (PCT)
Prior art keywords
access
amount
target webpage
visit
set threshold
Prior art date
Application number
PCT/CN2014/089724
Other languages
English (en)
French (fr)
Inventor
祁国晟
吴充
马燕龙
杨韬
戴飞
余德乐
Original Assignee
北京国双科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京国双科技有限公司 filed Critical 北京国双科技有限公司
Publication of WO2015062485A1 publication Critical patent/WO2015062485A1/zh
Priority to US15/139,096 priority Critical patent/US20160239864A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0248Avoiding fraud
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • H04L67/025Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/50Address allocation
    • H04L61/5007Internet protocol [IP] addresses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/16Implementing security features at a particular protocol layer
    • H04L63/168Implementing security features at a particular protocol layer above the transport layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user

Definitions

  • the present invention relates to the field of the Internet, and in particular to a method and apparatus for detecting cheating of webpage traffic.
  • Internet advertising cheats The media (such as Sina and other websites, as the main site of the website, used to complete the advertising campaign) to cheat for advertising traffic.
  • Advertiser A publisher of advertising campaigns, a merchant that sells or promotes its products and services online, and is the provider of affiliate marketing ads. Any merchant that promotes or sells its products or services can act as an advertiser. The advertiser publishes the advertising campaign and pays the website owner according to the total number of marketing effects and the unit effect price specified in the advertising campaign completed by the website owner.
  • Embedding an inline frame iframe in a web page is the most common technique for cheating on Internet advertising.
  • This method generally embeds an iframe of size 0x0 or 1x1 on its own webpage, that is, an iframe that is invisible to the user. Open other pages through the iframe, so that the user opens a web page that they don't want to enter, and the user can't see it. Brush the flow.
  • the traditional anti-cheat method is difficult to effectively identify such cheating methods using “human tactics” and embedded iframes, which makes it difficult to effectively suppress clicks.
  • the cheating of Internet advertisements is a cheating behavior carried out by the website owner in order to brush the traffic. Therefore, the third-party authoritative testing agency detects the cheating behavior of the advertisement webpage, which can effectively protect the interests of the advertiser.
  • the prior art there are few solutions that can recognize the cheating of webpages.
  • the main object of the present invention is to provide a method and a device for detecting fraudulent webpage access, so as to solve the problem of inaccurate recognition of webpage traffic cheating in the prior art.
  • a method for detecting cheating of a web page includes: obtaining the amount of access of the target webpage; determining whether the amount of the visitor satisfies the predetermined condition; obtaining the access source information of the target webpage if the amount of the visitor satisfies the predetermined condition; and determining the target according to the information of the access source Whether the number of visits to the page is cheat.
  • obtaining the visit amount of the target webpage includes acquiring the historical visit amount and the current visit amount of the target webpage, and determining whether the visitor amount satisfies the predetermined condition includes: obtaining a ratio of the historical visit amount and the current visit amount; determining whether the ratio exceeds the first setting a threshold; if the ratio exceeds the first set threshold, determining that the amount of access satisfies the predetermined condition; if the ratio does not exceed the first set threshold, determining that the amount of access does not satisfy the predetermined condition.
  • obtaining the visit amount of the target webpage includes obtaining the historical visit amount and the current visit amount of the target webpage
  • determining whether the visitor quantity satisfies the predetermined condition includes: obtaining a difference between the historical visit amount and the current visit amount; and determining whether the difference exceeds the second The threshold is set; if the difference exceeds the second set threshold, it is determined that the access amount satisfies the predetermined condition; if the difference does not exceed the second set threshold, it is determined that the access amount does not satisfy the predetermined condition.
  • obtaining the access source information of the target webpage includes: obtaining a source code of the target webpage; adding a detection code to the source code to obtain an access IP address of the target webpage; and using the access IP address as the access source information.
  • Determining whether the traffic of the target webpage is cheated according to the access source information includes: obtaining a first access amount of the first access IP address in the access IP address, where the first access IP address is the access IP address of the access target webpage having the largest access target address in the access IP address.
  • Calculating a ratio of the first visit amount to the visit amount determining whether the ratio of the first visit amount to the visit amount exceeds a third set threshold; if the ratio of the first visit amount to the visit amount exceeds a third set threshold, determining the target The amount of visits to the web page is cheated, and if the ratio of the first visit amount to the visit amount does not exceed the third set threshold, it is determined that the visit amount of the target web page is not cheated.
  • determining the amount of visits to the target web page includes: obtaining an access stay time of the first access IP; determining whether the access stay time exceeds a fourth set threshold; and determining the target webpage if the visit stay time does not exceed the fourth set threshold The amount of traffic is cheated; if the access time exceeds the fourth set threshold, it is determined that the amount of traffic to the landing page is not cheat.
  • the method for detecting the cheating of the webpage accessing comprises: obtaining the source code of the target webpage; and detecting whether there is an inline frame iframe having a size of 0*0 or 1*1 in the source code; If there is no iframe in the source code, get the traffic to the landing page.
  • a device for detecting a web page cheat includes: a first obtaining unit, configured to acquire an access amount of the target webpage; a first determining unit, configured to determine whether the amount of access meets a predetermined condition; and a second obtaining unit, configured to When the amount of access meets the predetermined condition, the access source information of the target webpage is obtained; and the second determining unit is configured to determine, according to the access source information, whether the amount of access of the target webpage is cheating.
  • the first obtaining unit is further configured to obtain a historical access amount and a current access amount of the target webpage
  • the first determining unit includes: a first acquiring module, configured to obtain a ratio of the historical access amount and the current access amount; a determining module, configured to determine whether the ratio exceeds a first set threshold; the first determining module, configured to determine that the access amount meets a predetermined condition when the ratio exceeds the first set threshold, when the ratio does not exceed the first set threshold, It is determined that the amount of access does not satisfy the predetermined condition.
  • the first obtaining unit is further configured to obtain a historical access amount and a current access amount of the target webpage
  • the first determining unit includes: a second acquiring module, configured to acquire a difference between the historical access amount and the current access amount; a second determining module, configured to determine whether the difference exceeds a second set threshold; and a second determining module, configured to determine that the access amount meets a predetermined condition when the difference exceeds the second set threshold, when the difference does not exceed the second setting When the threshold is determined, it is determined that the amount of access does not satisfy the predetermined condition.
  • the second obtaining unit includes: a third obtaining module, configured to acquire source code of the target webpage; and a fourth acquiring module, configured to add a detection code to the source code to obtain an access IP address of the target webpage; and generate a module, The accessing IP address is used as the access source information.
  • the second determining unit includes: a fifth obtaining module, configured to obtain a first access amount of the first access IP address in the access IP address, where the first access IP address is the access IP address.
  • Accessing a maximum of one access IP address of the target webpage ; a calculation module, configured to calculate a ratio of the first visitor to the visitor; and a third determining module, configured to determine whether a ratio of the first visitor to the visitor exceeds a third set threshold a third determining module, configured to determine the target webpage visit when the ratio of the first visit amount to the visit amount exceeds a third set threshold When the ratio of the first visit amount to the visit amount does not exceed the third set threshold, it is determined that the visit amount of the target web page is not cheated.
  • the third determining module includes: an obtaining submodule, configured to obtain an access dwell time of the first access IP; a determining submodule, configured to determine whether the access dwell time exceeds a fourth set threshold; and determining a submodule for using When the visit stay time does not exceed the fourth set threshold, it is determined that the visit amount of the target webpage is cheated, and when the visit stay time exceeds the fourth set threshold, it is determined that the visit amount of the target webpage is not cheated.
  • the detecting means for the web page cheats further includes: a third obtaining unit, configured to acquire a source code of the target webpage before acquiring the access amount of the target webpage; and a detecting unit, configured to detect whether the size of the source code is 0 *0 or 1*1 inline frame iframe; determining unit for obtaining the access amount of the target webpage when the iframe does not exist in the source code.
  • the method for detecting fraudulent use of the webpage visitor includes: obtaining the visit amount of the target webpage; determining whether the visitor satisfies the predetermined condition; if the visitor satisfies the predetermined condition, obtaining the access source information of the target webpage; and judging according to the access source information Whether the amount of visits to the target web page is cheated, and whether the visit amount of the obtained target webpage satisfies the preset condition, when the visit amount satisfies the preset condition, it is determined that the target webpage visit amount is suspected of cheating, and further obtains the access source of the target webpage.
  • the information further determines whether the amount of visits to the target web page is cheat based on the access source information, and improves the accuracy of detecting the cheating of the target webpage by analyzing and determining the source information of the target webpage, thereby solving the problem of cheating the webpage traffic. Inaccurate problems, in turn, achieve the effect of accurately identifying the amount of traffic on the landing page.
  • FIG. 1 is a schematic structural diagram of a device for detecting a cheat on a webpage according to a first embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a device for detecting a cheat on a webpage according to a second embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a device for detecting a cheat on a webpage according to a third embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a device for detecting a cheat on a webpage according to a fourth embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a device for detecting cheat on a webpage according to a fifth embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a device for detecting cheat on a web page according to a sixth embodiment of the present invention.
  • FIG. 7 is a flowchart of a method for detecting cheat on a web page according to a first embodiment of the present invention
  • FIG. 8 is a flowchart of a method for detecting cheat on web page access according to a second embodiment of the present invention.
  • FIG. 9 is a flowchart of a method for detecting cheat on a web page according to a third embodiment of the present invention.
  • FIG. 10 is a flowchart of a method for detecting cheat on web page access according to a fourth embodiment of the present invention.
  • FIG. 11 is a flowchart of a method for detecting cheat on a web page in accordance with a fifth embodiment of the present invention.
  • FIG. 12 is a flowchart of a method for detecting cheat on web page access according to a sixth embodiment of the present invention.
  • the embodiment of the invention provides a device for detecting cheats of webpages, and the device realizes its functions through a computer device.
  • the device for detecting web page cheats includes: a first obtaining unit 10, a first judging unit 20, a second obtaining unit 30, and a second judging unit 40.
  • the first obtaining unit 10 is configured to acquire the amount of access of the target webpage.
  • the amount of access obtained by the first obtaining unit 10 is the total amount of access of the target webpage.
  • the target webpage is a webpage that needs to detect the cheating of the visitor.
  • the webpage may be any webpage of any one of the webpages, and may be a webpage where the advertiser serves the advertisement, or may be a product webpage of the advertiser's marketing.
  • the amount of visits to the webpage obtained by the advertiser can be known.
  • the amount of traffic can be access traffic or access traffic.
  • the amount of traffic can be historical traffic, and the historical traffic represents the amount of traffic to the landing page over a certain period of time in the past.
  • the amount of access may also be the current amount of visits, and the current amount of visits indicates the amount of visits to the target webpage within a certain period of time.
  • the amount of traffic can also be historical traffic and current traffic.
  • the first obtaining unit 10 obtains the access amount by using a detection code in the target webpage to detect the access traffic of the target webpage or the access volume information such as the click volume, or directly read the target from the log file of the target webpage. Visits information such as visit traffic or visits to web pages.
  • the first determining unit 20 is configured to determine whether the amount of access meets a predetermined condition. Based on the access amount of the target web page acquired by the first obtaining unit 10, the first determining unit 20 determines whether the access amount satisfies the predetermined condition based on the access amount.
  • the predetermined condition may be a change rule of the amount of access.
  • the predetermined condition is a threshold when the amount of the visitor is abrupt. When the amount of the visit exceeds the threshold, the amount of the visitor is considered to satisfy the predetermined condition.
  • a sudden change in the number of visits is determined, that is, the current visits have a sudden change compared to the historical visits. This mutation can indicate that the current visits are rapidly increasing, and it can also indicate that the current visits are rapidly decreasing.
  • the situation in which the current visit amount appears to increase rapidly is the abrupt state of the access amount.
  • the first judging unit 20 judges whether the access amount satisfies a predetermined condition, so as to determine whether the access amount is suspected of cheating.
  • the traffic volume increases rapidly, if the current day's traffic is much larger than the previous day's traffic, it can be considered that the target page's traffic is suspected of cheating.
  • the second obtaining unit 30 is configured to acquire access source information of the target webpage when the amount of access meets the predetermined condition. When the amount of access to the target webpage satisfies the predetermined condition, it is determined that the amount of access to the target webpage is suspected to be cheating. When the target webpage is suspected to be cheating, the second obtaining unit 30 acquires the access source information of the target webpage.
  • the access source information may be an IP (Internet Protocol, IP address) address of the visitor, or may be a path information of the access. For example, for one visit, the visit may be accessed to the target webpage through a hyperlink of another webpage.
  • the second obtaining unit 30 can obtain the access path information of the access by adding the detection code to the source code of the target webpage, and can also obtain the IP address of the visitor. By obtaining the access source information, it is convenient to judge whether the traffic of the target webpage is cheat.
  • the second determining unit 40 is configured to determine, according to the access source information, whether the access amount of the target webpage is cheating. Since the amount of visits to the target webpage is suspected to be cheating at this time, after obtaining the access source information of the target webpage, it is possible to determine whether the amount of visits of the target webpage is cheated based on the access source information. For example, when the access source information is obtained, most of the access source information is accessed from some non-mainstream websites or a website that few people contact (that is, visitors through some non-mainstream websites or a website that few people touch.
  • the present invention by determining whether the access amount of the target webpage acquired by the first obtaining unit 10 satisfies a preset condition, when the amount of access meets the preset condition, it is determined that the target webpage visit amount is suspected to be cheating, and further obtaining the target webpage.
  • the source information is accessed, and according to the access source information, whether the amount of visits to the target webpage is cheated is further determined.
  • the accuracy of detecting the cheating of the target webpage is improved, and the target webpage is accurately identified. The effect of cheating on the amount of traffic.
  • FIG. 2 is a schematic structural diagram of a device for detecting cheat on a web page according to a second embodiment of the present invention.
  • the apparatus for detecting cheat on the web page of this embodiment can be used as a preferred embodiment of the above embodiment.
  • the apparatus for detecting censoring of webpages includes a first obtaining unit 10, a first judging unit 20, a second obtaining unit 30, and a second judging unit 40, wherein the first judging unit 20 includes the first obtaining unit.
  • the second obtaining unit 30 and the second determining unit 40 have the same functions as the second obtaining unit 30 and the second determining unit 40 shown in FIG. 1 , and details are not described herein.
  • the first obtaining unit 10 is further configured to acquire a historical visit amount and a current visit amount of the target webpage.
  • Historical traffic and current traffic are both visits to the landing page.
  • the historical visit amount indicates the amount of visits to the target web page in the past one unit
  • the current visit amount indicates the visit amount of the target web page in the current unit time.
  • One of the past unit time is the same unit time as the current unit time.
  • the current visit amount may be the visit amount of the target webpage of the current day
  • the historical visit amount may be the visit amount of the target webpage of the previous day.
  • the historical traffic and current traffic of the target webpage can be obtained by adding a detection code to the source code of the target webpage.
  • the first obtaining module 201 is configured to obtain a ratio of the historical access amount and the current access amount. Compare the historical traffic with the current traffic to get a ratio.
  • the current traffic of the target webpage is the traffic volume of the day
  • the historical traffic volume may be the previous day's traffic volume, wherein the traffic volume may be the access traffic or Access the traffic volume and compare the access traffic or the access traffic of the two to obtain a ratio.
  • the ratio can be the ratio of the current traffic divided by the historical traffic, or the historical traffic divided by the current traffic.
  • the ratio can also be the proportion of current visits that exceed historical visits. Obtaining the ratio can show the trend of the visitor.
  • the ratio is the ratio of the current visit amount divided by the historical visit amount. When the ratio is greater than 1, the current visit amount is greater than the historical visit amount, and when the ratio is larger, the ratio is larger. Indicates that the current traffic has increased dramatically.
  • the first determining module 202 is configured to determine whether the ratio exceeds a first set threshold.
  • the first set threshold can be set according to the actual situation. For example, when the ratio is the ratio of the current visit amount divided by the historical visit amount, the first set threshold may be set to 1.5, and determining whether the ratio exceeds the first set threshold indicates whether the current visit amount exceeds the historical visit amount. 1.5 times, the first set threshold may also be set to 2, and determining whether the ratio exceeds the first set threshold indicates whether the current visit amount exceeds twice the historical visit amount. When the ratio indicates that the current access amount exceeds the proportion of the historical access amount, the first set threshold may be set to 30%, and determining whether the ratio exceeds the first set threshold indicates that the current visit amount is compared with the historical visit amount. Whether the growth rate exceeds 30%.
  • the first determining module 203 is configured to determine that the access amount satisfies the predetermined condition when the ratio exceeds the first set threshold, and determine that the access amount does not satisfy the predetermined condition when the ratio does not exceed the first set threshold.
  • the ratio exceeds the first set threshold an alarm is prompted, and it is determined that the amount of access meets the preset condition, and the access source information of the target webpage is executed.
  • the first set threshold may be set to 1.5, and determining whether the ratio exceeds the first set threshold indicates whether the current visit amount exceeds 1.5 of the historical visit amount.
  • the ratio exceeds the first set threshold of 1.5, it is determined that the visitor satisfies the predetermined condition, and the current visitor has a sudden or rapidly increasing trend, and it can be determined that there is a certain cheating suspicion, and the next step of analysis is to obtain the access source information. .
  • the first set threshold may be set to 30%, determining whether the ratio exceeds the first set threshold means determining whether the growth rate of the current visit amount relative to the historical visit amount exceeds 30%, and when the growth rate exceeds 30%, determining that the visit amount satisfies the predetermined condition,
  • the current visits have a sudden or rapid increase in the number of visits, and it can be identified that there is some suspected cheating and the next step is analyzed.
  • the ratio does not exceed the first set threshold, if the ratio does not exceed the first set threshold of 1.5 in the above example, it is determined that the visitor does not satisfy the predetermined condition, and the visitor quantity does not have an abnormality, and it can be determined that the visit amount of the target webpage is not cheated. .
  • FIG. 3 is a schematic structural diagram of a device for detecting cheat on a web page according to a third embodiment of the present invention.
  • the apparatus for detecting cheat on the web page of this embodiment can be used as a preferred embodiment of the above embodiment.
  • the apparatus for detecting censoring of the webpage includes the first obtaining unit 10, the first determining unit 20, the second obtaining unit 30, and the second determining unit 40, wherein the first determining unit 20 includes the second obtaining.
  • the second obtaining unit 30 and the second determining unit 40 have the same functions as the second obtaining unit 30 and the second determining unit 40 shown in FIG. 1 , and details are not described herein.
  • the first obtaining unit 10 is further configured to acquire a historical visit amount and a current visit amount of the target webpage.
  • Historical traffic and current traffic are both visits to the landing page.
  • the historical visit amount indicates the amount of visits to the target web page in the past one unit
  • the current visit amount indicates the visit amount of the target web page in the current unit time.
  • One of the past unit time is the same unit time as the current unit time.
  • the current visit amount may be the visit amount of the target webpage of the current day
  • the historical visit amount may be the visit amount of the target webpage of the previous day.
  • the historical traffic and current traffic of the target webpage can be obtained by adding a detection code to the source code of the target webpage.
  • the second obtaining module 204 is configured to obtain a difference between the historical access amount and the current access amount.
  • the historical traffic and the current traffic are subtracted to obtain a difference.
  • the current traffic of the target webpage is the amount of traffic of the day
  • the historical traffic may be the previous day's traffic, wherein the traffic may be access.
  • Traffic or access traffic the access traffic or access traffic of the two is subtracted to obtain a difference, which may be the difference between the current visit amount minus the historical visit amount, or the historical visit amount minus The difference obtained by going to the current visit.
  • Obtaining the difference can show the trend of the traffic volume.
  • the difference is the difference between the current traffic minus the historical traffic. When the difference is positive, the current traffic is greater than the historical traffic, and the difference is The larger the value, the more the current traffic volume has increased.
  • the second determining module 205 is configured to determine whether the difference exceeds a second set threshold.
  • the second set threshold can be set according to the actual situation. For example, when the difference is the difference between the current visit amount and the historical visit amount, determining whether the difference exceeds the first set threshold indicates whether the amount of access of the current visit amount exceeding the historical visit amount exceeds the second set threshold. .
  • the second determining module 206 is configured to determine that the access amount satisfies the predetermined condition when the difference exceeds the second set threshold, and determine that the access amount does not satisfy the predetermined condition when the difference does not exceed the second set threshold.
  • the difference exceeding the second set threshold indicates whether the amount of access of the current visit amount exceeding the historical visit amount exceeds the second set threshold.
  • the alarm prompts, and determines that the amount of access meets the preset condition, and step S306 is performed.
  • the difference exceeds the second set threshold it indicates that the current visit has a sudden or rapidly increasing trend, and it can be determined that there is a certain cheating suspicion, and the next step of analysis is to obtain the access source information.
  • the difference does not exceed the second set threshold it indicates that the access amount is abnormal, and it can be determined that the visit amount of the target webpage is not cheated.
  • FIG. 4 is a schematic structural diagram of a device for detecting cheat on a web page according to a fourth embodiment of the present invention.
  • the apparatus for detecting cheat on the web page of this embodiment can be used as a preferred embodiment of the above embodiment.
  • the apparatus for detecting censoring of webpages includes a first obtaining unit 10, a first judging unit 20, a second obtaining unit 30, and a second judging unit 40, wherein the second obtaining unit 30 includes a third obtaining unit.
  • the first obtaining unit 10 and the first determining unit 20 have the same functions as the first obtaining unit 10 and the first determining unit 20 shown in FIG. 1 , and details are not described herein.
  • the third obtaining module 301 is configured to obtain source code of the target webpage.
  • the second obtaining unit 30 acquires the access source information of the target webpage, where the access source information of the target webpage is obtained by acquiring the source code of the target webpage by using the third obtaining module 301, and the source code may be used for Get access source information for your landing page.
  • the fourth obtaining module 302 is configured to add a detection code to the source code to obtain an access IP address of the target webpage.
  • the detection code is used to detect the access source information of the target webpage, and the access source information is the access IP address.
  • the access IP address is the IP address of the visitor, and a detection code is added to the source code to obtain all access IP addresses of the target web page. For example, when three visitors access the target webpage, the IP address of the visitor of the three visits can be obtained by adding the detection code to the target webpage, and the three access IP addresses can be the same IP address, or Is not the same IP address.
  • the generating module 303 is configured to use the access IP address as the access source information.
  • the IP address of the visitor may indicate the source information of the access, and may indicate that the target webpage is indeed accessed by the visitor having the IP address.
  • the access IP address is used as the access source information to facilitate further detection of the specific amount of traffic on the target web page.
  • the fifth obtaining module 401 is configured to obtain a first access quantity of the first access IP address in the access IP address, where the first access IP address is the access IP address of the access target webpage that is the most visited. Since the access IP address obtained through the detection code includes multiple IP addresses, and each IP address brings a certain amount of access to the target webpage.
  • the first access IP address may be the one of the access IP addresses that accesses the target webpage at most The IP address of the visitor. For example, when the detection code detects that there are three IP addresses of the access target webpage, and one of the IP addresses accesses the target webpage the most, the IP address is the first access IP address.
  • the first access amount is the access amount of the first access IP address to the target webpage, and the ratio of the first visitor to the total visitor is larger than the access amount of any other access IP address.
  • the calculating module 402 is configured to calculate a ratio of the first access amount to the access amount.
  • the traffic is the total number of visits to the target webpage, and the ratio of the first visitor to the total visitor is calculated, so as to determine the proportion of the first visitor in the total visitor.
  • the third determining module 403 is configured to determine whether a ratio of the first access amount to the access amount exceeds a third set threshold.
  • the third set threshold may be set as needed. For example, when the third set threshold is 0.5, determining whether the ratio of the first access amount to the access amount exceeds a third set threshold indicates whether the first access amount is exceeded. Half of the total traffic.
  • the third determining module 404 is configured to determine that the access amount of the target webpage is cheated when the ratio of the first access amount to the access amount exceeds a third set threshold; when the ratio of the first visit amount to the visit amount does not exceed the third setting At the threshold, it is determined that the amount of traffic to the landing page is not cheat.
  • the third set threshold is 0.5
  • the ratio of the first visit amount to the visit amount exceeds 0.5, indicating that the first visit amount exceeds half of the total visit amount, and the visit amount of the target web page can be considered at this time. It is achieved through certain cheating methods, and the possibility of cheating on its visits is relatively large.
  • the third set threshold is 0.5
  • the ratio of the first visit amount to the visit amount does not exceed 0.5, indicating that the first visit amount does not exceed half of the total visit amount, and the target webpage may be considered at this time.
  • the amount of traffic is normal, and it can be basically determined that the amount of visits to the landing page is not cheat.
  • FIG. 5 is a schematic structural diagram of a device for detecting cheat on a web page according to a fifth embodiment of the present invention.
  • the apparatus for detecting cheat on the web page of this embodiment can be used as a preferred embodiment of the above embodiment.
  • the apparatus for detecting censorship of webpages includes a first obtaining unit 10, a first judging unit 20, a second obtaining unit 30, and a second judging unit 40, wherein the second obtaining unit 30 includes a third obtaining unit.
  • the module 301, the fourth determining module, and the generating module 303, the second determining unit 40 includes a fifth obtaining module 401, a calculating module 402, a third determining module 403, and a third determining module 404, where the third determining module 404 includes an obtaining submodule 4041.
  • the first obtaining unit 10, the first determining unit 20, and the second obtaining unit 30 have the same functions as the first obtaining unit 10, the first determining unit 20, and the second obtaining unit 30 shown in FIG.
  • the fifth obtaining module 401, the calculating module 402, and the third determining module 403 have the same functions as the fifth obtaining module 401, the calculating module 402, and the third determining module 403 shown in FIG. 4, and details are not described herein.
  • the obtaining submodule 4041 is configured to obtain an access stay time of the first access IP.
  • the visit time indicates that the visitor visits the target webpage, the stay time of the target webpage, and the first visit IP address has visited many of the target webpages.
  • the access stay time also includes multiple access dwell times, and the access dwell time of obtaining the first access IP address is the access dwell time of each access to obtain the first access IP address.
  • the determining sub-module 4042 is configured to determine whether the access dwell time exceeds a fourth set threshold.
  • the fourth set threshold is an access time threshold, that is, the threshold is a time value, and can be set as needed. Since the access dwell time includes multiple access dwell times, it is determined whether the access dwell time exceeds a fourth set threshold. Whether the access time per access exceeds the fourth set threshold. For example, when the fourth setting threshold is 3 s, it is determined whether each access staying time of the first access IP address exceeds 3 s.
  • the determining sub-module 4043 is configured to determine that the access amount of the target webpage is cheated when the access dwell time does not exceed the fourth set threshold.
  • the access dwell time does not exceed the fourth set threshold, indicating that the access dwell time of the multiple access of the first access IP address does not exceed the fourth set threshold, if most of the first visits of the first access IP address are visited. If the time does not exceed the fourth set threshold, the amount of visits to the target webpage is considered cheating. For example, when the fourth setting threshold is 3s, if most of the first access times of the first access IP address are less than 3s, the majority of the first access amount of the first access IP address is indicated.
  • the number of visits is abnormal, and it is likely to adopt a form of brushing webpage traffic. If it is not common sense, the traffic of the target webpage is considered cheating.
  • the first visit amount is the amount of access for the normal visit, so the visit amount of the target webpage can be considered.
  • most of the access time in the visit amount may refer to an access stay time exceeding a predetermined proportion of the access amount. For example, the predetermined ratio may be 60%.
  • FIG. 6 is a schematic structural diagram of a device for detecting a cheat of a web page according to a fifth embodiment of the present invention.
  • the apparatus for detecting cheat on the web page of this embodiment can be used as a preferred embodiment of the above embodiment.
  • the apparatus for detecting cheat on the webpage includes a first obtaining unit 10, a first judging unit 20, a second obtaining unit 30, a second judging unit 40, a third obtaining unit 50, a detecting unit 60, and a determination.
  • Unit 70 The first obtaining unit 10, the first determining unit 20, the second obtaining unit 30, and the second determining unit 40 are the first obtaining unit 10, the first determining unit 20, the second obtaining unit 30, and the second determining shown in FIG. Unit 40 has the same function and will not be described here.
  • the third obtaining unit 50 is configured to acquire the source code of the target webpage before acquiring the access amount of the target webpage.
  • the source code of the target webpage can be crawled by the crawler program, or the source code can be obtained by other means to obtain the organization structure of the target webpage, so as to facilitate the detection of the target webpage.
  • the detecting unit 60 is configured to detect whether there is an inline frame iframe with a size of 0*0 or 1*1 in the source code. Because of the size of 0 ⁇ 0 or 1 ⁇ 1 iframe, that is, the invisible iframe. Open other pages via iframe In this way, the user opens a web page that he or she does not want to enter, and scans traffic or traffic invisible. You can write an analysis program that analyzes from the source code whether there is an inline frame iframe of size 0*0 or 1*1.
  • the determining unit 70 is configured to acquire the access amount of the target webpage when the iframe does not exist in the source code. Since the inline frame iframe whose size is 0*0 or 1*1 is used to spoof the access amount, the visitor does not know the amount of the brush, so when the iframe is detected in the source code of the target webpage is detected. At the time, you can be sure that you have cheated, you can be sure that the landing page traffic is cheating. When the iframe does not exist in the source code, the next step is determined by obtaining the visit amount of the target webpage.
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device, such that they may be stored in a storage device by a computing device, or they may be fabricated into individual integrated circuit modules, or Multiple modules or steps are made into a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software.
  • the embodiment of the invention further provides a method for detecting cheat on webpage access.
  • the method of detecting cheating in webpages can be run on a computer device. It should be noted that the method for detecting the cheating of the webpage of the embodiment of the present invention can be performed by the detecting device for cheating the webpage of the embodiment of the present invention. A method for detecting cheat on web page access according to an embodiment of the present invention is performed.
  • FIG. 7 is a flowchart of a method for detecting cheat on web page access according to a first embodiment of the present invention. As shown in FIG. 7, the method for detecting cheat on the webpage includes the following steps:
  • Step S101 Obtain an access amount of the target webpage.
  • the number of visits received is the total number of visits to the landing page.
  • the target webpage is a webpage that needs to detect the cheating of the visitor.
  • the webpage may be any webpage of any one of the webpages, and may be a webpage where the advertiser serves the advertisement, or may be a product webpage of the advertiser's marketing.
  • the target webpage is a webpage where the advertiser advertises
  • the amount of visits to the webpage obtained by the advertiser can be known.
  • the amount of traffic can be access traffic or access traffic.
  • the amount of traffic can be historical traffic, and the historical traffic represents the amount of traffic to the landing page over a certain period of time in the past.
  • the amount of access may also be the current amount of visits, and the current amount of visits indicates the amount of visits to the target webpage within a certain period of time.
  • the amount of traffic can also be historical traffic and current traffic.
  • the first obtaining unit 10 obtains the access amount by using a detection code in the target webpage to detect the access traffic of the target webpage or the access volume information such as the click volume, or directly read the target from the log file of the target webpage. Visits information such as visit traffic or visits to web pages.
  • Step S102 determining whether the amount of access meets a predetermined condition.
  • the first determining unit 20 determines whether the access amount satisfies the predetermined condition based on the access amount.
  • the predetermined condition may be a change rule of the amount of access.
  • the predetermined condition is a threshold when the amount of access is abrupt. When the amount of access exceeds the threshold, the amount of access is considered to satisfy a predetermined condition, and the amount of access may be determined to be abrupt. That is, the current visit amount is abrupt compared to the historical visit amount, and the mutation may indicate that the current visit amount is rapidly increasing, and may also indicate that the current visit amount is rapidly decreasing.
  • the situation in which the current visit amount appears to increase rapidly is the abrupt state of the access amount.
  • the first judging unit 20 judges whether the access amount satisfies a predetermined condition, so as to determine whether the access amount is suspected of cheating.
  • the traffic volume increases rapidly, if the current day's traffic is much larger than the previous day's traffic, it can be considered that the target page's traffic is suspected of cheating.
  • Step S103 Acquire access source information of the target webpage if the amount of access meets the predetermined condition.
  • the predetermined condition it is determined that the amount of access to the target webpage is suspected to be cheating.
  • the second obtaining unit 30 acquires the access source information of the target webpage.
  • the access source information may be an access IP (Internet Protocol, IP address) address of the visitor, or may be a path information accessed by the visitor at the time. For example, for one visit, the visit may be accessed through a hyperlink of another webpage. Landing page.
  • the URL of the linked webpage of the visit can be obtained, and the access IP of the visitor can also be obtained.
  • the access source information it is convenient to judge whether the traffic of the target webpage is cheat. If the number of visits does not satisfy the predetermined condition, it can be considered that the visit amount of the target webpage has not been cheated, and it is continued to detect whether the visit amount of the target webpage satisfies the preset condition.
  • Step S104 Determine, according to the access source information, whether the access amount of the target webpage is cheating. Since the amount of visits to the target webpage is suspected to be cheating at this time, after obtaining the access source information of the target webpage, it is possible to determine whether the amount of visits of the target webpage is cheated based on the access source information. For example, when the access source information obtained is mostly from a non-mainstream website or a website that few people contact, or from the target web page itself, it can be determined that the target web page has a large amount of access.
  • the above is a certain method of cheating, increasing the number of visits to the target webpage through the link of some non-mainstream websites or a website that few people contact, or increasing the amount of visits to the target webpage by constantly refreshing the target webpage. It is highly likely to cheat and can be considered cheating for the traffic on the landing page.
  • the present invention by determining whether the access amount of the target webpage acquired by the first obtaining unit 10 satisfies a preset condition, when the amount of access meets the preset condition, it is determined that the target webpage visit amount is suspected to be cheating, and further obtaining the target webpage.
  • the source information is accessed, and according to the access source information, whether the amount of visits to the target webpage is cheated is further determined.
  • the accuracy of detecting the cheating of the target webpage is improved, and the target webpage is accurately identified. The effect of cheating on the amount of traffic.
  • FIG. 8 is a flowchart of a method for detecting cheat on web page access according to a second embodiment of the present invention.
  • the method for detecting cheat on the web page of this embodiment can be used as a preferred embodiment of the method for detecting cheat on the web page of the above embodiment.
  • the method for detecting cheat on the webpage includes the following steps:
  • Step S201 Obtain a historical visit amount and a current visit amount of the target webpage.
  • Historical traffic and current traffic are both visits to the landing page.
  • the historical visit amount indicates the amount of visits to the target web page in the past one unit
  • the current visit amount indicates the visit amount of the target web page in the current unit time.
  • One of the past unit time is the same unit time as the current unit time.
  • the current visit amount may be the visit amount of the target webpage of the current day
  • the historical visit amount may be the visit amount of the target webpage of the previous day.
  • the historical traffic and current traffic of the target webpage can be obtained by adding a detection code to the source code of the target webpage.
  • Step S202 obtaining a ratio of the historical visit amount to the current visit amount. Compare the historical traffic with the current traffic to get a ratio.
  • the current traffic of the target webpage is the traffic volume of the day
  • the historical traffic volume may be the previous day's traffic volume, wherein the traffic volume may be the access traffic or Access the traffic volume, compare the access traffic or the access traffic of the two, and get a ratio.
  • the ratio can be the ratio of the current traffic divided by the historical traffic, or the historical traffic divided by the current traffic.
  • the ratio can also be the ratio of the current number of visits to the historical visits. Obtaining the ratio can show the trend of the visitor.
  • the ratio is the ratio of the current visit amount divided by the historical visit amount. When the ratio is greater than 1, the current visit amount is greater than the historical visit amount, and when the ratio is larger, the ratio is larger. Indicates that the current traffic has increased dramatically.
  • step S203 it is determined whether the ratio exceeds the first set threshold.
  • the first set threshold can be set according to the actual situation. For example, when the ratio is the ratio of the current visit amount divided by the historical visit amount, the first set threshold may be set to 1.5, and determining whether the ratio exceeds the first set threshold indicates whether the current visit amount exceeds the historical visit amount. 1.5 times, the first set threshold may also be set to 2, and determining whether the ratio exceeds the first set threshold indicates whether the current visit amount exceeds twice the historical visit amount. When the ratio indicates that the current access amount exceeds the proportion of the historical access amount, the first set threshold may be set to 30%, and determining whether the ratio exceeds the first set threshold indicates that the current visit amount is compared with the historical visit amount. Whether the growth rate exceeds 30%.
  • Step S204 if the ratio exceeds the first set threshold, it is determined that the access amount satisfies the predetermined condition.
  • the ratio exceeds the first set threshold an alarm is prompted, and it is determined that the amount of access meets the preset condition, and step S206 is performed.
  • the first set threshold may be set to 1.5, and determining whether the ratio exceeds the first set threshold indicates whether the current visit amount exceeds 1.5 of the historical visit amount.
  • the ratio exceeds the first set threshold of 1.5, it is determined that the visitor satisfies the predetermined condition, and the current visitor has a sudden or rapidly increasing trend, and it can be determined that there is a certain cheating suspicion, and the next step of analysis is to obtain the access source information.
  • the first set threshold may be set to 30%. Determining whether the ratio exceeds the first set threshold indicates whether the growth rate of the current visit amount relative to the historical visit amount exceeds 30%, and when the growth rate exceeds 30%, determining that the visit amount satisfies the predetermined condition, and the current access A sudden change in the amount or a rapid increase in the number of suspects can be identified as a suspected cheating.
  • Step S205 if the ratio does not exceed the first set threshold, it is determined that the access amount does not satisfy the predetermined condition.
  • the ratio does not exceed the first set threshold if the ratio does not exceed the first set threshold of 1.5 in the above example, it is determined that the visitor does not satisfy the predetermined condition, and the visit amount is abnormal, and it can be determined that the visit amount of the target webpage is not cheated. .
  • Step S206 if the amount of access meets the predetermined condition, the access source information of the target webpage is obtained.
  • the second obtaining unit 30 acquires the access source information of the target webpage.
  • the access source information may be the access IP address of the visitor or the URL of the visited webpage, such as for one visit, the visit may be accessed through the hyperlink of another webpage to the target webpage, through the target webpage.
  • the detection code is added to the source code to obtain the URL of the linked webpage of the visit, and also obtain the access IP of the visitor. By obtaining the access source information, it is convenient to judge whether the traffic of the target webpage is cheat.
  • Step S207 determining whether the access amount of the target webpage is cheat based on the access source information. Since the amount of visits to the target webpage is suspected to be cheating at this time, after obtaining the access source information of the target webpage, it is possible to determine whether the amount of visits of the target webpage is cheated based on the access source information. For example, when the access source information obtained is mostly from a non-mainstream website or a website that few people contact, or from the target web page itself, it can be considered that the target web page has a large amount of access. To a certain extent, it uses certain cheats to brush the traffic of the target webpage through the link of some non-mainstream websites or a website that few people touch, or to brush the landing page by constantly refreshing the target webpage. It is highly likely to cheat and can be considered cheating for the traffic on the landing page.
  • FIG. 9 is a flowchart of a method for detecting cheat on web page access according to a third embodiment of the present invention.
  • the method for detecting cheat on the web page of this embodiment can be used as a preferred embodiment of the method for detecting cheat on the web page of the above embodiment.
  • the method for detecting cheat on the webpage includes the following steps:
  • Step S301 Obtain a historical visit amount and a current visit amount of the target webpage.
  • Historical traffic and current traffic are both visits to the landing page.
  • the historical visit amount indicates the amount of visits to the target web page in the past one unit
  • the current visit amount indicates the visit amount of the target web page in the current unit time.
  • One of the past unit time is the same unit time as the current unit time.
  • the current visit amount may be the visit amount of the target webpage of the current day
  • the historical visit amount may be the visit amount of the target webpage of the previous day.
  • the historical traffic and current traffic of the target webpage can be obtained by adding a detection code to the source code of the target webpage.
  • Step S302 Obtain a difference between the historical visit amount and the current visit amount.
  • the historical traffic and the current traffic are subtracted to obtain a difference.
  • the current traffic of the target webpage is the amount of traffic of the day
  • the historical traffic may be the previous day's traffic, wherein the traffic may be access.
  • Traffic or access traffic the access traffic or access traffic of the two is subtracted to obtain a difference, which may be the difference between the current visit amount minus the historical visit amount, or the historical visit amount minus The difference obtained by going to the current visit.
  • Obtaining the difference can show the trend of the traffic volume.
  • the difference is the difference between the current traffic minus the historical traffic. When the difference is positive, the current traffic is greater than the historical traffic, and the difference is The larger the value, the more the current traffic volume has increased.
  • Step S303 determining whether the difference exceeds a second set threshold.
  • the second set threshold can be set according to the actual situation. For example, when the difference is the difference between the current visit amount and the historical visit amount, determining whether the difference exceeds the first set threshold indicates whether the amount of access of the current visit amount exceeding the historical visit amount exceeds the second set threshold. .
  • Step S304 if the difference exceeds the second set threshold, it is determined that the access amount satisfies the predetermined condition.
  • the difference exceeding the second set threshold indicates whether the amount of access of the current visit amount exceeding the historical visit amount exceeds the second set threshold.
  • the alarm prompts, and determines that the amount of access meets the preset condition, and step S306 is performed.
  • the difference exceeds the second set threshold it indicates that the current visit has a sudden or rapidly increasing trend, and it can be determined that there is a certain cheating suspicion, and the next step of analysis is to obtain the access source information.
  • Step S305 if the difference does not exceed the second set threshold, it is determined that the access amount does not satisfy the predetermined condition.
  • the difference does not exceed the second set threshold it indicates that the access amount is abnormal, and it can be determined that the visit amount of the target webpage is not cheated.
  • Step S306 if the amount of access meets the predetermined condition, the access source information of the target webpage is obtained.
  • the second obtaining unit 30 acquires the access source information of the target webpage.
  • the access source information may be the access IP address of the visitor or the URL of the visited webpage, such as for one visit, the visit may be accessed through the hyperlink of another webpage to the target webpage, through the target webpage.
  • the detection code is added to the source code to obtain the URL of the linked webpage of the visit, and also obtain the access IP of the visitor. By obtaining the access source information, it is convenient to judge whether the traffic of the target webpage is cheat.
  • Step S307 determining whether the access amount of the target webpage is cheat based on the access source information. Since the amount of visits to the target webpage is suspected to be cheating at this time, after obtaining the access source information of the target webpage, it is possible to determine whether the amount of visits of the target webpage is cheated based on the access source information. For example, when the access source information is obtained, most of the access source information comes from a non-mainstream website or a website that few people contact, or it comes from The target page itself, then it can be determined that the amount of visits to the target page is largely a cheat, through the links of some non-mainstream websites or a website that few people touch, to brush the traffic of the target page, or Swipe the landing page by constantly refreshing the landing page. It is highly likely to cheat and can be considered cheating for the traffic on the landing page.
  • FIG. 10 is a flowchart of a method for detecting cheat on web page access according to a fourth embodiment of the present invention.
  • the method for detecting cheat on the web page of this embodiment can be used as a preferred embodiment of the method for detecting cheat on the web page of the above embodiment.
  • the method for detecting cheat on the webpage includes the following steps:
  • Step S401 Obtain an access amount of the target webpage.
  • the target webpage is a webpage that needs to detect the cheating of the visitor.
  • the webpage may be any webpage of any one of the webpages, and may be a webpage where the advertiser serves the advertisement, or may be a product webpage of the advertiser's marketing.
  • the amount of visits to the webpage obtained by the advertiser can be known.
  • the amount of traffic can be access traffic or access traffic.
  • the amount of traffic can be historical traffic, and the historical traffic represents the amount of traffic to the landing page over a certain period of time in the past.
  • the amount of access may also be the current amount of visits, and the current amount of visits indicates the amount of visits to the target webpage within a certain period of time.
  • the amount of traffic can also be historical traffic and current traffic.
  • the first obtaining unit 10 obtains the access amount by using a detection code in the target webpage to detect the access traffic of the target webpage or the access volume information such as the click volume, or directly read the target from the log file of the target webpage. Visits information such as visit traffic or visits to web pages.
  • step S402 it is determined whether the access amount satisfies a predetermined condition. Based on the access amount of the target web page acquired by the first obtaining unit 10, the first determining unit 20 determines whether the access amount satisfies the predetermined condition based on the access amount.
  • the predetermined condition may be a change rule of the amount of access.
  • the predetermined condition is a threshold when the amount of access is abrupt. When the amount of access exceeds the threshold, the amount of access is considered to satisfy a predetermined condition, and the amount of access may be determined to be abrupt. That is, the current visit amount is abrupt compared to the historical visit amount, and the mutation may indicate that the current visit amount is rapidly increasing, and may also indicate that the current visit amount is rapidly decreasing.
  • the situation in which the current visit amount appears to increase rapidly is the abrupt state of the access amount.
  • the first judging unit 20 judges whether the access amount satisfies a predetermined condition, so as to determine whether the access amount is suspected of cheating. When the traffic volume increases rapidly, if the current day's traffic is much larger than the previous day's traffic, it can be considered that the target page's traffic is suspected of cheating. Otherwise, the target page's traffic is not cheated.
  • Step S403 if the amount of access meets the predetermined condition, the source code of the target webpage is obtained.
  • the access source information of the target webpage is obtained, wherein the source code of the target webpage is obtained by obtaining the access source information of the target webpage, and the source code may be used to obtain the access source information of the target webpage. If the number of visits does not satisfy the predetermined condition, it can be considered that the visit amount of the target webpage has not been cheated, and it is continued to detect whether the visit amount of the target webpage satisfies the preset condition.
  • Step S404 adding a detection code to the source code to obtain an access IP address of the target webpage.
  • the detection code is used to detect the access source information of the target webpage, and the access source information is the access IP address.
  • the access IP address is the IP address of the visitor, and a detection code is added to the source code to obtain all access IP addresses of the target web page. For example, when three visitors access the target webpage, the IP address of the visitor of the three visits can be obtained by adding the detection code to the target webpage, and the three access IP addresses can be the same IP address, or Is not the same IP address.
  • Step S405 the access IP address is used as the access source information.
  • the IP address of the visitor may indicate the source information of the access, and may indicate that the target webpage is indeed accessed by the visitor having the IP address.
  • the access IP address is used as the access source information to facilitate further detection of the specific amount of traffic on the target web page.
  • Step S406 Obtain a first access quantity of the first access IP address in the access IP address, where the first access IP address is the access IP address that accesses the target webpage in the access IP address. Since the access IP address obtained through the detection code includes multiple IP addresses, and each IP address brings a certain amount of access to the target webpage.
  • the first access IP address may be an IP address of one of the accessing IP addresses that accesses the target web page. For example, when the detection code detects that there are three IP addresses of the access target webpage, and one of the IP addresses accesses the target webpage the most, the IP address is the first access IP address.
  • the first access amount is the access amount of the first access IP address to the target webpage, and the ratio of the first visitor to the total visitor is larger than the access amount of any other access IP address.
  • Step S407 calculating a ratio of the first access amount to the access amount.
  • the traffic is the total number of visits to the target webpage, and the ratio of the first visitor to the total visitor is calculated, so as to determine the proportion of the first visitor in the total visitor.
  • Step S408 determining whether the ratio of the first access amount to the access amount exceeds a third set threshold.
  • the third set threshold may be set as needed. For example, when the third set threshold is 0.5, determining whether the ratio of the first access amount to the access amount exceeds a third set threshold indicates whether the first access amount is exceeded. Half of the total traffic.
  • Step S409 if the ratio of the first visit amount to the visit amount exceeds the third set threshold, it is determined that the visit amount of the target webpage is cheat.
  • the third set threshold is 0.5
  • the ratio of the first visit amount to the visit amount exceeds 0.5, indicating that the first visit amount exceeds half of the total visit amount, and the visit amount of the target web page can be considered at this time. It is achieved through certain cheating methods, and the possibility of cheating on its visits is relatively large.
  • Step S410 If the ratio of the first visit amount to the visit amount does not exceed the third set threshold, it is determined that the visit amount of the target webpage is not cheat.
  • the third set threshold is 0.5
  • the ratio of the first visit amount to the visit amount does not exceed 0.5, indicating that the first visit amount does not exceed half of the total visit amount, and the target webpage may be considered at this time.
  • the amount of traffic is normal, and it can be basically determined that the amount of visits to the landing page is not cheat.
  • FIG. 11 is a flowchart of a method for detecting cheat on web page access according to a fifth embodiment of the present invention.
  • the method for detecting cheat on the web page of this embodiment can be used as a preferred embodiment of the method for detecting cheat on the web page of the above embodiment.
  • the method for detecting cheat on the webpage includes the following steps:
  • Step S501 Obtain an access amount of the target webpage.
  • the target webpage is a webpage that needs to detect the cheating of the visitor.
  • the webpage may be any webpage of any one of the webpages, and may be a webpage where the advertiser serves the advertisement, or may be a product webpage of the advertiser's marketing.
  • the amount of visits to the webpage obtained by the advertiser can be known.
  • the amount of traffic can be access traffic or access traffic.
  • the amount of traffic can be historical traffic, and the historical traffic represents the amount of traffic to the landing page over a certain period of time in the past.
  • the amount of access may also be the current amount of visits, and the current amount of visits indicates the amount of visits to the target webpage within a certain period of time.
  • the amount of traffic can also be historical traffic and current traffic.
  • the first obtaining unit 10 obtains the access amount by using a detection code in the target webpage to detect the access traffic of the target webpage or the access volume information such as the click volume, or directly read the target from the log file of the target webpage. Visits information such as visit traffic or visits to web pages.
  • step S502 it is determined whether the amount of access meets the predetermined condition.
  • the first determining unit 20 determines whether the amount of access meets the predetermined condition based on the amount of access of the target webpage acquired by the first obtaining unit 10.
  • the predetermined condition may be a change rule of the amount of access.
  • the predetermined condition is a threshold when the amount of access is abrupt. When the amount of access exceeds the threshold, the amount of access is considered to satisfy a predetermined condition, and the amount of access may be determined to be abrupt. That is, the current visit amount is abrupt compared to the historical visit amount, and the mutation may indicate that the current visit amount is rapidly increasing, and may also indicate that the current visit amount is rapidly decreasing.
  • the situation in which the current visit amount appears to increase rapidly is the abrupt state of the access amount.
  • the first judging unit 20 judges whether the access amount satisfies a predetermined condition, so as to determine whether the access amount is suspected of cheating.
  • the traffic volume increases rapidly, if the current day's traffic is much larger than the previous day's traffic, it can be considered that the target page's traffic is suspected of cheating.
  • Step S503 if the amount of access meets the predetermined condition, the source code of the target webpage is obtained.
  • the access source information of the target webpage is obtained, wherein the source code of the target webpage is obtained by obtaining the access source information of the target webpage, and the source code may be used to obtain the access source information of the target webpage. If the number of visits does not satisfy the predetermined condition, it can be considered that the visit amount of the target webpage has not been cheated, and it is continued to detect whether the visit amount of the target webpage satisfies the preset condition.
  • Step S504 adding a detection code to the source code to obtain an access IP address of the target webpage.
  • the detection code is used to detect the access source information of the target webpage, and the access source information is the access IP address.
  • the access IP address is the IP address of the visitor, and a detection code is added to the source code to obtain all access IP addresses of the target web page. For example, when there are 3 visitors accessing the landing page, you can get it by adding a detection code to the landing page.
  • the IP address of the visitor of the three visits, the three access IP addresses may be the same IP address, or may be different IP addresses, and the access IP address is the access source information of the target webpage.
  • Step S505 the access IP address is used as the access source information.
  • the IP address of the visitor may indicate the source information of the access, and may indicate that the target webpage is indeed accessed by the visitor having the IP address.
  • the access IP address is used as the access source information to facilitate further detection of the specific amount of traffic on the target web page.
  • Step S506 Obtain a first access quantity of the first access IP address in the access IP address, where the first access IP address is the access IP address that accesses the target webpage in the access IP address. Since the access IP address obtained through the detection code includes multiple IP addresses, and each IP address brings a certain amount of access to the target webpage.
  • the first access IP address may be an IP address of one of the accessing IP addresses that accesses the target web page. For example, when the detection code detects that there are three IP addresses of the access target webpage, and one of the IP addresses accesses the target webpage the most, the IP address is the first access IP address.
  • the first access amount is the access amount of the first access IP address to the target webpage, and the ratio of the first visitor to the total visitor is larger than the access amount of any other access IP address.
  • Step S507 calculating a ratio of the first access amount to the access amount.
  • the traffic is the total number of visits to the target webpage, and the ratio of the first visitor to the total visitor is calculated, so as to determine the proportion of the first visitor in the total visitor.
  • Step S508 determining whether the ratio of the first access amount to the access amount exceeds a third set threshold.
  • the third set threshold may be set as needed. For example, when the third set threshold is 0.5, determining whether the ratio of the first access amount to the access amount exceeds a third set threshold indicates whether the first access amount is exceeded. Half of the total traffic.
  • Step S509 If the ratio of the first access amount to the access amount exceeds a third set threshold, obtain an access stay time of the first access IP address.
  • the visit stay time indicates that the visitor visits the target webpage, and the first visit IP address has visited the target webpage many times during the stay time of the target webpage, so the visit stay time also includes multiple visit stay times to obtain the first visit IP address.
  • the access stay time is the access stay time for each visit to obtain the first access IP address.
  • Step S510 determining whether the access stay time exceeds a fourth set threshold.
  • the fourth set threshold is an access time threshold, that is, the threshold is a time value, and can be set as needed. Since the access dwell time includes multiple access dwell times, it is determined whether the access dwell time exceeds a fourth set threshold. Whether the access time per access exceeds the fourth set threshold. For example, when the fourth setting threshold is 3 s, it is determined whether each access staying time of the first access IP address exceeds 3 s.
  • Step S511 if the access stay time does not exceed the fourth set threshold, it is determined that the visit amount of the target webpage is cheated.
  • the access dwell time does not exceed the fourth set threshold, indicating that the access dwell time of the multiple access of the first access IP address does not exceed the fourth set threshold, if most of the first visits of the first access IP address are visited. If the time does not exceed the fourth set threshold, the amount of visits to the target webpage is considered cheating. For example, when the fourth setting threshold is 3s, if most of the first access times of the first access IP address are less than 3s, the majority of the first access amount of the first access IP address is indicated. The number of visits is abnormal, and it is likely to adopt a form of brushing webpage traffic. If it is not common sense, the traffic of the target webpage is considered cheating.
  • Step S512 if the access stay time exceeds the fourth set threshold, it is determined that the visit amount of the target webpage is not cheated. Similarly, if the majority of the access time of the first access amount of the first access IP address exceeds the fourth set threshold, the first visit amount is the amount of access for the normal visit, so the visit amount of the target webpage can be considered. Not cheating.
  • FIG. 12 is a flowchart of a method for detecting cheat on web page access according to a sixth embodiment of the present invention.
  • the method for detecting cheat on the web page of this embodiment can be used as a preferred embodiment of the method for detecting cheat on the web page of the above embodiment.
  • the method for detecting cheat on the webpage includes the following steps:
  • Step S601 obtaining source code of the target webpage.
  • the source code of the target webpage can be crawled by the crawler program, or the source code can be obtained by other means to obtain the organization structure of the target webpage, so as to facilitate the detection of the target webpage.
  • Step S602 detecting whether there is an inline frame iframe having a size of 0*0 or 1*1 in the source code. Because of the size of 0 ⁇ 0 or 1 ⁇ 1 iframe, that is, the invisible iframe. Opening other pages through the iframe allows the user to open a web page that they don't want to enter, and to brush traffic or traffic invisible. You can write an analysis program that analyzes from the source code whether there is an inline frame iframe of size 0*0 or 1*1.
  • Step S603 if there is no iframe in the source code, the access amount of the target webpage is obtained.
  • the next step is determined by obtaining the visit amount of the target webpage. If there is an iframe in the source code, it is determined that the landing page traffic is cheating. Since the inline frame iframe whose size is 0*0 or 1*1 is used to spoof the access amount, the visitor does not know the amount of the brush, so when the iframe is detected in the source code of the target webpage is detected. At the time, you can be sure that you have cheated, you can be sure that the landing page traffic is cheating.
  • step S604 it is determined whether the access amount satisfies a predetermined condition.
  • Step S605 Acquire access source information of the target webpage if the amount of access meets the predetermined condition.
  • Step S606 determining whether the access amount of the target webpage is cheat based on the access source information.
  • Step S603, step S605, step S605, and step S606 are the same as step S101, step S102, step S103, and step S104 of the method for detecting web page access cheats shown in FIG. 7 of the present invention. Narration.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本发明公开了一种网页访问量作弊的检测方法和装置。该网页访问量作弊的检测方法包括:获取目标网页的访问量;判断访问量是否满足预定条件;如果访问量满足预定条件,获取目标网页的访问来源信息;以及根据访问来源信息判断目标网页的访问量是否作弊,通过判断获取到的目标网页的访问量是否满足预设条件,当访问量满足预设条件时,认定为目标网页访问量作弊。通过本发明,解决了对网页访问量作弊的识别不准确的问题,进而达到了准确识别目标网页的访问量作弊的效果。

Description

网页访问量作弊的检测方法和装置 技术领域
本发明涉及互联网领域,具体而言,涉及一种网页访问量作弊的检测方法和装置。
背景技术
随着越来越多的广告主选择互联网进行广告投放,网络广告费用支出逐年递增,对互联网广告投放效果的定量评估和第三方权威检测已经成为广告主的刚性需求。但是,与传统媒体行业不同,互联网广告行业的技术门槛更高、数据结构更为复杂、评估指标维度更多、投放技术要求更高。而这些均使得当出现互联网广告作弊时,不容易识别出互联网广告作弊,损害了广告主的利益。
下面对上述中一些术语进行介绍:
互联网广告作弊:媒体(如新浪等网站,作为网站主,用于完成广告活动的投放)为了刷广告流量而进行的作弊。
广告主:是广告活动的发布者,是在网上销售或宣传自己产品和服务的商家,是联盟营销广告的提供者。任何推广、销售其产品或服务的商家都可以作为广告主。广告主发布广告活动,并按照网站主完成的广告活动中规定的营销效果的总数量及单位效果价格向网站主支付费用。
目前,网络搜索服务商所经营的竞价广告业务以及搜索排名服务中很多都有点击量作弊的行为。根据业内人士估计,搜索引擎广告的总点击量中两成以上是子虚乌有的。一般地,点击量作弊的方法分“自动”和“手工”两类。前者往往是通过“机器人”(能够自动执行一系列循环点击、页面刷新操作的脚本程序)来不断点击出现在网站和搜索结果页面上的网幅图像广告Banner。后者则是采取“人海战术”,以较低的成本雇用廉价劳动力来进行人工点击各类广告链接——这种难以通过技术手段侦测的作弊方式如今正呈上升趋势,而一些闹得沸沸扬扬的网络评选作弊事件其实也都与这种作弊方式有关。
在网页中嵌入内联框架iframe是互联网广告作弊最常用的技巧。该方法一般是在自己的网页上嵌入大小为0×0或1×1的iframe,也就是用户不可见的iframe。通过iframe打开其他页面,使得用户打开了并非自己想要进入的网页,在用户看不见的情 况下刷流量。传统的反作弊方法难以有效识别这种采用“人海战术”和嵌入iframe的作弊方式,导致点击量作弊情况难以有效抑制。
互联网广告作弊归根结底都是网站主为了刷访问量而实施的作弊行为,因此第三方权威检测机构对广告网页刷访问量的作弊行为进行检测,能有效地保护广告主的利益。但是现有技术中,很少有能够识别网页访问量作弊的方案。
针对现有技术中对网页访问量作弊识别不准确的问题,目前尚未提出有效的解决方案。
发明内容
本发明的主要目的在于提供一种网页访问量作弊的检测方法和装置,以解决现有技术中对网页访问量作弊识别不准确的问题。
为了实现上述目的,根据本发明的一个方面,提供了一种网页访问量作弊的检测方法。根据本发明的网页访问量作弊的检测方法包括:获取目标网页的访问量;判断访问量是否满足预定条件;如果访问量满足预定条件,获取目标网页的访问来源信息;以及根据访问来源信息判断目标网页的访问量是否作弊。
进一步地,获取目标网页的访问量包括获取目标网页的历史访问量和当前访问量,判断访问量是否满足预定条件包括:获取历史访问量和当前访问量的比值;判断比值是否超出第一设定阈值;如果比值超出第一设定阈值,则确定访问量满足预定条件;如果比值未超出第一设定阈值,则确定访问量不满足预定条件。
进一步地,获取目标网页的访问量包括获取目标网页的历史访问量和当前访问量,判断访问量是否满足预定条件包括:获取历史访问量和当前访问量的差值;判断差值是否超出第二设定阈值;如果差值超出第二设定阈值,则确定访问量满足预定条件;如果差值未超出第二设定阈值,则确定访问量不满足预定条件。
进一步地,获取目标网页的访问来源信息包括:获取目标网页的源代码;在源代码中加入检测代码以获取目标网页的访问IP地址;将访问IP地址作为访问来源信息。根据访问来源信息判断目标网页的访问量是否作弊包括:获取访问IP地址中的第一访问IP地址的第一访问量,第一访问IP地址为访问IP地址中访问目标网页最多的一个访问IP地址;计算第一访问量与访问量的比值;判断第一访问量与访问量的比值是否超出第三设定阈值;如果第一访问量与访问量的比值超出第三设定阈值,则确定目标 网页的访问量作弊,如果第一访问量与访问量的比值未超出第三设定阈值,则确定目标网页的访问量未作弊。
进一步地,确定目标网页的访问量作弊包括:获取第一访问IP的访问停留时间;判断访问停留时间是否超出第四设定阈值;如果访问停留时间未超出第四设定阈值,则确定目标网页的访问量作弊;如果访问停留时间超出第四设定阈值,则确定目标网页的访问量未作弊。
进一步地,在获取目标网页的访问量之前,网页访问量作弊的检测方法还包括:获取目标网页的源代码;检测源代码中是否存在大小为0*0或者1*1的内联框架iframe;如果源代码中不存在iframe,则获取目标网页的访问量。
为了实现上述目的,根据本发明的另一方面,提供了一种网页访问量作弊的检测装置。根据本发明的网页访问量作弊的检测装置包括:第一获取单元,用于获取目标网页的访问量;第一判断单元,用于判断访问量是否满足预定条件;第二获取单元,用于当访问量满足预定条件时,获取目标网页的访问来源信息;第二判断单元,用于根据访问来源信息判断目标网页的访问量是否作弊。
进一步地,第一获取单元还用于获取目标网页的历史访问量和当前访问量,其中,第一判断单元包括:第一获取模块,用于获取历史访问量和当前访问量的比值;第一判断模块,用于判断比值是否超出第一设定阈值;第一确定模块,用于当比值超出第一设定阈值时,确定访问量满足预定条件,当比值未超出第一设定阈值时,确定访问量不满足预定条件。
进一步地,第一获取单元还用于获取目标网页的历史访问量和当前访问量,其中:第一判断单元包括:第二获取模块,用于获取历史访问量和当前访问量的差值;第二判断模块,用于判断差值是否超出第二设定阈值;第二确定模块,用于当差值超出第二设定阈值时,确定访问量满足预定条件,当差值未超出第二设定阈值时,确定访问量不满足预定条件。
进一步地,第二获取单元包括:第三获取模块,用于获取目标网页的源代码;第四获取模块,用于在源代码中加入检测代码以获取目标网页的访问IP地址;生成模块,用于将访问IP地址作为访问来源信息;第二判断单元包括:第五获取模块,用于获取访问IP地址中的第一访问IP地址的第一访问量,第一访问IP地址为访问IP地址中访问目标网页最多的一个访问IP地址;计算模块,用于计算第一访问量与访问量的比值;第三判断模块,用于判断第一访问量与访问量的比值是否超出第三设定阈值;第三确定模块,用于当第一访问量与访问量的比值超出第三设定阈值时,确定目标网页的访 问量作弊,当第一访问量与访问量的比值未超出第三设定阈值时,确定目标网页的访问量未作弊。
进一步地,第三确定模块包括:获取子模块,用于获取第一访问IP的访问停留时间;判断子模块,用于判断访问停留时间是否超出第四设定阈值;确定子模块,用于当访问停留时间未超出第四设定阈值时,确定目标网页的访问量作弊,当访问停留时间超出第四设定阈值时,确定目标网页的访问量未作弊。
进一步地,网页访问量作弊的检测装置还包括:第三获取单元,用于在获取目标网页的访问量之前,获取目标网页的源代码;检测单元,用于检测源代码中是否存在大小为0*0或者1*1的内联框架iframe;确定单元,用于当源代码中不存在iframe时,获取目标网页的访问量。
通过本发明,采用网页访问量作弊的检测方法包括:获取目标网页的访问量;判断访问量是否满足预定条件;如果访问量满足预定条件,获取目标网页的访问来源信息;以及根据访问来源信息判断目标网页的访问量是否作弊,通过判断获取到的目标网页的访问量是否满足预设条件,当访问量满足预设条件时,认定为目标网页访问量疑似作弊,并进一步获取目标网页的访问来源信息,根据访问来源信息进一步判断目标网页的访问量的是否作弊,通过对目标网页的来源信息的分析和判定提高对目标网页的访问量作弊的检测的精度,解决了对网页访问量作弊的识别不准确的问题,进而达到了准确识别目标网页的访问量作弊的效果。
附图说明
构成本申请的一部分的附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1是根据本发明第一实施例的网页访问量作弊的检测装置的结构示意图;
图2是根据本发明第二实施例的网页访问量作弊的检测装置的结构示意图;
图3是根据本发明第三实施例的网页访问量作弊的检测装置的结构示意图;
图4是根据本发明第四实施例的网页访问量作弊的检测装置的结构示意图;
图5是根据本发明第五实施例的网页访问量作弊的检测装置的结构示意图;
图6是根据本发明第六实施例的网页访问量作弊的检测装置的结构示意图;
图7是根据本发明第一实施例的网页访问量作弊的检测方法的流程图;
图8是根据本发明第二实施例的网页访问量作弊的检测方法的流程图;
图9是根据本发明第三实施例的网页访问量作弊的检测方法的流程图;
图10是根据本发明第四实施例的网页访问量作弊的检测方法的流程图;
图11是根据本发明第五实施例的网页访问量作弊的检测方法的流程图;以及
图12是根据本发明第六实施例的网页访问量作弊的检测方法的流程图。
具体实施方式
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。
本发明实施例提供了一种网页访问量作弊的检测装置,该装置通过计算机设备实现其功能。
图1是根据本发明第一实施例的网页访问量作弊的检测装置的结构示意图。如图1所示,该网页访问量作弊的检测装置包括:第一获取单元10、第一判断单元20、第二获取单元30和第二判断单元40。第一获取单元10用于获取目标网页的访问量。第一获取单元10获取到的访问量为目标网页的总访问量。目标网页为需要进行访问量作弊的检测的网页,该网页可以是任意一个网站中的任意一个网页,可以是广告主投放广告的网页,也可以是广告主营销的产品网页。例如,当目标网页为广告主投放广告的网页时,获取该网页的访问量可以得知广告主投放的广告的浏览量。其中,访问量可以是访问流量,也可以是访问点击量。访问量可以是历史访问量,历史访问量表示过去的一定时间段内目标网页的访问量。访问量也可以是当前访问量,当前访问量表示当前一定时间段内的目标网页的访问量。访问量也可以是历史访问量和当前访问量。第一获取单元10获取该访问量可以是通过在目标网页中加入检测代码,用来检测目标网页的访问流量或者访问点击量等访问量信息,也可以从目标网页的日志文件中直接读取目标网页的访问流量或者访问点击量等访问量信息。
第一判断单元20用于判断访问量是否满足预定条件。根据第一获取单元10获取的目标网页的访问量,第一判断单元20将该访问量作为判断基础,判断该访问量是否满足预定条件。该预定条件可以是访问量的变化规律,例如,该预定条件为访问量发生突变时的一个阈值,当访问量超过该阈值时,认为访问量满足预定条件,此时可以 认定访问量发生突变,也即是当前访问量相比于历史访问量发生了突变,该突变可以表示当前访问量出现迅速增加的态势,也可以表示当前访问量出现迅速降低的态势。本实施例以当前访问量出现迅速增加的态势为访问量的突变状态。第一判断单元20判断出该访问量是否满足预定条件,以便于判断该访问量是否为疑似作弊。当访问量出现急速增加的态势,如当前日的访问量远远大于前一日的访问量,可以认定该目标网页的访问量有作弊嫌疑。
第二获取单元30用于当访问量满足预定条件时,获取目标网页的访问来源信息。当目标网页的访问量满足预定条件时,则认定为该目标网页的访问量疑似作弊。当目标网页疑似作弊时,第二获取单元30获取该目标网页的访问来源信息。访问来源信息可以是访问者的IP(Internet Protocol,简称IP)地址,也可以是访问的路径信息,如针对一次访问,该次访问可以是通过其他网页的超链接访问到该目标网页。第二获取单元30通过在目标网页的源代码中加入检测代码,可以获取该次访问的访问路径信息,也可以获取访问者的IP地址。通过获取该访问来源信息,以便于判断目标网页的访问量是否作弊。
第二判断单元40用于根据访问来源信息判断目标网页的访问量是否作弊。由于此时目标网页的访问量为疑似作弊,当获取到目标网页的访问来源信息之后,可以根据访问来源信息来判断目标网页的访问量是否出现作弊。例如,当获取到的访问来源信息中,大部分访问来源信息的访问路径都来自一些非主流网站或者一个很少人接触的网站(即访问者通过一些非主流网站或者一个很少人接触的网站连接到目标网页),或者是来自目标网页本身,那么可以认定该目标网页的访问量很大程度上是采用一定的作弊手段,通过一些非主流网站或者一个很少人接触的网站的连接来增加目标网页的访问量,或者是通过不断刷新目标网页的方式增加该目标网页的访问量。其作弊的可能性较高,可以认定为该目标网页的访问量作弊。
根据本发明实施例,通过判断第一获取单元10获取的目标网页的访问量是否满足预设条件,当访问量满足预设条件时,认定为目标网页访问量疑似作弊,并进一步获取目标网页的访问来源信息,根据访问来源信息进一步判断目标网页的访问量的是否作弊,通过对目标网页的来源信息的分析和判定提高对目标网页的访问量作弊的检测的精度,达到了准确识别目标网页的访问量作弊的效果。
图2是根据本发明第二实施例的网页访问量作弊的检测装置的结构示意图。该实施例的网页访问量作弊的检测装置可以作为上述实施例的一种优选实施方式。如图2所示,该网页访问量作弊的检测装置包括第一获取单元10、第一判断单元20、第二获取单元30和第二判断单元40,其中,第一判断单元20包括第一获取模块201、第一 判断模块202和第一确定模块203。第二获取单元30和第二判断单元40与图1所示的第二获取单元30和第二判断单元40功能相同,这里不做赘述。
第一获取单元10还用于获取目标网页的历史访问量和当前访问量。历史访问量和当前访问量均为目标网页的访问量。历史访问量表示过去的一个单位间内的目标网页的访问量,当前访问量表示当前一个单位时间内目标网页的访问量。其中过去的一个单位时间与当前的一个单位时间为同一个单位时间。例如,以一天为时间单位,当前访问量可以为当前这一天的目标网页的访问量,历史访问量可以为前一天的目标网页的访问量。通过在目标网页的源代码中加入检测代码等方式可以获取目标网页的历史访问量和当前访问量。
第一获取模块201用于获取历史访问量和当前访问量的比值。将历史访问量和当前访问量进行比较,得到一个比值,例如,目标网页的当前访问量为当天的访问量,那么历史访问量可以为前一天的访问量,其中,访问量可以是访问流量或者访问点击量,将两者的访问流量或者访问点击量进行相应对比,得到一个比值,该比值可以是当前访问量除以历史访问量得到的比值,也可以是历史访问量除以当前访问量得到的比值,还可以是当前访问量超出历史访问量的比例。获取该比值可以看出访问量的变化趋势,例如比值为当前访问量除以历史访问量得到的比值,当该比值大于1,表示当前访问量大于历史访问量,同时当该比值越大,则表示当前访问量出现猛增的态势。
第一判断模块202用于判断比值是否超出第一设定阈值。第一设定阈值可以根据实际情况进行设定。例如,当比值为当前访问量除以历史访问量得到的比值时,第一设定阈值可以设定为1.5,判断比值是否超出第一设定阈值则表示判断当前访问量是否超出历史访问量的1.5倍,第一设定阈值也可以设定为2,判断比值是否超出第一设定阈值则表示判断当前访问量是否超出历史访问量的2倍。当比值表示当前访问量超出历史访问量的比例时,第一设定阈值可以设定为30%,判断比值是否超出第一设定阈值则表示判断当前访问量相对于历史访问量的访问量的增长率是否超过30%。
第一确定模块203用于当比值超出第一设定阈值时,确定访问量满足预定条件,当比值未超出第一设定阈值时,确定访问量不满足预定条件。当比值超出第一设定阈值则报警提示,并确定访问量满足预设条件,执行获取目标网页的访问来源信息。例如,当比值为当前访问量除以历史访问量得到的比值时,第一设定阈值可以设定为1.5,判断比值是否超出第一设定阈值表示判断当前访问量是否超出历史访问量的1.5倍,如果比值超过第一设定阈值1.5,则确定访问量满足预定条件,其当前访问量出现突变或者迅速增加的趋势,可以认定有一定的作弊嫌疑,进行下一步分析,即获取访问来源信息。当比值为当前访问量超出历史访问量的比例时,第一设定阈值可以设定为 30%,判断比值是否超出第一设定阈值则表示判断当前访问量相对于历史访问量的访问量的增长率是否超过30%,当增长率超过30%时,则确定访问量满足预定条件,其当前访问量出现突变或者迅速增加的趋势,可以认定有一定的作弊嫌疑,进行下一步分析。当比值未超出第一设定阈值时,如上述举例中如果比值未超过第一设定阈值1.5,则确定访问量不满足预定条件,访问量未出现异常,可以认定目标网页的访问量未作弊。
图3是根据本发明第三实施例的网页访问量作弊的检测装置的结构示意图。该实施例的网页访问量作弊的检测装置可以作为上述实施例的一种优选实施方式。如图3所示,该网页访问量作弊的检测装置包括第一获取单元10、第一判断单元20、第二获取单元30和第二判断单元40,其中,第一判断单元20包括第二获取模块204、第二判断模块205和第二确定模块206。第二获取单元30和第二判断单元40与图1所示的第二获取单元30和第二判断单元40功能相同,这里不做赘述。
第一获取单元10还用于获取目标网页的历史访问量和当前访问量。历史访问量和当前访问量均为目标网页的访问量。历史访问量表示过去的一个单位间内的目标网页的访问量,当前访问量表示当前一个单位时间内目标网页的访问量。其中过去的一个单位时间与当前的一个单位时间为同一个单位时间。例如,以一天为时间单位,当前访问量可以为当前这一天的目标网页的访问量,历史访问量可以为前一天的目标网页的访问量。通过在目标网页的源代码中加入检测代码等方式可以获取目标网页的历史访问量和当前访问量。
第二获取模块204用于获取历史访问量和当前访问量的差值。将历史访问量和当前访问量作减法处理,得到一个差值,例如,目标网页的当前访问量为当天的访问量,那么历史访问量可以为前一天的访问量,其中,访问量可以是访问流量或者访问点击量,将两者的访问流量或者访问点击量作减法处理,得到一个差值,该差值可以是当前访问量减去历史访问量得到的差值,也可以是历史访问量减去当前访问量得到的差值。获取该差值可以看出访问量的变化趋势,例如差值为当前访问量减去历史访问量得到的差值,当该差值为正,表示当前访问量大于历史访问量,同时当该差值越大,则表示当前访问量出现猛增的态势。
第二判断模块205,用于判断差值是否超出第二设定阈值。第二设定阈值可以根据实际情况进行设定。例如,当差值为当前访问量减去历史访问量得到的差值时,判断差值是否超出第一设定阈值则表示判断当前访问量超出历史访问量的访问量是否超出第二设定阈值。
第二确定模块206,用于当差值超出第二设定阈值时,确定访问量满足预定条件,当差值未超出第二设定阈值时,确定访问量不满足预定条件。差值超出第二设定阈值表示当前访问量超出历史访问量的访问量是否超出第二设定阈值。当差值超出第二设定阈值则报警提示,并确定访问量满足预设条件,执行步骤S306。当差值超过第二设定阈值时,表明当前访问量出现突变或者迅速增加的趋势,可以认定有一定的作弊嫌疑,进行下一步分析,即获取访问来源信息。当差值未超出第二设定阈值时,则表示访问量为出现异常,可以认定目标网页的访问量未作弊。
图4是根据本发明第四实施例的网页访问量作弊的检测装置的结构示意图。该实施例的网页访问量作弊的检测装置可以作为上述实施例的一种优选实施方式。如图4所示,该网页访问量作弊的检测装置包括第一获取单元10、第一判断单元20、第二获取单元30和第二判断单元40,其中,第二获取单元30包括第三获取模块301、第四获取模块302和生成模块303,第二判断单元40包括第五获取模块401、计算模块402、第三判断模块403和第三确定模块404。第一获取单元10和第一判断单元20与图1所示的第一获取单元10和第一判断单元20功能相同,这里不做赘述。
第三获取模块301用于获取目标网页的源代码。当访问量满足预定条件时,第二获取单元30获取目标网页的访问来源信息,其中获取目标网页的访问来源信息要先通过第三获取模块301获取目标网页的源代码,该源代码可以用于获取目标网页的访问来源信息。
第四获取模块302用于在源代码中加入检测代码以获取目标网页的访问IP地址。检测代码用于检测目标网页的访问来源信息,该访问来源信息为访问IP地址。该访问IP地址为访问者的IP地址,在源代码中加入检测代码以获取目标网页的所有访问IP地址。例如,当有3个访问者访问目标网页时,通过在目标网页中加入检测代码,可以获取这3次访问的访问者的IP地址,这3个访问IP地址可以是相同的IP地址,也可以是不相同的IP地址。
生成模块303,用于将访问IP地址作为访问来源信息。访问者的IP地址可以表示访问的来源信息,可以表示目标网页确实被具有该IP地址的访问者访问。将访问IP地址作为访问来源信息,以便于进一步检测目标网页的访问量的具体情况。
第五获取模块401用于获取访问IP地址中的第一访问IP地址的第一访问量,第一访问IP地址为访问IP地址中访问目标网页最多的一个访问IP地址。由于通过检测代码获取到的访问IP地址包括多个IP地址,且每一个IP地址都会给目标网页带来一定的访问量。第一访问IP地址可以是访问IP地址中的访问所述目标网页最多的一个 访问者的IP地址。例如,当检测代码检测到访问目标网页的IP地址有3个,其中一个IP地址访问目标网页的次数最多,那么该IP地址即为第一访问IP地址。第一访问量为第一访问IP地址访问目标网页的访问量,该第一访问量占总访问量的比例比其他任意一个访问IP地址的访问量都要大。
计算模块402,用于计算第一访问量与访问量的比值。其中访问量为目标网页的总访问量,计算第一访问量占总访问量的比值,以便于判断第一访问量在总访问量中所占的比重。
第三判断模块403,用于判断第一访问量与访问量的比值是否超出第三设定阈值。第三设定阈值可以根据需要进行设定,例如,当第三设定阈值为0.5时,则判断第一访问量与访问量的比值是否超出第三设定阈值表示判断第一访问量是否超过总访问量的一半访问量。
第三确定模块404,用于当第一访问量与访问量的比值超出第三设定阈值时,确定目标网页的访问量作弊;当第一访问量与访问量的比值未超出第三设定阈值时,确定目标网页的访问量未作弊。如上所述,当第三设定阈值为0.5时,第一访问量与访问量的比值超出0.5,则表示第一访问量超过总访问量的一半访问量,此时可以认为目标网页的访问量是通过一定的作弊手段实现的,其访问量作弊的可能性比较大。如上所述,当第三设定阈值为0.5时,第一访问量与访问量的比值未超出0.5,则表示第一访问量未超过总访问量的一半访问量,此时可以认为目标网页的访问量正常,基本可以认定目标网页的访问量未作弊。
图5是根据本发明第五实施例的网页访问量作弊的检测装置的结构示意图。该实施例的网页访问量作弊的检测装置可以作为上述实施例的一种优选实施方式。如图5所示,该网页访问量作弊的检测装置包括第一获取单元10、第一判断单元20、第二获取单元30和第二判断单元40,其中,第二获取单元30包括第三获取模块301、第四判断模块和生成模块303,第二判断单元40包括第五获取模块401、计算模块402、第三判断模块403和第三确定模块404,第三确定模块404包括获取子模块4041、判断子模块4042和确定子模块4043。第一获取单元10、第一判断单元20和第二获取单元30与图4所示的第一获取单元10、第一判断单元20和第二获取单元30功能相同,第二判断单元40中的第五获取模块401、计算模块402、第三判断模块403与图4所示的第五获取模块401、计算模块402、第三判断模块403功能相同,这里不做赘述。
获取子模块4041用于获取第一访问IP的访问停留时间。访问停留时间表示访问者访问目标网页时,在目标网页的停留时间,第一访问IP地址访问过目标网页的很多 次,因此访问停留时间也包括多个访问停留时间,获取第一访问IP地址的访问停留时间即是获取第一访问IP地址的每一次访问的访问停留时间。
判断子模块4042,用于判断访问停留时间是否超出第四设定阈值。第四设定阈值为访问时间阈值,即该阈值为时间值,可以根据需要进行设定,由于访问停留时间包括多个访问停留时间,因此,判断访问停留时间是否超出第四设定阈值表示判断每次访问停留时间是否超出第四设定阈值。例如,当第四设定阈值为3s时,判断第一访问IP地址的每一次访问停留时间是否超出3s。
确定子模块4043,用于当访问停留时间未超出第四设定阈值时,确定目标网页的访问量作弊,当访问停留时间超出第四设定阈值时,确定目标网页的访问量未作弊。访问停留时间未超出第四设定阈值表示第一访问IP地址的多次访问的访问停留时间均未超出第四设定阈值,假如第一访问IP地址的第一访问量中大部分的访问停留时间均未超出第四设定阈值,则认为目标网页的访问量作弊。例如,当第四设定阈值为3s时,如果第一访问IP地址的第一访问量中大部分的访问停留时间均不到3s,则表明第一访问IP地址的第一访问量中大部分的访问量为非正常访问,很可能采用了一种刷网页点击量的形式,不符合常理,则认为该目标网页的访问量作弊。同理,如果第一访问IP地址的第一访问量中大部分的访问停留时间均超出第四设定阈值,则表示第一访问量为正常访问的访问量,因此可以认为目标网页的访问量未作弊,在本发明实施例中,访问量中大部分的访问停留时间可以是指访问量中超过预定比例的访问量的访问停留时间,举例而言,该预定比例可以是60%。
图6是根据本发明第五实施例的网页访问量作弊的检测装置的结构示意图。该实施例的网页访问量作弊的检测装置可以作为上述实施例的一种优选实施方式。如图6所示,该网页访问量作弊的检测装置包括第一获取单元10、第一判断单元20、第二获取单元30、第二判断单元40、第三获取单元50、检测单元60和确定单元70。第一获取单元10、第一判断单元20、第二获取单元30和第二判断单元40与图1所示的第一获取单元10、第一判断单元20、第二获取单元30和第二判断单元40功能相同,这里不做赘述。
第三获取单元50用于在获取目标网页的访问量之前,获取目标网页的源代码。可以通过爬虫程序抓取目标网页的源代码,也可以通过其他方式,获取到源代码可以得到目标网页的组织架构,以便于对目标网页进行检测。
检测单元60,用于检测源代码中是否存在大小为0*0或者1*1的内联框架iframe。由于大小为0×0或1×1的iframe,也就是不可见的iframe。通过iframe打开其他页 面,使得用户打开了并非自己想要进入的网页,在看不见的情况下刷流量或者访问量。可以编写分析程序,从源代码中分析是否存在大小为0*0或者1*1的内联框架iframe。
确定单元70用于当所述源代码中不存在所述iframe时,获取所述目标网页的访问量。由于该大小为0*0或者1*1的内联框架iframe是用于骗取访问量,在访问者不知情的情况下刷访问量,因此,当检测到目标网页的源代码中存在有该iframe时,可以认定采取了作弊手段,则可以确定目标网页访问量作弊。当源代码中不存在该iframe时,则通过获取目标网页的访问量进行下一步判断。
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。
本发明实施例还提供了一种网页访问量作弊的检测方法。该网页访问量作弊的检测方法可以运行在计算机设备上。需要说明的是,本发明实施例的网页访问量作弊的检测方法可以通过本发明实施例的网页访问量作弊的检测装置来执行,本发明实施例的网页访问量作弊的检测装置也可以用于执行本发明实施例的网页访问量作弊的检测方法。
图7是根据本发明第一实施例的网页访问量作弊的检测方法的流程图。如图7所示,该网页访问量作弊的检测方法包括步骤如下:
步骤S101,获取目标网页的访问量。获取到的访问量为目标网页的总访问量。目标网页为需要进行访问量作弊的检测的网页,该网页可以是任意一个网站中的任意一个网页,可以是广告主投放广告的网页,也可以是广告主营销的产品网页。例如,当目标网页为广告主投放广告的网页时,获取该网页的访问量可以得知广告主投放的广告的浏览量。其中,访问量可以是访问流量,也可以是访问点击量。访问量可以是历史访问量,历史访问量表示过去的一定时间段内目标网页的访问量。访问量也可以是当前访问量,当前访问量表示当前一定时间段内的目标网页的访问量。访问量也可以是历史访问量和当前访问量。第一获取单元10获取该访问量可以是通过在目标网页中加入检测代码,用来检测目标网页的访问流量或者访问点击量等访问量信息,也可以从目标网页的日志文件中直接读取目标网页的访问流量或者访问点击量等访问量信息。
步骤S102,判断访问量是否满足预定条件。根据第一获取单元10获取的目标网页的访问量,第一判断单元20将该访问量作为判断基础,判断该访问量是否满足预定条件。该预定条件可以是访问量的变化规律,例如,该预定条件为访问量发生突变时的一个阈值,当访问量超过该阈值时,认为访问量满足预定条件,此时可以认定访问量发生突变,也即是当前访问量相比于历史访问量发生了突变,该突变可以表示当前访问量出现迅速增加的态势,也可以表示当前访问量出现迅速降低的态势。本实施例以当前访问量出现迅速增加的态势为访问量的突变状态。第一判断单元20判断出该访问量是否满足预定条件,以便于判断该访问量是否为疑似作弊。当访问量出现急速增加的态势,如当前日的访问量远远大于前一日的访问量,可以认定该目标网页的访问量有作弊嫌疑。
步骤S103,如果访问量满足预定条件,获取目标网页的访问来源信息。当目标网页的访问量满足预定条件时,则认定为该目标网页的访问量疑似作弊。当目标网页疑似作弊时,第二获取单元30获取该目标网页的访问来源信息。访问来源信息可以是访问者的访问IP(Internet Protocol,简称IP)地址,也可以是访问者这次访问的路径信息,如针对一次访问,该次访问可以是通过其他网页的超链接访问到该目标网页。通过在目标网页的源代码中加入检测代码,可以获取该次访问的链入网页的网址,也可以获取访问者的访问IP。通过获取该访问来源信息,以便于判断目标网页的访问量是否作弊。如果访问量不满足预定条件,则可以认为目标网页到目前为止的访问量没有作弊,继续检测目标网页的访问量是否满足预设条件。
步骤S104,根据访问来源信息判断目标网页的访问量是否作弊。由于此时目标网页的访问量为疑似作弊,当获取到目标网页的访问来源信息之后,可以根据访问来源信息来判断目标网页的访问量是否出现作弊。例如,当获取到的访问来源信息中,大部分访问来源信息都来自一个非主流网站或者一个很少人接触的网站,或者是来自目标网页本身,那么可以认定该目标网页的访问量很大程度上是采用一定的作弊手段,通过一些非主流网站或者一个很少人接触的网站的链入来增加目标网页的访问量,或者是通过不断刷新目标网页的方式增加该目标网页的访问量。其作弊的可能性较高,可以认定为该目标网页的访问量作弊。
根据本发明实施例,通过判断第一获取单元10获取的目标网页的访问量是否满足预设条件,当访问量满足预设条件时,认定为目标网页访问量疑似作弊,并进一步获取目标网页的访问来源信息,根据访问来源信息进一步判断目标网页的访问量的是否作弊,通过对目标网页的来源信息的分析和判定提高对目标网页的访问量作弊的检测的精度,达到了准确识别目标网页的访问量作弊的效果。
图8是根据本发明第二实施例的网页访问量作弊的检测方法的流程图。该实施例的网页访问量作弊的检测方法可以作为上述实施例的网页访问量作弊的检测方法的一种优选实施方式。如图8所示,该网页访问量作弊的检测方法包括步骤如下:
步骤S201,获取目标网页的历史访问量和当前访问量。历史访问量和当前访问量均为目标网页的访问量。历史访问量表示过去的一个单位间内的目标网页的访问量,当前访问量表示当前一个单位时间内目标网页的访问量。其中过去的一个单位时间与当前的一个单位时间为同一个单位时间。例如,以一天为时间单位,当前访问量可以为当前这一天的目标网页的访问量,历史访问量可以为前一天的目标网页的访问量。通过在目标网页的源代码中加入检测代码等方式可以获取目标网页的历史访问量和当前访问量。
步骤S202,获取历史访问量和当前访问量的比值。将历史访问量和当前访问量进行比较,得到一个比值,例如,目标网页的当前访问量为当天的访问量,那么历史访问量可以为前一天的访问量,其中,访问量可以是访问流量或者访问点击量,将两者的访问流量或者访问点击量进行对比,得到一个比值,该比值可以是当前访问量除以历史访问量得到的比值,也可以是历史访问量除以当前访问量得到的比值,还可以是当前访问量超出历史访问量的比例。获取该比值可以看出访问量的变化趋势,例如比值为当前访问量除以历史访问量得到的比值,当该比值大于1,表示当前访问量大于历史访问量,同时当该比值越大,则表示当前访问量出现猛增的态势。
步骤S203,判断比值是否超出第一设定阈值。第一设定阈值可以根据实际情况进行设定。例如,当比值为当前访问量除以历史访问量得到的比值时,第一设定阈值可以设定为1.5,判断比值是否超出第一设定阈值则表示判断当前访问量是否超出历史访问量的1.5倍,第一设定阈值也可以设定为2,判断比值是否超出第一设定阈值则表示判断当前访问量是否超出历史访问量的2倍。当比值表示当前访问量超出历史访问量的比例时,第一设定阈值可以设定为30%,判断比值是否超出第一设定阈值则表示判断当前访问量相对于历史访问量的访问量的增长率是否超过30%。
步骤S204,如果比值超出第一设定阈值,则确定访问量满足预定条件。当比值超出第一设定阈值则报警提示,并确定访问量满足预设条件,执行步骤S206。例如,当比值为当前访问量除以历史访问量得到的比值时,第一设定阈值可以设定为1.5,判断比值是否超出第一设定阈值表示判断当前访问量是否超出历史访问量的1.5倍,如果比值超过第一设定阈值1.5,则确定访问量满足预定条件,其当前访问量出现突变或者迅速增加的趋势,可以认定有一定的作弊嫌疑,进行下一步分析,即获取访问来源信息。当比值为当前访问量超出历史访问量的比例时,第一设定阈值可以设定为30%, 判断比值是否超出第一设定阈值则表示判断当前访问量相对于历史访问量的访问量的增长率是否超过30%,当增长率超过30%时,则确定访问量满足预定条件,其当前访问量出现突变或者迅速增加的趋势,可以认定有一定的作弊嫌疑,进行下一步分析。
步骤S205,如果比值未超出第一设定阈值,则确定访问量不满足预定条件。当比值未超出第一设定阈值时,如上述举例中如果比值未超过第一设定阈值1.5,则确定访问量不满足预定条件,访问量为出现异常,可以认定目标网页的访问量未作弊。
步骤S206,如果访问量满足预定条件,获取目标网页的访问来源信息。当目标网页的访问量满足预定条件时,则认定为该目标网页的访问量疑似作弊。当目标网页疑似作弊时,第二获取单元30获取该目标网页的访问来源信息。访问来源信息可以是访问者的访问IP地址,也可以是访问的入链的网页的网址,如针对一次访问,该次访问可以是通过其他网页的超链接访问到该目标网页,通过在目标网页的源代码中加入检测代码,可以获取该次访问的链入网页的网址,也可以获取访问者的访问IP。通过获取该访问来源信息,以便于判断目标网页的访问量是否作弊。
步骤S207,根据访问来源信息判断目标网页的访问量是否作弊。由于此时目标网页的访问量为疑似作弊,当获取到目标网页的访问来源信息之后,可以根据访问来源信息来判断目标网页的访问量是否出现作弊。例如,当获取到的访问来源信息中,大部分访问来源信息都来自一个非主流网站或者一个很少人接触的网站,亦或者是来自目标网页本身,那么可以认定该目标网页的访问量很大程度上是采用一定的作弊手段,通过一些非主流网站或者一个很少人接触的网站的链入来刷目标网页的访问量,或者是通过不断刷新目标网页的方式刷该目标网页的访问量。其作弊的可能性较高,可以认定为该目标网页的访问量作弊。
图9是根据本发明第三实施例的网页访问量作弊的检测方法的流程图。该实施例的网页访问量作弊的检测方法可以作为上述实施例的网页访问量作弊的检测方法的一种优选实施方式。如图9所示,该网页访问量作弊的检测方法包括步骤如下:
步骤S301,获取目标网页的历史访问量和当前访问量。历史访问量和当前访问量均为目标网页的访问量。历史访问量表示过去的一个单位间内的目标网页的访问量,当前访问量表示当前一个单位时间内目标网页的访问量。其中过去的一个单位时间与当前的一个单位时间为同一个单位时间。例如,以一天为时间单位,当前访问量可以为当前这一天的目标网页的访问量,历史访问量可以为前一天的目标网页的访问量。通过在目标网页的源代码中加入检测代码等方式可以获取目标网页的历史访问量和当前访问量。
步骤S302,获取历史访问量和当前访问量的差值。将历史访问量和当前访问量作减法处理,得到一个差值,例如,目标网页的当前访问量为当天的访问量,那么历史访问量可以为前一天的访问量,其中,访问量可以是访问流量或者访问点击量,将两者的访问流量或者访问点击量作减法处理,得到一个差值,该差值可以是当前访问量减去历史访问量得到的差值,也可以是历史访问量减去当前访问量得到的差值。获取该差值可以看出访问量的变化趋势,例如差值为当前访问量减去历史访问量得到的差值,当该差值为正,表示当前访问量大于历史访问量,同时当该差值越大,则表示当前访问量出现猛增的态势。
步骤S303,判断差值是否超出第二设定阈值。第二设定阈值可以根据实际情况进行设定。例如,当差值为当前访问量减去历史访问量得到的差值时,判断差值是否超出第一设定阈值则表示判断当前访问量超出历史访问量的访问量是否超出第二设定阈值。
步骤S304,如果差值超出第二设定阈值,则确定访问量满足预定条件。差值超出第二设定阈值表示当前访问量超出历史访问量的访问量是否超出第二设定阈值。当差值超出第二设定阈值则报警提示,并确定访问量满足预设条件,执行步骤S306。当差值超过第二设定阈值时,表明当前访问量出现突变或者迅速增加的趋势,可以认定有一定的作弊嫌疑,进行下一步分析,即获取访问来源信息。
步骤S305,如果差值未超出第二设定阈值,则确定访问量不满足预定条件。当差值未超出第二设定阈值时,则表示访问量为出现异常,可以认定目标网页的访问量未作弊。
步骤S306,如果访问量满足预定条件,获取目标网页的访问来源信息。当目标网页的访问量满足预定条件时,则认定为该目标网页的访问量疑似作弊。当目标网页疑似作弊时,第二获取单元30获取该目标网页的访问来源信息。访问来源信息可以是访问者的访问IP地址,也可以是访问的入链的网页的网址,如针对一次访问,该次访问可以是通过其他网页的超链接访问到该目标网页,通过在目标网页的源代码中加入检测代码,可以获取该次访问的链入网页的网址,也可以获取访问者的访问IP。通过获取该访问来源信息,以便于判断目标网页的访问量是否作弊。
步骤S307,根据访问来源信息判断目标网页的访问量是否作弊。由于此时目标网页的访问量为疑似作弊,当获取到目标网页的访问来源信息之后,可以根据访问来源信息来判断目标网页的访问量是否出现作弊。例如,当获取到的访问来源信息中,大部分访问来源信息都来自一个非主流网站或者一个很少人接触的网站,亦或者是来自 目标网页本身,那么可以认定该目标网页的访问量很大程度上是采用一定的作弊手段,通过一些非主流网站或者一个很少人接触的网站的链入来刷目标网页的访问量,或者是通过不断刷新目标网页的方式刷该目标网页的访问量。其作弊的可能性较高,可以认定为该目标网页的访问量作弊。
图10是根据本发明第四实施例的网页访问量作弊的检测方法的流程图。该实施例的网页访问量作弊的检测方法可以作为上述实施例的网页访问量作弊的检测方法的一种优选实施方式。如图10所示,该网页访问量作弊的检测方法包括步骤如下:
步骤S401,获取目标网页的访问量。目标网页为需要进行访问量作弊的检测的网页,该网页可以是任意一个网站中的任意一个网页,可以是广告主投放广告的网页,也可以是广告主营销的产品网页。例如,当目标网页为广告主投放广告的网页时,获取该网页的访问量可以得知广告主投放的广告的浏览量。其中,访问量可以是访问流量,也可以是访问点击量。访问量可以是历史访问量,历史访问量表示过去的一定时间段内目标网页的访问量。访问量也可以是当前访问量,当前访问量表示当前一定时间段内的目标网页的访问量。访问量也可以是历史访问量和当前访问量。第一获取单元10获取该访问量可以是通过在目标网页中加入检测代码,用来检测目标网页的访问流量或者访问点击量等访问量信息,也可以从目标网页的日志文件中直接读取目标网页的访问流量或者访问点击量等访问量信息。
步骤S402,判断访问量是否满足预定条件。根据第一获取单元10获取的目标网页的访问量,第一判断单元20将该访问量作为判断基础,判断该访问量是否满足预定条件。该预定条件可以是访问量的变化规律,例如,该预定条件为访问量发生突变时的一个阈值,当访问量超过该阈值时,认为访问量满足预定条件,此时可以认定访问量发生突变,也即是当前访问量相比于历史访问量发生了突变,该突变可以表示当前访问量出现迅速增加的态势,也可以表示当前访问量出现迅速降低的态势。本实施例以当前访问量出现迅速增加的态势为访问量的突变状态。第一判断单元20判断出该访问量是否满足预定条件,以便于判断该访问量是否为疑似作弊。当访问量出现急速增加的态势,如当前日的访问量远远大于前一日的访问量,可以认定该目标网页的访问量有作弊嫌疑,反之,则可认为目标网页的访问量没有作弊。
步骤S403,如果访问量满足预定条件,获取目标网页的源代码。当访问量满足预定条件时,获取目标网页的访问来源信息,其中获取目标网页的访问来源信息要先获取目标网页的源代码,该源代码可以用于获取目标网页的访问来源信息。如果访问量不满足预定条件,则可以认为目标网页到目前为止的访问量没有作弊,继续检测目标网页的访问量是否满足预设条件。
步骤S404,在源代码中加入检测代码以获取目标网页的访问IP地址。检测代码用于检测目标网页的访问来源信息,该访问来源信息为访问IP地址。该访问IP地址为访问者的IP地址,在源代码中加入检测代码以获取目标网页的所有访问IP地址。例如,当有3个访问者访问目标网页时,通过在目标网页中加入检测代码,可以获取这3次访问的访问者的IP地址,这3个访问IP地址可以是相同的IP地址,也可以是不相同的IP地址。
步骤S405,将访问IP地址作为访问来源信息。访问者的IP地址可以表示访问的来源信息,可以表示目标网页确实被具有该IP地址的访问者访问。将访问IP地址作为访问来源信息,以便于进一步检测目标网页的访问量的具体情况。
步骤S406,获取访问IP地址中的第一访问IP地址的第一访问量,第一访问IP地址为访问IP地址中访问所述目标网页最多的一个访问IP地址。由于通过检测代码获取到的访问IP地址包括多个IP地址,且每一个IP地址都会给目标网页带来一定的访问量。第一访问IP地址可以是访问IP地址中的访问所述目标网页最多的一个访问者的IP地址。例如,当检测代码检测到访问目标网页的IP地址有3个,其中一个IP地址访问目标网页的次数最多,那么该IP地址即为第一访问IP地址。第一访问量为第一访问IP地址访问目标网页的访问量,该第一访问量占总访问量的比例比其他任意一个访问IP地址的访问量都要大。
步骤S407,计算第一访问量与访问量的比值。其中访问量为目标网页的总访问量,计算第一访问量占总访问量的比值,以便于判断第一访问量在总访问量中所占的比重。
步骤S408,判断第一访问量与访问量的比值是否超出第三设定阈值。第三设定阈值可以根据需要进行设定,例如,当第三设定阈值为0.5时,则判断第一访问量与访问量的比值是否超出第三设定阈值表示判断第一访问量是否超过总访问量的一半访问量。
步骤S409,如果第一访问量与访问量的比值超出第三设定阈值,则确定目标网页的访问量作弊。如上所述,当第三设定阈值为0.5时,第一访问量与访问量的比值超出0.5,则表示第一访问量超过总访问量的一半访问量,此时可以认为目标网页的访问量是通过一定的作弊手段实现的,其访问量作弊的可能性比较大。
步骤S410,如果第一访问量与访问量的比值未超出第三设定阈值,则确定目标网页的访问量未作弊。如上所述,当第三设定阈值为0.5时,第一访问量与访问量的比值未超出0.5,则表示第一访问量未超过总访问量的一半访问量,此时可以认为目标网页的访问量正常,基本可以认定目标网页的访问量未作弊。
图11是根据本发明第五实施例的网页访问量作弊的检测方法的流程图。该实施例的网页访问量作弊的检测方法可以作为上述实施例的网页访问量作弊的检测方法的一种优选实施方式。如图11所示,该网页访问量作弊的检测方法包括步骤如下:
步骤S501,获取目标网页的访问量。目标网页为需要进行访问量作弊的检测的网页,该网页可以是任意一个网站中的任意一个网页,可以是广告主投放广告的网页,也可以是广告主营销的产品网页。例如,当目标网页为广告主投放广告的网页时,获取该网页的访问量可以得知广告主投放的广告的浏览量。其中,访问量可以是访问流量,也可以是访问点击量。访问量可以是历史访问量,历史访问量表示过去的一定时间段内目标网页的访问量。访问量也可以是当前访问量,当前访问量表示当前一定时间段内的目标网页的访问量。访问量也可以是历史访问量和当前访问量。第一获取单元10获取该访问量可以是通过在目标网页中加入检测代码,用来检测目标网页的访问流量或者访问点击量等访问量信息,也可以从目标网页的日志文件中直接读取目标网页的访问流量或者访问点击量等访问量信息。
步骤S502,判断访问量是否满足预定条件;根据第一获取单元10获取的目标网页的访问量,第一判断单元20将该访问量作为判断基础,判断该访问量是否满足预定条件。该预定条件可以是访问量的变化规律,例如,该预定条件为访问量发生突变时的一个阈值,当访问量超过该阈值时,认为访问量满足预定条件,此时可以认定访问量发生突变,也即是当前访问量相比于历史访问量发生了突变,该突变可以表示当前访问量出现迅速增加的态势,也可以表示当前访问量出现迅速降低的态势。本实施例以当前访问量出现迅速增加的态势为访问量的突变状态。第一判断单元20判断出该访问量是否满足预定条件,以便于判断该访问量是否为疑似作弊。当访问量出现急速增加的态势,如当前日的访问量远远大于前一日的访问量,可以认定该目标网页的访问量有作弊嫌疑。
步骤S503,如果访问量满足预定条件,获取目标网页的源代码。当访问量满足预定条件时,获取目标网页的访问来源信息,其中获取目标网页的访问来源信息要先获取目标网页的源代码,该源代码可以用于获取目标网页的访问来源信息。如果访问量不满足预定条件,则可以认为目标网页到目前为止的访问量没有作弊,继续检测目标网页的访问量是否满足预设条件。
步骤S504,在源代码中加入检测代码以获取目标网页的访问IP地址。检测代码用于检测目标网页的访问来源信息,该访问来源信息为访问IP地址。该访问IP地址为访问者的IP地址,在源代码中加入检测代码以获取目标网页的所有访问IP地址。例如,当有3个访问者访问目标网页时,通过在目标网页中加入检测代码,可以获取 这3次访问的访问者的IP地址,这3个访问IP地址可以是相同的IP地址,也可以是不相同的IP地址,该访问IP地址即为目标网页的访问来源信息。
步骤S505,将访问IP地址作为访问来源信息。访问者的IP地址可以表示访问的来源信息,可以表示目标网页确实被具有该IP地址的访问者访问。将访问IP地址作为访问来源信息,以便于进一步检测目标网页的访问量的具体情况。
步骤S506,获取访问IP地址中的第一访问IP地址的第一访问量,第一访问IP地址为访问IP地址中访问所述目标网页最多的一个访问IP地址。由于通过检测代码获取到的访问IP地址包括多个IP地址,且每一个IP地址都会给目标网页带来一定的访问量。第一访问IP地址可以是访问IP地址中的访问所述目标网页最多的一个访问者的IP地址。例如,当检测代码检测到访问目标网页的IP地址有3个,其中一个IP地址访问目标网页的次数最多,那么该IP地址即为第一访问IP地址。第一访问量为第一访问IP地址访问目标网页的访问量,该第一访问量占总访问量的比例比其他任意一个访问IP地址的访问量都要大。
步骤S507,计算第一访问量与访问量的比值。其中访问量为目标网页的总访问量,计算第一访问量占总访问量的比值,以便于判断第一访问量在总访问量中所占的比重。
步骤S508,判断第一访问量与访问量的比值是否超出第三设定阈值。第三设定阈值可以根据需要进行设定,例如,当第三设定阈值为0.5时,则判断第一访问量与访问量的比值是否超出第三设定阈值表示判断第一访问量是否超过总访问量的一半访问量。
步骤S509,如果第一访问量与访问量的比值超出第三设定阈值,则获取第一访问IP地址的访问停留时间。访问停留时间表示访问者访问目标网页时,在目标网页的停留时间,第一访问IP地址访问过目标网页的很多次,因此访问停留时间也包括多个访问停留时间,获取第一访问IP地址的访问停留时间即是获取第一访问IP地址的每一次访问的访问停留时间。
步骤S510,判断访问停留时间是否超出第四设定阈值。第四设定阈值为访问时间阈值,即该阈值为时间值,可以根据需要进行设定,由于访问停留时间包括多个访问停留时间,因此,判断访问停留时间是否超出第四设定阈值表示判断每次访问停留时间是否超出第四设定阈值。例如,当第四设定阈值为3s时,判断第一访问IP地址的每一次访问停留时间是否超出3s。
步骤S511,如果访问停留时间未超出第四设定阈值,则确定目标网页的访问量作弊。访问停留时间未超出第四设定阈值表示第一访问IP地址的多次访问的访问停留时间均未超出第四设定阈值,假如第一访问IP地址的第一访问量中大部分的访问停留时间均未超出第四设定阈值,则认为目标网页的访问量作弊。例如,当第四设定阈值为3s时,如果第一访问IP地址的第一访问量中大部分的访问停留时间均不到3s,则表明第一访问IP地址的第一访问量中大部分的访问量为非正常访问,很可能采用了一种刷网页点击量的形式,不符合常理,则认为该目标网页的访问量作弊。
步骤S512,如果访问停留时间超出第四设定阈值,则确定目标网页的访问量未作弊。同理,如果第一访问IP地址的第一访问量中大部分的访问停留时间均超出第四设定阈值,则表示第一访问量为正常访问的访问量,因此可以认为目标网页的访问量未作弊。
图12是根据本发明第六实施例的网页访问量作弊的检测方法的流程图。该实施例的网页访问量作弊的检测方法可以作为上述实施例的网页访问量作弊的检测方法的一种优选实施方式。如图12所示,该网页访问量作弊的检测方法包括步骤如下:
步骤S601,获取目标网页的源代码。可以通过爬虫程序抓取目标网页的源代码,也可以通过其他方式,获取到源代码可以得到目标网页的组织架构,以便于对目标网页进行检测。
步骤S602,检测源代码中是否存在大小为0*0或者1*1的内联框架iframe。由于大小为0×0或1×1的iframe,也就是不可见的iframe。通过iframe打开其他页面,使得用户打开了并非自己想要进入的网页,在看不见的情况下刷流量或者访问量。可以编写分析程序,从源代码中分析是否存在大小为0*0或者1*1的内联框架iframe。
步骤S603,如果源代码中不存在iframe,则获取目标网页的访问量。当源代码中不存在该iframe时,则通过获取目标网页的访问量进行下一步判断。如果源代码中存在iframe,则确定目标网页访问量作弊。由于该大小为0*0或者1*1的内联框架iframe是用于骗取访问量,在访问者不知情的情况下刷访问量,因此,当检测到目标网页的源代码中存在有该iframe时,可以认定采取了作弊手段,则可以确定目标网页访问量作弊。
步骤S604,判断访问量是否满足预定条件。
步骤S605,如果访问量满足预定条件,获取目标网页的访问来源信息。
步骤S606,根据访问来源信息判断目标网页的访问量是否作弊。
步骤S603中获取目标网页的访问量,以及步骤S604、步骤S605和步骤S606与本发明图7所示的网页访问量作弊的检测方法的步骤S101、步骤S102、步骤S103和步骤S104相同,这里不作赘述。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (12)

  1. 一种网页访问量作弊的检测方法,其特征在于,包括:
    获取目标网页的访问量;
    判断所述访问量是否满足预定条件;
    如果所述访问量满足预定条件,获取所述目标网页的访问来源信息;以及
    根据所述访问来源信息判断所述目标网页的访问量是否作弊。
  2. 根据权利要求1所述的网页访问量作弊的检测方法,其特征在于,获取目标网页的访问量包括获取所述目标网页的历史访问量和当前访问量,判断所述访问量是否满足预定条件包括:
    获取所述历史访问量和所述当前访问量的比值;
    判断所述比值是否超出第一设定阈值;
    如果所述比值超出所述第一设定阈值,则确定所述访问量满足所述预定条件;以及
    如果所述比值未超出所述第一设定阈值,则确定所述访问量不满足所述预定条件。
  3. 根据权利要求1所述的网页访问量作弊的检测方法,其特征在于,获取目标网页的访问量包括获取所述目标网页的历史访问量和当前访问量,判断所述访问量是否满足预定条件包括:
    获取所述历史访问量和所述当前访问量的差值;
    判断所述差值是否超出第二设定阈值;
    如果所述差值超出所述第二设定阈值,则确定所述访问量满足所述预定条件;以及
    如果所述差值未超出所述第二设定阈值,则确定所述访问量不满足所述预定条件。
  4. 根据权利要求1所述的网页访问量作弊的检测方法,其特征在于,
    获取所述目标网页的访问来源信息包括:获取所述目标网页的源代码;在所述源代码中加入检测代码以获取所述目标网页的访问IP地址;将所述访问IP地址作为所述访问来源信息;
    根据所述访问来源信息判断所述目标网页的访问量是否作弊包括:获取所述访问IP地址中的第一访问IP地址的第一访问量,所述第一访问IP地址为所述访问IP地址中访问所述目标网页最多的一个访问IP地址;
    计算所述第一访问量与所述访问量的比值;
    判断所述第一访问量与所述访问量的比值是否超出第三设定阈值;
    如果所述第一访问量与所述访问量的比值超出所述第三设定阈值,则确定所述目标网页的访问量作弊,以及
    如果所述第一访问量与所述访问量的比值未超出所述第三设定阈值,则确定所述目标网页的访问量未作弊。
  5. 根据权利要求4所述的网页访问量作弊的检测方法,其特征在于,确定所述目标网页的访问量作弊包括:
    获取所述第一访问IP的访问停留时间;
    判断所述访问停留时间是否超出第四设定阈值;
    如果所述访问停留时间未超出所述第四设定阈值,则确定所述目标网页的访问量作弊;以及
    如果所述访问停留时间超出所述第四设定阈值,则确定所述目标网页的访问量未作弊。
  6. 根据权利要求1所述的网页访问量作弊的检测方法,其特征在于,在获取目标网页的访问量之前,所述网页访问量作弊的检测方法还包括:
    获取所述目标网页的源代码;
    检测所述源代码中是否存在大小为0*0或者1*1的内联框架iframe;以及
    如果所述源代码中不存在所述iframe,则获取所述目标网页的访问量。
  7. 一种网页访问量作弊的检测装置,其特征在于,包括:
    第一获取单元,用于获取目标网页的访问量;
    第一判断单元,用于判断所述访问量是否满足预定条件;
    第二获取单元,用于当所述访问量满足预定条件时,获取所述目标网页的访问来源信息;以及
    第二判断单元,用于根据所述访问来源信息判断所述目标网页的访问量是否作弊。
  8. 根据权利要求7所述的网页访问量作弊的检测装置,其特征在于,所述第一获取单元还用于获取所述目标网页的历史访问量和当前访问量,其中,所述第一判断单元包括:
    第一获取模块,用于获取所述历史访问量和所述当前访问量的比值;
    第一判断模块,用于判断所述比值是否超出第一设定阈值;以及
    第一确定模块,用于当所述比值超出所述第一设定阈值时,确定所述访问量满足所述预定条件,当所述比值未超出所述第一设定阈值时,确定所述访问量不满足所述预定条件。
  9. 根据权利要求7所述的网页访问量作弊的检测装置,其特征在于,第一获取单元还用于获取所述目标网页的历史访问量和当前访问量,其中,所述第一判断单元包括:
    第二获取模块,用于获取所述历史访问量和所述当前访问量的差值;
    第二判断模块,用于判断所述差值是否超出第二设定阈值;以及
    第二确定模块,用于当所述差值超出所述第二设定阈值时,确定所述访问量满足所述预定条件,当所述差值未超出所述第二设定阈值时,确定所述访问量不满足所述预定条件。
  10. 根据权利要求7所述的网页访问量作弊的检测装置,其特征在于,
    所述第二获取单元包括:
    第三获取模块,用于获取所述目标网页的源代码;
    第四获取模块,用于在所述源代码中加入检测代码以获取所述目标网页的访问IP地址;
    生成模块,用于将所述访问IP地址作为访问来源信息;
    所述第二判断单元包括:
    第五获取模块,用于获取所述访问IP地址中的第一访问IP地址的第一访问量,所述第一访问IP地址为所述访问IP地址中访问所述目标网页最多的一个访问IP地址;
    计算模块,用于计算所述第一访问量与所述访问量的比值;
    第三判断模块,用于判断所述第一访问量与所述访问量的比值是否超出第三设定阈值;以及
    第三确定模块,用于当所述第一访问量与所述访问量的比值超出所述第三设定阈值时,确定所述目标网页的访问量作弊,当所述第一访问量与所述访问量的比值未超出所述第三设定阈值时,确定所述目标网页的访问量未作弊。
  11. 根据权利要求10所述的网页访问量作弊的检测装置,其特征在于,所述第三确定模块包括:
    获取子模块,用于获取所述第一访问IP的访问停留时间;
    判断子模块,用于判断所述访问停留时间是否超出第四设定阈值;以及
    确定子模块,用于当所述访问停留时间未超出所述第四设定阈值时,确定所述目标网页的访问量作弊,当所述访问停留时间超出所述第四设定阈值时,确定所述目标网页的访问量未作弊。
  12. 根据权利要求7所述的网页访问量作弊的检测装置,其特征在于,所述网页访问量作弊的检测装置还包括:
    第三获取单元,用于在获取目标网页的访问量之前,获取所述目标网页的源代码;
    检测单元,用于检测所述源代码中是否存在大小为0*0或者1*1的内联框架iframe;以及
    确定单元,用于当所述源代码中不存在所述iframe时,获取所述目标网页的访问量。
PCT/CN2014/089724 2013-10-29 2014-10-28 网页访问量作弊的检测方法和装置 WO2015062485A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/139,096 US20160239864A1 (en) 2013-10-29 2016-04-26 Method and apparatus for detecting cheat on page views of web page

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310523151.0A CN103593415B (zh) 2013-10-29 2013-10-29 网页访问量作弊的检测方法和装置
CN201310523151.0 2013-10-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/139,096 Continuation-In-Part US20160239864A1 (en) 2013-10-29 2016-04-26 Method and apparatus for detecting cheat on page views of web page

Publications (1)

Publication Number Publication Date
WO2015062485A1 true WO2015062485A1 (zh) 2015-05-07

Family

ID=50083556

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/089724 WO2015062485A1 (zh) 2013-10-29 2014-10-28 网页访问量作弊的检测方法和装置

Country Status (3)

Country Link
US (1) US20160239864A1 (zh)
CN (1) CN103593415B (zh)
WO (1) WO2015062485A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611520A (zh) * 2020-05-28 2020-09-01 北京学之途网络科技有限公司 一种流量作弊的监测方法、装置、电子设备及存储介质
CN111611521A (zh) * 2020-05-28 2020-09-01 北京学之途网络科技有限公司 一种流量作弊的监测方法、装置、电子设备及存储介质
CN114172725A (zh) * 2021-12-07 2022-03-11 百度在线网络技术(北京)有限公司 非法网站的处理方法、装置、电子设备和存储介质

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593415B (zh) * 2013-10-29 2017-08-01 北京国双科技有限公司 网页访问量作弊的检测方法和装置
CN106301980B (zh) * 2015-05-28 2020-06-05 腾讯科技(深圳)有限公司 一种刷量工具检测方法和装置
CN106445796B (zh) * 2015-08-04 2021-01-19 腾讯科技(深圳)有限公司 作弊渠道的自动检测方法及装置
CN106469383A (zh) * 2015-08-14 2017-03-01 北京国双科技有限公司 广告投放质量的检测方法和装置
CN106547420B (zh) 2015-09-23 2020-06-02 阿里巴巴集团控股有限公司 一种页面处理方法和装置
CN105279674A (zh) * 2015-10-13 2016-01-27 精硕世纪科技(北京)有限公司 移动广告投放设备作弊行为的判断方法和装置
CN106611346A (zh) * 2015-10-22 2017-05-03 北京国双科技有限公司 访客筛选方法和装置
CN106611348A (zh) * 2015-10-23 2017-05-03 北京国双科技有限公司 异常流量的检测方法和装置
CN106934627B (zh) * 2015-12-28 2021-03-30 中国移动通信集团公司 一种电商行业作弊行为的检测方法及装置
CN105677221A (zh) * 2015-12-30 2016-06-15 广州优视网络科技有限公司 一种提高应用程序数据检测准确性的方法、装置及设备
CN106933905B (zh) * 2015-12-31 2019-12-24 北京国双科技有限公司 网页访问数据的监测方法和装置
CN107169769A (zh) * 2016-03-08 2017-09-15 广州市动景计算机科技有限公司 应用程序的刷量识别方法、装置
CN105975379A (zh) * 2016-05-25 2016-09-28 北京比邻弘科科技有限公司 一种虚假移动设备的识别方法及识别系统
CN106097000B (zh) * 2016-06-02 2022-07-26 腾讯科技(深圳)有限公司 一种信息处理方法及服务器
CN106355431B (zh) * 2016-08-18 2020-01-07 晶赞广告(上海)有限公司 作弊流量检测方法、装置及终端
CN108255879B (zh) * 2016-12-29 2021-10-08 北京国双科技有限公司 网页浏览流量作弊的检测方法及装置
CN106603554B (zh) * 2016-12-29 2019-11-15 北京奇艺世纪科技有限公司 一种自适应实时视频数据的反作弊方法及装置
CN106651458B (zh) * 2016-12-29 2020-07-07 腾讯科技(深圳)有限公司 一种广告反作弊方法和装置
CN109150928A (zh) * 2017-06-15 2019-01-04 北京京东尚科信息技术有限公司 用于处理请求的方法和装置
CN107454441B (zh) * 2017-06-30 2019-12-03 武汉斗鱼网络科技有限公司 一种检测直播间刷人气行为的方法、直播平台服务器及计算机可读存储介质
CN107566897B (zh) * 2017-07-19 2019-10-15 北京奇艺世纪科技有限公司 一种视频刷量的鉴别方法、装置及电子设备
CN107578263B (zh) * 2017-07-21 2021-01-05 北京奇艺世纪科技有限公司 一种广告异常访问的检测方法、装置和电子设备
CN109586990B (zh) * 2017-09-29 2021-11-02 北京国双科技有限公司 一种识别作弊流量的方法及装置
CN108009844B (zh) * 2017-11-20 2021-06-29 北京智钥科技有限公司 确定广告作弊行为的方法、装置及云服务器
CN110097389A (zh) * 2018-01-31 2019-08-06 上海甚术网络科技有限公司 一种广告流量反作弊方法
CN110381375B (zh) * 2018-04-13 2022-06-21 武汉斗鱼网络科技有限公司 一种确定盗刷数据的方法、客户端及服务器
CN108810947B (zh) * 2018-05-29 2021-05-11 每日互动股份有限公司 基于ip地址的鉴别真实流量的服务器
CN111222938A (zh) * 2018-11-27 2020-06-02 北京京东尚科信息技术有限公司 目标对象信息识别方法、装置、电子设备及可读存储介质
CN109905738B (zh) * 2019-03-26 2022-03-08 湖南快乐阳光互动娱乐传媒有限公司 视频广告异常展现监测方法及装置、存储介质和电子设备
CN110365672B (zh) * 2019-07-09 2022-02-22 葛晓滨 一种电子商务异常攻击的检测方法
CN110290400B (zh) * 2019-07-29 2022-06-03 北京奇艺世纪科技有限公司 可疑刷量视频的识别方法、真实播放量预估方法及装置
CN112529605B (zh) * 2019-09-17 2023-12-22 北京互娱数字科技有限公司 一种广告异常曝光识别系统及方法
CN111861568A (zh) * 2020-07-23 2020-10-30 上海志窗信息科技有限公司 互联网广告监控系统及其方法
CN112188291B (zh) * 2020-09-24 2022-11-29 北京明略昭辉科技有限公司 广告位异常的识别方法和装置
CN113657924B (zh) * 2021-07-21 2023-10-31 安徽赤兔马传媒科技有限公司 基于机器学习的线下智慧屏广告反作弊系统及报警器
CN117217830B (zh) * 2023-11-07 2024-02-27 深圳市豪斯莱科技有限公司 一种广告刷单监控识别方法、系统及可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101083510A (zh) * 2002-03-08 2007-12-05 艾威尔公司 用于高速率正交频分复用通信的系统和方法
CN102254265A (zh) * 2010-05-18 2011-11-23 北京首家通信技术有限公司 一种富媒体互联网广告内容匹配、效果评估方法
CN102693501A (zh) * 2012-05-31 2012-09-26 刘志军 一种网络广告推广效果分析方法
CN103200262A (zh) * 2013-04-02 2013-07-10 亿赞普(北京)科技有限公司 一种基于移动网络的广告调度方法、装置及系统
CN103593415A (zh) * 2013-10-29 2014-02-19 北京国双科技有限公司 网页访问量作弊的检测方法和装置

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001084351A2 (en) * 2000-04-28 2001-11-08 Inceptor, Inc. Method of and system for enhanced web page delivery
US6963874B2 (en) * 2002-01-09 2005-11-08 Digital River, Inc. Web-site performance analysis system and method utilizing web-site traversal counters and histograms
US7734502B1 (en) * 2005-08-11 2010-06-08 A9.Com, Inc. Ad server system with click fraud protection
US20070129999A1 (en) * 2005-11-18 2007-06-07 Jie Zhou Fraud detection in web-based advertising
US20080288303A1 (en) * 2006-03-17 2008-11-20 Claria Corporation Method for Detecting and Preventing Fraudulent Internet Advertising Activity
US20070255821A1 (en) * 2006-05-01 2007-11-01 Li Ge Real-time click fraud detecting and blocking system
US7657626B1 (en) * 2006-09-19 2010-02-02 Enquisite, Inc. Click fraud detection
US20080114624A1 (en) * 2006-11-13 2008-05-15 Microsoft Corporation Click-fraud protector
US8880541B2 (en) * 2006-11-27 2014-11-04 Adobe Systems Incorporated Qualification of website data and analysis using anomalies relative to historic patterns
US20080281606A1 (en) * 2007-05-07 2008-11-13 Microsoft Corporation Identifying automated click fraud programs
CN100565526C (zh) * 2007-07-25 2009-12-02 北京搜狗科技发展有限公司 一种针对网页作弊的反作弊方法及系统
US8219549B2 (en) * 2008-02-06 2012-07-10 Microsoft Corporation Forum mining for suspicious link spam sites detection
US8311876B2 (en) * 2009-04-09 2012-11-13 Sas Institute Inc. Computer-implemented systems and methods for behavioral identification of non-human web sessions
US9576303B2 (en) * 2011-06-17 2017-02-21 Google Inc. Advertisements in view
CN103049456B (zh) * 2011-10-14 2016-03-16 腾讯科技(深圳)有限公司 一种筛选网页的方法及装置
US20130110648A1 (en) * 2011-10-31 2013-05-02 Simon Raab System and method for click fraud protection
US20140278947A1 (en) * 2011-10-31 2014-09-18 Pureclick Llc System and method for click fraud protection
US20130198203A1 (en) * 2011-12-22 2013-08-01 John Bates Bot detection using profile-based filtration
CN103294686B (zh) * 2012-02-24 2018-04-17 腾讯科技(深圳)有限公司 一种网页作弊用户、作弊网页的识别方法及系统
US10043197B1 (en) * 2012-06-14 2018-08-07 Rocket Fuel Inc. Abusive user metrics

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101083510A (zh) * 2002-03-08 2007-12-05 艾威尔公司 用于高速率正交频分复用通信的系统和方法
CN102254265A (zh) * 2010-05-18 2011-11-23 北京首家通信技术有限公司 一种富媒体互联网广告内容匹配、效果评估方法
CN102693501A (zh) * 2012-05-31 2012-09-26 刘志军 一种网络广告推广效果分析方法
CN103200262A (zh) * 2013-04-02 2013-07-10 亿赞普(北京)科技有限公司 一种基于移动网络的广告调度方法、装置及系统
CN103593415A (zh) * 2013-10-29 2014-02-19 北京国双科技有限公司 网页访问量作弊的检测方法和装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611520A (zh) * 2020-05-28 2020-09-01 北京学之途网络科技有限公司 一种流量作弊的监测方法、装置、电子设备及存储介质
CN111611521A (zh) * 2020-05-28 2020-09-01 北京学之途网络科技有限公司 一种流量作弊的监测方法、装置、电子设备及存储介质
CN111611521B (zh) * 2020-05-28 2023-11-03 北京学之途网络科技有限公司 一种流量作弊的监测方法、装置、电子设备及存储介质
CN111611520B (zh) * 2020-05-28 2024-03-08 北京明略昭辉科技有限公司 一种流量作弊的监测方法、装置、电子设备及存储介质
CN114172725A (zh) * 2021-12-07 2022-03-11 百度在线网络技术(北京)有限公司 非法网站的处理方法、装置、电子设备和存储介质
CN114172725B (zh) * 2021-12-07 2023-11-14 百度在线网络技术(北京)有限公司 非法网站的处理方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN103593415B (zh) 2017-08-01
CN103593415A (zh) 2014-02-19
US20160239864A1 (en) 2016-08-18

Similar Documents

Publication Publication Date Title
WO2015062485A1 (zh) 网页访问量作弊的检测方法和装置
US11301898B2 (en) Condition-based method of directing electronic profile-based advertisements for display in ad space in internet websites
Hils et al. Measuring the emergence of consent management on the web
JP5735492B2 (ja) オンライン広告キャンペーンの効果の測定
US10902443B2 (en) Detecting differing categorical features when comparing segments
JP5562328B2 (ja) インターネットベースの広告の自動監視および照合
US20130268351A1 (en) Verified online impressions
US20230245151A1 (en) Systems and methods for determining segments of online users from correlated datasets
US10621600B2 (en) Method for analyzing website visitors using anonymized behavioral prediction models
CA2674856A1 (en) Measurement of content placement effectiveness over web pages and like media
US8150849B2 (en) System and method for extrapolating data from a sample set
KR101364490B1 (ko) 인터넷 광고에 대해 사용자 혜택을 제공함에 따른 프리미엄을 제거하는 광고 제공 시스템 및 방법
US20150339716A1 (en) Online classified website for specific geographic regions and method for marketing the same
Whitehead et al. Benefit transfers with the contingent valuation method
KR101598620B1 (ko) 사용자별 관심 주기를 이용하여 전송하기 위한 광고를 결정하는 광고 시스템 및 방법
US11887159B2 (en) Privacy-safe frequency distribution of geo-features for mobile devices
Uphaus et al. Barriers seen by potential local Providers of Applications using Location-Based Services
US20150302434A1 (en) System and method to measure brand favorability
Schebesta Behavioural study on advertising and marketing practices in online social media: Annex 1.5 legal assessment of problematic practices
KR20220090754A (ko) 웹사이트를 대상으로 하는 부정클릭 차단 및 광고 효과의 측정 시스템
KR101598618B1 (ko) 사용자의 검색 시점에 대한 최근성과 광고의 노출빈도를 이용하여 전송하기 위한 광고를 결정하는 광고 시스템 및 방법
KR20100109219A (ko) 인터넷 광고 시스템 및 그 방법
Bannan Weight of the words; Even simple metrics can help you determine the effectiveness of your search engine marketing.
KR20120090251A (ko) 키워드들의 조합을 추천하는 시스템 및 방법 그리고 키워드들의 조합을 추출하는 시스템 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14857611

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 160816)

122 Ep: pct application non-entry in european phase

Ref document number: 14857611

Country of ref document: EP

Kind code of ref document: A1