WO2015143956A1 - 一种拦截网页中的广告的方法及装置 - Google Patents

一种拦截网页中的广告的方法及装置 Download PDF

Info

Publication number
WO2015143956A1
WO2015143956A1 PCT/CN2015/072515 CN2015072515W WO2015143956A1 WO 2015143956 A1 WO2015143956 A1 WO 2015143956A1 CN 2015072515 W CN2015072515 W CN 2015072515W WO 2015143956 A1 WO2015143956 A1 WO 2015143956A1
Authority
WO
WIPO (PCT)
Prior art keywords
advertisement
webpage
suspected
webpage data
window
Prior art date
Application number
PCT/CN2015/072515
Other languages
English (en)
French (fr)
Inventor
朱佳来
陈亮
Original Assignee
北京金山网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京金山网络科技有限公司 filed Critical 北京金山网络科技有限公司
Publication of WO2015143956A1 publication Critical patent/WO2015143956A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/51Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems at application loading time, e.g. accepting, rejecting, starting or inhibiting executable software based on integrity or source reliability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2119Authenticating web pages, e.g. with suspicious links

Definitions

  • the present application relates to the field of web page identification technologies, and in particular, to a method and apparatus for intercepting advertisements in web pages.
  • Advertisements on the webpage such as advertisements at the top of the page, advertisements on both sides of the floating window, advertisements of fixed advertisement spaces, etc., are generally published by the website operator or are published by related customers, so the website operator will not block such advertisements. . However, such advertisements may cause interference to users. If the user is using a mobile terminal such as a mobile phone to browse the webpage, these advertisements may occupy certain traffic resources.
  • the inventor of the present application found that how to identify an advertisement in a webpage to facilitate the screening of advertisements has become a technical problem to be solved at present.
  • the embodiment of the present application provides a method and a device for intercepting advertisements in a webpage, which are used for automatically screening out suspected advertisements, quickly identifying advertisements, and automatically generating interception rules to provide for blocked advertisements. convenient.
  • the embodiment of the present application provides a method for intercepting an advertisement in a webpage, including:
  • the suspected advertisement is an actual advertisement, generating a corresponding advertisement interception rule
  • Block ads in webpages based on the generated ad blocking rules are Block ads in webpages based on the generated ad blocking rules.
  • the step of analyzing the webpage data to determine a suspected advertisement includes: obtaining an attribute identifier of a webpage element in a source file of the webpage data; Determining whether the value of the attribute identifier includes a feature character of the advertisement; determining a webpage element corresponding to the attribute identifier of the feature character including the advertisement as a suspected advertisement.
  • the step of analyzing the webpage data to determine a suspected advertisement includes: determining, according to the webpage data, whether a preset location in the webpage page exists within a preset size interval a window; if the determination result is YES, the webpage data corresponding to the window is determined as a suspected advertisement.
  • the step of analyzing the webpage data to determine a suspected advertisement includes: determining, according to the webpage data, whether there is a full-screen display window that is consistent with the screen size and placed on the top layer
  • the full-screen display window has a picture that does not exceed the first preset number and a button that does not exceed the second preset number; if the determination result is yes, the web page corresponding to the full-screen display window
  • the data is determined to be a suspected advertisement.
  • the step of analyzing the webpage data to determine a suspected advertisement includes: determining whether a uniform resource locator URL of a window webpage element in the webpage data is a feature including an advertisement The URL of the character; if the result of the determination is YES, the webpage data corresponding to the window webpage element is determined as a suspected advertisement.
  • the step of determining whether the suspected advertisement is an actual advertisement comprises: if a color difference between a fill color of the suspected advertisement portion and a fill color of the webpage reaches a preset threshold; Determining that the advertisement is an actual advertisement; or determining whether the suspected advertisement is an actual advertisement according to a color histogram change rate of the suspected advertisement; if the color histogram change rate of the suspected advertisement is greater than or equal to a preset threshold, determining The suspected advertisement is an actual advertisement.
  • an embodiment of the present application provides an apparatus for intercepting an advertisement in a webpage, including:
  • the obtaining module is configured to obtain webpage data corresponding to the preset webpage
  • An analysis module configured to analyze the webpage data to determine a suspected advertisement
  • a determining module configured to determine whether the suspected advertisement is an actual advertisement
  • a generating module configured to generate a corresponding advertisement blocking rule if the suspected advertisement is an actual advertisement
  • the intercepting module is configured to block the advertisement in the webpage according to the generated corresponding advertisement blocking rule.
  • the analyzing module is configured to acquire an attribute identifier of a webpage element in a source file of the webpage data, and determine whether the value of the attribute identifier includes an advertisement.
  • Feature character determining a webpage element corresponding to the attribute identifier of the feature character containing the advertisement as a suspected advertisement.
  • the analyzing module is configured to determine, according to the webpage data, whether a preset location in a webpage page has a window in a preset size interval; if the determination result is yes, The webpage data corresponding to the window is determined to be a suspected advertisement.
  • the analyzing module is configured to determine, according to the webpage data, whether there is a window that is consistent with the screen size and is placed on the top screen, and the full screen display window does not exceed the first a preset number of pictures and a button not exceeding the second preset number; if the determination result is YES, determining the webpage data corresponding to the full-screen displayed window as a suspected advertisement.
  • the analyzing module is configured to determine whether a uniform resource locator URL of a window webpage element in the webpage data is a URL of a feature character including an advertisement; if the judgment result is yes, And determining webpage data corresponding to the window webpage element as a suspected advertisement.
  • the determining module is configured to: if the color difference between the fill color of the suspected advertisement portion and the fill color of the webpage reaches a preset threshold; determine that the suspected advertisement is an actual advertisement; or Determining the color histogram change rate of the advertisement, determining whether the suspect advertisement is an actual advertisement; if the color histogram change rate of the suspect advertisement is greater than or equal to a preset threshold, determining that the suspect advertisement is an actual advertisement.
  • the embodiment of the present application further discloses a terminal, where the terminal includes:
  • processor a memory, a communication interface, and a bus
  • the processor, the memory, and the communication interface are connected by the bus and complete communication with each other;
  • the memory stores executable program code
  • the processor runs a program corresponding to the executable program code by reading executable program code stored in the memory for:
  • the suspected advertisement is an actual advertisement, generating a corresponding advertisement interception rule
  • Block ads in webpages based on the generated ad blocking rules are Block ads in webpages based on the generated ad blocking rules.
  • the embodiment of the present application further discloses an application program for executing a method for intercepting an advertisement in a webpage according to an embodiment of the present application at runtime.
  • the embodiment of the present application further discloses a storage medium for storing an application, where the application is used to execute the method for intercepting an advertisement in a webpage according to an embodiment of the present application.
  • the technical solution provided by the embodiment of the present application may include the following beneficial effects: obtaining a suspected advertisement by analyzing webpage data corresponding to the preset webpage, and generating a corresponding advertisement interception rule when the suspected advertisement is an actual advertisement, and generating the corresponding advertisement blocking rule according to the generated Corresponding ad blocking rules block ads in webpages, automatically filter out suspected ads, quickly identify ads, and automatically generate blocking rules to facilitate blocking ads.
  • FIG. 1 is a main flowchart of a method for intercepting an advertisement in a webpage according to an embodiment of the present application
  • FIG. 2 is a flowchart of a first preferred embodiment of a method for intercepting advertisements in a webpage according to an embodiment of the present application
  • FIG. 3 is a flowchart of a second preferred embodiment of a method for intercepting an advertisement in a webpage according to an embodiment of the present application
  • FIG. 4 is a flowchart of a third preferred embodiment of a method for intercepting advertisements in a webpage according to an embodiment of the present application
  • FIG. 5 is a flowchart of a fourth preferred embodiment of a method for intercepting an advertisement in a webpage according to an embodiment of the present application
  • FIG. 6 is a schematic structural diagram of an apparatus for intercepting advertisements in a webpage according to an embodiment of the present application.
  • the webpage data corresponding to the preset webpage is analyzed to determine the suspected advertisement, and in the case that the suspected advertisement is the actual advertisement, the corresponding advertisement interception rule is generated, and the webpage is intercepted according to the generated corresponding advertisement interception rule.
  • the advertisement it automatically filters out the suspected advertisements, quickly identifies the advertisements, and automatically generates the interception rules to facilitate the blocking of advertisements.
  • the embodiment of the present application is more targeted and accurate for intercepting advertisements in a webpage.
  • a main process of a method for intercepting advertisements in a webpage in the embodiment of the present application includes:
  • the local client can send an access request to the network side according to the preset web address, and the network side returns the webpage data according to the access request, and the local client can obtain the webpage data.
  • the local client can maintain a list of URLs in which one or more preset URLs are stored.
  • the list of URLs can be updated manually, or it can be automatically updated by the system.
  • the webpage data may exist in the source file of the webpage, and the source files of the webpage may include: a Hyper Text Markup Language (HTML) source file, an Extensible HyperText Markup Language (XHTML) source file, and the like. .
  • HTML Hyper Text Markup Language
  • XHTML Extensible HyperText Markup Language
  • the above 102 may have the following implementations:
  • the attribute identifier of the webpage element in the source file of the webpage data is obtained; determining whether the value of the attribute identifier includes the characteristic character of the advertisement; if included, determining the corresponding webpage element as Suspected advertising.
  • the value of the attribute identifier Tagname includes "AD”
  • the value of the attribute identifier class includes "
  • the preset position in the webpage page has a window within the preset size interval; if the determination result is yes, the webpage data corresponding to the window is determined as a suspected advertisement.
  • the preset position may include a top position, a bottom position, left and right sides, and the like.
  • the preset size interval is [30 ⁇ 100, 100 ⁇ 350] pixels, and the preset size interval can be determined according to the screen size of the terminal. In this way, the advertisement of the fixed advertising space in the webpage can be identified in a targeted manner.
  • the webpage data it is determined whether there is a full-screen display window that is consistent with the screen size and is placed on the top layer.
  • the full-screen display window there are no more than the first preset number of pictures and no more than the second preset.
  • the number of buttons if there is a full-screen display window that satisfies the above conditions, it is determined that the webpage data corresponding to the full-screen displayed window is a suspected advertisement.
  • the full-screen display window may be a general webpage or an advertisement, the inventor of the present application finds that there are many pictures and buttons in the general webpage, and there are few pictures in the advertisement window, generally one picture, and the buttons are also compared. Therefore, the first preset number may have a value range of [1, 3], and the second preset number may have a value range of [1, 4].
  • the window is determined.
  • the corresponding web page data is not a suspected advertisement.
  • the window displayed on the top screen in full screen may refer to the position attribute of the window displayed in full screen as the top.
  • the method A4 determining, according to the webpage data, whether the Uniform Resource Locator (URL) of the window webpage element in the webpage page is a URL containing the feature character of the advertisement; if the judgment result is yes, the window is The webpage data corresponding to the webpage element is determined to be suspected Report.
  • the window web page elements in the web page are typically located on a portion of the web page page and are different from the full screen display window in the manner A3 above.
  • the foregoing 103 may determine whether the suspected advertisement is an actual advertisement by recognizing the color difference, for example, if the fill color of the suspected advertisement portion has a significant color difference with the fill color of the webpage, for example, the color difference reaches a preset threshold, and the determined The suspected advertisement is an actual advertisement.
  • other automatic identification methods can also be used to determine whether the suspected advertisement is an actual advertisement.
  • the suspected advertisement is an actual advertisement, generate a corresponding advertisement blocking rule.
  • the suspected advertisement is an actual advertisement
  • a corresponding advertisement interception rule is generated.
  • an interception rule that needs to block the content at the top of the home page of the URL B is generated.
  • the system can automatically block the content at the top of the home page of the URL B according to the interception rule.
  • a first preferred implementation manner of a method for intercepting an advertisement in a webpage in the embodiment of the present application includes:
  • the local client can maintain a list of URLs, which have one or more preset URLs, such as the URL http://xx.com, and the local client can use the URL to the network side.
  • the network side returns the webpage data according to the access request, and the local client can obtain the webpage data corresponding to the webpage.
  • the webpage data may exist in the source file of the webpage, for example, the attribute identifier of the webpage element in the HTML source file of the webpage data, such as Tagname, ID, or class.
  • the characteristic characters of the advertisement such as "advertising”, “AD”, “Adv”, “Advert”, or “Advertisement”, and the like.
  • the local client sends an access request to the network side according to the web address, and the network side returns the webpage data according to the access request, and the local client can obtain the webpage data corresponding to the webpage, and the webpage data corresponding to the webpage includes the following HTML code.
  • it may be determined whether the value of the attribute identifier class includes a feature character of the advertisement (such as "advertisement”, “AD”, “Adv”, “Advert”, or "Advertisement", etc.) Suspected advertisements are identified.
  • the attribute identifier class "advertise" which contains the feature characters of the advertisement, determines the webpage element corresponding to the attribute identifier class as a suspected advertisement, and determines the webpage element marked by the above HTML code as a suspected advertisement.
  • whether the suspect advertisement is an actual advertisement in the above 205 and 206 can also determine whether the color difference between the fill color of the suspected advertisement portion and the fill color of the webpage reaches a preset threshold; if the color difference reaches a preset threshold, the suspect is determined.
  • the advertisement is an actual advertisement; if the color difference does not reach the preset threshold, it is determined that the suspected advertisement is not the actual advertisement.
  • the webpage element marked by the HTML code is an actual advertisement, and a rule for generating a webpage element that intercepts the HTML code mark may be generated according to the generated rule.
  • the rule that intercepts the page element of the HTML code tag intercepts the page element of the HTML code tag in the webpage (ie, the actual ad).
  • the attribute identifier of the webpage element in the source file of the webpage data is obtained, and whether the value of the attribute identifier includes the feature character of the advertisement, and if so, the corresponding webpage element is determined as a suspected advertisement, and further determined according to The color histogram change rate of the suspected advertisement, determining whether the suspected advertisement is an actual advertisement, thereby generating a corresponding advertisement interception rule, and blocking the advertisement in the webpage according to the generated corresponding advertisement interception rule, which is more targeted and more accurate, and is used to block the advertisement Convenience.
  • a second preferred implementation manner of a method for intercepting advertisements in a webpage in the embodiment of the present application includes:
  • the local client can maintain a list of URLs, which have one or more preset URLs, such as the URL http://m.xx.com, which the local client can use according to the URL.
  • the network side sends an access request, and the network side returns the webpage data according to the access request, and the local The client can obtain the webpage data corresponding to the webpage.
  • 302. Determine, according to the webpage data, whether there is a window in the preset size interval in the preset position in the webpage page, and if yes, continue to execute 303; otherwise, end the current process.
  • the preset position may include a top position, a bottom position, left and right sides, and the like.
  • the preset size interval is [30 ⁇ 100, 100 ⁇ 350] pixels, and the preset size interval can be determined according to the screen size of the terminal.
  • the local client sends an access request to the network side according to the web address, and the network side returns the webpage data according to the access request, and the local client can obtain the webpage data corresponding to the webpage, and the top position in the webpage data corresponding to the webpage has the following HTML. Window page element.
  • the actual web page element has a height of 90 pixels and a width of 320 pixels (the same width as the screen of the terminal), and is located at the top position of the page. Therefore, the webpage data corresponding to the window can be considered as a suspected advertisement.
  • whether the suspect advertisement is an actual advertisement in the above 304 and 305 can also determine whether the color difference between the fill color of the suspected advertisement portion and the fill color of the webpage reaches a preset threshold; if the color difference reaches a preset threshold, the suspect is determined.
  • the advertisement is an actual advertisement; if the color difference does not reach the preset threshold, it is determined that the suspected advertisement is not the actual advertisement.
  • the webpage data determining whether there is a window in the preset size interval in the preset position in the webpage page; if the determination result is yes, determining the webpage data corresponding to the window as a suspected advertisement, and further According to the color histogram change rate of the suspected advertisement, it is determined whether the suspected advertisement is an actual advertisement, thereby generating a corresponding advertisement interception rule, and intercepting the advertisement in the webpage according to the generated corresponding advertisement interception rule, so that the webpage can be specifically identified Ads that hold ad slots to make it easier to block ads.
  • a third preferred implementation manner of a method for intercepting advertisements in a webpage in the embodiment of the present application includes:
  • the local client can maintain a list of URLs, which have one or more preset URLs, such as the URL http://wk.xx.com, which the local client can use according to the URL.
  • the network side sends an access request, and the network side returns the webpage data according to the access request, and the local client can obtain the webpage data corresponding to the webpage.
  • the inventor of the present application found that there are many pictures and buttons in a general webpage, and there are few pictures in the advertisement window, generally one picture, and fewer buttons. Therefore, there is a full screen display that is consistent with the screen size and placed on the top layer. In the window, if there is no more than the first preset number of pictures and no more than the second preset number of buttons, it may be determined that the webpage data corresponding to the full-screen displayed window is a suspected advertisement.
  • the window displayed on the top screen in full screen may refer to the position attribute of the window displayed in full screen as the top.
  • the first preset number may have a value range of [1, 3]
  • the second preset number may have a value range of [1, 4].
  • the local client sends an access request to the network side according to the website http://wk.xx.com, and the network side returns the webpage data according to the access request, and the local client can obtain the webpage data corresponding to the webpage, and the HTML of the webpage data.
  • the source file contains the following elements: it satisfies a full-screen large image (the background of the ⁇ div>) with the conditions for placing two buttons ( ⁇ a>) on it.
  • the webpage data corresponding to the full-screen displayed window is determined as a suspected advertisement.
  • whether the suspect advertisement is an actual advertisement in the above 404 and 405 can also determine whether the color difference between the fill color of the suspected advertisement portion and the fill color of the webpage reaches a preset threshold; if the color difference reaches a preset threshold, the suspect is determined.
  • the advertisement is an actual advertisement; if the color difference does not reach the preset threshold, it is determined that the suspected advertisement is not the actual advertisement.
  • the webpage data it is determined whether there is a full-screen display window that is consistent with the screen size and is placed on the top layer, and the full-screen display window has no more than the first preset number of pictures and the second preset If there is a full-screen display window that satisfies the above condition, it is determined that the webpage data corresponding to the full-screen display window is a suspected advertisement, and further determines whether the suspected advertisement is an actual advertisement according to the color histogram change rate of the suspected advertisement.
  • the corresponding advertisement blocking rule is generated, and the advertisement in the webpage is intercepted according to the generated corresponding advertisement blocking rule, and the window advertisement of the full screen display in the recognition webpage is more targeted and more accurate, and the screen advertisement is facilitated.
  • a fourth preferred implementation manner of a method for intercepting advertisements in a webpage in the embodiment of the present application includes:
  • the characteristic characters of the advertisement such as "advertising”, “AD”, “Adv”, “Advert”, or “Advertisement”, and the like.
  • whether the suspect advertisement is an actual advertisement in the above 504 and 505 can also determine whether the color difference between the fill color of the suspected advertisement portion and the fill color of the webpage reaches a preset threshold; If the color difference reaches a preset threshold, it is determined that the suspected advertisement is an actual advertisement; if the color difference does not reach a preset threshold, it is determined that the suspected advertisement is not an actual advertisement.
  • the method implementation process for intercepting advertisements in a webpage is understood by the above description, and the process can be implemented by a device, and the internal structure and function of the device are introduced below.
  • an apparatus for intercepting advertisements in a webpage in the embodiment of the present application includes: an obtaining module 601, an analyzing module 602, a determining module 603, a generating module 604, and an intercepting module 605.
  • the obtaining module 601 is configured to obtain webpage data corresponding to the preset webpage
  • the analyzing module 602 is configured to analyze webpage data to obtain a suspected advertisement
  • the determining module 603 is configured to determine whether the suspected advertisement is an actual advertisement
  • the generating module 604 is configured to generate a corresponding advertisement blocking rule when the suspected advertisement is an actual advertisement
  • the intercepting module 605 is configured to block the advertisement in the webpage according to the generated corresponding advertisement blocking rule.
  • the analysis module 602 is configured to obtain an attribute identifier of the webpage element in the source file of the webpage data; determine whether the value of the attribute identifier includes the feature character of the advertisement; and determine the webpage element corresponding to the attribute identifier of the feature character containing the advertisement as the suspect ad.
  • the analyzing module 602 is configured to determine, according to the webpage data, whether there is a window in the preset size interval in the preset location in the webpage page; when the preset location in the webpage page has a window in the preset size section, the window is The corresponding web page data is determined to be a suspected advertisement.
  • the analyzing module 602 is configured to determine, according to the webpage data, whether there is a full-screen display window that is consistent with the screen size and is placed on the top layer, and the full-screen display window does not exceed the first preset a number of pictures and a second preset number of buttons; when the determination is YES, the web page data corresponding to the window displayed in full screen is determined as a suspected advertisement.
  • the analysis module 602 is configured to determine whether the uniform resource locator URL of the window webpage element in the webpage data is a URL containing the feature character of the advertisement; and the URL of the window webpage element in the webpage data is the URL of the feature character containing the advertisement.
  • the webpage data corresponding to the window webpage element is determined as a suspected advertisement.
  • the determining module 603 is configured to determine, if the color difference between the fill color of the suspected advertisement portion and the fill color of the webpage reaches a preset threshold; determine whether the suspected advertisement is an actual advertisement; or determine whether the suspected advertisement is based on a color histogram change rate of the suspected advertisement For the actual advertisement; when the color histogram change rate of the suspected advertisement is greater than or equal to the preset threshold, it is determined that the suspected advertisement is the actual advertisement.
  • the embodiment of the present application provides a terminal, where the terminal includes:
  • processor a memory, a communication interface, and a bus
  • the processor, the memory, and the communication interface are connected by the bus and complete communication with each other;
  • the memory stores executable program code
  • the processor runs a program corresponding to the executable program code by reading executable program code stored in the memory for:
  • the suspected advertisement is an actual advertisement, generating a corresponding advertisement interception rule
  • Block ads in webpages based on the generated ad blocking rules are Block ads in webpages based on the generated ad blocking rules.
  • the embodiment of the present application provides an application program for executing a method for intercepting an advertisement in a webpage provided by an embodiment of the present application at runtime.
  • methods for intercepting advertisements in webpages include:
  • the suspected advertisement is an actual advertisement, generating a corresponding advertisement interception rule
  • Block ads in webpages based on the generated ad blocking rules are Block ads in webpages based on the generated ad blocking rules.
  • the embodiment of the present application provides a storage medium for storing an application, and the application is used to execute the method for intercepting an advertisement in a webpage provided by the embodiment of the present application.
  • methods for intercepting advertisements in webpages include:
  • the suspected advertisement is an actual advertisement, generating a corresponding advertisement interception rule
  • Block ads in webpages based on the generated ad blocking rules are Block ads in webpages based on the generated ad blocking rules.
  • the suspected advertisement is obtained by analyzing the webpage data corresponding to the preset webpage, and when the suspected advertisement is the actual advertisement, the corresponding advertisement interception rule is generated, and the webpage is intercepted according to the generated corresponding advertisement interception rule.
  • the advertisements automatically filter out suspected advertisements, quickly identify the advertisements, and automatically generate interception rules to facilitate the blocking of advertisements.
  • the embodiment of the present application is more targeted and accurate for identifying advertisements in a webpage.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the device is implemented in a flow chart or Multiple processes and/or block diagrams The functions specified in one or more boxes.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)
  • Computer Hardware Design (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)
  • Data Mining & Analysis (AREA)

Abstract

一种拦截网页中的广告的方法及装置,用于实现自动筛选出疑似广告,快速识别出广告,并自动生成拦截规则,为屏蔽广告提供便利。其中,所述方法包括:获取预置的网址对应的网页数据(101);对所述网页数据进行分析,获得疑似广告(102);判断所述疑似广告是否为实际广告(103);在所述疑似广告为实际广告的情况下,生成相应的广告拦截规则(104);根据生成的相应的广告拦截规则拦截网页中的广告(105)。

Description

一种拦截网页中的广告的方法及装置
本申请要求于2014年03月28日提交中国专利局、申请号为201410124030.3发明名称为“一种拦截网页中的广告的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及网页识别技术领域,尤其涉及一种拦截网页中的广告的方法及装置。
背景技术
随着互联网技术的不断进步,互联网用户的不断增加,电子商务产业取得了蓬勃发展,原来通过电视、楼宇等传播的广告越来越多地流向互联网。
网页中的广告,比如页面顶部的广告、两侧浮游窗口的广告、固定广告位的广告等,一般是网站运营商发布的,或者是关联客户发布的,所以网站运营商不会屏蔽这类广告。但是这类广告会给用户造成干扰,如果用户是使用手机等移动终端来浏览网页,这些广告会占用一定的流量资源。
因此,本申请的发明人发现,如何识别网页中的广告,为屏蔽广告提供便利,成为目前亟待解决的技术问题。
发明内容
为克服相关技术中存在的问题,本申请实施例提供一种拦截网页中的广告的方法及装置,用于实现自动筛选出疑似广告,快速识别出广告,并自动生成拦截规则,为屏蔽广告提供便利。
一方面,本申请实施例提供了一种拦截网页中的广告的方法,包括:
获取预置的网址对应的网页数据;
对所述网页数据进行分析,确定疑似广告;
判断所述疑似广告是否为实际广告;
在所述疑似广告为实际广告的情况下,生成相应的广告拦截规则;
根据生成的相应的广告拦截规则拦截网页中的广告。
根据本申请的一种具体实现方式,所述对所述网页数据进行分析,确定疑似广告的步骤,包括:获取所述网页数据的源文件中网页元素的属性标识; 判断所述属性标识的值中是否包含广告的特征字符;将包含广告的特征字符的属性标识对应的网页元素确定为疑似广告。
根据本申请的一种具体实现方式,所述对所述网页数据进行分析,确定疑似广告的步骤,包括:根据所述网页数据,判断网页页面中的预设位置是否存在预设尺寸区间内的窗口;在判断结果为是的情况下,将所述窗口对应的网页数据确定为疑似广告。
根据本申请的一种具体实现方式,所述对所述网页数据进行分析,确定疑似广告的步骤,包括:根据所述网页数据,判断是否存在与屏幕大小一致且置于顶层的全屏显示的窗口,所述全屏显示的窗口内存在不超过第一预设个数的图片和不超过第二预设个数的按钮;在判断结果为是的情况下,将所述全屏显示的窗口对应的网页数据确定为疑似广告。
根据本申请的一种具体实现方式,所述对所述网页数据进行分析,确定疑似广告的步骤,包括:判断所述网页数据中的窗口网页元素的统一资源定位符URL是否为包含广告的特征字符的URL;在判断结果为是的情况下,将所述窗口网页元素对应的网页数据确定为疑似广告。
根据本申请的一种具体实现方式,所述判断所述疑似广告是否为实际广告的步骤,包括:如果所述疑似广告部分的填充颜色与网页的填充颜色的色差达到预设的阈值;确定所述疑似广告为实际广告;或者根据所述疑似广告的颜色直方图变化率,判断所述疑似广告是否为实际广告;如果所述疑似广告的颜色直方图变化率大于或等于预设阈值,确定所述疑似广告为实际广告。
另一方面,本申请实施例提供了一种拦截网页中的广告的装置,包括:
获取模块,用于获取预置的网址对应的网页数据;
分析模块,用于对所述网页数据进行分析,确定疑似广告;
判断模块,用于判断所述疑似广告是否为实际广告;
生成模块,用于在所述疑似广告为实际广告的情况下,生成相应的广告拦截规则;
拦截模块,用于根据生成的相应的广告拦截规则拦截网页中的广告。
根据本申请的一种具体实现方式,所述分析模块用于获取所述网页数据的源文件中网页元素的属性标识;判断所述属性标识的值中是否包含广告的 特征字符;将包含广告的特征字符的属性标识对应的网页元素确定为疑似广告。
根据本申请的一种具体实现方式,所述分析模块用于根据所述网页数据,判断网页页面中的预设位置是否存在预设尺寸区间内的窗口;在判断结果为是的情况下,将所述窗口对应的网页数据确定为疑似广告。
根据本申请的一种具体实现方式,所述分析模块用于根据所述网页数据,判断是否存在与屏幕大小一致且置于顶层的全屏显示的窗口,所述全屏显示的窗口内存在不超过第一预设个数的图片和不超过第二预设个数的按钮;在判断结果为是的情况下,将所述全屏显示的窗口对应的网页数据确定为疑似广告。
根据本申请的一种具体实现方式,所述分析模块用于判断所述网页数据中的窗口网页元素的统一资源定位符URL是否为包含广告的特征字符的URL;在判断结果为是的情况下,将所述窗口网页元素对应的网页数据确定为疑似广告。
根据本申请的一种具体实现方式,所述判断模块用于如果所述疑似广告部分的填充颜色与网页的填充颜色的色差达到预设的阈值;确定所述疑似广告为实际广告;或者根据所述疑似广告的颜色直方图变化率,判断所述疑似广告是否为实际广告;如果所述疑似广告的颜色直方图变化率大于或等于预设阈值,确定所述疑似广告为实际广告。
为达到上述目的,本申请实施例还公开了一种终端,所述终端包括:
处理器、存储器、通信接口和总线;
所述处理器、所述存储器和所述通信接口通过所述总线连接并完成相互间的通信;
所述存储器存储可执行程序代码;
所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于:
获取预置的网址对应的网页数据;
对所述网页数据进行分析,获得疑似广告;
判断所述疑似广告是否为实际广告;
当所述疑似广告为实际广告时,生成相应的广告拦截规则;
根据生成的相应的广告拦截规则拦截网页中的广告。
本申请实施例还公开了一种应用程序,该应用程序用于在运行时执行本申请实施例所述的拦截网页中的广告的方法。
本申请实施例还公开了一种存储介质,用于存储应用程序,所述应用程序用于执行本申请实施例所述的拦截网页中的广告的方法。
本申请实施例提供的技术方案可以包括以下有益效果:通过对预置的网址对应的网页数据进行分析,获得疑似广告,当疑似广告为实际广告时,生成相应的广告拦截规则,并根据生成的相应的广告拦截规则拦截网页中的广告,实现自动筛选出疑似广告,快速识别出广告,并自动生成拦截规则,为屏蔽广告提供便利。
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本申请。
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
附图说明
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种拦截网页中的广告的方法的主要流程图;
图2为本申请实施例提供的一种拦截网页中的广告的方法的第一种优选的实施方式流程图;
图3为本申请实施例提供的一种拦截网页中的广告的方法的第二种优选的实施方式流程图;
图4为本申请实施例提供的一种拦截网页中的广告的方法的第三种优选的实施方式流程图;
图5为本申请实施例提供的一种拦截网页中的广告的方法的第四种优选的实施方式流程图;
图6为本申请实施例提供的一种拦截网页中的广告的装置的结构示意图。
具体实施方式
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例中通过对预置的网址对应的网页数据进行分析,确定疑似广告,在疑似广告为实际广告的情况下,生成相应的广告拦截规则,并根据生成的相应的广告拦截规则拦截网页中的广告,实现自动筛选出疑似广告,快速识别出广告,并自动生成拦截规则,为屏蔽广告提供便利。本申请实施例对拦截网页中的广告更有针对性、更准确。
参见图1所示,本申请实施例中一种拦截网页中的广告的方法的主要流程,包括:
101、获取预置的网址对应的网页数据。
本地客户端可以根据预置的网址向网络侧发送访问请求,网络侧根据该访问请求返回网页数据,本地客户端即可获取网页数据。
为了便于网址的查询,本地客户端可以维护一个网址列表,该网址列表中存有一个或多个预置的网址。可以由人工对该网址列表进行更新,也可以由系统自动对该网址列表进行更新。网页数据可以存在于网页的源文件中,网页的源文件可以包括:超文本标记语言(Hyper Text Markup Language,HTML)源文件、可扩展超文本标记语言(Extensible HyperText Markup Language,XHTML)源文件等。
102、对网页数据进行分析,获得疑似广告。
优选地,上述102可以有以下多种实现方式:
如方式A1,获取网页数据的源文件中网页元素的属性标识;判断属性标识的值中是否包含广告的特征字符;如果包含,则将对应的网页元素确定为 疑似广告。例如,获取网页数据的HTML源文件中网页元素的属性标识,这里,属性标识如标记名(Tagname)、身份标识(Identity,ID)或者类(class)等,其中,Tagname=“XXX-AD”,ID=“XX-BJ”,class=“广告”;判断属性标识的值中是否包含广告的特征字符,其中,广告的特征字符如“广告”、“AD”、“Adv”、“Advert”、或者“Advertisement”等;如果包含,则将对应的网页元素确定为疑似广告。通过判断可知,属性标识Tagname的值中包含“AD”,属性标识class的值中包含“广告”,因此将属性标识Tagname和class对应的网页元素确定为疑似广告。
如方式A2,根据网页数据,判断网页页面中的预设位置是否存在预设尺寸区间内的窗口;在判断结果为是的情况下,将窗口对应的网页数据确定为疑似广告。例如,预设位置可包括顶部位置、底部位置、左右两侧位置等。预设尺寸区间如[30×100,100×350]像素,预设尺寸区间可以根据终端的屏幕大小来确定。这样可以有针对性地识别出网页中固定广告位的广告。
如方式A3,根据网页数据,判断是否存在与屏幕大小一致且置于顶层的全屏显示的窗口,该全屏显示的窗口内,存在不超过第一预设个数的图片和不超过第二预设个数的按钮;如果存在满足上述条件的全屏显示的窗口,则确定该全屏显示的窗口对应的网页数据为疑似广告。由于全屏显示的窗口可能是一般的网页,也可能是广告,本申请的发明人发现一般网页内的图片较多且按钮较多,广告窗口内的图片较少,一般为一个图片,按钮也较少,因此第一预设个数的取值范围可以为[1,3],第二预设个数的取值范围可以为[1,4]。
如果不存在全屏显示的窗口,或全屏显示的窗口未置于顶层,或全屏显示的窗口内不存在图片或按钮,或全屏显示的窗口内的图片或按钮超过预设个数,则确定该窗口对应的网页数据不为疑似广告。
其中,置于顶层的全屏显示的窗口可以是指全屏显示的窗口的位置属性为置顶。
如方式A4,根据网页数据,判断网页页面中的窗口网页元素的统一资源定位符(Uniform Resource Locator,URL)是否为包含广告的特征字符的URL;在判断结果为是的情况下,将该窗口网页元素对应的网页数据确定为疑似广 告。网页页面中的窗口网页元素通常位于网页页面的局部,且不同于前述方式A3中的全屏显示的窗口。
103、判断疑似广告是否为实际广告,若是,则继续执行104;否则,结束本次流程。
优选地,上述103可以通过识别色差的方式判断疑似广告是否为实际广告,例如,如果所述疑似广告部分的填充颜色与网页的填充颜色有明显色差,如,色差达到预设的阈值,确定所述疑似广告为实际广告。或者,也可以通过识别颜色直方图变化的方式判断疑似广告是否为实际广告,例如,根据疑似广告的颜色直方图变化率,判断疑似广告是否为实际广告;当疑似广告的颜色直方图变化率大于或等于预设阈值时,确定疑似广告为实际广告。当然,还可以采用其它自动识别的方式判断疑似广告是否为实际广告。
104、当疑似广告为实际广告时,生成相应的广告拦截规则。
具体的,在疑似广告为实际广告的情况下,生成相应的广告拦截规则。
105、根据生成的相应的广告拦截规则拦截网页中的广告。
例如,网址B的首页顶部的疑似广告被确定为实际广告,则生成需要屏蔽网址B的首页顶部的内容的拦截规则。当用户打开网址B的首页时,系统可以自动根据该拦截规则屏蔽网址B的首页顶部的内容。
以上介绍了图1所示的实施例中各环节的多种实现方式,下面通过几个实施例来详细介绍实现过程。
参见图2所示,本申请实施例中一种拦截网页中的广告的方法的第一种优选的实施方式,包括:
201、获取预置的网址对应的网页数据。
为了便于网址的查询,本地客户端可以维护一个网址列表,该网址列表中存有一个或多个预置的网址,如网址http://xx.com,本地客户端可以根据该网址向网络侧发送访问请求,网络侧根据该访问请求返回网页数据,本地客户端即可获取该网址对应的网页数据。
202、获取网页数据的源文件中网页元素的属性标识。
网页数据可以存在于网页的源文件中,例如,获取网页数据的HTML源文件中网页元素的属性标识,属性标识如Tagname、ID、或者class等。
203、判断属性标识的值中是否包含广告的特征字符,若是,则继续执行204;否则,结束本次流程。
其中,广告的特征字符,如“广告”、“AD”、“Adv”、“Advert”、或者“Advertisement”等。
204、将包含广告的特征字符的属性标识对应的网页元素确定为疑似广告。
例如,本地客户端根据网址向网络侧发送访问请求,网络侧根据该访问请求返回网页数据,本地客户端即可获取该网址对应的网页数据,该网址对应的网页数据中包含以下HTML代码。根据本申请实施例提供的技术方案,可以通过判断属性标识class的值中是否包含广告的特征字符(如“广告”、“AD”、“Adv”、“Advert”、或者“Advertisement”等)将疑似广告识别出来。
<h3class="advertise"><span>
<a href="http://sax.xx.com.cn/click?type=3&amp;
t=MjAxNC0wMy0xMSAxMDo0MToyOAkyMjAuMTgxLjQyLjE5NQlhMjhmYjEzZDQ4NTE5NDEwOGIzMjQwZjYwMTIwNTI5OQlodHRwOi8vc2luYS5jbj9yZWY9aHR0cCUzQSUyRiUyRnd3dy5zaW5hLmNvbSUyRiZmcm9tPXRvd2FwJnZ0PTQJUERQUzAwMDAwMDAzNzg5MAllNzMwNTk1Ny0wYTBiLTQyYjktODQyYS1kNWFjMjAxMmNiZjkJQ0E1NzgzREY2MTE4CUNBNTc4M0RGNjExOAktCS0JMzAyMDAwfDMwMjAwMAlDQTU3ODNERjYxMTgJTkIxMzEwMDEyNwkJQ0E1NzgzREY2MTE4CVdBUAktCTI3CS0JLQktCS0JLQktCS0JLQky&amp;url=http%3a%2f%2fds.xi-ge.net&amp;pos=108&amp;vt=4">流落民间宫廷滋补秘方(必看)</a>
<img src="http://sax.xx.com.cn/view?type=3&amp;
t=MjAxNC0wMy0xMSAxMDo0MToyOAkyMjAuMTgxLjQyLjE5NQlhMjhmYjEzZDQ4NTE5NDEwOGIzMjQwZjYwMTIwNTI5OQlodHRwOi8vc2luYS5jbj9yZWY9aHR0cCUzQSUyRiUyRnd3dy5zaW5hLmNvbSUyRiZmcm9tPXRvd2FwJnZ0PTQJUERQUzAwMDAwMDAzNzg5MAllNzMwNTk1Ny0wYTBiLTQyYjktODQyYS1kNWFjMjAxMmNiZjkJQ0E1NzgzREY2MTE4CUNBNTc4M0RGNjExOAktCS0JMzAyMDAwfDMwMjAwMAlDQTU3ODNERjYxMTgJTkIxMzEwMDEyNwkJQ0E1NzgzREY2MTE4CVdBUAktCTI3CS0JLQktCS0JLQktCS0J LQky"alt="pv_monitor"style="display:none;"></span>
</h3>
通过判断可知,属性标识class="advertise",其包含广告的特征字符,因此将属性标识class对应的网页元素确定为疑似广告,即将上述HTML代码标记的网页元素确定为疑似广告。
205、确定疑似广告的颜色直方图变化率。
206、判断疑似广告的颜色直方图变化率是否大于或等于预设阈值,若是,则继续执行207;否则,结束本次流程。
优选地,上述205和206中判断疑似广告是否为实际广告还可以通过判断疑似广告部分的填充颜色与网页的填充颜色的色差是否达到预设的阈值;如果色差达到预设的阈值,则确定疑似广告为实际广告;如果色差没有达到预设的阈值,则确定疑似广告不为实际广告。
207、确定疑似广告为实际广告,并生成相应的广告拦截规则。
208、根据生成的相应的广告拦截规则拦截网页中的广告。
如果上述HTML代码标记的网页元素的颜色直方图变化率大于或等于预设阈值,则该HTML代码标记的网页元素为实际广告,且生成拦截该HTML代码标记的网页元素的规则,可以根据生成的拦截该HTML代码标记的网页元素的规则拦截网页中HTML代码标记的网页元素(即为实际广告)。
本申请实施例中,获取网页数据的源文件中网页元素的属性标识,并判断属性标识的值中是否包含广告的特征字符,如果包含,则将对应的网页元素确定为疑似广告,并进一步根据疑似广告的颜色直方图变化率,判断疑似广告是否为实际广告,从而生成相应的广告拦截规则,根据生成的相应的广告拦截规则拦截网页中的广告,更有针对性、更准确,为屏蔽广告提供便利。
参见图3所示,本申请实施例中一种拦截网页中的广告的方法的第二种优选的实施方式,包括:
301、获取预置的网址对应的网页数据。
为了便于网址的查询,本地客户端可以维护一个网址列表,该网址列表中存有一个或多个预置的网址,如网址http://m.xx.com,本地客户端可以根据该网址向网络侧发送访问请求,网络侧根据该访问请求返回网页数据,本地 客户端即可获取该网址对应的网页数据。
302、根据网页数据,判断网页页面中的预设位置是否存在预设尺寸区间内的窗口,若是,则继续执行303;否则,结束本次流程。
其中,预设位置可包括顶部位置、底部位置、左右两侧位置等。预设尺寸区间如[30×100,100×350]像素,预设尺寸区间可以根据终端的屏幕大小来确定。
303、将窗口对应的网页数据确定为疑似广告。
例如,本地客户端根据网址向网络侧发送访问请求,网络侧根据该访问请求返回网页数据,本地客户端即可获取该网址对应的网页数据,该网址对应的网页数据中的顶部位置有以下HTML窗口网页元素。通过判断可知,该窗口网页元素实际高度为90像素,宽度为320像素(与终端的屏幕等宽),位于页面顶部位置,因此可以认为该窗口对应的网页数据为疑似广告。
Figure PCTCN2015072515-appb-000001
Figure PCTCN2015072515-appb-000002
304、确定疑似广告的颜色直方图变化率。
305、判断疑似广告的颜色直方图变化率是否大于或等于预设阈值,若是,则继续执行306;否则,结束本次流程。
优选地,上述304和305中判断疑似广告是否为实际广告还可以通过判断疑似广告部分的填充颜色与网页的填充颜色的色差是否达到预设的阈值;如果色差达到预设的阈值,则确定疑似广告为实际广告;如果色差没有达到预设的阈值,则确定疑似广告不为实际广告。
306、确定疑似广告为实际广告,并生成相应的广告拦截规则。
307、根据生成的相应的广告拦截规则拦截网页中的广告。
本申请实施例中,根据网页数据,判断网页页面中的预设位置是否存在预设尺寸区间内的窗口;在判断结果为是的情况下,将窗口对应的网页数据确定为疑似广告,并进一步根据疑似广告的颜色直方图变化率,判断疑似广告是否为实际广告,从而生成相应的广告拦截规则,根据生成的相应的广告拦截规则拦截网页中的广告,这样可以有针对性地识别出网页中固定广告位的广告,为屏蔽广告提供便利。
参见图4所示,本申请实施例中一种拦截网页中的广告的方法的第三种优选的实施方式,包括:
401、获取预置的网址对应的网页数据。
为了便于网址的查询,本地客户端可以维护一个网址列表,该网址列表中存有一个或多个预置的网址,如网址http://wk.xx.com,本地客户端可以根据该网址向网络侧发送访问请求,网络侧根据该访问请求返回网页数据,本地客户端即可获取该网址对应的网页数据。
402、根据网页数据,判断是否存在与屏幕大小一致且置于顶层的全屏显示的窗口,该全屏显示的窗口内存在不超过第一预设个数的图片和第二预设个数的按钮,若存在满足上述条件的全屏显示的窗口,则继续执行403;否则,结束本次流程。
本申请的发明人发现一般网页内的图片较多且按钮较多,广告窗口内的图片较少,一般为一个图片,按钮也较少,因此,存在与屏幕大小一致且置于顶层的全屏显示的窗口内,存在不超过第一预设个数的图片和不超过第二预设个数的按钮的情况下,可确定该全屏显示的窗口对应的网页数据为疑似广告。
其中,置于顶层的全屏显示的窗口可以是指全屏显示的窗口的位置属性为置顶。第一预设个数的取值范围可以为[1,3],第二预设个数的取值范围可以为[1,4]。
403、确定该全屏显示的窗口对应的网页数据为疑似广告。
例如,本地客户端根据网址http://wk.xx.com向网络侧发送访问请求,网络侧根据该访问请求返回网页数据,本地客户端即可获取该网址对应的网页数据,网页数据的HTML源文件中包含如下元素:它满足一张全屏大图(<div>的background),上面放置两个按钮(<a>)的条件。根据本申请实施例提供的技术方案,将该全屏显示的窗口对应的网页数据确定为疑似广告。
<div id="h_native"class="yd_na"style="height:568px;">
<div class="YDna"id="nativeRcmd"style="background-image:url(http://
img.xx.com/img/iknow/wenku/1136x640.jpg);background-size:320px auto;">
<div class="btnCon"style="padding-top:150px">
<a class="dlrn"href="http://yuedu.xx.com/apps?fr=1024"bind-fun=
"closeForever">立即下载</a>
<a class="downloadLater"bind-fun="closeDay">以后再说</a>
</div>
</div>
</div>
404、确定疑似广告的颜色直方图变化率。
405、判断疑似广告的颜色直方图变化率是否大于或等于预设阈值,若是,则继续执行406;否则,结束本次流程。
优选地,上述404和405中判断疑似广告是否为实际广告还可以通过判断疑似广告部分的填充颜色与网页的填充颜色的色差是否达到预设的阈值;如果色差达到预设的阈值,则确定疑似广告为实际广告;如果色差没有达到预设的阈值,则确定疑似广告不为实际广告。
406、确定疑似广告为实际广告,并生成相应的广告拦截规则。
407、根据生成的相应的广告拦截规则拦截网页中的广告。
本申请实施例中,根据网页数据,判断是否存在与屏幕大小一致且置于顶层的全屏显示的窗口,该全屏显示的窗口内存在不超过第一预设个数的图片和第二预设个数的按钮;如果存在满足上述条件的全屏显示的窗口,则确定该全屏显示的窗口对应的网页数据为疑似广告,并进一步根据疑似广告的颜色直方图变化率,判断疑似广告是否为实际广告,从而生成相应的广告拦截规则,根据生成的相应的广告拦截规则拦截网页中的广告,对识别网页中的全屏显示的窗口广告更有针对性、更准确,为屏蔽广告提供便利。
参见图5所示,本申请实施例中一种拦截网页中的广告的方法的第四种优选的实施方式,包括:
501、获取预置的网址对应的网页数据。
502、根据网页数据,判断网页页面中的窗口网页元素的URL是否为包含广告的特征字符的URL,若是,则继续执行503;否则,结束本次流程。
其中,广告的特征字符,如“广告”、“AD”、“Adv”、“Advert”、或者“Advertisement”等。
503、将该窗口网页元素对应的网页数据确定为疑似广告。
504、确定疑似广告的颜色直方图变化率。
505、判断疑似广告的颜色直方图变化率是否大于或等于预设阈值,若是,则继续执行506;否则,结束本次流程。
优选地,上述504和505中判断疑似广告是否为实际广告还可以通过判断疑似广告部分的填充颜色与网页的填充颜色的色差是否达到预设的阈值; 如果色差达到预设的阈值,则确定疑似广告为实际广告;如果色差没有达到预设的阈值,则确定疑似广告不为实际广告。
506、确定疑似广告为实际广告,并生成相应的广告拦截规则。
507、根据生成的相应的广告拦截规则拦截网页中的广告。
本申请实施例中,根据网页数据,判断网页页面中的窗口网页元素的URL是否为包含广告的特征字符的URL;在判断结果为是的情况下,将该窗口网页元素对应的网页数据确定为疑似广告,并进一步根据疑似广告的颜色直方图变化率,判断疑似广告是否为实际广告,从而生成相应的广告拦截规则,根据生成的相应的广告拦截规则拦截网页中的广告。
需要说明的是,实际应用中,上述所有可选实施方式可以采用结合的方式任意组合,形成本申请的可选实施例,在此不再一一赘述。
通过以上描述了解了拦截网页中的广告的方法实现过程,该过程可由装置实现,下面对装置的内部结构和功能进行介绍。
基于同一发明构思,参见图6所示,本申请实施例中一种拦截网页中的广告的装置包括:获取模块601、分析模块602、判断模块603、生成模块604和拦截模块605。
获取模块601,用于获取预置的网址对应的网页数据;
分析模块602,用于对网页数据进行分析,获得疑似广告;
判断模块603,用于判断疑似广告是否为实际广告;
生成模块604,用于当疑似广告为实际广告时,生成相应的广告拦截规则;
拦截模块605,用于根据生成的相应的广告拦截规则拦截网页中的广告。
优选地,分析模块602用于获取网页数据的源文件中网页元素的属性标识;判断属性标识的值中是否包含广告的特征字符;将包含广告的特征字符的属性标识对应的网页元素确定为疑似广告。
优选地,分析模块602用于根据网页数据,判断网页页面中的预设位置是否存在预设尺寸区间内的窗口;当网页页面中的预设位置存在预设尺寸区间内的窗口时,将窗口对应的网页数据确定为疑似广告。
优选地,分析模块602用于根据网页数据,判断是否存在与屏幕大小一致且置于顶层的全屏显示的窗口,全屏显示的窗口内存在不超过第一预设个 数的图片和第二预设个数的按钮;当判断为是时,将全屏显示的窗口对应的网页数据确定为疑似广告。
优选地,分析模块602用于判断网页数据中的窗口网页元素的统一资源定位符URL是否为包含广告的特征字符的URL;当网页数据中的窗口网页元素的URL为包含广告的特征字符的URL时,将窗口网页元素对应的网页数据确定为疑似广告。
优选地,判断模块603用于如果疑似广告部分的填充颜色与网页的填充颜色的色差达到预设的阈值;确定疑似广告为实际广告;或者根据疑似广告的颜色直方图变化率,判断疑似广告是否为实际广告;当疑似广告的颜色直方图变化率大于或等于预设阈值时,确定疑似广告为实际广告。
另外,本申请实施例提供了一种终端,该终端包括:
处理器、存储器、通信接口和总线;
所述处理器、所述存储器和所述通信接口通过所述总线连接并完成相互间的通信;
所述存储器存储可执行程序代码;
所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于:
获取预置的网址对应的网页数据;
对所述网页数据进行分析,获得疑似广告;
判断所述疑似广告是否为实际广告;
当所述疑似广告为实际广告时,生成相应的广告拦截规则;
根据生成的相应的广告拦截规则拦截网页中的广告。
本申请实施例提供了一种应用程序,该应用程序用于在运行时执行本申请实施例提供的拦截网页中的广告的方法。其中,拦截网页中的广告的方法,包括:
获取预置的网址对应的网页数据;
对所述网页数据进行分析,获得疑似广告;
判断所述疑似广告是否为实际广告;
当所述疑似广告为实际广告时,生成相应的广告拦截规则;
根据生成的相应的广告拦截规则拦截网页中的广告。
本申请实施例提供了一种存储介质,用于存储应用程序,该应用程序用于执行本申请实施例提供的拦截网页中的广告的方法。其中,拦截网页中的广告的方法,包括:
获取预置的网址对应的网页数据;
对所述网页数据进行分析,获得疑似广告;
判断所述疑似广告是否为实际广告;
当所述疑似广告为实际广告时,生成相应的广告拦截规则;
根据生成的相应的广告拦截规则拦截网页中的广告。
本申请实施例中,通过对预置的网址对应的网页数据进行分析,获得疑似广告,当疑似广告为实际广告时,生成相应的广告拦截规则,并根据生成的相应的广告拦截规则拦截网页中的广告,实现自动筛选出疑似广告,快速识别出广告,并自动生成拦截规则,为屏蔽广告提供便利。本申请实施例对识别网页中的广告更有针对性、更准确。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或 多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (12)

  1. 一种拦截网页中的广告的方法,其特征在于,包括:
    获取预置的网址对应的网页数据;
    对所述网页数据进行分析,确定疑似广告;
    判断所述疑似广告是否为实际广告;
    在所述疑似广告为实际广告的情况下,生成相应的广告拦截规则;
    根据生成的相应的广告拦截规则拦截网页中的广告。
  2. 如权利要求1所述的方法,其特征在于,所述对所述网页数据进行分析,确定疑似广告的步骤,包括:
    获取所述网页数据的源文件中网页元素的属性标识;
    判断所述属性标识的值中是否包含广告的特征字符;
    将包含广告的特征字符的属性标识对应的网页元素确定为疑似广告。
  3. 如权利要求1所述的方法,其特征在于,所述对所述网页数据进行分析,确定疑似广告的步骤,包括:
    根据所述网页数据,判断网页页面中的预设位置是否存在预设尺寸区间内的窗口;
    在判断结果为是的情况下,将所述窗口对应的网页数据确定为疑似广告。
  4. 如权利要求1所述的方法,其特征在于,所述对所述网页数据进行分析,确定疑似广告的步骤,包括:
    根据所述网页数据,判断是否存在与屏幕大小一致且置于顶层的全屏显示的窗口,所述全屏显示的窗口内,存在不超过第一预设个数的图片和不超过第二预设个数的按钮;
    在判断结果为是的情况下,将所述全屏显示的窗口对应的网页数据确定为疑似广告。
  5. 如权利要求1所述的方法,其特征在于,所述对所述网页数据进行分析,确定疑似广告的步骤,包括:
    判断所述网页数据中的窗口网页元素的统一资源定位符URL是否为包含广告的特征字符的URL;
    在判断结果为是的情况下,将所述窗口网页元素对应的网页数据确定为 疑似广告。
  6. 如权利要求1至5中任意一项所述的方法,其特征在于,所述判断所述疑似广告是否为实际广告的步骤,包括:
    如果所述疑似广告部分的填充颜色与网页的填充颜色的色差达到预设的阈值;确定所述疑似广告为实际广告;或者
    根据所述疑似广告的颜色直方图变化率,判断所述疑似广告是否为实际广告;如果所述疑似广告的颜色直方图变化率大于或等于预设阈值,确定所述疑似广告为实际广告。
  7. 一种拦截网页中的广告的装置,其特征在于,包括:
    获取模块,用于获取预置的网址对应的网页数据;
    分析模块,用于对所述网页数据进行分析,确定疑似广告;
    判断模块,用于判断所述疑似广告是否为实际广告;
    生成模块,用于在所述疑似广告为实际广告的情况下,生成相应的广告拦截规则;
    拦截模块,用于根据生成的相应的广告拦截规则拦截网页中的广告。
  8. 如权利要求7所述的装置,其特征在于,所述分析模块用于获取所述网页数据的源文件中网页元素的属性标识;判断所述属性标识的值中是否包含广告的特征字符;将包含广告的特征字符的属性标识对应的网页元素确定为疑似广告。
  9. 如权利要求7所述的装置,其特征在于,所述分析模块用于根据所述网页数据,判断网页页面中的预设位置是否存在预设尺寸区间内的窗口;在判断结果为是的情况下,将所述窗口对应的网页数据确定为疑似广告。
  10. 如权利要求7所述的装置,其特征在于,所述分析模块用于根据所述网页数据,判断是否存在与屏幕大小一致且置于顶层的全屏显示的窗口,所述全屏显示的窗口内存在不超过第一预设个数的图片和不超过第二预设个数的按钮;在判断结果为是的情况下,将所述全屏显示的窗口对应的网页数据确定为疑似广告。
  11. 如权利要求7所述的装置,其特征在于,所述分析模块用于判断所述网页数据中的窗口网页元素的统一资源定位符URL是否为包含广告的特征 字符的URL;在判断结果为是的情况下,将所述窗口网页元素对应的网页数据确定为疑似广告。
  12. 如权利要求7所述的装置,其特征在于,所述判断模块用于如果所述疑似广告部分的填充颜色与网页的填充颜色的色差达到预设的阈值;确定所述疑似广告为实际广告;或者根据所述疑似广告的颜色直方图变化率,判断所述疑似广告是否为实际广告;如果所述疑似广告的颜色直方图变化率大于或等于预设阈值,确定所述疑似广告为实际广告。
PCT/CN2015/072515 2014-03-28 2015-02-09 一种拦截网页中的广告的方法及装置 WO2015143956A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410124030.3A CN103886088B (zh) 2014-03-28 2014-03-28 一种拦截网页中的广告的方法及装置
CN201410124030.3 2014-03-28

Publications (1)

Publication Number Publication Date
WO2015143956A1 true WO2015143956A1 (zh) 2015-10-01

Family

ID=50954980

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/072515 WO2015143956A1 (zh) 2014-03-28 2015-02-09 一种拦截网页中的广告的方法及装置

Country Status (2)

Country Link
CN (1) CN103886088B (zh)
WO (1) WO2015143956A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11210331B2 (en) 2019-05-23 2021-12-28 Google Llc Cross-platform content muting

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886088B (zh) * 2014-03-28 2017-05-17 北京金山网络科技有限公司 一种拦截网页中的广告的方法及装置
CN104572798A (zh) * 2014-07-25 2015-04-29 上海二三四五网络科技有限公司 一种用于处理网页的方法、设备与系统
CN104239422B (zh) * 2014-08-21 2018-05-08 小米科技有限责任公司 广告识别方法及装置、电子设备
CN104199934B (zh) * 2014-09-05 2017-07-04 北京奇虎科技有限公司 针对应用程序的广告进行拦截的方法及装置
CN104965838B (zh) * 2014-09-11 2018-03-16 腾讯科技(深圳)有限公司 页面元素处理方法及页面元素处理装置
CN104202346A (zh) * 2014-09-29 2014-12-10 联想(北京)有限公司 一种网络连接请求处理方法及装置
CN104462284B (zh) * 2014-11-27 2018-04-13 百度在线网络技术(北京)有限公司 判定网页质量的方法及系统
CN104731868B (zh) * 2015-02-28 2019-02-12 小米科技有限责任公司 拦截广告的方法及装置
CN104780153B (zh) * 2015-03-11 2018-06-19 小米科技有限责任公司 信息过滤方法及装置
CN106033450B (zh) * 2015-03-17 2020-02-14 中兴通讯股份有限公司 一种广告拦截的方法、装置和浏览器
CN106202101B (zh) * 2015-05-06 2020-04-03 腾讯科技(深圳)有限公司 广告识别方法及装置
CN106326316B (zh) * 2015-07-08 2022-11-29 腾讯科技(深圳)有限公司 一种网页广告过滤方法及装置
CN105549975A (zh) * 2015-12-15 2016-05-04 北京金山安全软件有限公司 提示类广告窗口的处理方法及装置
CN106209889B (zh) * 2016-07-25 2019-07-05 北京小米移动软件有限公司 检测网页中劫持信息的方法及装置
WO2018058330A1 (zh) * 2016-09-27 2018-04-05 中兴通讯股份有限公司 广告拦截的方法、装置和浏览器、计算机存储介质
CN107562864A (zh) * 2017-08-30 2018-01-09 努比亚技术有限公司 一种广告屏蔽方法、移动终端及计算机可读存储介质
CN107871017B (zh) * 2017-11-27 2023-05-09 腾讯数码(天津)有限公司 一种信息过滤功能的检测方法及装置
CN108009232A (zh) * 2017-11-29 2018-05-08 北京小米移动软件有限公司 广告屏蔽方法及装置
CN109214864A (zh) * 2018-08-27 2019-01-15 河南丰泰光电科技有限公司 一种广告识别方法及装置、电子设备
CN109344350A (zh) * 2018-09-30 2019-02-15 珠海市君天电子科技有限公司 一种信息处理方法及其设备
CN110457597A (zh) * 2019-08-08 2019-11-15 中科鼎富(北京)科技发展有限公司 一种广告识别方法及装置
CN115379270B (zh) * 2022-08-03 2023-07-14 深圳乐播科技有限公司 视频投屏方法、装置、云端设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1760901A (zh) * 2005-11-03 2006-04-19 上海交通大学 电子邮件过滤系统
CN102332028A (zh) * 2011-10-15 2012-01-25 西安交通大学 一种面向网页的不良Web内容识别方法
CN103530560A (zh) * 2013-09-29 2014-01-22 北京金山网络科技有限公司 广告拦截的方法、装置和客户端
CN103593354A (zh) * 2012-08-15 2014-02-19 腾讯科技(深圳)有限公司 一种过滤网络页面广告的方法、装置、服务器及系统
CN103886088A (zh) * 2014-03-28 2014-06-25 北京金山网络科技有限公司 一种拦截网页中的广告的方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102768664A (zh) * 2011-05-06 2012-11-07 李超 分布式网页广告拦截的方法及系统
CN103605688B (zh) * 2013-11-01 2017-05-10 北京奇虎科技有限公司 一种网页广告的拦截方法、装置和浏览器

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1760901A (zh) * 2005-11-03 2006-04-19 上海交通大学 电子邮件过滤系统
CN102332028A (zh) * 2011-10-15 2012-01-25 西安交通大学 一种面向网页的不良Web内容识别方法
CN103593354A (zh) * 2012-08-15 2014-02-19 腾讯科技(深圳)有限公司 一种过滤网络页面广告的方法、装置、服务器及系统
CN103530560A (zh) * 2013-09-29 2014-01-22 北京金山网络科技有限公司 广告拦截的方法、装置和客户端
CN103886088A (zh) * 2014-03-28 2014-06-25 北京金山网络科技有限公司 一种拦截网页中的广告的方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11210331B2 (en) 2019-05-23 2021-12-28 Google Llc Cross-platform content muting
US11586663B2 (en) 2019-05-23 2023-02-21 Google Llc Cross-platform content muting

Also Published As

Publication number Publication date
CN103886088B (zh) 2017-05-17
CN103886088A (zh) 2014-06-25

Similar Documents

Publication Publication Date Title
WO2015143956A1 (zh) 一种拦截网页中的广告的方法及装置
KR102455232B1 (ko) 콘텍스트 기반 탭 관리를 위한 방법 및 전자 장치
CN107256232B (zh) 一种信息推荐方法和装置
US8898296B2 (en) Detection of boilerplate content
CN106911693B (zh) 用于检测网页内容劫持的方法、装置和终端设备
US20210344765A1 (en) System and method for the capture of mobile behavior, usage, or content exposure
CN106033450B (zh) 一种广告拦截的方法、装置和浏览器
US9934206B2 (en) Method and apparatus for extracting web page content
US10496696B2 (en) Search method and apparatus
CN104486140A (zh) 一种检测网页被劫持的装置及其检测方法
JP6140904B2 (ja) 端末標記方法、端末標記装置、プログラム及び記録媒体
US8966359B2 (en) Web application content mapping
US11886546B2 (en) Systems and methods for dynamically restricting the rendering of unauthorized content included in information resources
JP2021512415A (ja) デジタルコンポーネントのバックドロップレンダリング
US10291492B2 (en) Systems and methods for discovering sources of online content
CN104881452B (zh) 一种资源地址的嗅探方法、装置及系统
CN107180194B (zh) 基于视觉分析系统进行漏洞检测的方法及装置
WO2017148349A1 (zh) 一种浏览网页中缩略图的方法及装置
CN105260383B (zh) 一种用于展现网页图像信息的处理方法及电子设备
US20130230248A1 (en) Ensuring validity of the bookmark reference in a collaborative bookmarking system
CN108399167B (zh) 网页信息提取方法和装置
JP2018506783A (ja) 要素識別子の生成
WO2016035061A1 (en) A system for preloading imagized video clips in a web-page
CN105574177B (zh) 呈现搜索结果的方法及显示设备
WO2018058330A1 (zh) 广告拦截的方法、装置和浏览器、计算机存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15769066

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 06/12/2016)

122 Ep: pct application non-entry in european phase

Ref document number: 15769066

Country of ref document: EP

Kind code of ref document: A1