WO2018171189A1 - 一种浏览器广告拦截方法、装置及终端 - Google Patents

一种浏览器广告拦截方法、装置及终端 Download PDF

Info

Publication number
WO2018171189A1
WO2018171189A1 PCT/CN2017/107605 CN2017107605W WO2018171189A1 WO 2018171189 A1 WO2018171189 A1 WO 2018171189A1 CN 2017107605 W CN2017107605 W CN 2017107605W WO 2018171189 A1 WO2018171189 A1 WO 2018171189A1
Authority
WO
WIPO (PCT)
Prior art keywords
box
pop
popup
advertisement
observable feature
Prior art date
Application number
PCT/CN2017/107605
Other languages
English (en)
French (fr)
Inventor
曹刚
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2018171189A1 publication Critical patent/WO2018171189A1/zh

Links

Images

Definitions

  • the present invention relates to the field of terminal browser application technologies, and in particular, to a browser advertisement intercepting method, device and terminal.
  • the conventional techniques mainly include two methods: one is to establish a sub-resource such as a picture corresponding to a pop-up box advertisement, a JS (JavaScript) script file, and the like.
  • the URL Uniform Resource Locator
  • pop-up ID or CLASS intercepts the blacklist.
  • the browser detects these URLs, it stops the network loading or hides the pop-up box according to the ID and CLASS.
  • the pop-up features themselves, such as CSS (Cascading Style Sheets) features, define the features and rules on the server by the user, and determine whether the pop-up box is filtered and notify the terminal to intercept on the server side.
  • CSS CSS
  • the common feature of the above two methods is that pre-selected features and rules are used to intercept pop-up advertisements.
  • the main drawbacks are:
  • the external information and internal features of the pop-up box are ever-changing. For example, the ID and CLASS information itself is constantly changing dynamically. The insufficient or improper selection of CSS features and rules will cause the new advertisement pop-up box to be intercepted or intercepted.
  • Embodiments of the present invention provide a browser advertisement interception method, device, and terminal, which overcome the prior art for advertising The above drawbacks of the technical solution of the pop-up block.
  • a browser advertisement intercepting method including:
  • any pop-up box according to the value of the identification function corresponding to any of the pop-up boxes, it is determined whether any of the pop-up boxes is an advertisement pop-up box, and if so, interception is performed.
  • the manner of obtaining the popup box includes:
  • a node element having a cascading style sheet position attribute in a tree structure of a web page as a fixed attribute, and an advertisement pop-up box deleted by the user in the web page and/or an intercepted advertisement pop-up box restored by the user are collected as a pop-up box.
  • the observable feature value of the pop-up box is trained to obtain a recognition function, including:
  • a recognition function is determined based on weights of the observable feature values of the popup.
  • the observable feature value of the pop-up box is trained to obtain a recognition function, including:
  • a recognition function is determined based on weights of the respective observable feature values of the popup.
  • the effective observable feature value of the pop-up box is filtered based on the weights of the observable feature values of the pop-up box, including:
  • the weights of the observable feature values of the pop-up frame are compared with the set weight thresholds, and the observable feature values whose weights are greater than the set weight threshold are selected as the effective observable feature values.
  • the setting whether the pop-up box is an identifier of an advertisement pop-up box includes:
  • Whether the popup is an identifier of an advertisement popup is set by a tagging manner and/or a clustering algorithm.
  • the training is performed by using an artificial neural network method
  • the recognition function is a step activation function.
  • the observable feature value of the popup box includes at least one of the following:
  • the relative height of the popup frame relative to the webpage where the popup box is located in the direction of the webpage layer, the relative value of the popup frame relative to the start coordinate position of the webpage where the popup box is located, the area ratio of the popup frame to the terminal window, the URL corresponding to the popup box, and the popup The relevance value of the domain name of the webpage where the box is located, the relevance value of the text rendered by the popup box and the title content of the webpage where the popup box is located.
  • determining whether any of the pop-up boxes is an advertisement pop-up box according to the value of the identification function corresponding to any one of the pop-up boxes includes:
  • any of the pop-up boxes is an advertisement pop-up box, otherwise determining that any of the pop-up boxes is not an advertisement pop-up box.
  • a browser advertisement intercepting apparatus including:
  • a training module configured to train an observable feature value of the popup box to obtain a recognition function
  • the intercepting module is configured to determine, according to the value of the identification function corresponding to any one of the pop-up boxes, whether the pop-up box is an advertisement pop-up box, and if so, intercepting.
  • the device further includes:
  • a collection module configured to collect a node element of a cascading style sheet position attribute in a tree structure as a fixed attribute as a popup box;
  • a node element having a cascading style sheet position attribute in a tree structure of a web page as a fixed attribute, and an advertisement pop-up box deleted by the user in the web page and/or an intercepted advertisement pop-up box restored by the user are collected as a pop-up box.
  • the training module includes:
  • a module configured to set whether the pop-up box is an identifier of an advertisement pop-up box
  • the weight determining module is configured to train the observable feature value of the pop-up box based on the identifier, and obtain the weight of each observable feature value of the pop-up box;
  • a function determining module is configured to determine a recognition function based on weights of the observable feature values of the popup.
  • a terminal comprising a processor and a memory storing the processor-executable instructions, when the instructions are executed by the processor, performing the following operations:
  • any pop-up box according to the value of the identification function corresponding to any of the pop-up boxes, it is determined whether any of the pop-up boxes is an advertisement pop-up box, and if so, interception is performed.
  • the operation performed by the processor specifically includes: collecting, as a pop-up box, a node element in which a tiling style table location attribute in a tree structure of the webpage is a fixed attribute; or
  • a node element having a cascading style sheet position attribute in a tree structure of a web page as a fixed attribute, and an advertisement pop-up box deleted by the user in the web page and/or an intercepted advertisement pop-up box restored by the user are collected as a pop-up box.
  • the method specifically includes the following operations:
  • a recognition function is determined based on weights of the observable feature values of the popup.
  • the method specifically includes the following operations:
  • a recognition function is determined based on weights of the respective observable feature values of the popup.
  • the processor specifically includes the following operations:
  • the processor specifically includes the following operations:
  • Whether the popup is an identifier of an advertisement popup is set by a tagging manner and/or a clustering algorithm.
  • the training is performed by using an artificial neural network method
  • the recognition function is a step activation function.
  • the observable feature value of the popup box includes at least one of the following:
  • the relative height of the popup frame relative to the webpage where the popup box is located in the direction of the webpage layer, the relative value of the popup frame relative to the start coordinate position of the webpage where the popup box is located, the area ratio of the popup frame to the terminal window, the URL corresponding to the popup box, and the popup The relevance value of the domain name of the webpage where the box is located, the relevance value of the text rendered by the popup box and the title content of the webpage where the popup box is located.
  • the processor when the processor performs the step of determining whether any of the pop-up frames is an advertisement pop-up box according to the value of the identification function corresponding to any pop-up box, the processor specifically includes the following operations:
  • any of the pop-up boxes is an advertisement pop-up box, otherwise determining that any of the pop-up boxes is not an advertisement pop-up box.
  • the embodiment of the present invention has at least the following advantages:
  • the browser advertisement intercepting method, device and terminal provided by the embodiments of the present invention mainly adhere to the purpose of “letting the advertisement pop-up box data speak for itself”, and the browser advertisement pop-up box advertisement interception based on artificial intelligence and big data technology for machine learning
  • the method uses machine learning to automatically select features and rules, thus making the interception system more intelligent and generalized, effectively making up for the shortcomings of conventional methods, and thus achieving an excellent user experience.
  • FIG. 1 is a main flowchart of a browser advertisement interception method according to first, second, and third embodiments of the present invention
  • step S102 in a browser advertisement interception method according to a second embodiment of the present invention
  • step S102 in a browser advertisement interception method according to a third embodiment of the present invention
  • FIG. 4 is a schematic diagram showing the main components of a browser advertisement intercepting apparatus according to fourth, fifth, and sixth embodiments of the present invention.
  • FIG. 5 is a schematic structural diagram of a modeling module according to a fifth embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a modeling module according to a sixth embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a workflow of a machine learning-based browser advertisement pop-up block intercepting apparatus according to an eighth embodiment of the present invention.
  • FIG. 8 is a flowchart of main processes of pop-up box sample training according to an eighth embodiment of the present invention.
  • FIG. 9 is a flowchart of main processing for performing advertisement pop-up block interception in the real-time detection phase according to the eighth embodiment of the present invention.
  • the method, device and terminal for intercepting an advertisement pop-up frame proposed by the embodiment of the present invention mainly adhere to the purpose of “letting the advertisement pop-up box data speak for itself”, and automatically select an advertisement pop-up box to intercept required features and rules by using a machine learning method, the main It consists of the following key technical steps:
  • the pre-processing detection is performed, and according to the CSS attribute value of each label element in the web page DOM, the label element whose style position is a fixed attribute (position value is equal to the fixed value) is used as a training sample and a candidate advertisement to be detected next is popped up. frame;
  • the real-time detection phase according to the observed feature values of the candidate pop-up advertisement frame obtained in (1), input to the trained model in (2) to obtain the actual output value (the value may be a Boolean value, and is not limited to Other probability real values, etc.) to determine whether it is an advertisement pop-up box for interception operations (such as hiding or deleting the label element in the web page DOM);
  • the features and rules of the advertisement pop-up box can be automatically obtained, so that the browser can intercept various forms of the advertisement pop-up box in any complicated webpage very accurately.
  • a first embodiment of the present invention includes the following specific steps:
  • Step S101 collecting a popup box in the webpage.
  • the pop-up box is used to initially screen all the node elements in the webpage, specifically by using the attributes that the pop-up box must have, but it is not said that the attribute has a pop-up box, and the subsequent need After training, it can be used to determine the recognition function used to identify the pop-up of the advertisement. Therefore, the pop-up box obtained by the initial screening can be regarded as a suspected advertisement pop-up box. The ad pop-up box must be included in the pop-up box that was collected.
  • the pop-up box in the collecting webpage includes:
  • the node element of the cascading style sheet position attribute CSS-position in the tree structure of the web page is determined as a pop-up box, and the pop-up box is collected.
  • the tree structure of the web page is usually a DOM (Document Object Model) tree structure, and the fixed attributes may be position values, fixed values, and the like.
  • DOM Document Object Model
  • Step S102 training the observable feature value of the popup box to obtain a recognition function.
  • the observable feature value of the pop-up frame is trained by using an artificial neural network method.
  • the observable feature value of the popup includes at least one of the following:
  • the relative height of the popup frame relative to the webpage where the popup box is located in the direction of the webpage layer, the relative value of the popup frame relative to the start coordinate position of the webpage where the popup box is located, the area ratio of the popup frame to the terminal window, the URL corresponding to the popup box, and the popup The relevance value of the domain name of the webpage where the box is located, the relevance value of the text rendered by the popup box and the title content of the webpage where the popup box is located.
  • Step S103 Determine, according to the value of the identification function corresponding to any one of the pop-up boxes, whether the any pop-up box is an advertisement pop-up box for any pop-up box, and if yes, perform interception.
  • a node element having a CSS-position attribute in a tree structure of a webpage as a fixed attribute is determined as a popup box, although some node elements not being an advertisement popup may be included therein, but a node of the advertisement popup box can be guaranteed The elements will not be missed.
  • This step can be considered as a preliminary screening based on the necessary features of the advertisement pop-up box, and subsequent selection of as many observable feature values as possible related to the advertisement pop-up, using artificial neural network methods for these observable features.
  • the value is trained to obtain an artificial neural network model for determining whether the popup is an advertisement popup.
  • the embodiment of the present invention is comprehensive and accurate for the selection of observable feature values, To accurately determine the artificial neural network model, that is, the recognition function, therefore, the judgment of the advertisement pop-up box is relatively accurate.
  • the embodiment of the invention does not need to maintain the blacklist and feature table as in the prior art, reduces the interception cost of the advertisement pop-up box, and can adapt the external information of the pop-up box and the internal features to change the advertisement pop-up box accurately.
  • a second embodiment of the present invention includes the following specific steps:
  • Step S101 collecting a popup box in the webpage.
  • the pop-up box in the collecting webpage includes:
  • the node element of the cascading style sheet position attribute CSS-position in the tree structure of the web page is determined as a pop-up box, and the pop-up box is collected.
  • the tree structure of the web page is usually a DOM (Document Object Model) tree structure, and the fixed attributes may be position values, fixed values, and the like.
  • DOM Document Object Model
  • the pop-up box in the collecting webpage further includes: collecting an advertisement pop-up box deleted by the user in the webpage and/or the intercepted advertisement pop-up frame displayed by the user, as a pop-up box.
  • the embodiment of the present invention can also use the advertisement pop-up box deleted by the user in the webpage and/or the blocked advertisement pop-up frame restored by the user as a pop-up box, and the sample type of the pop-up box is added.
  • the user-deleted advertisement pop-up box indicates that some of the missing advertisement pop-ups are not recognized after being intercepted, and are manually deleted by the user after being seen by the user. This sample has a very recorded meaning and can be used after being trained.
  • the pop-up box is an artificial neural network model of the advertisement pop-up box, that is, the recognition function is improved; the intercepted advertisement pop-up box displayed by the user resumes intercepting some pop-ups that the user does not consider to be an advertisement pop-up box or the user is willing to see
  • the box is also very recordable for this kind of sample.
  • the artificial neural network model for judging whether the pop-up box is an advertisement pop-up box can be improved. Compared with the interception effect of the first embodiment, it is closer to the real needs of the user, and the user experience is improved.
  • Step S102 training an observable feature value of the popup box to obtain a recognition function
  • step S102 includes:
  • A1 setting whether the popup box is an identifier of an advertisement popup box
  • step A1 includes:
  • Whether the popup is an identifier of an advertisement popup is set by means of a mark.
  • the mark can be artificial
  • the tag can also be an auto tag. Whether the pop-up box is the identifier of the advertisement pop-up box is actually a desired determination result.
  • step S101 whether the popup box determined according to the cascading style sheet position attribute CSS-position is the flag of the advertisement popup is manually marked as yes or no.
  • the pop-up box When the advertisement pop-up box deleted by the user is used as a pop-up box, whether the pop-up box is marked as the advertisement pop-up box is manually or automatically marked as yes, and belongs to the leak interception. In the process of the user actually using the browser, the pop-up box is added.
  • the training corrects the selection of the feature value and the weight; when the user intercepts the displayed intercepted advertisement bullet as a pop-up box, whether the pop-up frame is manually or automatically marked as an advertisement pop-up frame is false, and the error is intercepted.
  • the join training of the pop-up box corrects the selection of the feature values and the weights.
  • A2 training, according to the identifier, the observable feature value of the pop-up box, and obtaining the weight of each observable feature value of the pop-up box;
  • the observable feature value of the pop-up box includes:
  • the relative height of the popup frame relative to the webpage where the popup box is located in the direction of the webpage layer, the relative value of the popup frame relative to the start coordinate position of the webpage where the popup box is located, the area ratio of the popup frame to the terminal window, the URL corresponding to the popup box, and the popup The relevance value of the domain name of the webpage where the box is located, the relevance value of the text rendered by the popup box and the title content of the webpage where the popup box is located.
  • the recognition function is determined based on the weights of the observable feature values of the popup.
  • the recognition function may be a step activation function based on a single layer artificial neural network model or a multi-layer artificial neural network model.
  • Step S103 Determine, according to the value of the identification function corresponding to any one of the pop-up boxes, whether the any pop-up box is an advertisement pop-up box for any pop-up box, and if yes, perform interception.
  • step S103 the method includes:
  • any of the pop-up boxes is an advertisement pop-up box, otherwise determining that any of the pop-up boxes is not an advertisement pop-up box.
  • a third embodiment of the present invention includes the following specific steps:
  • Step S101 collecting a popup box in the webpage.
  • the pop-up box in the collecting webpage includes:
  • the tree structure of the web page is usually a DOM (Document Object Model) tree structure, and the fixed attributes may be position values, fixed values, and the like.
  • DOM Document Object Model
  • the pop-up box in the collecting webpage further includes: collecting an advertisement pop-up box deleted by the user in the webpage and/or the intercepted advertisement pop-up frame displayed by the user, as a pop-up box.
  • Step S102 training an observable feature value of the popup box to obtain a recognition function
  • step S102 includes:
  • step B1 includes:
  • whether a part of the pop-up box is an identifier of an advertisement pop-up box is set by means of marking, and whether the remaining pop-up box is an identifier of the advertisement pop-up box is set by a clustering algorithm.
  • the tag can be a manual tag or an auto tag.
  • step S101 in the case where the mark mode is adopted, whether or not the pop-up frame determined based on the cascading style sheet position attribute CSS-position is manually marked as yes or no.
  • the pop-up box When the advertisement pop-up box deleted by the user is used as a pop-up box, whether the pop-up box is marked as the advertisement pop-up box is manually or automatically marked as yes, and belongs to the leak interception. In the process of the user actually using the browser, the pop-up box is added.
  • the training corrects the selection of the feature value and the weight; when the user intercepts the displayed intercepted advertisement bullet as a pop-up box, whether the pop-up frame is manually or automatically marked as an advertisement pop-up frame is false, and the error is intercepted.
  • the join training of the pop-up box corrects the selection of the feature values and the weights.
  • the clustering algorithm such as the K-means method
  • the K-means method can be used to set whether the pop-up box is an identifier of an advertisement pop-up box, and the labor cost is reduced.
  • the observable feature value of the pop-up box includes:
  • the relative height of the popup frame relative to the webpage where the popup box is located in the direction of the webpage layer, the relative value of the popup frame relative to the start coordinate position of the webpage where the popup box is located, the area ratio of the popup frame to the terminal window, the URL corresponding to the popup box, and the popup The relevance value of the domain name of the webpage where the box is located, the text rendered by the popup box, and the label of the webpage where the popup box is located. The relevance value of the question content.
  • step B3 includes:
  • the weights of the observable feature values of the pop-up frame are compared with the set weight thresholds, and the observable feature values whose weights are greater than the set weight threshold are selected as the effective observable feature values.
  • Step S103 Determine, according to the value of the identification function corresponding to any one of the pop-up boxes, whether the any pop-up box is an advertisement pop-up box for any pop-up box, and if yes, perform interception.
  • step S103 the method includes:
  • any of the pop-up boxes is an advertisement pop-up box, otherwise determining that any of the pop-up boxes is not an advertisement pop-up box.
  • the fourth embodiment of the present invention corresponds to the first embodiment.
  • This embodiment introduces a browser advertisement intercepting device. As shown in FIG. 4, the following components are included:
  • a collection module 401 configured to collect popups in a web page.
  • the collecting module 401 is configured to:
  • a node element in which the cascading style sheet position attribute in the tree structure of the web page is a fixed attribute is collected as a popup.
  • the tree structure of a web page is usually a DOM tree structure, and the fixed attribute may be a position value, a fixed value, or the like.
  • the training module 402 is configured to train the observable feature value of the pop-up box to obtain a recognition function
  • the observable feature value of the popup includes at least one of the following:
  • the relative height of the popup frame relative to the webpage where the popup box is located in the direction of the webpage layer, the relative value of the popup frame relative to the start coordinate position of the webpage where the popup box is located, the area ratio of the popup frame to the terminal window, the URL corresponding to the popup box, and the popup The relevance value of the domain name of the webpage where the box is located, the relevance value of the text rendered by the popup box and the title content of the webpage where the popup box is located.
  • the intercepting module 403 is configured to determine, according to the value of the identification function corresponding to any one of the pop-up boxes, whether the pop-up box is an advertisement pop-up box, and if so, intercepting.
  • the fifth embodiment of the present invention corresponds to the second embodiment.
  • This embodiment introduces a browser advertisement intercepting device. As shown in FIG. 4, the following components are included:
  • a collection module 401 configured to collect popups in a web page.
  • the collecting module 401 is configured to:
  • a node element in which the cascading style sheet position attribute in the tree structure of the web page is a fixed attribute is collected as a popup.
  • the tree structure of a web page is usually a DOM tree structure, and the fixed attribute may be a position value, a fixed value, or the like.
  • the collecting module 401 is further configured to: collect the advertisement pop-up box deleted by the user in the webpage and/or the intercepted advertisement pop-up frame displayed by the user, as a pop-up box.
  • the training module 402 is configured to train the observable feature value of the pop-up box to obtain a recognition function
  • the training module 402 includes:
  • the setting module 10 is configured to set whether the popup box is an identifier of an advertisement popup box
  • the setting module 10 is configured to: set, by using a mark, whether the pop-up box is an identifier of an advertisement pop-up box;
  • the pop-up box When the advertisement pop-up box deleted by the user is used as a pop-up box, whether the pop-up box is marked as the advertisement pop-up box is manually or automatically marked as yes, and belongs to the leak interception. In the process of the user actually using the browser, the pop-up box is added.
  • the training corrects the selection of the feature value and the weight; when the user intercepts the displayed intercepted advertisement bullet as a pop-up box, whether the pop-up frame is manually or automatically marked as an advertisement pop-up frame is false, and the error is intercepted.
  • the join training of the pop-up box corrects the selection of the feature values and the weights.
  • the weight determining module 20 is configured to train the observable feature values of the pop-up box based on the identifier to obtain the weights of the observable feature values of the pop-up box;
  • observable feature values of the popup include:
  • the relative height of the popup frame relative to the webpage where the popup box is located in the direction of the webpage layer, the relative value of the popup frame relative to the start coordinate position of the webpage where the popup box is located, the area ratio of the popup frame to the terminal window, the URL corresponding to the popup box, and the popup The relevance value of the domain name of the webpage where the box is located, the text rendered by the popup box, and the label of the webpage where the popup box is located. The relevance value of the question content.
  • the function determination module 30 is configured to determine a recognition function based on weights of the observable feature values of the popup.
  • the intercepting module 403 is configured to determine, according to the value of the identification function corresponding to any one of the pop-up boxes, whether the pop-up box is an advertisement pop-up box, and if so, intercepting.
  • the intercepting module 403 is configured to:
  • any of the pop-up boxes is an advertisement pop-up box, otherwise determining that any of the pop-up boxes is not an advertisement pop-up box.
  • the sixth embodiment of the present invention corresponds to the third embodiment.
  • This embodiment introduces a browser advertisement intercepting device. As shown in FIG. 4, the following components are included:
  • a collection module 401 configured to collect popups in a web page.
  • the collecting module 401 is configured to:
  • a node element in which the cascading style sheet position attribute in the tree structure of the web page is a fixed attribute is collected as a popup.
  • the tree structure of a web page is usually a DOM tree structure, and the fixed attribute may be a position value, a fixed value, or the like.
  • the collecting module 401 is further configured to: collect the advertisement pop-up box deleted by the user in the webpage and/or the intercepted advertisement pop-up frame displayed by the user, as a pop-up box.
  • the training module 402 is configured to train the observable feature value of the pop-up box to obtain a recognition function
  • the modeling module 402 includes:
  • the setting module 10 is configured to set whether the popup box is an identifier of an advertisement popup box
  • the setting module 10 is configured to: set, by using a clustering algorithm, whether the pop-up box is an identifier of an advertisement pop-up box; or, by marking, whether a part of the pop-up box is an identifier of an advertisement pop-up box, and
  • the clustering algorithm sets whether the remaining popups are identifiers of the advertisement popups.
  • the manner of marking includes manual marking or automatic marking.
  • the advertisement pop-up box deleted by the user is used as a pop-up box, whether the pop-up box is the identifier of the advertisement pop-up box is If the user or the automatic mark is yes, it belongs to the leak interception.
  • the feature value selection and the weight are corrected by the join training of the pop-up box; the intercepted advertisement bullet displayed by the user is restored as a pop-up In the frame, whether the pop-up box is manually or automatically marked as false in the pop-up box is false interception, and the selection and weight of the feature value are corrected by the joining training of the pop-up box.
  • the weight determining module 20 is configured to train the observable feature values of the pop-up box based on the identifier to obtain the weights of the observable feature values of the pop-up box;
  • observable feature values of the popup include:
  • the relative height of the popup frame relative to the webpage where the popup box is located in the direction of the webpage layer, the relative value of the popup frame relative to the start coordinate position of the webpage where the popup box is located, the area ratio of the popup frame to the terminal window, the URL corresponding to the popup box, and the popup The relevance value of the domain name of the webpage where the box is located, the relevance value of the text rendered by the popup box and the title content of the webpage where the popup box is located.
  • the screening module 40 is configured to compare the weights of the observable feature values of the pop-up box with the set weight thresholds, and filter out the observable feature values whose weights are greater than the set weight thresholds as effective observable Eigenvalues.
  • the function determination module 30 is configured to determine a recognition function based on weights of the respective observable feature values of the popup.
  • the intercepting module 403 is configured to determine, according to the value of the identification function corresponding to any one of the pop-up boxes, whether the pop-up box is an advertisement pop-up box, and if so, intercepting.
  • the intercepting module 403 is configured to:
  • any of the pop-up boxes is an advertisement pop-up box, otherwise determining that any of the pop-up boxes is not an advertisement pop-up box.
  • a terminal may be understood as a physical device mobile phone or a server, and includes a processor and a memory storing executable instructions of the processor. When the instruction is executed by the processor, performing the following operations :
  • any pop-up box according to the value of the identification function corresponding to any of the pop-up boxes, it is determined whether any of the pop-up boxes is an advertisement pop-up box, and if so, interception is performed.
  • the operations performed by the processor specifically include:
  • a node element having a cascading style sheet position attribute in a tree structure of a web page as a fixed attribute, and an advertisement pop-up box deleted by the user in the web page and/or an intercepted advertisement pop-up box restored by the user are collected as a pop-up box.
  • the tree structure of a web page is usually a DOM tree structure, and the fixed attribute may be a position value, a fixed value, or the like.
  • the processor when the performing the step of training the observable feature value of the pop-up box to obtain a recognition function, the processor specifically includes the following operations:
  • a recognition function is determined based on weights of the observable feature values of the popup.
  • the processor when the performing the step of training the observable feature value of the pop-up box to obtain an identification function, the processor specifically includes the following operations:
  • a recognition function is determined based on weights of the respective observable feature values of the popup.
  • the processor specifically includes the following operations:
  • the weights of the observable feature values of the pop-up frame are compared with the set weight thresholds, and the observable feature values whose weights are greater than the set weight threshold are selected as the effective observable feature values.
  • the processor specifically includes the following operations:
  • Whether the popup is an identifier of an advertisement popup is set by a tagging manner and/or a clustering algorithm.
  • the training is performed by using an artificial neural network method
  • the recognition function is a step activation function.
  • the observable feature value of the popup includes at least one of the following:
  • the relative height of the popup frame relative to the webpage where the popup box is located in the direction of the webpage layer, the relative value of the popup frame relative to the start coordinate position of the webpage where the popup box is located, the area ratio of the popup frame to the terminal window, the URL corresponding to the popup box, and the popup The relevance value of the domain name of the webpage where the box is located, the relevance value of the text rendered by the popup box and the title content of the webpage where the popup box is located.
  • the processor determines, according to the value of the identification function corresponding to any pop-up box, whether the any pop-up box is an advertisement pop-up box
  • the method specifically includes the following operations:
  • any of the pop-up boxes is an advertisement pop-up box, otherwise determining that any of the pop-up boxes is not an advertisement pop-up box.
  • the eighth embodiment of the present invention is based on the above embodiment, and an application example of the present invention is described with reference to Figs.
  • An embodiment of the present invention provides a browser advertisement pop-up block interception method based on machine learning.
  • the apparatus for implementing the interception method is as shown in FIG. 7, and includes: an advertisement pop-up box candidate detection sub-module, features, and decision learning.
  • the advertisement pop-up box candidate detection sub-module is a pre-processing module, and the pre-processing module is mainly configured to check whether the CSS-POSITION attribute is FIXED by traversing each node in the current web page DOM tree structure (ie, the position is fixed.
  • the advertisement pop-up filtering processing sub-module only performs CSS hiding or directly deleting the DOM node corresponding to the pop-up box to be intercepted according to the judgment result.
  • the core processing procedure of the embodiment of the present invention is to use a machine learning method to train samples of an advertisement intercepting frame to automatically acquire feature options and rule parameters, and perform real-time advertisement pop-up frame discrimination processing using the training result, which will be in this embodiment. It will be introduced in detail.
  • FIG. 8 is a main process flowchart of the sample training of the advertisement pop-up box in the embodiment of the present invention, and the detailed steps are mainly processed as follows:
  • Step 100 First, the pop-up box candidate sample sub-module needs to obtain the pop-up box samples of each webpage of the current mainstream website.
  • the labels corresponding to these pop-up box samples are ⁇ DIV>, ⁇ SECTION>, ⁇ A>, etc.
  • the pop-up boxes of these different labels come from different web pages.
  • Step 110 Extract all the observable feature values corresponding to the pop-up box samples to form an input vector required for training.
  • all observable feature values corresponding to the pop-up box samples refer to all the feature values associated with the pop-up box as much as possible.
  • a pop-up box Z-INDEX height relative value that is, a pop-up box in the direction of the web page layer relative to the pop-up The height of the web page where the box is located (denoted as x1), the relative value of the pop-up frame relative to the starting coordinate position of the web page where the pop-up box is located (denoted as x2), the area ratio of the pop-up box to the terminal window (denoted as x3), and the pop-up box.
  • m is the dimension of the input vector, that is, the maximum feature value taken during training, in this embodiment, m is 5;
  • Step 120 Determine whether each pop-up box sample is an advertisement pop-up box by means of manual marking, thereby forming a desired output, for example, it can be recorded as:
  • Step 130 On the basis of inputting the feature vector and the expected output of all the pop-up box samples, the artificial neural network method is used to train the weights corresponding to the input feature items, and the automatic selection of the decision parameters is realized here.
  • the training process is illustrated by the simplest single-layer neural network model.
  • the initialization can be set to an arbitrary value (generally set to 0)
  • f is a step activation function whose weight is The training process is as follows:
  • Step 140 After the training ends, the feature value corresponding to the smaller weight is removed, and the automatic selection of the feature value of the advertisement pop-up frame is implemented.
  • the example in step 130 is still further described:
  • the weight of an eigenvalue is w>0.01, the eigenvalue is the selected feature item, otherwise the eigenvalue is removed.
  • the feature values x1 and x2 are removed, and x3 and x4, x5 (the corresponding meanings of the respective features are shown in the example in step 110) are automatically selected to form a feature vector required for actual detection of the model later;
  • FIG. 9 is a flowchart of main processing for intercepting an advertisement pop-up frame in a real-time detection phase according to an embodiment of the present invention, and the detailed steps are as follows:
  • Step 200 Obtain a candidate pop-up box by preprocessing the advertisement pop-up box candidate detection sub-module, for example, obtain a node in the webpage with a label of ⁇ DIV>, determine that the CSS-POSITION attribute of the node is FIXED, and if yes, determine the node. a candidate popup;
  • Step 210 Extract the observed feature values of the node according to the feature items obtained by the training to form an input vector.
  • the input vector constructed here is:
  • X ⁇ x3, x4, x5 ⁇ , wherein the meanings of the eigenvalues corresponding to x3, x4, and x5 are respectively exemplified in step 110, and the value is the corresponding actual feature value extracted at this node.
  • Step 220 Substituting the input feature vector and the training obtained weight into the single layer neural network model f (w3*x3+w4*x4+w5*x5) to obtain a determined single layer neural network model.
  • Step 230 Determine, according to the result value, whether the corresponding node is an advertisement pop-up box, for example, if Output>0.5, determine that the candidate pop-up box is an advertisement pop-up box, and perform step 240, otherwise the process ends;
  • Step 240 Hide or delete the advertisement popup.
  • the technical methods of other machine learning can be found in the training steps 120-130 using the supervised learning training algorithm and the single layer artificial neural network model.
  • the supervised learning training algorithm can use unsupervised or semi-supervised training methods.
  • Unsupervised training means that the expected output of each sample is not required to be manually labeled, but the clustering algorithm (such as K-means method) is used to automatically implement the sample.
  • Labeling, semi-supervising is between supervision and non-supervision, that is, some samples are supervised, and some samples are marked by unsupervised methods.
  • the single-layer artificial neural network model can be replaced by a multi-layer artificial neural network model.
  • the entire training (steps 100 to 150) in the embodiment is not necessarily limited to the browser manufacturer's offline training to preset parameters to the product, but also can be extended to use the browser in actual use.
  • Pop-up ad boxes often appear when browsing web pages, and the web browsing experience for users (especially mobile phone users) is very poor.
  • the conventional pop-up blocking method uses a blacklist and a feature table to maintain a huge amount of work and is extremely costly.
  • the external information and internal features of the advertisement pop-up box are ever-changing. For example, the ID and CLASS information itself are constantly changing dynamically, and the insufficient or improper selection of the CSS features of the advertisement may cause the leakage or interception of the advertisement pop-up box.
  • the invention automatically selects features and rules through the method of machine learning (training can be online or offline), which can effectively make up for the defects of the conventional methods, thereby being able to cleanly and accurately Cut the various forms of the ad frame, so it can bring a great user experience.
  • the flow of the browser advertisement pop-up block intercepting method in this embodiment is the same as that in the first, second or third embodiment, except that in engineering implementation, the embodiment can add necessary general hardware by means of software.
  • the method of the embodiments of the present invention may be embodied in the form of a computer software product stored in a storage medium (such as a ROM/RAM, a magnetic disk, an optical disk), including a plurality of instructions.
  • a device (which may be a mobile phone, a server, etc.) is configured to perform the method described in the embodiments of the present invention.

Landscapes

  • Information Transfer Between Computers (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种浏览器广告拦截方法、装置及终端,该方法包括:对弹出框的可观察特征值进行训练得到识别函数(S102);针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值,判断所述任一弹出框是否为广告弹出框,若是,则进行拦截(S103)。该方法无需维护黑名单和特征表,降低了成本,且能够适应弹出框的外部信息和内部特征千变万化的情况进行广告弹出框的准确拦截。

Description

一种浏览器广告拦截方法、装置及终端 技术领域
本发明涉及终端浏览器应用技术领域,尤其涉及一种浏览器广告拦截方法、装置及终端。
背景技术
随着无线通讯技术和互联网技术的飞速发展,在移动终端上使用浏览器上网的用户越来越多。作为移动互联网入口,浏览器的重要性不言而喻。如何在移动终端上提高浏览器的用户体验,从而在白热化的市场竞争中取得亮点和份额,是自研浏览器技术目前研究的重点。
通常网页上会出现各种各样商家的广告,特别是那些弹出框广告悬浮在网页上面,非常影响用户(尤其是手机用户)的阅读体验。因此针对这种弹出式广告框,各种浏览器竞相推出自己的广告拦截功能,其常规技术主要包括两种方法:一是建立相关弹出框广告对应的图片、JS(JavaScript)脚本文件等子资源的URL(Uniform Resource Locator,统一资源定位器)地址、弹出框的ID或CLASS等拦截黑名单,当浏览器检测到这些URL地址时就停止网络加载或根据ID及CLASS隐藏弹出框;二是根据弹出框特征自身,如CSS(Cascading Style Sheets,层叠样式表)特征,由用户在服务器上定义好特征和规则,在服务器侧来确定弹出框是否滤除并通知终端进行拦截。以上两种方法的共同特点是:预先选取好的特征和规则来进行弹出框广告的拦截,其主要的缺陷是:
1、黑名单和特征表维护工作巨大,成本极高;
2、弹出框的外部信息和内部特征千变万化,如ID和CLASS信息本身经常不断动态变化,自身的CSS特征和规则选取不足或不当都会造成新广告弹出框的漏拦截或误拦截。
发明内容
本发明实施例提供一种浏览器广告拦截方法、装置及终端,克服现有技术对广告 弹出框拦截的技术方案的上述缺陷。
根据本发明的一个实施例,提供了一种浏览器广告拦截方法,包括:
对弹出框的可观察特征值进行训练得到识别函数;
针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值,判断所述任一弹出框是否为广告弹出框,若是,则进行拦截。
进一步的,所述弹出框的获取方式,包括:
将网页的树形结构中层叠样式表位置属性为固定属性的节点元素作为弹出框进行收集;或者,
将网页的树形结构中层叠样式表位置属性为固定属性的节点元素、以及:网页中由用户删除的广告弹出框和/或由用户恢复显示的已拦截广告弹出框作为弹出框进行收集。
进一步的,作为一种可选的技术方案,所述对所述弹出框的可观察特征值进行训练,得到识别函数,包括:
设置所述弹出框是否为广告弹出框的标识;
基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
基于所述弹出框的各可观察特征值的权值确定出识别函数。
进一步的,作为另一种可选的技术方案,所述对所述弹出框的可观察特征值进行训练,得到识别函数,包括:
设置所述弹出框是否为广告弹出框的标识;
基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
基于所述弹出框的各可观察特征值的权值筛选出所述弹出框的有效可观察特征值;
基于所述弹出框的各有效可观察特征值的权值确定出识别函数。
进一步的,所述基于所述弹出框的各可观察特征值的权值筛选出所述弹出框的有效可观察特征值,包括:
将所述弹出框的各可观察特征值的权值与设定的权值阈值进行比较,筛选出权值大于设定的权值阈值的可观察特征值作为有效可观察特征值。
进一步的,所述设置所述弹出框是否为广告弹出框的标识,包括:
通过标记的方式和/或聚类算法设置所述弹出框是否为广告弹出框的标识。
进一步的,所述训练是采用人工神经网络方法进行训练的;
所述识别函数为阶跃激活函数。
进一步的,所述弹出框的可观察特征值,包括以下至少之一:
弹出框在网页层方向上相对于弹出框所在的网页的高度、弹出框相对于弹出框所在的网页的起始坐标位置相对值、弹出框与终端窗口的面积比、弹出框对应的网址与弹出框所在的网页的域名的相关性值、弹出框所呈现的文本与弹出框所在的网页的标题内容的相关性值。
进一步的,针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值判断所述任一弹出框是否为广告弹出框,包括:
若所述任一弹出框对应的所述识别函数的数值大于设定的识别阈值,则判定所述任一弹出框为广告弹出框,否则判定所述任一弹出框不是广告弹出框。
根据本发明的另一实施例,还提供了一种浏览器广告拦截装置,包括:
训练模块,设置为对弹出框的可观察特征值进行训练得到识别函数;
拦截模块,设置为针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值,判断所述任一弹出框是否为广告弹出框,若是,则进行拦截。
进一步的,所述装置还包括:
收集模块,设置为将网页的树形结构中层叠样式表位置属性为固定属性的节点元素作为弹出框进行收集;或者,
将网页的树形结构中层叠样式表位置属性为固定属性的节点元素、以及:网页中由用户删除的广告弹出框和/或由用户恢复显示的已拦截广告弹出框作为弹出框进行收集。
进一步的,所述训练模块,包括:
设置模块,设置为设置所述弹出框是否为广告弹出框的标识;
权值确定模块,设置为基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
函数确定模块,设置为基于所述弹出框的各可观察特征值的权值确定出识别函数。
根据本发明的又一实施例,还提供了一种终端,包括处理器以及存储有所述处理器可执行指令的存储器,当所述指令被处理器执行时,执行如下操作:
对弹出框的可观察特征值进行训练得到识别函数;
针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值,判断所述任一弹出框是否为广告弹出框,若是,则进行拦截。
进一步的,所述处理器执行的操作具体还包括:将网页的树形结构中层叠样式表位置属性为固定属性的节点元素作为弹出框进行收集;或者,
将网页的树形结构中层叠样式表位置属性为固定属性的节点元素、以及:网页中由用户删除的广告弹出框和/或由用户恢复显示的已拦截广告弹出框作为弹出框进行收集。
进一步的,作为一种可选的技术方案,所述处理器在执行所述对所述弹出框的可观察特征值进行训练得到识别函数的步骤时,具体包括如下操作:
设置所述弹出框是否为广告弹出框的标识;
基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
基于所述弹出框的各可观察特征值的权值确定出识别函数。
进一步的,作为另一种可选的技术方案,所述处理器在执行所述对所述弹出框的可观察特征值进行训练得到识别函数的步骤时,具体包括如下操作:
设置所述弹出框是否为广告弹出框的标识;
基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
基于所述弹出框的各可观察特征值的权值筛选出所述弹出框的有效可观察特征值;
基于所述弹出框的各有效可观察特征值的权值确定出识别函数。
进一步的,所述处理器在执行所述基于所述弹出框的各可观察特征值的权值筛选出所述弹出框的有效可观察特征值的步骤时,具体包括如下操作:
将所述弹出框的各可观察特征值的权值与设定的权值阈值进行比较,筛选出权值 大于设定的权值阈值的可观察特征值作为有效可观察特征值。
进一步的,所述处理器在执行所述设置所述弹出框是否为广告弹出框的标识的步骤时,具体包括如下操作:
通过标记的方式和/或聚类算法设置所述弹出框是否为广告弹出框的标识。
进一步的,所述训练是采用人工神经网络方法进行训练的;
所述识别函数为阶跃激活函数。
进一步的,所述弹出框的可观察特征值,包括以下至少之一:
弹出框在网页层方向上相对于弹出框所在的网页的高度、弹出框相对于弹出框所在的网页的起始坐标位置相对值、弹出框与终端窗口的面积比、弹出框对应的网址与弹出框所在的网页的域名的相关性值、弹出框所呈现的文本与弹出框所在的网页的标题内容的相关性值。
进一步的,所述处理器在执行根据任一弹出框对应的所述识别函数的数值判断所述任一弹出框是否为广告弹出框的步骤时,具体包括如下操作:
若所述任一弹出框对应的所述识别函数的数值大于设定的识别阈值,则判定所述任一弹出框为广告弹出框,否则判定所述任一弹出框不是广告弹出框。
通过采用上述技术方案,本发明实施例至少具有下列优点:
本发明实施例提供的所述浏览器广告拦截方法、装置及终端,主要秉承“让广告弹出框数据自己说话”的宗旨,基于人工智能和大数据技术进行机器学习的浏览器广告弹出框广告拦截办法,用机器学习的方法自动选择特征和规则,因此让拦截系统更加具有智能化和泛化性,有效地弥补常规方法的缺陷,从而取得极佳的用户体验。
附图说明
图1为本发明第一、二、三实施例的浏览器广告拦截方法的主流程图;
图2为本发明第二实施例的浏览器广告拦截方法中的步骤S102的流程图;
图3为本发明第三实施例的浏览器广告拦截方法中的步骤S102的流程图;
图4为本发明第四、五、六实施例的浏览器广告拦截装置的主要组成结构示意图;
图5为本发明第五实施例的建模模块的组成示意图;
图6为本发明第六实施例的建模模块的组成示意图;
图7为本发明第八实施例的基于机器学习的浏览器广告弹出框拦截装置的工作流程示意图;
图8为本发明第八实施例的弹出框样本训练的主要处理流程图;
图9为本发明第八实施例的实时检测阶段进行广告弹出框拦截的主要处理流程图。
具体实施方式
为更进一步阐述本发明为达成预定目的所采取的技术手段及功效,以下结合附图及较佳实施例,对本发明进行详细说明如后。
本发明实施例提出的广告弹出框的拦截方法、装置及终端,主要秉承“让广告弹出框数据自己说话”的宗旨,用机器学习的方法自动选择广告弹出框拦截需要的特征和规则,其主要包含以下几个关键技术步骤处理:
(一)首先进行预处理检测,根据网页DOM中各标签元素的CSS属性值,将样式位置为固定属性(position值等于fixed值)的标签元素作为训练的样本和待下一步检测的候选广告弹出框;
(二)在训练阶段,根据(一)中的方法提取大量主流网站中的弹出式广告框标签元素样本,并将这些标签元素的所有相关的可观察特征值作为多维输入以及每个样本人工标记作为期望输出,从而构成机器学习训练中需要的可监督学习样本(不限于其他半监督或非监督学习方法)。训练的时候可采用机器学习常用的基于人工神经网络模型结构和学习算法(不限于其他机器学习模型结构和学习方法)。等训练收敛后,各个可观察特征值对应的权值就会自动计算出来(这里实现了检测规则的自动确定),为了在实时检测中提高检测效率,可将权值极低对应的特征项去掉,从而同时实现特征的自动选取;
(三)在实时检测阶段,根据(一)中得到候选的弹出广告框的各观察特征值输入到(二)中训练好的模型中得到实际输出值(该值可以是布尔值,也不限于其他概率实数值等)来判断是否是广告弹出框来进行拦截操作(如在网页DOM中隐藏或删除该标签元素);
经过上述几个关键技术步骤的处理,可实现广告弹出框的特征和规则自动获取,从而使得浏览器在任何复杂网页中能非常准确地拦截住各种形式的广告弹出框广告框。
本发明第一实施例,一种浏览器广告拦截方法,如图1所示,包括以下具体步骤:
步骤S101,收集网页中的弹出框。
在本步骤中,弹出框是通过对网页中所有的节点元素进行初筛,具体是利用弹出框必然具备的属性进行初筛,但并不是说具备该属性的一定就是广告弹出框,后续还需要经过训练才能用于确定出用于识别广告弹出框的识别函数。因此,初筛得到的弹出框可以认为是疑似广告弹出框。广告弹出框必然包含于收集到的弹出框中。
可选的,所述收集网页中的弹出框,包括:
将网页的树形结构中层叠样式表位置属性CSS-position为固定属性的节点元素判定为弹出框,对所述弹出框进行收集。
网页的树形结构通常为DOM(Document Object Model,文档对象模型)树结构,固定属性可以是position值、fixed值等。
步骤S102,对所述弹出框的可观察特征值进行训练得到识别函数。
可选的,采用人工神经网络方法对所述弹出框的可观察特征值进行训练。
可选的,所述弹出框的可观察特征值,包括以下至少之一:
弹出框在网页层方向上相对于弹出框所在的网页的高度、弹出框相对于弹出框所在的网页的起始坐标位置相对值、弹出框与终端窗口的面积比、弹出框对应的网址与弹出框所在的网页的域名的相关性值、弹出框所呈现的文本与弹出框所在的网页的标题内容的相关性值。
步骤S103,针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值,判断所述任一弹出框是否为广告弹出框,若是,则进行拦截。
本发明实施例将网页的树形结构中CSS-position属性为固定属性的节点元素判定为弹出框,虽然可能会将一些不是广告弹出框的节点元素纳入其中,但是能够保证是广告弹出框的节点元素不会漏掉,这一步可以认为是基于广告弹出框的必要特征进行的初筛,后续选择尽可能多的与广告弹出框相关的可观察特征值,采用人工神经网络方法对这些可观察特征值进行训练以得到用于判断所述弹出框是否为广告弹出框的人工神经网络模型。由于本发明实施例对于可观察特征值选择的全面准确,可以得到较 为准确的该人工神经网络模型即识别函数,因此,对广告弹出框的判断也比较准确。本发明实施例无需像现有技术那样维护黑名单和特征表,降低了广告弹出框的拦截成本,且能够适应弹出框的外部信息和内部特征千变万化的情况而进行广告弹出框的准确拦截。
本发明第二实施例,一种浏览器广告拦截方法,如图1所示,包括以下具体步骤:
步骤S101,收集网页中的弹出框。
可选的,所述收集网页中的弹出框,包括:
将网页的树形结构中层叠样式表位置属性CSS-position为固定属性的节点元素判定为弹出框,对所述弹出框进行收集。
网页的树形结构通常为DOM(Document Object Model,文档对象模型)树结构,固定属性可以是position值、fixed值等。
可选的,所述收集网页中的弹出框,还包括:将网页中由用户删除的广告弹出框和/或由用户恢复显示的已拦截广告弹出框也作为弹出框进行收集。
本发明实施例与第一实施例相比,还可以将网页中由用户删除的广告弹出框和/或由用户恢复显示的已拦截广告弹出框也作为弹出框,增加了弹出框的样本种类,用户删除的广告弹出框说明经过拦截还有一些漏掉的广告弹出框没有识别出来,被用户看到后由用户手动删除了,对于这种样本非常有记录的意义,经过训练后可以对用于判断所述弹出框是否为广告弹出框的人工神经网络模型即识别函数进行完善;由用户恢复显示的已拦截广告弹出框说明拦截了一些用户不认为是广告弹出框的或者用户愿意看到的弹出框,对于这种样本也非常有记录的意义,经过训练后可以对用于判断所述弹出框是否为广告弹出框的人工神经网络模型进行完善。与第一实施例的拦截效果相比更加贴近用户的真实需求,提升了用户体验。
步骤S102,对弹出框的可观察特征值进行训练得到识别函数;
如图2所示,可选的,步骤S102包括:
A1:设置所述弹出框是否为广告弹出框的标识;
可选的,步骤A1包括:
通过标记的方式设置所述弹出框是否为广告弹出框的标识。所述标记可以是人工 标记也可以是自动标记。所述弹出框是否为广告弹出框的标识实际上就是期望判定结果。
在步骤S101中,根据层叠样式表位置属性CSS-position判断出的弹出框是否为广告弹出框的标识是被人工标记为是或者否。
由用户删除的广告弹出框作为弹出框时,该弹出框是否为广告弹出框的标识被人工或者自动标记为是,属于漏拦截,在用户实际使用浏览器的过程中,通过该弹出框的加入训练对特征值的选取和权值予以修正;由用户恢复显示的已拦截广告弹作为弹出框时,该弹出框是否为广告弹出框的标识被人工或者自动标记为否,属于误拦截,通过该弹出框的加入训练对特征值的选取和权值予以修正。
A2:基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
可选的,所述弹出框的可观察特征值,包括:
弹出框在网页层方向上相对于弹出框所在的网页的高度、弹出框相对于弹出框所在的网页的起始坐标位置相对值、弹出框与终端窗口的面积比、弹出框对应的网址与弹出框所在的网页的域名的相关性值、弹出框所呈现的文本与弹出框所在的网页的标题内容的相关性值。
A3:基于所述弹出框的各可观察特征值的权值确定出识别函数。该识别函数可以是基于单层人工神经网络模型或者多层人工神经网络模型的阶跃激活函数。
步骤S103,针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值,判断所述任一弹出框是否为广告弹出框,若是,则进行拦截。
可选的,在步骤S103中,包括:
若所述任一弹出框对应的所述识别函数的数值大于设定的识别阈值,则判定所述任一弹出框为广告弹出框,否则判定所述任一弹出框不是广告弹出框。
本发明第三实施例,一种浏览器广告拦截方法,如图1所示,包括以下具体步骤:
步骤S101,收集网页中的弹出框。
可选的,所述收集网页中的弹出框,包括:
将网页的树形结构中层叠样式表位置属性CSS-position为固定属性的节点元素判 定为弹出框,对所述弹出框进行收集。
网页的树形结构通常为DOM(Document Object Model,文档对象模型)树结构,固定属性可以是position值、fixed值等。
可选的,所述收集网页中的弹出框,还包括:将网页中由用户删除的广告弹出框和/或由用户恢复显示的已拦截广告弹出框也作为弹出框进行收集。
步骤S102,对弹出框的可观察特征值进行训练得到识别函数;
如图3所示,可选的,步骤S102包括:
B1:设置所述弹出框是否为广告弹出框的标识;
可选的,步骤B1包括:
通过聚类算法设置所述弹出框是否为广告弹出框的标识;
或者,通过标记的方式设置一部分所述弹出框是否为广告弹出框的标识、且通过聚类算法设置其余的所述弹出框是否为广告弹出框的标识。所述标记可以是人工标记也可以是自动标记。
在步骤S101中,在采用标记方式的情况下,根据层叠样式表位置属性CSS-position判断出的弹出框是否为广告弹出框的标识被人工标记为是或者否。
由用户删除的广告弹出框作为弹出框时,该弹出框是否为广告弹出框的标识被人工或者自动标记为是,属于漏拦截,在用户实际使用浏览器的过程中,通过该弹出框的加入训练对特征值的选取和权值予以修正;由用户恢复显示的已拦截广告弹作为弹出框时,该弹出框是否为广告弹出框的标识被人工或者自动标记为否,属于误拦截,通过该弹出框的加入训练对特征值的选取和权值予以修正。
本发明实施例与第二实施例的区别在于,可以全部或者部分的通过聚类算法,比如:K均值方法,设置所述弹出框是否为广告弹出框的标识,减少人工成本。
B2:基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
可选的,所述弹出框的可观察特征值,包括:
弹出框在网页层方向上相对于弹出框所在的网页的高度、弹出框相对于弹出框所在的网页的起始坐标位置相对值、弹出框与终端窗口的面积比、弹出框对应的网址与弹出框所在的网页的域名的相关性值、弹出框所呈现的文本与弹出框所在的网页的标 题内容的相关性值。
B3:基于所述弹出框的各可观察特征值的权值筛选出所述弹出框的有效可观察特征值;
可选的,步骤B3包括:
将所述弹出框的各可观察特征值的权值与设定的权值阈值进行比较,筛选出权值大于设定的权值阈值的可观察特征值作为有效可观察特征值。
B4:基于所述弹出框的各有效可观察特征值的权值确定出识别函数。
步骤S103,针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值,判断所述任一弹出框是否为广告弹出框,若是,则进行拦截。
可选的,在步骤S103中,包括:
若所述任一弹出框对应的所述识别函数的数值大于设定的识别阈值,则判定所述任一弹出框为广告弹出框,否则判定所述任一弹出框不是广告弹出框。
本发明第四实施例,与第一实施例对应,本实施例介绍一种浏览器广告拦截装置,如图4所示,包括以下组成部分:
1)收集模块401,设置为收集网页中的弹出框。
可选的,收集模块401,设置为:
将网页的树形结构中层叠样式表位置属性为固定属性的节点元素作为弹出框进行收集。
网页的树形结构通常为DOM树结构,固定属性可以是position值、fixed值等。
2)训练模块402,设置为对弹出框的可观察特征值进行训练得到识别函数;
可选的,所述弹出框的可观察特征值,包括以下至少之一:
弹出框在网页层方向上相对于弹出框所在的网页的高度、弹出框相对于弹出框所在的网页的起始坐标位置相对值、弹出框与终端窗口的面积比、弹出框对应的网址与弹出框所在的网页的域名的相关性值、弹出框所呈现的文本与弹出框所在的网页的标题内容的相关性值。
3)拦截模块403,设置为针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值,判断所述任一弹出框是否为广告弹出框,若是,则进行拦截。
本发明第五实施例,与第二实施例对应,本实施例介绍一种浏览器广告拦截装置,如图4所示,包括以下组成部分:
1)收集模块401,设置为收集网页中的弹出框。
可选的,收集模块401,设置为:
将网页的树形结构中层叠样式表位置属性为固定属性的节点元素作为弹出框进行收集。
网页的树形结构通常为DOM树结构,固定属性可以是position值、fixed值等。
可选的,收集模块401,还设置为:将网页中由用户删除的广告弹出框和/或由用户恢复显示的已拦截广告弹出框也作为弹出框进行收集。
2)训练模块402,设置为对弹出框的可观察特征值进行训练得到识别函数;
如图5所示,可选的,训练模块402,包括:
设置模块10,设置为设置所述弹出框是否为广告弹出框的标识;
可选的,设置模块10,设置为:通过标记的方式设置所述弹出框是否为广告弹出框的标识;
可选的,根据层叠样式表位置属性CSS-position判断出的弹出框是否为广告弹出框的标识被人工标记为是或者否。
由用户删除的广告弹出框作为弹出框时,该弹出框是否为广告弹出框的标识被人工或者自动标记为是,属于漏拦截,在用户实际使用浏览器的过程中,通过该弹出框的加入训练对特征值的选取和权值予以修正;由用户恢复显示的已拦截广告弹作为弹出框时,该弹出框是否为广告弹出框的标识被人工或者自动标记为否,属于误拦截,通过该弹出框的加入训练对特征值的选取和权值予以修正。
权值确定模块20,设置为基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
进一步的,所述弹出框的可观察特征值,包括:
弹出框在网页层方向上相对于弹出框所在的网页的高度、弹出框相对于弹出框所在的网页的起始坐标位置相对值、弹出框与终端窗口的面积比、弹出框对应的网址与弹出框所在的网页的域名的相关性值、弹出框所呈现的文本与弹出框所在的网页的标 题内容的相关性值。
函数确定模块30,设置为基于所述弹出框的各可观察特征值的权值确定出识别函数。
3)拦截模块403,设置为针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值,判断所述任一弹出框是否为广告弹出框,若是,则进行拦截。
可选的,拦截模块403,设置为:
若所述任一弹出框对应的所述识别函数的数值大于设定的识别阈值,则判定所述任一弹出框为广告弹出框,否则判定所述任一弹出框不是广告弹出框。
本发明第六实施例,与第三实施例对应,本实施例介绍一种浏览器广告拦截装置,如图4所示,包括以下组成部分:
1)收集模块401,设置为收集网页中的弹出框。
可选的,收集模块401,设置为:
将网页的树形结构中层叠样式表位置属性为固定属性的节点元素作为弹出框进行收集。
网页的树形结构通常为DOM树结构,固定属性可以是position值、fixed值等。
可选的,收集模块401,还设置为:将网页中由用户删除的广告弹出框和/或由用户恢复显示的已拦截广告弹出框也作为弹出框进行收集。
2)训练模块402,设置为对弹出框的可观察特征值进行训练得到识别函数;
如图6所示,可选的,建模模块402,包括:
设置模块10,设置为设置所述弹出框是否为广告弹出框的标识;
可选的,设置模块10,设置为:通过聚类算法设置所述弹出框是否为广告弹出框的标识;或者,通过标记的方式设置一部分所述弹出框是否为广告弹出框的标识、且通过聚类算法设置其余的所述弹出框是否为广告弹出框的标识。所述标记的方式包括人工标记或自动标记。
在采用标记方式的情况下,可选的,根据层叠样式表位置属性CSS-position判断出的弹出框是否为广告弹出框的标识被人工标记为是或者否。
由用户删除的广告弹出框作为弹出框时,该弹出框是否为广告弹出框的标识被人 工或者自动标记为是,属于漏拦截,在用户实际使用浏览器的过程中,通过该弹出框的加入训练对特征值的选取和权值予以修正;由用户恢复显示的已拦截广告弹作为弹出框时,该弹出框是否为广告弹出框的标识被人工或者自动标记为否,属于误拦截,通过该弹出框的加入训练对特征值的选取和权值予以修正。
权值确定模块20,设置为基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
进一步的,所述弹出框的可观察特征值,包括:
弹出框在网页层方向上相对于弹出框所在的网页的高度、弹出框相对于弹出框所在的网页的起始坐标位置相对值、弹出框与终端窗口的面积比、弹出框对应的网址与弹出框所在的网页的域名的相关性值、弹出框所呈现的文本与弹出框所在的网页的标题内容的相关性值。
筛选模块40,设置为将所述弹出框的各可观察特征值的权值与设定的权值阈值进行比较,筛选出权值大于设定的权值阈值的可观察特征值作为有效可观察特征值。
函数确定模块30,设置为基于所述弹出框的各有效可观察特征值的权值确定出识别函数。
3)拦截模块403,设置为针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值,判断所述任一弹出框是否为广告弹出框,若是,则进行拦截。
可选的,拦截模块403,用于:
若所述任一弹出框对应的所述识别函数的数值大于设定的识别阈值,则判定所述任一弹出框为广告弹出框,否则判定所述任一弹出框不是广告弹出框。
本发明第七实施例,一种终端,可以作为实体装置手机或者服务器来理解,包括处理器以及存储有所述处理器可执行指令的存储器,当所述指令被处理器执行时,执行如下操作:
收集网页中的弹出框;
对弹出框的可观察特征值进行训练得到识别函数;
针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值,判断所述任一弹出框是否为广告弹出框,若是,则进行拦截。
可选的,所述处理器执行的操作具体还包括:
将网页的树形结构中层叠样式表位置属性为固定属性的节点元素作为弹出框进行收集;或者,
将网页的树形结构中层叠样式表位置属性为固定属性的节点元素、以及:网页中由用户删除的广告弹出框和/或由用户恢复显示的已拦截广告弹出框作为弹出框进行收集。
网页的树形结构通常为DOM树结构,固定属性可以是position值、fixed值等。
可选的,作为一种可选的技术方案,所述处理器在执行所述对所述弹出框的可观察特征值进行训练得到识别函数的步骤时,具体包括如下操作:
设置所述弹出框是否为广告弹出框的标识;
基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
基于所述弹出框的各可观察特征值的权值确定出识别函数。
可选的,作为另一种可选的技术方案,所述处理器在执行所述对所述弹出框的可观察特征值进行训练得到识别函数的步骤时,具体包括如下操作:
设置所述弹出框是否为广告弹出框的标识;
基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
基于所述弹出框的各可观察特征值的权值筛选出所述弹出框的有效可观察特征值;
基于所述弹出框的各有效可观察特征值的权值确定出识别函数。
可选的,所述处理器在执行所述基于所述弹出框的各可观察特征值的权值筛选出所述弹出框的有效可观察特征值的步骤时,具体包括如下操作:
将所述弹出框的各可观察特征值的权值与设定的权值阈值进行比较,筛选出权值大于设定的权值阈值的可观察特征值作为有效可观察特征值。
可选的,所述处理器在执行所述设置所述弹出框是否为广告弹出框的标识的步骤时,具体包括如下操作:
通过标记的方式和/或聚类算法设置所述弹出框是否为广告弹出框的标识。
可选的,所述训练是采用人工神经网络方法进行训练的;
所述识别函数为阶跃激活函数。
可选的,所述弹出框的可观察特征值,包括以下至少之一:
弹出框在网页层方向上相对于弹出框所在的网页的高度、弹出框相对于弹出框所在的网页的起始坐标位置相对值、弹出框与终端窗口的面积比、弹出框对应的网址与弹出框所在的网页的域名的相关性值、弹出框所呈现的文本与弹出框所在的网页的标题内容的相关性值。
可选的,所述处理器在执行根据任一弹出框对应的所述识别函数的数值判断所述任一弹出框是否为广告弹出框的步骤时,具体包括如下操作:
若所述任一弹出框对应的所述识别函数的数值大于设定的识别阈值,则判定所述任一弹出框为广告弹出框,否则判定所述任一弹出框不是广告弹出框。
本发明第八实施例,本实施例是在上述实施例的基础上,结合附图7~9介绍一个本发明的应用实例。
本发明实施例提供了一种基于机器学习的浏览器广告弹出框拦截方法,实现该拦截方法的装置如图7所示,在该装置中包括:广告弹出框候选检测子模块、特征和决策学习子模块、广告弹出框判断子模块和广告框滤除处理子模块。其中,广告弹出框候选检测子模块是一个预处理模块,该预处理模块主要设置为通过遍历当前网页DOM树结构中的每一个节点,检查其CSS-POSITION属性是否为FIXED(即位置固定不变,因为在网页上下滚动过程中,这些弹出框的位置都是固定不变的,即它并不随网页滚动而消失)来预先判断该节点对应区域是否为弹出框广告候选区域,并一方面作为学习样本输入给特征和决策学习子模块,以及另一方面作为检测样本输入给广告弹出框判断子模块进行处理。广告弹出框滤除处理子模块只是根据判断结果对须拦截的广告弹出框对应的DOM节点进行CSS隐藏或直接删除的处理。本发明实施例的核心处理过程是使用机器学习的方法对广告拦截框的样本进行训练从而自动获取特征选项和规则参数,以及使用训练结果进行实时的广告弹出框判别的处理,将在本实施例中予以详细介绍。
如图8是本发明实施例中广告弹出框样本训练的主要处理流程图,其详细步骤主要处理如下:
步骤100:首先需要通过广告弹出框候选检测子模块对当前主流网站的各个网页的弹出框样本进行获取。比如:这些弹出框样本对应的标签有<DIV>、<SECTION>、<A>等,这些不同标签的弹出框来自不同的网页。
步骤110:提取出与弹出框样本对应的所有可观察特征值构成训练需要的输入向量。这里与弹出框样本对应的所有可观察特征值是指尽量所有和广告弹出框相关的特征值,举个例子比如有:弹出框Z-INDEX高度相对值即弹出框在网页层方向上相对于弹出框所在的网页的高度(记为x1)、弹出框相对于弹出框所在的网页的起始坐标位置相对值(记为x2)、弹出框与终端窗口的面积比(记为x3)、弹出框对应的网址与弹出框所在的网页的域名的相关性值(记为x4)、弹出框所呈现的文本与弹出框所在的网页的标题内容的相关性值(记为x5)等等。于是一个弹出框样本对应的输入向量值则为:
Xi={x1,x2,x3,x4,x5..xm},
其中i=1,2,…N(N表示样本的个数),m为输入向量的维度,即训练时所取的最大特征数值,本实施例中取m为5;
步骤120:通过人工标记的方式确定各弹出框样本是否是广告弹出框,从而构成期望输出,比如可记为:
Yi=1(如果样本为广告弹出框),Yi=0(如果样本为非广告弹出框),其中,i=1,2,…N(N表示样本的个数)。
这是采用监督式机器学习训练所需要的期望输出。
步骤130:在所有弹出框样本输入特征向量和期望输出的基础上采用人工神经网络方法训练出各个输入特征项对应的权值,这里就实现了决策参数的自动选取。具体以最简单的单层神经网络模型为例说明这个训练过程,该模型的实际输出output定义如下:output=f(w1*x1+w2*x2+…+wm*xm)
这里W=(w1,w2,..wm)中每个值为输入的特征值对应的权值,初始化可以设置为任意值(一般设为0),f为阶跃激活函数,其权值的训练过程具体如下:
Figure PCTCN2017107605-appb-000001
Figure PCTCN2017107605-appb-000002
步骤140:训练结束后将较小权值对应的特征值去掉,这里就实现了广告弹出框特征值的自动选取,这里仍然以步骤130中的例子继续具体说明:
训练结束是指步骤130中第二层中的一个循环结束后Change的值等于0即本次迭代没有权值需要更新,或者第一层循环全部结束即k=K(如200),这时可以得到训练更新后的权值W=(w1,w2,..wm),比如m=5时经过训练得到一个权值向量W=(0.0086,0.0078,0.0183,0.062,0.072),则通过下面一个判断来自动选取特征值:
如果一个特征值的权值w>0.01,则该特征值为选取的特征项,否则去掉该特征值。
于是特征值x1和x2被去掉,x3和x4、x5(各自对应的特征意义见步骤110中的举例)被自动选取出来,构成以后模型实际检测需要的特征向量;
步骤150:将剩余特征值(如x3,x4,x5)和对应权值(如W=(0.0183,0.062,0.072))预装或在线更新到实时检测的单层神经网络模型f(w3*x3+w4*x4+w5*x5)中。
如图9是本发明实施例中实时检测阶段进行广告弹出框拦截的主要处理流程图,其详细步骤如下:
步骤200:通过广告弹出框候选检测子模块的预处理得到一个候选弹出框,比如得到网页中的一个标签为<DIV>的节点,判断该节点的CSS-POSITION属性为FIXED,若是则判定该节点为候选弹出框;
步骤210:按照训练得到的特征项提取该节点各观察特征值构成输入向量。仍以 上面的例子为例,比如这里构成的输入向量则为:
X={x3,x4,x5},其中,x3,x4,x5各自对应的特征值意义见步骤110中的举例,该值是在这个节点提取的对应实际特征值。
步骤220:将输入的特征向量和训练得到各个权值代入单层神经网络模型f(w3*x3+w4*x4+w5*x5)中,以得到确定的单层神经网络模型。比如:这里使用上面训练中得到并选择后的各个权值则为:W=(0.0183,0.062,0.072),则判断结果值Output=f(0.0183*x3+0.062*x4+0.072*x5)。
步骤230:根据结果值判断对应该节点是否为广告弹出框,比如:如果Output>0.5,则判定该候选弹出框为广告弹出框,执行步骤240,否则流程结束;
步骤240:隐藏或删除该广告弹出框。
训练步骤120-130中用到可监督学习训练算法和单层人工神经网络模型都可以找到其他机器学习的技术方法替换。比如可监督学习训练算法可以用非监督或半监督训练方法,非监督训练是指不需要人工标注每个样本的期望输出,而是通过一些聚类算法(如K均值方法)来自动实现样本的标注,半监督则是处于监督和非监督之间,即部分样本用监督,部分样本用非监督的方法标注。而单层人工神经网络模型可以用多层人工神经网络模型来替代。
另外需要说明的是实施例中的整个训练(步骤100至150)过程不一定局限在浏览器厂商离线训练后将参数预置到产品上,也可以扩展到使用该浏览器用户在实际使用中遇到新的没有滤除的广告弹出框或遇到误拦截情况时,可以选择在线滤除和恢复功能即启动在线的训练来实时更新参数,以后再打开该网页就不会出现该弹出广告框了或误拦截了。即用户可以根据自己判断干预训练的效果,从而更能突出相对于现有技术所能获得的有益效果。
浏览网页时经常出现弹出式广告框,对用户(特别是手机用户)的网页浏览体验非常差。常规弹出式广告拦截方法使用黑名单和特征表维护工作巨大,成本极高。另外广告弹出框的外部信息和内部特征千变万化,如ID和CLASS信息本身经常不断动态变化,自身的CSS特征选取不足或不当都会造成广告弹出框的漏拦截或误拦截。本发明根据目前人工智能和大数据技术发展,通过机器学习的方法自动进行特征和规则的选取(训练可在线或离线),能很有效地弥补常规方法的缺陷,从而能干净准确地拦 截各种形式的广告框,因此能带来极佳的用户体验。
本发明第九实施例,本实施例的浏览器广告弹出框拦截方法的流程与第一、二或三实施例相同,区别在于,在工程实现上,本实施例可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明实施例的所述方法可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台设备(可以是手机、服务器等设备)执行本发明实施例所述的方法。
通过具体实施方式的说明,应当可对本发明为达成预定目的所采取的技术手段及功效得以更加深入且具体的了解,然而所附图示仅是提供参考与说明之用,并非用来对本发明加以限制。

Claims (22)

  1. 一种浏览器广告拦截方法,包括:
    对弹出框的可观察特征值进行训练得到识别函数;
    针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值,判断所述任一弹出框是否为广告弹出框,若是,则进行拦截。
  2. 根据权利要求1所述的浏览器广告拦截方法,其中,所述弹出框的获取方式,包括:
    将网页的树形结构中层叠样式表位置属性为固定属性的节点元素作为弹出框进行收集;或者,
    将网页的树形结构中层叠样式表位置属性为固定属性的节点元素、以及:网页中由用户删除的广告弹出框和/或由用户恢复显示的已拦截广告弹出框作为弹出框进行收集。
  3. 根据权利要求1所述的浏览器广告拦截方法,其中,所述对所述弹出框的可观察特征值进行训练,得到识别函数,包括:
    设置所述弹出框是否为广告弹出框的标识;
    基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
    基于所述弹出框的各可观察特征值的权值确定出识别函数。
  4. 根据权利要求1所述的浏览器广告拦截方法,其中,所述对所述弹出框的可观察特征值进行训练,得到识别函数,包括:
    设置所述弹出框是否为广告弹出框的标识;
    基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
    基于所述弹出框的各可观察特征值的权值筛选出所述弹出框的有效可观察特征值;
    基于所述弹出框的各有效可观察特征值的权值确定出识别函数。
  5. 根据权利要求4所述的浏览器广告拦截方法,其中,所述基于所述弹出框的各可观察特征值的权值筛选出所述弹出框的有效可观察特征值,包括:
    将所述弹出框的各可观察特征值的权值与设定的权值阈值进行比较,筛选出权值大于设定的权值阈值的可观察特征值作为有效可观察特征值。
  6. 根据权利要求3或4所述的浏览器广告拦截方法,其中,所述设置所述弹出框是否为广告弹出框的标识,包括:
    通过标记的方式和/或聚类算法设置所述弹出框是否为广告弹出框的标识。
  7. 根据权利要求3或4所述的浏览器广告拦截方法,其中,所述训练是采用人工神经网络方法进行训练的;
    所述识别函数为阶跃激活函数。
  8. 根据权利要求3或4所述的浏览器广告拦截方法,其中,所述弹出框的可观察特征值,包括以下至少之一:
    弹出框在网页层方向上相对于弹出框所在的网页的高度、弹出框相对于弹出框所在的网页的起始坐标位置相对值、弹出框与终端窗口的面积比、弹出框对应的网址与弹出框所在的网页的域名的相关性值、弹出框所呈现的文本与弹出框所在的网页的标题内容的相关性值。
  9. 根据权利要求1所述的浏览器广告拦截方法,其中,针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值判断所述任一弹出框是否为广告弹出框,包括:
    若所述任一弹出框对应的所述识别函数的数值大于设定的识别阈值,则判定所述任一弹出框为广告弹出框,否则判定所述任一弹出框不是广告弹出框。
  10. 一种浏览器广告拦截装置,包括:
    训练模块,设置为对弹出框的可观察特征值进行训练得到识别函数;
    拦截模块,设置为针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值,判断所述任一弹出框是否为广告弹出框,若是,则进行拦截。
  11. 根据权利要求10所述的浏览器广告拦截装置,其中,所述装置还包括:
    收集模块,设置为将网页的树形结构中层叠样式表位置属性为固定属性的节点元素作为弹出框进行收集;或者,
    将网页的树形结构中层叠样式表位置属性为固定属性的节点元素、以及:网页中由用户删除的广告弹出框和/或由用户恢复显示的已拦截广告弹出框作为弹出框进行收集。
  12. 根据权利要求10所述的浏览器广告拦截装置,其中,所述训练模块,包括:
    设置模块,设置为设置所述弹出框是否为广告弹出框的标识;
    权值确定模块,设置为基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
    函数确定模块,设置为基于所述弹出框的各可观察特征值的权值确定出识别函数。
  13. 一种终端,包括处理器以及存储有所述处理器可执行指令的存储器,当所述指令被处理器执行时,执行如下操作:
    对弹出框的可观察特征值进行训练得到识别函数;
    针对任一弹出框,根据所述任一弹出框对应的所述识别函数的数值,判断所述任一弹出框是否为广告弹出框,若是,则进行拦截。
  14. 根据权利要求13所述的终端,其中,所述处理器执行的操作具体还包括:将网页的树形结构中层叠样式表位置属性为固定属性的节点元素作为弹出框进行收集;或者,
    将网页的树形结构中层叠样式表位置属性为固定属性的节点元素、以及:网页中由用户删除的广告弹出框和/或由用户恢复显示的已拦截广告弹出框作为弹出框进行收集。
  15. 根据权利要求13所述的终端,其中,所述处理器在执行所述对所述弹出框的可观察特征值进行训练得到识别函数的步骤时,具体包括如下操作:
    设置所述弹出框是否为广告弹出框的标识;
    基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
    基于所述弹出框的各可观察特征值的权值确定出识别函数。
  16. 根据权利要求13所述的终端,其中,所述处理器在执行所述对所述弹出框的可观察特征值进行训练得到识别函数的步骤时,具体包括如下操作:
    设置所述弹出框是否为广告弹出框的标识;
    基于所述标识对所述弹出框的可观察特征值进行训练,得到所述弹出框的各可观察特征值的权值;
    基于所述弹出框的各可观察特征值的权值筛选出所述弹出框的有效可观察特征值;
    基于所述弹出框的各有效可观察特征值的权值确定出识别函数。
  17. 根据权利要求16所述的终端,其中,所述处理器在执行所述基于所述弹出框的各可观察特征值的权值筛选出所述弹出框的有效可观察特征值的步骤时,具体包括如下操作:
    将所述弹出框的各可观察特征值的权值与设定的权值阈值进行比较,筛选出权值大于设定的权值阈值的可观察特征值作为有效可观察特征值。
  18. 根据权利要求15或16所述的终端,其中,所述处理器在执行所述设置所述弹出框是否为广告弹出框的标识的步骤时,具体包括如下操作:
    通过标记的方式和/或聚类算法设置所述弹出框是否为广告弹出框的标识。
  19. 根据权利要求15或16所述的终端,其中,所述训练是采用人工神经网络方法进行训练的;
    所述识别函数为阶跃激活函数。
  20. 根据权利要求15或16所述的终端,其中,所述弹出框的可观察特征值,包括以下至少之一:
    弹出框在网页层方向上相对于弹出框所在的网页的高度、弹出框相对于弹出框所在的网页的起始坐标位置相对值、弹出框与终端窗口的面积比、弹出框对应的网址与弹出框所在的网页的域名的相关性值、弹出框所呈现的文本与弹出框所在的网页的标题内容的相关性值。
  21. 根据权利要求13所述的终端,其中,所述处理器在执行根据任一弹出框对应的所述识别函数的数值判断所述任一弹出框是否为广告弹出框的步骤时,具体包括如下操作:
    若所述任一弹出框对应的所述识别函数的数值大于设定的识别阈值,则判定所述任一弹出框为广告弹出框,否则判定所述任一弹出框不是广告弹出框。
  22. 一种计算机可读存储介质,存储有计算机程序,当所述计算机程序被运行时,执行权利要求1至9中任一项所述的方法。
PCT/CN2017/107605 2017-03-21 2017-10-25 一种浏览器广告拦截方法、装置及终端 WO2018171189A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710168060.8 2017-03-21
CN201710168060.8A CN108628888A (zh) 2017-03-21 2017-03-21 一种浏览器广告拦截方法、装置及终端

Publications (1)

Publication Number Publication Date
WO2018171189A1 true WO2018171189A1 (zh) 2018-09-27

Family

ID=63584041

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/107605 WO2018171189A1 (zh) 2017-03-21 2017-10-25 一种浏览器广告拦截方法、装置及终端

Country Status (2)

Country Link
CN (1) CN108628888A (zh)
WO (1) WO2018171189A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111897606A (zh) * 2019-05-06 2020-11-06 北京奇虎科技有限公司 一种弹框的处理方法和装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591983A (zh) * 2012-01-10 2012-07-18 凤凰在线(北京)信息技术有限公司 一种广告过滤系统及其过滤方法
CN104778405A (zh) * 2015-03-11 2015-07-15 小米科技有限责任公司 广告拦截方法及装置
CN105516941A (zh) * 2014-10-13 2016-04-20 中兴通讯股份有限公司 一种垃圾短信的拦截方法及装置
CN105653550A (zh) * 2014-11-14 2016-06-08 腾讯科技(深圳)有限公司 网页过滤方法和装置
CN106033450A (zh) * 2015-03-17 2016-10-19 中兴通讯股份有限公司 一种广告拦截的方法、装置和浏览器
KR20160142075A (ko) * 2015-06-02 2016-12-12 엘지전자 주식회사 디스플레이 장치 및 그의 방송 컨텐트 차단 방법
CN106354836A (zh) * 2016-08-31 2017-01-25 南威软件股份有限公司 一种广告页面的预测方法和装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9922131B2 (en) * 2013-11-06 2018-03-20 Hipmunk, Inc. Graphical user interface machine to present a window
CN104346457A (zh) * 2014-10-31 2015-02-11 北京奇虎科技有限公司 拦截业务对象的方法及浏览器客户端

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591983A (zh) * 2012-01-10 2012-07-18 凤凰在线(北京)信息技术有限公司 一种广告过滤系统及其过滤方法
CN105516941A (zh) * 2014-10-13 2016-04-20 中兴通讯股份有限公司 一种垃圾短信的拦截方法及装置
CN105653550A (zh) * 2014-11-14 2016-06-08 腾讯科技(深圳)有限公司 网页过滤方法和装置
CN104778405A (zh) * 2015-03-11 2015-07-15 小米科技有限责任公司 广告拦截方法及装置
CN106033450A (zh) * 2015-03-17 2016-10-19 中兴通讯股份有限公司 一种广告拦截的方法、装置和浏览器
KR20160142075A (ko) * 2015-06-02 2016-12-12 엘지전자 주식회사 디스플레이 장치 및 그의 방송 컨텐트 차단 방법
CN106354836A (zh) * 2016-08-31 2017-01-25 南威软件股份有限公司 一种广告页面的预测方法和装置

Also Published As

Publication number Publication date
CN108628888A (zh) 2018-10-09

Similar Documents

Publication Publication Date Title
CN108566399B (zh) 钓鱼网站识别方法及系统
US10296552B1 (en) System and method for automated identification of internet advertising and creating rules for blocking of internet advertising
US20190188729A1 (en) System and method for detecting counterfeit product based on deep learning
EP3223174A1 (en) Method and system for selecting sample set for assessing the accessibility of a website
US10769196B2 (en) Method and apparatus for displaying electronic photo, and mobile device
CN105528422A (zh) 一种主题爬虫处理方法及装置
CN106021319A (zh) 语音交互方法、装置及系统
Murthy XML URL classification based on their semantic structure orientation for web mining applications
CN102902794A (zh) 网页分类系统及方法
CN102902790A (zh) 网页分类系统及方法
US10963690B2 (en) Method for identifying main picture in web page
WO2018171189A1 (zh) 一种浏览器广告拦截方法、装置及终端
US11074306B2 (en) Web content extraction method, device, storage medium
CN112182451A (zh) 网页内容摘要生成方法、设备、存储介质及装置
CN112287800A (zh) 一种无样本条件下的广告视频识别方法及系统
CN111539390A (zh) 一种基于Yolov3的小目标图像识别方法、设备和系统
CN108319606A (zh) 专业数据库的构建方法和装置
US20130230248A1 (en) Ensuring validity of the bookmark reference in a collaborative bookmarking system
US20220253503A1 (en) Generating interactive screenshot based on a static screenshot
CN113407678B (zh) 知识图谱构建方法、装置和设备
CN115186240A (zh) 基于关联性信息的社交网络用户对齐方法、装置、介质
CN115437930A (zh) 网页应用指纹信息的识别方法及相关设备
JP6129815B2 (ja) 情報処理装置、方法及びプログラム
CN108171074A (zh) 一种基于内容关联的Web追踪自动检测方法
CN109446330B (zh) 网络服务平台情感倾向识别方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17902250

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17902250

Country of ref document: EP

Kind code of ref document: A1