CN110489636A - A kind of web advertisement screen method based on code analysis and image procossing - Google Patents

A kind of web advertisement screen method based on code analysis and image procossing Download PDF

Info

Publication number
CN110489636A
CN110489636A CN201810485860.7A CN201810485860A CN110489636A CN 110489636 A CN110489636 A CN 110489636A CN 201810485860 A CN201810485860 A CN 201810485860A CN 110489636 A CN110489636 A CN 110489636A
Authority
CN
China
Prior art keywords
image
advertising
rear end
node
sent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810485860.7A
Other languages
Chinese (zh)
Inventor
许蕾
汪睿
李言辉
徐宝文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201810485860.7A priority Critical patent/CN110489636A/en
Publication of CN110489636A publication Critical patent/CN110489636A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention is a kind of web advertisement screen method based on code analysis and image procossing: script monitor DOMContentLoaded event in front end obtains the dom tree for triggering the event;Then it recursively traverses the dom tree and identifies the advertising logo that may include;Then formulating respective rule avoids webpage normal picture from being sent to rear end judgement, for that may include that the image of advertising logo is sent to rear end using AJAX technology and judges;Front end request is monitored in rear end, is judged using image of the image text identification model to request, and return result to front end script;Advertising area is finally identified according to advertising logo, and advertising area is shielded.

Description

A kind of web advertisement screen method based on code analysis and image procossing
Technical field
The invention belongs to field of computer technology, especially technical field of software engineering.The present invention is based on generation using a kind of Code analysis and the method for image procossing shield web advertisement, can remove high manpower and time dimension from by the method Shielding rules list cost is protected, wrong report and the rate of failing to report of web advertisement shielding can be effectively reduced.
Background technique
With the fast development of internet and increasingly popular, webpage has become the important sources that people obtain information.Net Page is also flooded with various commercial advertisements while providing the user with useful information.The possible occupying system resources of these advertisements, shadow Webpage content display is rung, induction user accesses harmful webpage, influences user experience, and then reduce user's viscosity.
Here advertisement refers to displaying advertisement, i.e. these advertisements are all automatically to be loaded into webpage, do not need user It clicks.This series advertisements is very popular at present, and occupies higher ratio.This series advertisements is propagated based on Web framework, is related to And the participation of the roles such as publisher, advertiser, advertising network, ad network, user.Publisher, that is, website owner, they The normal content of main issuing web site, it is also possible to sell advertisement position to some advertisers, and according to the pageview or click of user Amount is to make a profit.Advertiser creates advertisement, is the revenue streams of online advertisement.In the communication process of advertisement, advertising network is played the part of Drill matching publisher and gray role.Big advertising network (such as Google Display Network) provides extensively Accusing quotient can choose the platform of publisher and designated user.In addition, advertising network can also sell again him by ad network Advertisement position.For user when browsing webpage, advertisement will show user, when users click on such advertisements, will jump to In corresponding advertisement webpage.
Currently, some web advertisements shielding tools are widely used, for example, Adblock Plus, Adguard, AdSafe and AdMon.Most of these softwares shield advertisement according to specific list of rules, if Adblock Plus is according to specific list EasyList shields advertisement.The working method of Adblock Plus is mainly shown as two aspects: one is to carry out network Control, the other is the processing for the page.Network-control refers to that Adblock Plus can sentence when website issues HTTP request Whether the requested URL that breaks is in its list of rules EasyList, if words, then prevent this request, to reach shielding The effect of advertisement.And the id when page elements referred to for the processing of the page, in other attribute values such as class and EasyList It when rule matches, then removes or the element is not shown, to achieve the effect that shield advertisement.Although this rule-based Matched mode energy shielded segment advertisement, but this also results in and reports by mistake and the occurrence of failing to report.Particularly, this based on rule Then list shielding advertisement method needs constantly safeguarded according to the feedback of user, consume a large amount of time and manpower at This.In addition, the method based on filter rule match will fail with the appearance of webpage randomized technique.In addition, due to exploitation Personnel may misapply the content in filtering rule list when defining element id or class attribute value, lead to such method fault Shield normal web page contents.
In addition, certain methods carry out static program analysis by the source code to JavaScript, load is therefrom identified With the JavaScript script for showing advertisement.Its specific method is that corresponding feature construction point is extracted from JavaScript code Class device recycles trained classifier to judge whether some JavaScript script file is that advertisement is relevant, to advertisement phase The JavaScript script of pass carries out shielding to achieve the effect that shield advertisement.The advantages of this method is not have to continuous dimension Filter list is protected, there is considerable flexibility and scalability.But because the script of JavaScript is continuous with web technologies Variation, it is necessary to constantly extract new feature, construct suitable classifier, can be only achieved preferable effect.
Other methods do not extract feature from the JavaScript script of load, are based on from Adblock Plus Filter list EasyList in extract feature, according to these extract feature training classifier, then use trained point Class device judges new content, to reduce artificial intervention and cost.But it this is trained according to list of rules Obtained classifier, precision depend entirely on original list of rules, so effect need to be improved.
According to " the Internet advertising management Tentative Measures " Article 7 of China regulation, " Internet advertising should have recognizable Property, significantly indicate advertisement, enable the customer to distinguish it as advertisement " and existing method deficiency, present invention combination web page analysis Technology and image text identification technology propose a kind of method by identifying advertising logo in advertising area.This method can pass Traverse dom tree when DOMContentLoaded event is triggered with returning, then respectively to may include advertising logo in dom tree Text node, img node element, the node comprising backgroundImage attribute and the node comprising fixed attribute value It is handled.For the advertising logo that may be introduced by image, the present invention will send back-end server for image URL In, judge whether institute's detection image includes advertising logo by Image Classifier, then according to Image Classifier obtain as a result, Decide whether to shield the region comprising these images.
Summary of the invention
On the basis of work on hand, the problem to be solved in the present invention is: utilizing code analysis and image processing techniques, knows Advertising logo and then identification web advertisement region in other web page area reach effective webpage randomized technique bring of evading and show There is the higher problem of wrong report rate of failing to report of the indeterminable shielding problem of web advertisement screen method and existing method, to effectively protect Hinder the clean and tidy and safe of webpage.
The technical scheme is that front end script obtains the dom tree of triggering DOMContentLoaded event;By passing It traverses dom tree with returning identifies possible advertising logo node;Respective rule is formulated to avoid webpage normal picture from being sent to rear end sentencing It is disconnected;Front end request is monitored in rear end, is judged using image of the image text identification model to transmission, and before returning result to End feet sheet;Advertising area is identified according to advertising logo, and advertising area is shielded.
The present invention specifically includes the following steps:
1) front end script monitor DOMContentLoaded event obtains dom tree;
2) recursively traversal dom tree identifies possible advertising logo;
3) formulating respective rule avoids webpage normal picture from being sent to rear end judgement, for that may include the figure of advertising logo Judged as being sent to rear end using AJAX technology;
4) front end request is monitored in rear end, is judged using image of the image text identification model to request, and by result Return to front end script;
5) advertising area is identified according to advertising logo, and advertising area is shielded.
Step 1) monitors DOMContentLoaded event.One webpage is mainly by HTML, CSS and JavaScript (JS) It constitutes, wherein HTML constitutes the major architectural of webpage, so that interior had a basic displaying, CSS makes webpage have one A good layout, allows webpage to seem more beautiful, JS be then for being responded to some events that user triggers, such as to Some contents of family input are judged, provide corresponding prompt to undesirable content.So being parsed to the page The first step be exactly that webpage is converted to dom tree.After initial html document is fully loaded and is parsed, DOMContentLoaded event is triggered, so the present invention obtains triggering webpage by monitoring DOMContentLoaded event Dom tree.
Step 2) identifies possible advertising logo by recursively traversal dom tree, handles various types of nodes, comprising:
(1) text node, that is, text node may include " advertisement " character string in the value attribute value of the node;
(2) there are two types of the case where img node, that is, pictorial element node, which includes advertising logo.One is the advertisements Advertising logo in region is present in advertising image in a manner of watermark, and at this moment advertising image and advertising logo are whole as one Body is present in an img node element, the present invention in this form existing advertising logo be referred to as img element type watermark Advertising logo.Another kind is advertising logo only with (width range 20-50px, altitude range 12-30px) one smaller Image format exist, then navigated in some angle of original image by CSS style, the present invention claims to exist in this form Advertising logo be img element type small figure advertising logo.
(3) comprising the node of backgroundImage attribute, the advertising logo of such case is also by way of image It introduces, unlike the img node of front, the mode for introducing image here is to be by the way that the background attribute of some element is arranged One image passes through and the CSS style of the element is arranged and adds the introducing of backgroundImage attribute.Then pass through again CSS style navigates in some angle of original image.Particularly, the present invention is referred to as packet with advertising logo existing for this type node Advertising logo containing backImg (backgroundImage is abbreviated as backImg) attribute type.
After obtaining the dom tree of webpage, the present invention will recursively traverse this dom tree.When encountering text type node, this Invention obtains its value attribute value first, then judges whether the value includes " advertisement " character string, if comprising if, then It is further to be judged.Because certain text type nodes comprising " advertisement " character string may belong to the normal content of webpage, If being shielded, it will appear wrong report.
Step 3) formulates respective rule and webpage normal picture is avoided to be sent to rear end judgement, for that may include advertising logo Image be sent to rear end using AJAX technology and judged.
Because can be comprising more img element type and comprising backImg attribute type node, if will in a website All these images URL is sent to rear end, by the performance of extreme influence the method for the present invention, so the present invention can be according to certain rule Then determine which picture URL, which is sent to back-end server, is judged.
(1) processing img node element rule.By img node element introduce advertising logo have watermark advertisement mark and it is small Figure advertising logo both of these case, so different processing can be carried out to both of these case.For small figure advertising logo, the present invention It can be judged according to width the and height attribute value of img node element, watermark advertisement is identified, and the present invention is by basis Whether its region content includes that the text node of verbal description is judged.
(2) processing includes backImg attribute type node rule.Due to the node comprising backImg attribute type, it is only There are in the page in the form of a small advertising logo for meeting.So only needing to be judged according to itself width and height attribute value .
After being judged using above-mentioned rule, for that may include the image of advertising logo, the present invention passes through AJAX skill Art is sent to rear end and is judged.
Front end request is monitored in step 4) rear end, is judged using image of the image text identification model to request, and will As a result front end script is returned to.After rear end listens to the request of front end, the image URL in request is extracted first, then basis URL downloading image simultaneously judges the image using packaged image text identification model.Mould is identified in image text In type, the image for possessing watermark advertising logo cuts four angles in upper left, lower-left, upper right, bottom right of the image first, then Binaryzation is carried out to four angles of cutting again, extracts feature, the processing such as disaggregated model judgement.Known according to image text last rear end Other model returns result to front end script.
Step 5) identifies advertising area according to advertising logo, and shields to advertising area.When being had found in the page Some advertising logo is inadequate, if if finding advertising logo, only advertising logo of shielding, rather than is wrapped The entire advertising area of the advertising logo is contained.So need to find the advertising area comprising the mark according to advertising logo, And then this advertising area is shielded.Watermark advertisement is identified, need to only search to obtain the block grade member comprising this watermark advertisement mark After element, which is shielded.
Advertising logo for small figure advertising logo and comprising backImg attribute need to obtain upwards after finding these marks Whether block grade father node element recursively judges in this block grade element then by specific function comprising specific advertisement knot Structure, explanation finds advertising area if comprising if, is shielded.Otherwise it continues up and searches the progress of block grade father node element Judgement searches the number of plies upwards and at most carries out two layers of lookup, puts if not finding particular advertisement code structure yet after searching two layers It abandons and searches.
By using above technical scheme, the invention has the following advantages that
1. currently a popular web advertisement screen method is all based on shielding rules list to shield advertisement, and shields rule Then list needs the constant feedback of user could effectively shielding web page advertisement.The present invention proceeds from the reality, and provides a foundation The new method that advertising logo shields web advertisement in webpage shields list of rules without maintenance, reduces maintenance shielding The time of list of rules and human cost.
2. code analysis and image processing techniques is combined to carry out web advertisement shielding, it can effectively evade webpage randomization skill Art, and developer can be made to misapply problem regardless of id the and class attribute value of element.In addition, the method for the present invention is logical Cross image text identification technology identify image ad identify, trained image text identification model can use always and need not Continuous training disaggregated model.
Detailed description of the invention
Fig. 1 is structure chart of the invention.
Fig. 2 is to search advertising area flow chart according to advertising logo.
Specific embodiment
Present invention combination code analysis and image processing techniques shield advertisement in webpage, it may be assumed that monitor DOMContentLoaded event obtains the dom tree for triggering this event;Recursive traversal dom tree, finding in Web page may deposit Advertising logo;Formulating respective rule avoids webpage normal picture from being sent to rear end judgement;Image ad is identified into node URL It is sent to rear end by AJAX mode, and is judged using image text identification model;Advertisement area is searched according to advertising logo Domain shields advertisement in turn.
Structure of the invention is as shown in Figure 1, specifically include following five steps.
Step 1: front end script obtains the dom tree for triggering this event by monitoring DOMContentLoaded event.
The dom tree is made of each node, such as have head node, body node, p node etc..Each node Sequentially sequentially determined by them in the displaying of the page.In addition to root node does not have father node, tail node does not have outside child nodes, Other nodes have corresponding father node and child nodes.Present invention is primarily concerned with text type nodes, img element type node And the node comprising backgroundImage attribute type.
Step 2: recursively traversal dom tree identifies possible advertising logo.During traversal, the present invention will be right Text node, img node element and the node comprising backgroundImage attribute are handled.Specific ergodic algorithm Pseudocode is as follows:
Webpage normal picture is avoided to be sent to rear end judgement step 3: formulating respective rule.
(1) img element type node rule is handled
For the small figure advertising logo node processing of img element type: according to advertising logo in manual review website before The case where for small image, learn the width attribute-value ranges of its image between 20-50px, the height attribute value of image Range is between 12-30px, so if certain opens the width attribute value of image and height attribute value within this range, directly It connects and sends the image to back-end server and judged.This processing is abstracted as rule 1 by the present invention.
For being not belonging to the image of the range, shown just it is considered herein that the image may be one as web site contents Normal image will use following rule to be filtered.Image within the scope of this is less than for those, the present invention will directly ignore pair The processing of these images, because most of the image being less than within the scope of this is existed as an icon, if as wide If accusing mark, which will be too small, influences visual effect, it is possible to directly ignore the situation.The present invention handles this It is abstracted as rule 2.
The case where for being greater than the range, the present invention will carry out this using the method that watermark advertisement mark is identified below Judgement processing.Processing for the watermark advertisement mark node of img element type: because the image as advertising area is mostly all It is individualism, in its region content does not include the text section point value of verbal description, so can be found first comprising this img The upper layer block grade element of element, then can recursively traverse the block grade region, and whether search in this region includes text section point value, If comprising if, then it is assumed that this region is the normal content region of website, will not be sent to rear end and be judged, otherwise be sent Judged to rear end.This processing is abstracted as rule 3 by the present invention.
(2) processing includes backImg attribute type node rule
For the node comprising backImg attribute type, it only can be in the form of a small advertising logo in the page Middle displaying, and by way of relative positioning, which is navigated in some angle of actual advertisement image.So this yuan Width the and height attribute value of element is not too large will not be too small.The present invention only considers its width attribute-value ranges in 20- Between 50px and its height attribute value between 12-30px include backImg attribute type node, when such node When width and height attribute value meet above-mentioned condition, back-end server is sent by the type node and is judged, it is other Situation is then without processing.This processing is abstracted as rule 4 by the present invention.
After being judged using above-mentioned rule, for that may include the image of advertising logo, the present invention passes through creation XMLHttpRequest object sends rear end for image URL using AJAX technology.It constructs XMLHttpRequest object, use The code that AJAX technology sends request is as follows:
Step 4: front end request is monitored in rear end, judged using image of the image text identification model to request, and will As a result front end script is returned to.After back-end server listens to the request of front end script, image is extracted from request URL first URL downloads image according to the URL.Then this image is detected using image text identification model.
The present invention according to advertising logo character be usually all it is monochromatic, background color variation is smooth, character and background border ten Clear feature is distinguished to organize work.In image text identification model, first convert the image indicated by RGB to by gray value The image of expression, then the method for use information entropy and Canny operator edge detection carries out binary conversion treatment to gray-value image, Then HOG feature is extracted from binary image or extracts feature using CNN, then utilizes trained support vector machines (Support Vector Machine, abbreviation SVM) model or multilayer perceptron (Multi-layer Perceptron, abbreviation MLP) model classifies to complete to the text identification of image to the image of detection.Not in conjunction with each stage conditioning process Together, these three methods of comentropy+HOG feature+SVM, Canny operator+HOG feature+SVM, Canny operator+CNN are suggested.Through It crosses experiment to compare, since the method for Canny operator+HOG feature+SVM possesses best effect, so selecting this method as screen It covers image text in web advertisement and knows method for distinguishing.After image text identification model is to wanting detection image to identify, rear end Return result to front end then to carry out subsequent processing.
For possessing the image of watermark advertising logo, the present invention cuts upper left, lower-left, upper right, the bottom right of the image first Then four angles amplify processing to four angles of cutting again, then intercept some corner image using sliding window sliding, Canny operator binaryzation is reused, HOG feature is extracted, the modes such as svm classifier model sentence sliding window truncated picture It is disconnected, finally the judging result to certain image is added in masks set, judge from masks set the image whether include Advertising logo.The pseudocode for handling such image is as follows:
Step 5: searching advertising area according to advertising logo and then shielding advertisement.Some advertisements is had found in the page Knowledge is inadequate, because if being only only advertising logo of shielding if finding advertising logo, rather than contains this The entire advertising area of advertising logo.How introduction is found the minimum advertising area comprising advertising logo by the present invention, to reach To the effect of shielding advertisement.
The case where for advertising logo together with advertising image, i.e. advertising logo are to be present in advertisement in the form of watermark In region.When identifying watermark advertisement mark from advertising image, then searches first upwards and contain the block of this advertising image Grade element, then shields the content of entire block grade element.
Advertising logo is not present in advertising area in a manner of watermark instead of, the advertisements small as one Know, the situation in advertising area is then navigated to by way of CSS.After finding advertising logo, its father node is obtained upwards, Then the father node one is put into be known as judging in isIncludeAdRegion () function.The function will recursively time The each node gone through under the father node is until finding containing iframe or embed or object node or finding some img element When the father node of node is a element, just exit traversal, and correlating markings are set, expression has found the smallest advertising area, otherwise It is just exited until it reaches the set specific upward lookup number of plies.Why select iframe, embed, objec and comprising There is a element of img element, is because these elements are the essential elements for introducing advertisement as body matter.
Pass through above-mentioned method, it will the continuous parent element for searching advertising logo upwards, it is eligible until finding Advertising area or to search the number of plies of parent element upwards be more than that the specific number of plies (when processing select 2 layers) just stops lookup.Benefit With this method, can find comprising advertising logo and include advertisement body content advertising area, can effectively shield advertisement, and And also it is unlikely to report by mistake.The whole flow process handled this is as shown in Figure 2.
Specific implementation explanation is made to the present invention below with reference to the advertising area in certain Web page.The present invention, which not only limits, to be applicable in In the example.It is as follows to the concrete operations of the Web page:
1, front end script monitor DOMContentLoaded event obtains dom tree.
When the example web page is loaded, front end script can monitor DOMContentLoaded event, when After DOMContentLoaded event is triggered, the dom tree for triggering the event is obtained, for subsequent operation.
2, recursively traversal dom tree identifies possible advertising logo.
For the dom tree of acquisition, the dom tree will be recursively traversed, and carries out different processing to different nodes.At this In example, when traversing dom tree, img node element will be handled.
3, formulating respective rule avoids webpage normal picture from being sent to rear end judgement, for that may include the figure of advertising logo Judged as being sent to rear end using AJAX technology.
For the img element in this example, the present invention obtains the element using window.getComputedStyle () Width and height attribute value is respectively 310*130, and carries out judgement to the node element using dependency rule and find the example It is not belonging to normal picture in webpage, creation XMLHttpRequest object is then passed through using AJAX technology and is sent to rear end progress Judgement, request are linked as http: // 127.0.0.1: //http://s1.cncnimg.cn/224134.jpg.
4, front end request is monitored in rear end, is judged using image of the image text identification model to request, and by result Return to front end script.
Rear end responds request, and it is http://s1.cncnimg.cn/ that image URL is extracted from the URL of request 224134.jpg and downloading the image.Find that the image may be for water according to width the and height attribute value of the image The image for printing advertising logo, then in back-end image text identification module, the left side for being first 50*30 to the image interception size Above, the image at four lower-left, upper right, bottom right angles.Then a series of processing are carried out to the image at this four angles, such as from RGB image To the conversion of gray-value image, binary conversion treatment is carried out to gray-value image using Canny operator edge detection method, from two-value Change and extracts HOG feature in image.Finally classified using trained SVM model to the image in this four corners, realize from Identification text in image.Image text identification module through the invention is found comprising advertising logo in the image, then will As a result front end is returned to.
5, advertising area is identified according to advertising logo, and advertising area is shielded.
After the judgement of rear end, front end script receives the judging result that rear end provides.In this example, front end receives this Image includes the rear end judging result of advertising logo.Then front end script is found wide comprising this according to the method for searching advertising area The region of mark is accused, the region found herein is the div node element region that class attribute value is " banner ", is finally used This advertising area of removeChild () function mask.
In short, the technology of present invention combination code analysis and image procossing shields the advertisement in webpage, propose It is a kind of according to advertising logo come the new approaches of advertisement in shielding web page, and give the algorithm of recursive traversal dom tree, propose one The rule of the normal Web page image of Series Filtration devises effective image text identification model, finally completes to web advertisement Effectively shielding.Compared to traditional web advertisement screen method, the present invention is strong not only for property, and target is more clearly careful, finally Shielding have more comprehensive high efficiency.

Claims (7)

1. the invention proposes a kind of web advertisement screen method based on code analysis and image procossing, it is characterized in that preceding end feet This obtains the dom tree of the triggering event by monitoring DOMContentLoaded event, and then recursively the identification of traversal dom tree can The advertising logo of energy, when encountering the node comprising image, after avoiding webpage normal picture from being sent to by formulation respective rule End judgement, the node for not meeting shielding rules are sent to rear end using AJAX technology and judge, rear end is monitored front end and asked It asks and responds, judged using image of the image text identification model to request, and return result to front end script, front end Script searches advertising area by advertising logo according to returning the result, and shields to advertising area.
2. described a kind of web advertisement screen method based on code analysis and image procossing according to claim 1, special Sign the following steps are included:
1) front end script monitor DOMContentLoaded event obtains dom tree;
2) recursively traversal dom tree identifies possible advertising logo;
3) formulating respective rule avoids webpage normal picture from being sent to rear end judgement, for that may include the image benefit of advertising logo Rear end is sent to AJAX technology to be judged;
4) front end request is monitored in rear end, is judged using image of the image text identification model to request, and result is returned Give front end script;
5) advertising area is identified according to advertising logo, and advertising area is shielded.
3. a kind of web advertisement screen method based on code analysis and image procossing according to described in right 2, it is characterized in that Front end script obtains the dom tree for triggering the event, by monitoring DOMContentLoaded event for subsequent in step 1) Step is further to this to be handled.
4. a kind of web advertisement screen method based on code analysis and image procossing according to described in right 2, it is characterized in that Dom tree is recursively traversed in step 2) and identifies possible advertising logo, which will comprehensively analyze full page, right Different types of node carries out different processing, it may be assumed that when encountering text text node, the value by obtaining the node belongs to Property value in whether comprising " advertisement " character string judge whether the node belongs to advertising logo;When encountering img node element and packet When containing backgroundImage (being abbreviated as backImg) attribute node, it will be sent to rear end, use image text identification model Judge whether this kind of node belongs to advertising logo.
5. a kind of web advertisement screen method based on code analysis and image procossing according to described in right 2, it is characterized in that It avoids webpage normal picture to be sent to rear end by using corresponding rule in step 3) to judge to mitigate service end pressure, For needing to be sent to nodes that rear end judges, comprising abnormal images, passes through building XMLHttpRequest object, uses AJAX technology is sent to rear end and is judged, the rule of the filtering web page normal picture of design are as follows:
Rule 1: for img element type node, when the width attribute-value ranges of image are between 20-50px, image Height attribute-value ranges directly send the image to back-end server and are judged between 12-30px;
Rule 2: for img element type node, when image width attribute value is less than 20 or the height attribute value of image is small When 12, directly ignore this situation, without processing;
Rule 3: for img element type node, when the width attribute value of image be greater than 50 or image height attribute value it is big In 30, the upper layer block grade element comprising this img element can be found first, the block grade region then can be recursively traversed, search this It whether include text section point value in region, if comprising if, then it is assumed that this region is the normal content region of website, Bu Huifa It is sent to rear end to be judged, is otherwise sent to rear end and is judged;
Rule 4: it for comprising backImg attribute type node, when its backgroundImage attribute value is not empty, and saves For point width attribute-value ranges between 20-50px, height attribute-value ranges then send it to rear end between 12-30px Server is judged.
6. a kind of web advertisement screen method based on code analysis and image procossing, feature according to described in right 2 exist Rear end is requested and is responded by monitoring front end in step 4), is converted into gray value figure after downloading image, uses Canny Operator carries out binary conversion treatment to it, then therefrom extracts HOG feature, is finally judged using svm classifier model it, complete At image text identifying processing, the result of image text identification model is finally returned into front end script.
7. a kind of web advertisement screen method based on code analysis and image procossing, feature according to described in right 2 exist According to advertising logo in step 5), search whether block grade region includes particular advertisement structural code upwards, if comprising if, Stop searching and shield advertising area, otherwise reaches two layers of just stopping lookup until searching the block grade element number of plies upwards, for looking for The advertising area arrived uses removeChild () function mask advertising area.
CN201810485860.7A 2018-05-15 2018-05-15 A kind of web advertisement screen method based on code analysis and image procossing Pending CN110489636A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810485860.7A CN110489636A (en) 2018-05-15 2018-05-15 A kind of web advertisement screen method based on code analysis and image procossing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810485860.7A CN110489636A (en) 2018-05-15 2018-05-15 A kind of web advertisement screen method based on code analysis and image procossing

Publications (1)

Publication Number Publication Date
CN110489636A true CN110489636A (en) 2019-11-22

Family

ID=68545324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810485860.7A Pending CN110489636A (en) 2018-05-15 2018-05-15 A kind of web advertisement screen method based on code analysis and image procossing

Country Status (1)

Country Link
CN (1) CN110489636A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353112A (en) * 2020-02-27 2020-06-30 百度在线网络技术(北京)有限公司 Page processing method and device, electronic equipment and computer readable medium
KR20210040449A (en) * 2020-02-27 2021-04-13 바이두 온라인 네트웍 테크놀러지 (베이징) 캄파니 리미티드 Page processing methods, devices, electronic devices, and computer-readable media
CN116562270A (en) * 2023-07-07 2023-08-08 天津亿科科技有限公司 Natural language processing system supporting multi-mode input and method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605688A (en) * 2013-11-01 2014-02-26 北京奇虎科技有限公司 Intercept method and intercept device for homepage advertisements and browser
CN106326316A (en) * 2015-07-08 2017-01-11 腾讯科技(深圳)有限公司 Web page advertisement filtering method and device
CN107562864A (en) * 2017-08-30 2018-01-09 努比亚技术有限公司 A kind of advertisement screen method, mobile terminal and computer-readable recording medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605688A (en) * 2013-11-01 2014-02-26 北京奇虎科技有限公司 Intercept method and intercept device for homepage advertisements and browser
CN106326316A (en) * 2015-07-08 2017-01-11 腾讯科技(深圳)有限公司 Web page advertisement filtering method and device
CN107562864A (en) * 2017-08-30 2018-01-09 努比亚技术有限公司 A kind of advertisement screen method, mobile terminal and computer-readable recording medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353112A (en) * 2020-02-27 2020-06-30 百度在线网络技术(北京)有限公司 Page processing method and device, electronic equipment and computer readable medium
KR20210040449A (en) * 2020-02-27 2021-04-13 바이두 온라인 네트웍 테크놀러지 (베이징) 캄파니 리미티드 Page processing methods, devices, electronic devices, and computer-readable media
EP3851981A4 (en) * 2020-02-27 2021-12-29 Baidu Online Network Technology (Beijing) Co., Ltd Page processing method and apparatus, electronic device and computer readable medium
JP2022512056A (en) * 2020-02-27 2022-02-02 バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド Page processing methods, devices, electronic devices and computer readable storage media
JP7212771B2 (en) 2020-02-27 2023-01-25 バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド Page processing method, device, electronic device and computer readable storage medium
KR102565950B1 (en) * 2020-02-27 2023-08-10 바이두 온라인 네트웍 테크놀러지 (베이징) 캄파니 리미티드 Page processing method, device, electronic device and computer readable medium
CN116562270A (en) * 2023-07-07 2023-08-08 天津亿科科技有限公司 Natural language processing system supporting multi-mode input and method thereof

Similar Documents

Publication Publication Date Title
US10789626B2 (en) Deep-linking system, method and computer program product for online advertisement and e-commerce
CN103886074B (en) Commercial product recommending system based on social media
US7873640B2 (en) Semantic analysis documents to rank terms
US7620651B2 (en) System for dynamic product summary based on consumer-contributed keywords
CN103678511B (en) The method and device of webpage content extraction is carried out according to visual template
CN110489636A (en) A kind of web advertisement screen method based on code analysis and image procossing
CN105917369A (en) Modifying advertisement sizing for presentation in digital magazine
US20110288931A1 (en) Microsite models
CN103942257B (en) Video search method and device
CN102779136A (en) Method and device for information search
US20170316446A1 (en) Optimization of Online Advertising Assets
US9672541B2 (en) Visual tag editor
CN103605715B (en) Data Integration treating method and apparatus for multiple data sources
CN108665064A (en) Neural network model training, object recommendation method and device
US11699019B2 (en) Visual content optimization system using artificial intelligence (AI) based design generation and validation
KR20140038962A (en) Aggregation of conversion paths utilizing user interaction grouping
CN104462251B (en) The data processing method and device launched for network multimedia file
CN106688215A (en) Automated click type selection for content performance optimization
CN106462559A (en) Arbitrary size content item generation
CN106611353A (en) Audience obtaining method and server equipment
CN106354855A (en) Recommendation method and system
CN106407220A (en) Information publishment control method, control apparatus, and system
CN106874502A (en) A kind of method of video search, device and terminal
CN111159572A (en) Recommended content auditing method and device, electronic equipment and storage medium
Nyein Mining contents in Web page using cosine similarity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191122

WD01 Invention patent application deemed withdrawn after publication