CN110489636A - A kind of web advertisement screen method based on code analysis and image procossing - Google Patents
A kind of web advertisement screen method based on code analysis and image procossing Download PDFInfo
- Publication number
- CN110489636A CN110489636A CN201810485860.7A CN201810485860A CN110489636A CN 110489636 A CN110489636 A CN 110489636A CN 201810485860 A CN201810485860 A CN 201810485860A CN 110489636 A CN110489636 A CN 110489636A
- Authority
- CN
- China
- Prior art keywords
- image
- advertising
- rear end
- node
- sent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000004458 analytical method Methods 0.000 title claims abstract description 16
- 238000005516 engineering process Methods 0.000 claims abstract description 16
- 238000012545 processing Methods 0.000 claims description 26
- 239000000284 extract Substances 0.000 claims description 8
- 238000012544 monitoring process Methods 0.000 claims description 5
- 230000008901 benefit Effects 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 3
- 230000002159 abnormal effect Effects 0.000 claims 1
- 238000009472 formulation Methods 0.000 claims 1
- 239000000203 mixture Substances 0.000 claims 1
- 230000000694 effects Effects 0.000 description 7
- 238000001514 detection method Methods 0.000 description 3
- 230000001960 triggered effect Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000003708 edge detection Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000006854 communication Effects 0.000 description 1
- 230000003750 conditioning effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- UIAGMCDKSXEBJQ-UHFFFAOYSA-N nimodipine Chemical compound COCCOC(=O)C1=C(C)NC(C)=C(C(=O)OC(C)C)C1C1=CC=CC([N+]([O-])=O)=C1 UIAGMCDKSXEBJQ-UHFFFAOYSA-N 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The present invention is a kind of web advertisement screen method based on code analysis and image procossing: script monitor DOMContentLoaded event in front end obtains the dom tree for triggering the event;Then it recursively traverses the dom tree and identifies the advertising logo that may include;Then formulating respective rule avoids webpage normal picture from being sent to rear end judgement, for that may include that the image of advertising logo is sent to rear end using AJAX technology and judges;Front end request is monitored in rear end, is judged using image of the image text identification model to request, and return result to front end script;Advertising area is finally identified according to advertising logo, and advertising area is shielded.
Description
Technical field
The invention belongs to field of computer technology, especially technical field of software engineering.The present invention is based on generation using a kind of
Code analysis and the method for image procossing shield web advertisement, can remove high manpower and time dimension from by the method
Shielding rules list cost is protected, wrong report and the rate of failing to report of web advertisement shielding can be effectively reduced.
Background technique
With the fast development of internet and increasingly popular, webpage has become the important sources that people obtain information.Net
Page is also flooded with various commercial advertisements while providing the user with useful information.The possible occupying system resources of these advertisements, shadow
Webpage content display is rung, induction user accesses harmful webpage, influences user experience, and then reduce user's viscosity.
Here advertisement refers to displaying advertisement, i.e. these advertisements are all automatically to be loaded into webpage, do not need user
It clicks.This series advertisements is very popular at present, and occupies higher ratio.This series advertisements is propagated based on Web framework, is related to
And the participation of the roles such as publisher, advertiser, advertising network, ad network, user.Publisher, that is, website owner, they
The normal content of main issuing web site, it is also possible to sell advertisement position to some advertisers, and according to the pageview or click of user
Amount is to make a profit.Advertiser creates advertisement, is the revenue streams of online advertisement.In the communication process of advertisement, advertising network is played the part of
Drill matching publisher and gray role.Big advertising network (such as Google Display Network) provides extensively
Accusing quotient can choose the platform of publisher and designated user.In addition, advertising network can also sell again him by ad network
Advertisement position.For user when browsing webpage, advertisement will show user, when users click on such advertisements, will jump to
In corresponding advertisement webpage.
Currently, some web advertisements shielding tools are widely used, for example, Adblock Plus, Adguard, AdSafe and
AdMon.Most of these softwares shield advertisement according to specific list of rules, if Adblock Plus is according to specific list
EasyList shields advertisement.The working method of Adblock Plus is mainly shown as two aspects: one is to carry out network
Control, the other is the processing for the page.Network-control refers to that Adblock Plus can sentence when website issues HTTP request
Whether the requested URL that breaks is in its list of rules EasyList, if words, then prevent this request, to reach shielding
The effect of advertisement.And the id when page elements referred to for the processing of the page, in other attribute values such as class and EasyList
It when rule matches, then removes or the element is not shown, to achieve the effect that shield advertisement.Although this rule-based
Matched mode energy shielded segment advertisement, but this also results in and reports by mistake and the occurrence of failing to report.Particularly, this based on rule
Then list shielding advertisement method needs constantly safeguarded according to the feedback of user, consume a large amount of time and manpower at
This.In addition, the method based on filter rule match will fail with the appearance of webpage randomized technique.In addition, due to exploitation
Personnel may misapply the content in filtering rule list when defining element id or class attribute value, lead to such method fault
Shield normal web page contents.
In addition, certain methods carry out static program analysis by the source code to JavaScript, load is therefrom identified
With the JavaScript script for showing advertisement.Its specific method is that corresponding feature construction point is extracted from JavaScript code
Class device recycles trained classifier to judge whether some JavaScript script file is that advertisement is relevant, to advertisement phase
The JavaScript script of pass carries out shielding to achieve the effect that shield advertisement.The advantages of this method is not have to continuous dimension
Filter list is protected, there is considerable flexibility and scalability.But because the script of JavaScript is continuous with web technologies
Variation, it is necessary to constantly extract new feature, construct suitable classifier, can be only achieved preferable effect.
Other methods do not extract feature from the JavaScript script of load, are based on from Adblock Plus
Filter list EasyList in extract feature, according to these extract feature training classifier, then use trained point
Class device judges new content, to reduce artificial intervention and cost.But it this is trained according to list of rules
Obtained classifier, precision depend entirely on original list of rules, so effect need to be improved.
According to " the Internet advertising management Tentative Measures " Article 7 of China regulation, " Internet advertising should have recognizable
Property, significantly indicate advertisement, enable the customer to distinguish it as advertisement " and existing method deficiency, present invention combination web page analysis
Technology and image text identification technology propose a kind of method by identifying advertising logo in advertising area.This method can pass
Traverse dom tree when DOMContentLoaded event is triggered with returning, then respectively to may include advertising logo in dom tree
Text node, img node element, the node comprising backgroundImage attribute and the node comprising fixed attribute value
It is handled.For the advertising logo that may be introduced by image, the present invention will send back-end server for image URL
In, judge whether institute's detection image includes advertising logo by Image Classifier, then according to Image Classifier obtain as a result,
Decide whether to shield the region comprising these images.
Summary of the invention
On the basis of work on hand, the problem to be solved in the present invention is: utilizing code analysis and image processing techniques, knows
Advertising logo and then identification web advertisement region in other web page area reach effective webpage randomized technique bring of evading and show
There is the higher problem of wrong report rate of failing to report of the indeterminable shielding problem of web advertisement screen method and existing method, to effectively protect
Hinder the clean and tidy and safe of webpage.
The technical scheme is that front end script obtains the dom tree of triggering DOMContentLoaded event;By passing
It traverses dom tree with returning identifies possible advertising logo node;Respective rule is formulated to avoid webpage normal picture from being sent to rear end sentencing
It is disconnected;Front end request is monitored in rear end, is judged using image of the image text identification model to transmission, and before returning result to
End feet sheet;Advertising area is identified according to advertising logo, and advertising area is shielded.
The present invention specifically includes the following steps:
1) front end script monitor DOMContentLoaded event obtains dom tree;
2) recursively traversal dom tree identifies possible advertising logo;
3) formulating respective rule avoids webpage normal picture from being sent to rear end judgement, for that may include the figure of advertising logo
Judged as being sent to rear end using AJAX technology;
4) front end request is monitored in rear end, is judged using image of the image text identification model to request, and by result
Return to front end script;
5) advertising area is identified according to advertising logo, and advertising area is shielded.
Step 1) monitors DOMContentLoaded event.One webpage is mainly by HTML, CSS and JavaScript (JS)
It constitutes, wherein HTML constitutes the major architectural of webpage, so that interior had a basic displaying, CSS makes webpage have one
A good layout, allows webpage to seem more beautiful, JS be then for being responded to some events that user triggers, such as to
Some contents of family input are judged, provide corresponding prompt to undesirable content.So being parsed to the page
The first step be exactly that webpage is converted to dom tree.After initial html document is fully loaded and is parsed,
DOMContentLoaded event is triggered, so the present invention obtains triggering webpage by monitoring DOMContentLoaded event
Dom tree.
Step 2) identifies possible advertising logo by recursively traversal dom tree, handles various types of nodes, comprising:
(1) text node, that is, text node may include " advertisement " character string in the value attribute value of the node;
(2) there are two types of the case where img node, that is, pictorial element node, which includes advertising logo.One is the advertisements
Advertising logo in region is present in advertising image in a manner of watermark, and at this moment advertising image and advertising logo are whole as one
Body is present in an img node element, the present invention in this form existing advertising logo be referred to as img element type watermark
Advertising logo.Another kind is advertising logo only with (width range 20-50px, altitude range 12-30px) one smaller
Image format exist, then navigated in some angle of original image by CSS style, the present invention claims to exist in this form
Advertising logo be img element type small figure advertising logo.
(3) comprising the node of backgroundImage attribute, the advertising logo of such case is also by way of image
It introduces, unlike the img node of front, the mode for introducing image here is to be by the way that the background attribute of some element is arranged
One image passes through and the CSS style of the element is arranged and adds the introducing of backgroundImage attribute.Then pass through again
CSS style navigates in some angle of original image.Particularly, the present invention is referred to as packet with advertising logo existing for this type node
Advertising logo containing backImg (backgroundImage is abbreviated as backImg) attribute type.
After obtaining the dom tree of webpage, the present invention will recursively traverse this dom tree.When encountering text type node, this
Invention obtains its value attribute value first, then judges whether the value includes " advertisement " character string, if comprising if, then
It is further to be judged.Because certain text type nodes comprising " advertisement " character string may belong to the normal content of webpage,
If being shielded, it will appear wrong report.
Step 3) formulates respective rule and webpage normal picture is avoided to be sent to rear end judgement, for that may include advertising logo
Image be sent to rear end using AJAX technology and judged.
Because can be comprising more img element type and comprising backImg attribute type node, if will in a website
All these images URL is sent to rear end, by the performance of extreme influence the method for the present invention, so the present invention can be according to certain rule
Then determine which picture URL, which is sent to back-end server, is judged.
(1) processing img node element rule.By img node element introduce advertising logo have watermark advertisement mark and it is small
Figure advertising logo both of these case, so different processing can be carried out to both of these case.For small figure advertising logo, the present invention
It can be judged according to width the and height attribute value of img node element, watermark advertisement is identified, and the present invention is by basis
Whether its region content includes that the text node of verbal description is judged.
(2) processing includes backImg attribute type node rule.Due to the node comprising backImg attribute type, it is only
There are in the page in the form of a small advertising logo for meeting.So only needing to be judged according to itself width and height attribute value
.
After being judged using above-mentioned rule, for that may include the image of advertising logo, the present invention passes through AJAX skill
Art is sent to rear end and is judged.
Front end request is monitored in step 4) rear end, is judged using image of the image text identification model to request, and will
As a result front end script is returned to.After rear end listens to the request of front end, the image URL in request is extracted first, then basis
URL downloading image simultaneously judges the image using packaged image text identification model.Mould is identified in image text
In type, the image for possessing watermark advertising logo cuts four angles in upper left, lower-left, upper right, bottom right of the image first, then
Binaryzation is carried out to four angles of cutting again, extracts feature, the processing such as disaggregated model judgement.Known according to image text last rear end
Other model returns result to front end script.
Step 5) identifies advertising area according to advertising logo, and shields to advertising area.When being had found in the page
Some advertising logo is inadequate, if if finding advertising logo, only advertising logo of shielding, rather than is wrapped
The entire advertising area of the advertising logo is contained.So need to find the advertising area comprising the mark according to advertising logo,
And then this advertising area is shielded.Watermark advertisement is identified, need to only search to obtain the block grade member comprising this watermark advertisement mark
After element, which is shielded.
Advertising logo for small figure advertising logo and comprising backImg attribute need to obtain upwards after finding these marks
Whether block grade father node element recursively judges in this block grade element then by specific function comprising specific advertisement knot
Structure, explanation finds advertising area if comprising if, is shielded.Otherwise it continues up and searches the progress of block grade father node element
Judgement searches the number of plies upwards and at most carries out two layers of lookup, puts if not finding particular advertisement code structure yet after searching two layers
It abandons and searches.
By using above technical scheme, the invention has the following advantages that
1. currently a popular web advertisement screen method is all based on shielding rules list to shield advertisement, and shields rule
Then list needs the constant feedback of user could effectively shielding web page advertisement.The present invention proceeds from the reality, and provides a foundation
The new method that advertising logo shields web advertisement in webpage shields list of rules without maintenance, reduces maintenance shielding
The time of list of rules and human cost.
2. code analysis and image processing techniques is combined to carry out web advertisement shielding, it can effectively evade webpage randomization skill
Art, and developer can be made to misapply problem regardless of id the and class attribute value of element.In addition, the method for the present invention is logical
Cross image text identification technology identify image ad identify, trained image text identification model can use always and need not
Continuous training disaggregated model.
Detailed description of the invention
Fig. 1 is structure chart of the invention.
Fig. 2 is to search advertising area flow chart according to advertising logo.
Specific embodiment
Present invention combination code analysis and image processing techniques shield advertisement in webpage, it may be assumed that monitor
DOMContentLoaded event obtains the dom tree for triggering this event;Recursive traversal dom tree, finding in Web page may deposit
Advertising logo;Formulating respective rule avoids webpage normal picture from being sent to rear end judgement;Image ad is identified into node URL
It is sent to rear end by AJAX mode, and is judged using image text identification model;Advertisement area is searched according to advertising logo
Domain shields advertisement in turn.
Structure of the invention is as shown in Figure 1, specifically include following five steps.
Step 1: front end script obtains the dom tree for triggering this event by monitoring DOMContentLoaded event.
The dom tree is made of each node, such as have head node, body node, p node etc..Each node
Sequentially sequentially determined by them in the displaying of the page.In addition to root node does not have father node, tail node does not have outside child nodes,
Other nodes have corresponding father node and child nodes.Present invention is primarily concerned with text type nodes, img element type node
And the node comprising backgroundImage attribute type.
Step 2: recursively traversal dom tree identifies possible advertising logo.During traversal, the present invention will be right
Text node, img node element and the node comprising backgroundImage attribute are handled.Specific ergodic algorithm
Pseudocode is as follows:
Webpage normal picture is avoided to be sent to rear end judgement step 3: formulating respective rule.
(1) img element type node rule is handled
For the small figure advertising logo node processing of img element type: according to advertising logo in manual review website before
The case where for small image, learn the width attribute-value ranges of its image between 20-50px, the height attribute value of image
Range is between 12-30px, so if certain opens the width attribute value of image and height attribute value within this range, directly
It connects and sends the image to back-end server and judged.This processing is abstracted as rule 1 by the present invention.
For being not belonging to the image of the range, shown just it is considered herein that the image may be one as web site contents
Normal image will use following rule to be filtered.Image within the scope of this is less than for those, the present invention will directly ignore pair
The processing of these images, because most of the image being less than within the scope of this is existed as an icon, if as wide
If accusing mark, which will be too small, influences visual effect, it is possible to directly ignore the situation.The present invention handles this
It is abstracted as rule 2.
The case where for being greater than the range, the present invention will carry out this using the method that watermark advertisement mark is identified below
Judgement processing.Processing for the watermark advertisement mark node of img element type: because the image as advertising area is mostly all
It is individualism, in its region content does not include the text section point value of verbal description, so can be found first comprising this img
The upper layer block grade element of element, then can recursively traverse the block grade region, and whether search in this region includes text section point value,
If comprising if, then it is assumed that this region is the normal content region of website, will not be sent to rear end and be judged, otherwise be sent
Judged to rear end.This processing is abstracted as rule 3 by the present invention.
(2) processing includes backImg attribute type node rule
For the node comprising backImg attribute type, it only can be in the form of a small advertising logo in the page
Middle displaying, and by way of relative positioning, which is navigated in some angle of actual advertisement image.So this yuan
Width the and height attribute value of element is not too large will not be too small.The present invention only considers its width attribute-value ranges in 20-
Between 50px and its height attribute value between 12-30px include backImg attribute type node, when such node
When width and height attribute value meet above-mentioned condition, back-end server is sent by the type node and is judged, it is other
Situation is then without processing.This processing is abstracted as rule 4 by the present invention.
After being judged using above-mentioned rule, for that may include the image of advertising logo, the present invention passes through creation
XMLHttpRequest object sends rear end for image URL using AJAX technology.It constructs XMLHttpRequest object, use
The code that AJAX technology sends request is as follows:
Step 4: front end request is monitored in rear end, judged using image of the image text identification model to request, and will
As a result front end script is returned to.After back-end server listens to the request of front end script, image is extracted from request URL first
URL downloads image according to the URL.Then this image is detected using image text identification model.
The present invention according to advertising logo character be usually all it is monochromatic, background color variation is smooth, character and background border ten
Clear feature is distinguished to organize work.In image text identification model, first convert the image indicated by RGB to by gray value
The image of expression, then the method for use information entropy and Canny operator edge detection carries out binary conversion treatment to gray-value image,
Then HOG feature is extracted from binary image or extracts feature using CNN, then utilizes trained support vector machines
(Support Vector Machine, abbreviation SVM) model or multilayer perceptron (Multi-layer Perceptron, abbreviation
MLP) model classifies to complete to the text identification of image to the image of detection.Not in conjunction with each stage conditioning process
Together, these three methods of comentropy+HOG feature+SVM, Canny operator+HOG feature+SVM, Canny operator+CNN are suggested.Through
It crosses experiment to compare, since the method for Canny operator+HOG feature+SVM possesses best effect, so selecting this method as screen
It covers image text in web advertisement and knows method for distinguishing.After image text identification model is to wanting detection image to identify, rear end
Return result to front end then to carry out subsequent processing.
For possessing the image of watermark advertising logo, the present invention cuts upper left, lower-left, upper right, the bottom right of the image first
Then four angles amplify processing to four angles of cutting again, then intercept some corner image using sliding window sliding,
Canny operator binaryzation is reused, HOG feature is extracted, the modes such as svm classifier model sentence sliding window truncated picture
It is disconnected, finally the judging result to certain image is added in masks set, judge from masks set the image whether include
Advertising logo.The pseudocode for handling such image is as follows:
Step 5: searching advertising area according to advertising logo and then shielding advertisement.Some advertisements is had found in the page
Knowledge is inadequate, because if being only only advertising logo of shielding if finding advertising logo, rather than contains this
The entire advertising area of advertising logo.How introduction is found the minimum advertising area comprising advertising logo by the present invention, to reach
To the effect of shielding advertisement.
The case where for advertising logo together with advertising image, i.e. advertising logo are to be present in advertisement in the form of watermark
In region.When identifying watermark advertisement mark from advertising image, then searches first upwards and contain the block of this advertising image
Grade element, then shields the content of entire block grade element.
Advertising logo is not present in advertising area in a manner of watermark instead of, the advertisements small as one
Know, the situation in advertising area is then navigated to by way of CSS.After finding advertising logo, its father node is obtained upwards,
Then the father node one is put into be known as judging in isIncludeAdRegion () function.The function will recursively time
The each node gone through under the father node is until finding containing iframe or embed or object node or finding some img element
When the father node of node is a element, just exit traversal, and correlating markings are set, expression has found the smallest advertising area, otherwise
It is just exited until it reaches the set specific upward lookup number of plies.Why select iframe, embed, objec and comprising
There is a element of img element, is because these elements are the essential elements for introducing advertisement as body matter.
Pass through above-mentioned method, it will the continuous parent element for searching advertising logo upwards, it is eligible until finding
Advertising area or to search the number of plies of parent element upwards be more than that the specific number of plies (when processing select 2 layers) just stops lookup.Benefit
With this method, can find comprising advertising logo and include advertisement body content advertising area, can effectively shield advertisement, and
And also it is unlikely to report by mistake.The whole flow process handled this is as shown in Figure 2.
Specific implementation explanation is made to the present invention below with reference to the advertising area in certain Web page.The present invention, which not only limits, to be applicable in
In the example.It is as follows to the concrete operations of the Web page:
1, front end script monitor DOMContentLoaded event obtains dom tree.
When the example web page is loaded, front end script can monitor DOMContentLoaded event, when
After DOMContentLoaded event is triggered, the dom tree for triggering the event is obtained, for subsequent operation.
2, recursively traversal dom tree identifies possible advertising logo.
For the dom tree of acquisition, the dom tree will be recursively traversed, and carries out different processing to different nodes.At this
In example, when traversing dom tree, img node element will be handled.
3, formulating respective rule avoids webpage normal picture from being sent to rear end judgement, for that may include the figure of advertising logo
Judged as being sent to rear end using AJAX technology.
For the img element in this example, the present invention obtains the element using window.getComputedStyle ()
Width and height attribute value is respectively 310*130, and carries out judgement to the node element using dependency rule and find the example
It is not belonging to normal picture in webpage, creation XMLHttpRequest object is then passed through using AJAX technology and is sent to rear end progress
Judgement, request are linked as http: // 127.0.0.1: //http://s1.cncnimg.cn/224134.jpg.
4, front end request is monitored in rear end, is judged using image of the image text identification model to request, and by result
Return to front end script.
Rear end responds request, and it is http://s1.cncnimg.cn/ that image URL is extracted from the URL of request
224134.jpg and downloading the image.Find that the image may be for water according to width the and height attribute value of the image
The image for printing advertising logo, then in back-end image text identification module, the left side for being first 50*30 to the image interception size
Above, the image at four lower-left, upper right, bottom right angles.Then a series of processing are carried out to the image at this four angles, such as from RGB image
To the conversion of gray-value image, binary conversion treatment is carried out to gray-value image using Canny operator edge detection method, from two-value
Change and extracts HOG feature in image.Finally classified using trained SVM model to the image in this four corners, realize from
Identification text in image.Image text identification module through the invention is found comprising advertising logo in the image, then will
As a result front end is returned to.
5, advertising area is identified according to advertising logo, and advertising area is shielded.
After the judgement of rear end, front end script receives the judging result that rear end provides.In this example, front end receives this
Image includes the rear end judging result of advertising logo.Then front end script is found wide comprising this according to the method for searching advertising area
The region of mark is accused, the region found herein is the div node element region that class attribute value is " banner ", is finally used
This advertising area of removeChild () function mask.
In short, the technology of present invention combination code analysis and image procossing shields the advertisement in webpage, propose
It is a kind of according to advertising logo come the new approaches of advertisement in shielding web page, and give the algorithm of recursive traversal dom tree, propose one
The rule of the normal Web page image of Series Filtration devises effective image text identification model, finally completes to web advertisement
Effectively shielding.Compared to traditional web advertisement screen method, the present invention is strong not only for property, and target is more clearly careful, finally
Shielding have more comprehensive high efficiency.
Claims (7)
1. the invention proposes a kind of web advertisement screen method based on code analysis and image procossing, it is characterized in that preceding end feet
This obtains the dom tree of the triggering event by monitoring DOMContentLoaded event, and then recursively the identification of traversal dom tree can
The advertising logo of energy, when encountering the node comprising image, after avoiding webpage normal picture from being sent to by formulation respective rule
End judgement, the node for not meeting shielding rules are sent to rear end using AJAX technology and judge, rear end is monitored front end and asked
It asks and responds, judged using image of the image text identification model to request, and return result to front end script, front end
Script searches advertising area by advertising logo according to returning the result, and shields to advertising area.
2. described a kind of web advertisement screen method based on code analysis and image procossing according to claim 1, special
Sign the following steps are included:
1) front end script monitor DOMContentLoaded event obtains dom tree;
2) recursively traversal dom tree identifies possible advertising logo;
3) formulating respective rule avoids webpage normal picture from being sent to rear end judgement, for that may include the image benefit of advertising logo
Rear end is sent to AJAX technology to be judged;
4) front end request is monitored in rear end, is judged using image of the image text identification model to request, and result is returned
Give front end script;
5) advertising area is identified according to advertising logo, and advertising area is shielded.
3. a kind of web advertisement screen method based on code analysis and image procossing according to described in right 2, it is characterized in that
Front end script obtains the dom tree for triggering the event, by monitoring DOMContentLoaded event for subsequent in step 1)
Step is further to this to be handled.
4. a kind of web advertisement screen method based on code analysis and image procossing according to described in right 2, it is characterized in that
Dom tree is recursively traversed in step 2) and identifies possible advertising logo, which will comprehensively analyze full page, right
Different types of node carries out different processing, it may be assumed that when encountering text text node, the value by obtaining the node belongs to
Property value in whether comprising " advertisement " character string judge whether the node belongs to advertising logo;When encountering img node element and packet
When containing backgroundImage (being abbreviated as backImg) attribute node, it will be sent to rear end, use image text identification model
Judge whether this kind of node belongs to advertising logo.
5. a kind of web advertisement screen method based on code analysis and image procossing according to described in right 2, it is characterized in that
It avoids webpage normal picture to be sent to rear end by using corresponding rule in step 3) to judge to mitigate service end pressure,
For needing to be sent to nodes that rear end judges, comprising abnormal images, passes through building XMLHttpRequest object, uses
AJAX technology is sent to rear end and is judged, the rule of the filtering web page normal picture of design are as follows:
Rule 1: for img element type node, when the width attribute-value ranges of image are between 20-50px, image
Height attribute-value ranges directly send the image to back-end server and are judged between 12-30px;
Rule 2: for img element type node, when image width attribute value is less than 20 or the height attribute value of image is small
When 12, directly ignore this situation, without processing;
Rule 3: for img element type node, when the width attribute value of image be greater than 50 or image height attribute value it is big
In 30, the upper layer block grade element comprising this img element can be found first, the block grade region then can be recursively traversed, search this
It whether include text section point value in region, if comprising if, then it is assumed that this region is the normal content region of website, Bu Huifa
It is sent to rear end to be judged, is otherwise sent to rear end and is judged;
Rule 4: it for comprising backImg attribute type node, when its backgroundImage attribute value is not empty, and saves
For point width attribute-value ranges between 20-50px, height attribute-value ranges then send it to rear end between 12-30px
Server is judged.
6. a kind of web advertisement screen method based on code analysis and image procossing, feature according to described in right 2 exist
Rear end is requested and is responded by monitoring front end in step 4), is converted into gray value figure after downloading image, uses Canny
Operator carries out binary conversion treatment to it, then therefrom extracts HOG feature, is finally judged using svm classifier model it, complete
At image text identifying processing, the result of image text identification model is finally returned into front end script.
7. a kind of web advertisement screen method based on code analysis and image procossing, feature according to described in right 2 exist
According to advertising logo in step 5), search whether block grade region includes particular advertisement structural code upwards, if comprising if,
Stop searching and shield advertising area, otherwise reaches two layers of just stopping lookup until searching the block grade element number of plies upwards, for looking for
The advertising area arrived uses removeChild () function mask advertising area.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810485860.7A CN110489636A (en) | 2018-05-15 | 2018-05-15 | A kind of web advertisement screen method based on code analysis and image procossing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810485860.7A CN110489636A (en) | 2018-05-15 | 2018-05-15 | A kind of web advertisement screen method based on code analysis and image procossing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110489636A true CN110489636A (en) | 2019-11-22 |
Family
ID=68545324
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810485860.7A Pending CN110489636A (en) | 2018-05-15 | 2018-05-15 | A kind of web advertisement screen method based on code analysis and image procossing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110489636A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111353112A (en) * | 2020-02-27 | 2020-06-30 | 百度在线网络技术(北京)有限公司 | Page processing method and device, electronic equipment and computer readable medium |
KR20210040449A (en) * | 2020-02-27 | 2021-04-13 | 바이두 온라인 네트웍 테크놀러지 (베이징) 캄파니 리미티드 | Page processing methods, devices, electronic devices, and computer-readable media |
CN116562270A (en) * | 2023-07-07 | 2023-08-08 | 天津亿科科技有限公司 | Natural language processing system supporting multi-mode input and method thereof |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103605688A (en) * | 2013-11-01 | 2014-02-26 | 北京奇虎科技有限公司 | Intercept method and intercept device for homepage advertisements and browser |
CN106326316A (en) * | 2015-07-08 | 2017-01-11 | 腾讯科技(深圳)有限公司 | Web page advertisement filtering method and device |
CN107562864A (en) * | 2017-08-30 | 2018-01-09 | 努比亚技术有限公司 | A kind of advertisement screen method, mobile terminal and computer-readable recording medium |
-
2018
- 2018-05-15 CN CN201810485860.7A patent/CN110489636A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103605688A (en) * | 2013-11-01 | 2014-02-26 | 北京奇虎科技有限公司 | Intercept method and intercept device for homepage advertisements and browser |
CN106326316A (en) * | 2015-07-08 | 2017-01-11 | 腾讯科技(深圳)有限公司 | Web page advertisement filtering method and device |
CN107562864A (en) * | 2017-08-30 | 2018-01-09 | 努比亚技术有限公司 | A kind of advertisement screen method, mobile terminal and computer-readable recording medium |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111353112A (en) * | 2020-02-27 | 2020-06-30 | 百度在线网络技术(北京)有限公司 | Page processing method and device, electronic equipment and computer readable medium |
KR20210040449A (en) * | 2020-02-27 | 2021-04-13 | 바이두 온라인 네트웍 테크놀러지 (베이징) 캄파니 리미티드 | Page processing methods, devices, electronic devices, and computer-readable media |
EP3851981A4 (en) * | 2020-02-27 | 2021-12-29 | Baidu Online Network Technology (Beijing) Co., Ltd | Page processing method and apparatus, electronic device and computer readable medium |
JP2022512056A (en) * | 2020-02-27 | 2022-02-02 | バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド | Page processing methods, devices, electronic devices and computer readable storage media |
JP7212771B2 (en) | 2020-02-27 | 2023-01-25 | バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド | Page processing method, device, electronic device and computer readable storage medium |
KR102565950B1 (en) * | 2020-02-27 | 2023-08-10 | 바이두 온라인 네트웍 테크놀러지 (베이징) 캄파니 리미티드 | Page processing method, device, electronic device and computer readable medium |
CN116562270A (en) * | 2023-07-07 | 2023-08-08 | 天津亿科科技有限公司 | Natural language processing system supporting multi-mode input and method thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10789626B2 (en) | Deep-linking system, method and computer program product for online advertisement and e-commerce | |
CN103886074B (en) | Commercial product recommending system based on social media | |
US7873640B2 (en) | Semantic analysis documents to rank terms | |
US7620651B2 (en) | System for dynamic product summary based on consumer-contributed keywords | |
CN103678511B (en) | The method and device of webpage content extraction is carried out according to visual template | |
CN110489636A (en) | A kind of web advertisement screen method based on code analysis and image procossing | |
CN105917369A (en) | Modifying advertisement sizing for presentation in digital magazine | |
US20110288931A1 (en) | Microsite models | |
CN103942257B (en) | Video search method and device | |
CN102779136A (en) | Method and device for information search | |
US20170316446A1 (en) | Optimization of Online Advertising Assets | |
US9672541B2 (en) | Visual tag editor | |
CN103605715B (en) | Data Integration treating method and apparatus for multiple data sources | |
CN108665064A (en) | Neural network model training, object recommendation method and device | |
US11699019B2 (en) | Visual content optimization system using artificial intelligence (AI) based design generation and validation | |
KR20140038962A (en) | Aggregation of conversion paths utilizing user interaction grouping | |
CN104462251B (en) | The data processing method and device launched for network multimedia file | |
CN106688215A (en) | Automated click type selection for content performance optimization | |
CN106462559A (en) | Arbitrary size content item generation | |
CN106611353A (en) | Audience obtaining method and server equipment | |
CN106354855A (en) | Recommendation method and system | |
CN106407220A (en) | Information publishment control method, control apparatus, and system | |
CN106874502A (en) | A kind of method of video search, device and terminal | |
CN111159572A (en) | Recommended content auditing method and device, electronic equipment and storage medium | |
Nyein | Mining contents in Web page using cosine similarity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20191122 |
|
WD01 | Invention patent application deemed withdrawn after publication |