Present patent application is the applying date on December 30th, 2011, Application No. 201110457654.3, entitled
A kind of divisional application of the Chinese invention patent application of " method and device for detecting the page and distorting ".
The content of the invention
The application provides a kind of method for detecting the page and distorting, on the premise of manual intervention is reduced as far as possible, to carry
The efficiency and accuracy rate that the high detection page is distorted, especially need to detect, page quantity is numerous, also, the black chain of required matching is special
In the case that sign data are more, efficiency and accuracy rate that the detection page is distorted are improved.
Detect the device distorted of the page present invention also provides a kind of, to ensure above method application in practice and
Realize.
In order to solve the above problems, this application discloses a kind of method for detecting the page and distorting, including:
Black chain property data base is generated, and the black chain property data base is disposed in multiple servers, the black chain is special
Sign database includes black chain characteristic;
Obtain the characteristic information of the current detection page;
According to destination server corresponding to the characteristic information determination of the page;
Matched using the black chain property data base in the destination server with the current detection page, judge current inspection
The black chain characteristic whether included in the black chain property data base is surveyed in the page, if so, then judging current page to be usurped
Change the page.
Preferably, the server has server identification, and the characteristic information includes page classifications information, the foundation
Include corresponding to the characteristic information determination of the page the step of destination server:
According to the corresponding relation of preset page classifications information and server identification, extraction current page classification information is corresponding
Server identification;
Server corresponding to the server identification is defined as destination server.
Preferably, the characteristic information includes the URL of the page, and the server has numerical identity, described according to the page
Characteristic information determine corresponding to server identification the step of include:
The URL of the current detection page is converted to by numerical value using preset algorithm;
The server that corresponding numerical identity is extracted by the numerical value is destination server.
Preferably, the page classifications information includes the content category message of the page, the classification of type information of the page, the page
Attributive classification information.
Preferably, the step of generation black chain property data base includes:
The page is characterized using the existing black page of the chain characteristic search comprising the black chain characteristic;
Layout of the black chain characteristic in characteristics page is analyzed, when finding that layout is abnormal, from this feature page
It is middle to extract the page elements for including the black chain characteristic;
Black chain rule is generated according to the page elements, is carried out using the black chain rule in the further feature page
Match somebody with somebody, and new black chain characteristic is extracted in the characteristics page of matching;
Preserve the black chain characteristic and form black chain property data base.
Preferably, the black chain characteristic includes distorting keyword and black chain URL.
Preferably, the step of layout of the analysis black chain characteristic in characteristics page includes:
The position of page element of the black chain characteristic is judged whether in preset threshold range, if so, then judging institute
It is abnormal to state layout of the black chain characteristic in characteristics page;
And/or
Whether the page elements attribute for judging the black chain characteristic is invisible attribute, if so, then judging described black
Layout of the chain characteristic in characteristics page is abnormal;
And/or
Whether the page elements attribute for judging the black chain characteristic is the attribute hidden to browser, if so, then sentencing
Layout of the fixed black chain characteristic in characteristics page is abnormal.
Preferably, described the step of generating black chain rule according to page elements, is:
From comprising the page elements for distorting keyword and/or black chain URL, regular expression is taken out as black chain
Rule.
Preferably, described method, in addition to:
Interval updates the black chain property data base at preset timed intervals.
Disclosed herein as well is a kind of device for detecting the page and distorting, including:
Database generation module, for generating black chain property data base, it is special that the black chain property data base includes black chain
Levy data;
Database deployment module, for disposing the black chain property data base in multiple servers;
Characteristic information acquisition module, for obtaining the characteristic information of the current detection page;
Destination server determining module, for destination server corresponding to the characteristic information determination according to the page;
Tampering detection module, for being entered using the black chain property data base in the destination server with the current detection page
Row matching, judge the black chain characteristic in the black chain property data base whether is included in the current detection page, if so, then sentencing
Current page is determined to be tampered the page.
Preferably, the server has server identification, and the characteristic information includes page classifications information, the target
Server determining module includes:
Marker extraction submodule, for the corresponding relation according to preset page classifications information and server identification, extraction
Server identification corresponding to current page classification information;
Mark location submodule, for server corresponding to the server identification to be defined as into destination server.
Preferably, the characteristic information includes the URL of the page, and the server has numerical identity, the destination service
Device determining module includes:
URL transform subblocks, for the URL of the current detection page to be converted into numerical value using preset algorithm;
The corresponding submodule of mark, the server for extracting corresponding numerical identity by the numerical value is destination server.
Preferably, the database generation module includes:
Characteristics page searches for submodule, for including the black chain characteristic using existing black chain characteristic search
The page be characterized the page;
Topological analysis's submodule, for analyzing layout of the black chain characteristic in characteristics page;
Page elements extracting sub-module, for when finding that layout is abnormal, extraction to be comprising described black from this feature page
The page elements of chain characteristic;
Black chain rule generates submodule, for generating black chain rule according to the page elements;
Black chain characteristic extracting sub-module, for being matched using the black chain rule in the further feature page,
And new black chain characteristic is extracted in the characteristics page of matching, preserve the black chain characteristic and form black chain characteristic
Storehouse.
Preferably, topological analysis's submodule further comprises:
First judging unit, for judging the position of page element of the black chain characteristic whether in preset threshold range
It is interior, if so, then judging that layout of the black chain characteristic in characteristics page is abnormal;
And/or
Second judging unit, for judging whether the page elements attribute of the black chain characteristic is invisible attribute,
If so, then judge that layout of the black chain characteristic in characteristics page is abnormal;
And/or
3rd judging unit, for judging whether the page elements attribute of the black chain characteristic is to be hidden to browser
Attribute, if so, then judging that layout of the black chain characteristic in characteristics page is abnormal.
Preferably, the black chain characteristic includes distorting keyword and black chain URL, the black chain rule generation submodule
Including:
Regular expression extracting unit, for from comprising the page elements for distorting keyword and/or black chain URL,
Regular expression is taken out as black chain rule.
Preferably, described device, in addition to:
Database update module, for being spaced the renewal black chain property data base at preset timed intervals.
Compared with prior art, the application has advantages below:
The application disperses individually service by the way that the black chain property data base of generation is disposed in multiple servers
The pressure of device or client process, when receiving concurrent multiple page tampering detections request, according to institute's request detection page
Characteristic information determine the server of processing current detection, specific tampering detection processing is carried out by the server, so as to
Need to detect that page quantity is numerous, also, required matching black chain characteristic it is more in the case of, effectively improve the detection page and usurp
The efficiency and accuracy rate changed.
Furthermore the application judges whether include black chain characteristic in the current detection page according to black chain property data base,
The page comprising black chain characteristic is defined as being tampered the page.In the embodiment of the present application, in black chain property data base
Black chain feature can not can collect automatically in the following ways all by artificially collecting:Pass through known black chain characteristic
With reference to search engine technique, using the page of the web crawlers crawl comprising this black chain characteristic as characteristics page, by dividing
This layout of the black chain characteristic in these characteristics pages is analysed, extracts bag from the abnormal characteristics page if exception is laid out
Page elements containing the black chain characteristic, a set of general regular expression is formed as black chain rule, the black chain is advised
Then matched in the further feature page, and new black chain characteristic is extracted in the characteristics page of matching.So collect
Black chain characteristic is not required to manual intervention, and very quickly, also, the accuracy rate of collected black chain characteristic is also very high,
So as to which when being used in page tampering detection, the efficiency and accuracy rate of detection can be effectively improved.
Also, the embodiment of the present application, with reference to search engine technique, is captured using web crawlers and wrapped according to black chain characteristic
The page containing this black chain characteristic, then layout of the analysis bag containing this black chain characteristic page, so as to whether judge the page
It is tampered, and the page elements that the black chain characteristic is included in the page is tampered described in extraction, ultimately forms a set of general
Regular expression as black chain rule.The application is set system without extra, made using regular expression without manual intervention
Matched for black chain rule in the page, to extract more black chain characteristics, train the mode of more black chain rules, energy
Preferably it is applied to the situation of current black chain industrialization, cost can not only be reduced, moreover it is possible to find the page being tampered faster and more
Face, effectively improve the efficiency that the detection page is distorted.Also, sandbox technology is isolated based on web crawlers technology and browser kernel
Realize, security, confidence level and the degree of accuracy that the detection page is distorted also has been effectively ensured.
Embodiment
It is below in conjunction with the accompanying drawings and specific real to enable the above-mentioned purpose of the application, feature and advantage more obvious understandable
Mode is applied to be described in further detail the application.
Black chain, also referred to as " network psoriasis ".It is well known that search engine has a ranking system, search engine is recognized
Website preferably, will be forward in the ranking of search result, and correspondingly, the clicking rate of website will be higher.Search engine weighs
The quality of one website of amount has many indexs, and wherein very important point is exactly the external linkage of website.If one
The external linkage of website is all well and good, then the ranking of this website in a search engine will be improved correspondingly.
For example, the ranking of certain website for newly opening in a search engine is very rearward, high (ranking is good, quality for some right afterwards
It is high) website and this website newly opened link, then since search engine just will be considered that website that this newly opens can be with
Upper link is done in high website with such weight, then its weight also will not be low, so the row of this website in a search engine
Name will be lifted.If the high website of multiple weights also all links with this website, then its ranking will rise
Obtain very fast.
, whereas if a website newly opened, without any background, without any relation, its weight will not be very high, institute
Its very high ranking will not be given with search engine, its ranking in search result will compare rearward.For search engine
This characteristic, at present some instruments provide black chain technology, i.e., the high website by invading some weights, by net after invading successfully
The link stood is inserted into by the page of invasion website, so as to realize the effect of link, and by hiding web site url, is made not
People is that can't see any link on by the page of invasion website.
However, realizing search rank lifting using black chain technology at present, quite a few is that game private takes website, stolen
The dangerous websites such as number wooden horse website, fishing website and advertiser website.For these dangerous websites, search engine will not give it
Very high ranking, but by " black chain ", their ranking will be very forward, in this case, when using search engine
When, the probability for clicking on these websites of opening will be very high, if user does not carry out security protection work, then will be easy
The virus on website will be infected.
Exactly inventor herein has found the seriousness of this problem, proposes that one of core idea of the embodiment of the present application exists
In the application is by the way that the black chain property data base of generation is disposed in multiple servers to disperse alone server or visitor
The pressure of family end processing, when receiving concurrent multiple page tampering detections request, the feature according to institute's request detection page
Information determines the server of processing current detection, specific tampering detection processing is carried out by the server, so as to need to detect
Page quantity is numerous, in the case that the black chain characteristic of required matching is more, effectively improve efficiency that the detection page distorts and
Accuracy rate.Also, in the embodiment of the present application, the black chain feature in black chain property data base can not all by artificially collecting,
It can collect automatically in the following ways:By known black chain characteristic combination search engine technique, web crawlers is used
The page of the crawl comprising this black chain characteristic is as characteristics page, by analyzing this black chain characteristic in these characteristics pages
In layout, if be laid out it is abnormal if extraction includes the black chain characteristic from the abnormal characteristics page page elements,
A set of general regular expression is formed as black chain rule, the black chain rule is matched in the further feature page, and
New black chain characteristic is extracted in the characteristics page of matching.So collect black chain characteristic and be not required to manual intervention, very
Quickly, also, collected black chain characteristic accuracy rate it is also very high, so as to be used in the page tampering detection when, can be effective
Improve the efficiency and accuracy rate of detection.
Reference picture 1, the step flow chart for the embodiment of the method that a kind of detection page of the application is distorted is shown, specifically may be used
To comprise the following steps:
Step 11, the black chain property data base of generation, and the black chain property data base is disposed in multiple servers, it is described
Black chain property data base includes black chain characteristic;
In the specific implementation, the black chain characteristic can include distorting keyword and black chain URL.Such as distort keyword
" the private clothes issue of legend ", black chain URL " http://www.45u.com " etc..
In a preferred embodiment of the present application, black chain property data base can be generated by following sub-step:
Sub-step 111, the page for including the black chain characteristic using existing black chain characteristic search are characterized
The page;
The layout of sub-step 112, the analysis black chain characteristic in characteristics page, when finding that layout is abnormal, from
Extraction includes the page elements of the black chain characteristic in this feature page;
Sub-step 113, according to the page elements black chain rule is generated, using the black chain rule in the further feature page
In matched, and new black chain characteristic is extracted in the characteristics page of matching;
Sub-step 114, the preservation black chain characteristic form black chain property data base;
In the specific implementation, the existing black chain characteristic can include distorting keyword and black chain URL.According to institute
Existing black chain characteristic is stated, the page of the black chain characteristic is included using web crawlers crawl, and by these pages
As characteristics page.
It is well known that the function that search engine automatically extracts webpage from WWW is realized by web crawlers.Net
Network reptile is also known as Web Spider, i.e. Web Spider, and Web Spider is to find webpage by the chained address of webpage, from net
Some page (being typically homepage) of standing starts, and reads the content of webpage, finds other chained addresses in webpage, Ran Houtong
Cross these chained addresses and find next webpage, so circulation is gone down always, is all captured until all webpages in this website
Untill complete.If a website is treated as in whole internet, then Web Spider can is with this principle institute on internet
Some webpages all capture.
Current web crawlers can be divided into general reptile and focused crawler.General reptile is based on BFS
Thought, opened from the URL (Uniform Resource Locator, URL) of one or several Initial pages
Begin, obtain the URL on Initial page, during webpage is captured, new URL is constantly extracted from current page and is put into team
Row, certain stop condition until meeting system.And focused crawler is an automatic program for downloading webpage, captured for orienting
Related pages resource., according to set crawl target, the webpage selectively accessed on WWW is linked to related, obtained for it
Required information.Different from general reptile, focused crawler does not pursue big covering, but will be targeted by crawl with it is a certain
The related webpage of particular topic content, it is that the user of subject-oriented inquires about preparation data resource.
In existing black chain technology, hiding chain is connected to some and fixes skill, such as knowledge of the search engine to javascript
It is not fine, hiding div is exported by javascript.Like this, this directly manually can not be seen by the page
A little links, and it is effective that search engine, which confirms as these links,.Code is:Div above is write by javascript first,
Setting display is none.Then a table is exported, the black chain to be hung is contained in table.Finally pass through again
Javascript output latter halfs div.
It can quickly and efficiently discover page-out using the isolation sandbox technology of browser kernel to be tampered.Specifically,
The isolation sandbox technology of browser kernel is browser kernel, such as IE or firefox, constructs the virtual execution of a safety
Environment.Any disk write operation that user is made by browser, all it will be redirected in a specific temporary folder.This
Sample, even if after installing by force, being also simply installed into temporary file comprising rogue programs such as virus, wooden horse, advertisements in webpage
In folder, user equipment will not be damaged.Browser kernel is responsible for the explanation (such as HTML, JavaScript) to webpage grammer
And render (display) webpage.So commonly referred browser kernel is namely downloaded to the page, parses, performs, rendered
Engine, the engine determines how browser shows the content of webpage and the format information of the page.
According to the aforesaid operations characteristic of browser kernel, using isolation sandbox technology, black chain feature can be safely analyzed
Whether layout of the data in characteristics page occurs exception, specifically, can be by analyzing the page of the black chain characteristic
Surface element position and attribute, to judge whether layout of the black chain characteristic in characteristics page be abnormal, for example, judging described black
Whether not in preset threshold range, the page elements of the black chain characteristic are for the position of the page elements of chain characteristic
No have a sightless attribute, and/or, whether the page elements of the black chain characteristic have the category hidden to browser
Property, if so, then judging that layout of the black chain characteristic in characteristics page is abnormal.If for example, detect the hyperlink of some page
It is sightless to connect, or, the length, width and height of some html tag element are negative values in the page, then can determine that the layout of the page is different
Often, it is the page that is tampered.
When finding that layout is abnormal, extracted in the characteristics page abnormal from the layout and distort keyword comprising described
And/or black chain URL page elements;Then from comprising the page elements for distorting keyword and/or black chain URL, it is abstracted
Go out regular expression as black chain rule.
It is well known that regular expression is the instrument for carrying out text matches, generally by some general characters and some
Metacharacter (metacharacters) forms.General character includes the letter and number of capital and small letter, and metacharacter is then with special
Implication.The matching of regular expression is found and given regular expression phase it is to be understood that in given character string
The part matched somebody with somebody.It is possible to have more than one part to meet given regular expression in character string, at this moment each such portion
Divide and be referred to as a matching.Matching can include three kinds of implications in this paper:A kind of is Adjective, such as a character
One expression formula of String matching;A kind of is verb character, such as regular expression is matched in character string;Also one kind is noun
Property, it is exactly " part for meeting given regular expression in character string " just mentioned.
The create-rule of regular expression is illustrated below by way of citing.
Assuming that to search hi, then regular expression hi can be used.This regular expression can accurately match such
Character string:It is made up of two characters, previous character is h, and the latter is i.In practice, regular expression is can to ignore greatly
Small letter.If all comprising the two continuous characters of hi, such as him, history, high etc. in many words.With hi come
If lookup, the hi inside this this word, which can be also found, to be come.If accurately searching hi this word, then it should make
With bhi b.Wherein, b be regular expression a metacharacter, it represents the start or end of word, that is, word
Boundary.Although generally the word of English is separated by space or punctuation mark or line feed, b simultaneously mismatch this
Any one in word separators a bit, it only matches a position.If what is looked for is that one is nearby followed behind hi
Lucy, then should use bhi b.* bLucy b.Wherein, is another metacharacter, matches any word in addition to newline
Symbol.* be equally metacharacter, what it was represented is quantity --- specify * contents in front can continuously repeat appearance any time with
Matched whole expression formula.Now bhi b.* bLucy b the meaning with regard to apparent:A word hi before this, then
It is any any character (but can not be line feed), is finally this word of Lucy.
For example, in the html fragments of the abnormal A pages of page layout, extraction includes the page elements of black chain characteristic
It is as follows:
<script>document.write('<D'+'iv st'+'yle'+'=" po'+'si'+'tio'+'n:a'+'
bso'+'lu'+'te;l'+'ef'+'t:'+'-'+'10'+'00'+'0'+'p'+'x;'+″″+'>')>××××<script
>document.write('<'+'/d'+'i'+'v>');</script>
It is as the regular expression of black chain rule according to the generation of above-mentioned page elements:
<script.*>document\.write.*\(.*\+.*\+.*\+.*\+.*\+.*\).*</
script>([\S\s]+)</div>
Or such as, in the html fragments of the abnormal B pages of page layout, extraction includes the page elements of black chain characteristic
It is as follows:
<A href=" http://www.45u.com " style=" margin-left:-83791;”>;
It is as the regular expression of black chain rule according to the generation of above-mentioned page elements:
<A s*href s*=[" '] .+[" '] s*style=[" '] [w+ -]+:-[0-9]+.*["\'].*
>.*</a>.
Certainly, the method for the black chain rule of above-mentioned generation is solely for example, and those skilled in the art adopt according to actual conditions
Generating mode with any black chain rule is all feasible, and the application need not be any limitation as to this.
Matched using black chain rule in the further feature page, more black chain characteristics, training can be extracted
More black chain rules, can finally form the black chain property data base for the black chain of the whole network.
An industrial chain is nowadays formed due to hanging black chain, so identical distorts keyword and/or black chain URL can be a large amount of
Appear in other pages being tampered.Matched using regular expression as black chain rule in the page, to extract more
More black chain characteristics, more black chain rules are trained, be more suitable for the situation of current black chain industrialization, can send out faster and more
The page being now tampered, effectively improve the efficiency that the detection page is distorted.
It is numerous for detection page quantity needed for being applicable, also, the more situation of black chain characteristic of required matching, at this
, it is necessary to which the black chain property data base generated is deployed in multiple servers in application embodiment, the 10 of backstage is such as deployed to
In platform server, the black chain property data base content disposed in every server is identical.
In the specific implementation, because black chain characteristic has necessarily ageing, initiation can be spaced at preset timed intervals
Renewal to the black chain property data base, specifically it can complete black chain characteristic by repeating above-mentioned sub-step S111-S114
According to the renewal in storehouse.
Step 12, the characteristic information for obtaining the current detection page;
Step 13, according to the page characteristic information determine corresponding to destination server;
In the specific implementation, for black chain feature place deployment server, server identification can be set respectively, it is described
Mark can use any rule and form to set, such as, numeric sorting, character sequence etc., the application is not restricted to this.
As a kind of example of the embodiment of the present application concrete application, the characteristic information can include page classifications information,
In this case, the step 103 can specifically include following sub-step:
Sub-step S311, the corresponding relation according to preset page classifications information and server identification, extract current page
Server identification corresponding to classification information;
Sub-step S312, server corresponding to the server identification is defined as destination server.
In the specific implementation, the page classifications information can be the content category message of the page, for example, according in the page
Hold and the page is divided into game class, film class, novel class, video class, music class, shopping class, mailbox class, life kind, bank's class, trip
Swim class etc.;Preset above-mentioned all kinds of content of pages are corresponding with server identification as shown in the table respectively:
With reference to upper table, if the classifying content for getting the current detection page is game class, it is determined that destination server aaa
The server of mark, if the classifying content for getting the current detection page is GT grand touring, it is determined that destination server identifies for kkk
Server.
In a particular application, the page classifications information can also be the classification information of page type, for example, according to the page
The page is divided into by type:HTML types homepage, Flash types homepage, import block in homepage, HTML types first level pages, the HTML type pages
The three-level page, general first level pages, the general two level page, row corresponding to block content in the corresponding two level page, the HTML type pages
Table first level pages, the list two level page;Preset above-mentioned all kinds of page types are corresponding with server identification as shown in the table respectively:
With reference to upper table, if the type for getting the current detection page is general first level pages, it is determined that destination server is
The server of 777 marks, if the type for getting the current detection page is HTML type homepages, it is determined that destination server 111
The server of mark.
In practice, those skilled in the art are feasible using any page classifications information, for example, it is also possible to adopt
With the attributive classification information, the labeling information of the page etc. of the page, the embodiment of the present application need not be limited to this.
In another preferred embodiment of the present application, the characteristic information can include the URL of the page, the server
With numerical identity, in this case, the step 103 can specifically include following sub-step:
Sub-step S321, the URL of the current detection page is converted to by numerical value using preset algorithm;
Sub-step S322, the server for extracting by the numerical value corresponding numerical identity are destination server.
For example, it is assumed that current black chain database portion is deployed on n platform servers, the URL (systems of the current detection page are being got
One URLs, web page address) when, using the URL as input, random algorithm is called, such as MD5 algorithms, obtains a certain character
Go here and there (such as character strings of 32 bytes), character string is then mapped to a numerical value using certain mapping ruler, using the numerical value as pair
The server n answered value, the numerical value such as obtained are 2, that is, the server identification that obtaining to preserve is 2, you can determine that target takes
Device be engaged in identify 2 server.
Certainly, the method for destination server corresponding to the above-mentioned characteristic information determination according to the page is solely for example, this
Art personnel can use any method according to actual conditions, such as solid using the tag characters string of the page is converted to
Method of definite value etc., the application need not be any limitation as to this.
Step 14, matched using the black chain property data base in the destination server, sentenced with the current detection page
Whether black chain characteristic in the black chain property data base is included in the disconnected current detection page, if so, then judging current page
Face is to be tampered the page.
In practice, if not including the black chain characteristic in the black chain property data base in the current detection page,
It can determine that current page is not tampered with.
The embodiment of the present invention works as presence by using the framework that distributed treatment and application are carried out to black chain property data base
During the concurrently detection request of multiple pages, can effective distributing server detection pressure, so as to effectively save system resource.
It should be noted that for embodiment of the method, in order to be briefly described, therefore it is all expressed as to a series of action group
Close, but those skilled in the art should know, the application is not limited by described sequence of movement, because according to this Shen
Please, some steps can use other orders or carry out simultaneously.Secondly, those skilled in the art should also know, specification
Described in embodiment belong to preferred embodiment, necessary to involved action and module not necessarily the application.
With reference to figure 2, the structured flowchart for the device embodiment distorted it illustrates a kind of detection page of the application, specifically may be used
With including with lower module:
Database generation module 21, for generating black chain property data base, the black chain property data base includes black chain
Characteristic;
Database deployment module 22, for disposing the black chain property data base in multiple servers;
Characteristic information acquisition module 23, for obtaining the characteristic information of the current detection page;
Destination server determining module 24, for destination server corresponding to the characteristic information determination according to the page;
Tampering detection module 25, for using the black chain property data base in the destination server and the current detection page
Matched, judge the black chain characteristic in the black chain property data base whether is included in the current detection page, if so, then
Current page is judged to be tampered the page.
In a preferred embodiment of the present application, the server has server identification, and the characteristic information includes
Page classifications information, in this case, the destination server determining module 24 can include following submodule:
Marker extraction submodule, for the corresponding relation according to preset page classifications information and server identification, extraction
Server identification corresponding to current page classification information;
Mark location submodule, for server corresponding to the server identification to be defined as into destination server.
In another preferred embodiment of the present application, the characteristic information includes the URL of the page, and the server has
Numerical identity, in this case, the destination server determining module 24 can include following submodule:
URL transform subblocks, for the URL of the current detection page to be converted into numerical value using preset algorithm;
The corresponding submodule of mark, the server for extracting corresponding numerical identity by the numerical value is destination server.
In the specific implementation, the embodiment of the present application can also include database update module, for being spaced at preset timed intervals
Update the black chain property data base.
In a preferred embodiment of the present application, the database generation module 21 can include following submodule:
Characteristics page searches for submodule, for including the black chain characteristic using existing black chain characteristic search
The page be characterized the page;
Topological analysis's module, for analyzing layout of the black chain characteristic in characteristics page;
Page elements extraction module, for when finding that layout is abnormal, being extracted from this feature page and including the black chain
The page elements of characteristic;
Black chain rule generation module, for generating black chain rule according to the page elements;
Black chain characteristic extraction module, for being matched using the black chain rule in the further feature page, and
New black chain characteristic is extracted in the characteristics page of matching, the black chain characteristic is preserved and forms black chain characteristic
Storehouse.
In the specific implementation, the black chain characteristic can include distorting keyword and black chain URL.
As a kind of example of the embodiment of the present application concrete application, topological analysis's submodule can include such as placing an order
Member:
First judging unit, for judging the position of page element of the black chain characteristic whether in preset threshold range
It is interior, if so, then judging that layout of the black chain characteristic in characteristics page is abnormal;
And/or
Second judging unit, for judging whether the page elements attribute of the black chain characteristic is invisible attribute,
If so, then judge that layout of the black chain characteristic in characteristics page is abnormal;
And/or
3rd judging unit, for judging whether the page elements attribute of the black chain characteristic is to be hidden to browser
Attribute, if so, then judging that layout of the black chain characteristic in characteristics page is abnormal.
In a particular application, the black chain rule generation submodule can include such as lower unit:
Regular expression extracting unit, for from comprising the page elements for distorting keyword and/or black chain URL,
Regular expression is taken out as black chain rule.
Because described device embodiment essentially corresponds to the embodiment of the method shown in earlier figures 1, therefore the description of the present embodiment
In not detailed part, may refer to the related description in previous embodiment, just do not repeat herein.
The application can be used in numerous general or special purpose computing system environments or configuration.Such as:Personal computer, service
Device computer, handheld device or portable set, laptop device, multicomputer system, the system based on microprocessor, machine top
Box, programmable consumer-elcetronics devices, network PC, minicom, mainframe computer including any of the above system or equipment
DCE etc..
The application can be described in the general context of computer executable instructions, such as program
Module.Usually, program module includes performing particular task or realizes routine, program, object, the group of particular abstract data type
Part, data structure etc..The application can also be put into practice in a distributed computing environment, in these DCEs, by
Task is performed and connected remote processing devices by communication network.In a distributed computing environment, program module can be with
In the local and remote computer-readable storage medium including storage device.
Finally, it is to be noted that, herein, such as first and second or the like relational terms be used merely to by
One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation
Between any this actual relation or order be present.Moreover, term " comprising ", "comprising" or its any other variant meaning
Covering including for nonexcludability, so that process, method, article or equipment including a series of elements not only include that
A little key elements, but also the other element including being not expressly set out, or also include for this process, method, article or
The intrinsic key element of equipment.In the absence of more restrictions, the key element limited by sentence "including a ...", is not arranged
Except other identical element in the process including the key element, method, article or equipment being also present.
The method distorted above to a kind of detection page provided herein, and, a kind of dress for detecting the page and distorting
Put and be described in detail, specific case used herein is set forth to the principle and embodiment of the application, the above
The explanation of embodiment is only intended to help and understands the present processes and its core concept;Meanwhile for the general skill of this area
Art personnel, according to the thought of the application, there will be changes in specific embodiments and applications, in summary, this
Description should not be construed as the limitation to the application.