Present patent application be the applying date be on December 30th, 2011, application No. is 201110457654.3, it is entitled
A kind of divisional application of the Chinese invention patent application of " method and device that the detection page is distorted ".
Invention content
The application provides a kind of method that the detection page is distorted, under the premise of reducing manual intervention as far as possible, to carry
The efficiency and accuracy rate that the high detection page is distorted, especially need to detect page quantity numerous, also, required matched black chain is special
In the case that sign data are more, efficiency and accuracy rate that the detection page is distorted are improved.
Present invention also provides a kind of device for distorting of the detection page, to ensure above method application in practice and
It realizes.
To solve the above-mentioned problems, this application discloses a kind of methods that the detection page is distorted, including:
Black chain property data base is generated, and disposes the black chain property data base in multiple servers, the black chain is special
It includes black chain characteristic to levy database;
Obtain the characteristic information of the current detection page;
Characteristic information according to the page determines corresponding destination server;
It is matched with the current detection page using the black chain property data base in the destination server, judges current inspection
It surveys whether comprising the black chain characteristic in the black chain property data base in the page, if so, judging current page to be usurped
Change the page.
Preferably, the server has server identification, and the characteristic information includes page classifications information, the foundation
The step of characteristic information of the page determines corresponding destination server include:
According to the correspondence of preset page classifications information and server identification, extraction current page classification information corresponds to
Server identification;
The corresponding server of the server identification is determined as destination server.
Preferably, the characteristic information includes the URL of the page, and the server has numerical identity, described according to the page
Characteristic information the step of determining corresponding server identification include:
The URL of the current detection page is converted to by numerical value using preset algorithm;
The server that corresponding numerical identity is extracted by the numerical value is destination server.
Preferably, the page classifications information includes the content category message of the page, the classification of type information of the page, the page
Attributive classification information.
Preferably, the step of generation black chain property data base includes:
The page is characterized using the existing black page of the chain characteristic search comprising the black chain characteristic;
Layout of the black chain characteristic in characteristics page is analyzed, when finding that layout is abnormal, from this feature page
It is middle to extract the page elements for including the black chain characteristic;
Black chain rule is generated according to the page elements, is carried out in the other feature page using the black chain rule
Match, and extracts new black chain characteristic in matched characteristics page;
It preserves the black chain characteristic and forms black chain property data base.
Preferably, the black chain characteristic includes distorting keyword and black chain URL.
Preferably, the step of layout for analyzing the black chain characteristic in characteristics page includes:
The position of page element of the black chain characteristic is judged whether in preset threshold range, if so, judgement institute
It is abnormal to state layout of the black chain characteristic in characteristics page;
And/or
Judge whether the page elements attribute of the black chain characteristic is invisible attribute, if so, judgement is described black
Layout of the chain characteristic in characteristics page is abnormal;
And/or
Judge whether the page elements attribute of the black chain characteristic is the attribute hidden to browser, if so, sentencing
Layout of the fixed black chain characteristic in characteristics page is abnormal.
Preferably, described the step of generating black chain rule according to page elements, is:
From comprising the page elements for distorting keyword and/or black chain URL, regular expression is taken out as black chain
Rule.
Preferably, the method further includes:
Interval updates the black chain property data base at preset timed intervals.
Disclosed herein as well is a kind of devices that the detection page is distorted, including:
Database generation module, for generating black chain property data base, the black chain property data base includes that black chain is special
Levy data;
Database deployment module, for disposing the black chain property data base in multiple servers;
Characteristic information acquisition module, the characteristic information for obtaining the current detection page;
Destination server determining module, for determining corresponding destination server according to the characteristic information of the page;
Tampering detection module, for using black chain property data base and the current detection page in the destination server into
Whether row matching judges comprising the black chain characteristic in the black chain property data base in the current detection page, if so, sentencing
It is to be tampered the page to determine current page.
Preferably, the server has server identification, and the characteristic information includes page classifications information, the target
Server determining module includes:
Marker extraction submodule, for the correspondence according to preset page classifications information and server identification, extraction
The corresponding server identification of current page classification information;
Mark location submodule, for the corresponding server of the server identification to be determined as destination server.
Preferably, the characteristic information includes the URL of the page, and the server has numerical identity, the destination service
Device determining module includes:
URL transform subblocks, for the URL of the current detection page to be converted to numerical value using preset algorithm;
The corresponding submodule of mark, for being destination server by the server of the corresponding numerical identity of numerical value extraction.
Preferably, the database generation module includes:
Characteristics page searches for submodule, for including the black chain characteristic using existing black chain characteristic search
The page be characterized the page;
Topological analysis's submodule, for analyzing layout of the black chain characteristic in characteristics page;
Page elements extracting sub-module, for when finding that layout is abnormal, extraction to be comprising described black from this feature page
The page elements of chain characteristic;
Black chain rule generates submodule, for generating black chain rule according to the page elements;
Black chain characteristic extracting sub-module, for being matched in the other feature page using the black chain rule,
And new black chain characteristic is extracted in matched characteristics page, it preserves the black chain characteristic and forms black chain characteristic
Library.
Preferably, topological analysis's submodule further comprises:
First judging unit, for judging the position of page element of the black chain characteristic whether in preset threshold range
It is interior, if so, layout of the judgement black chain characteristic in characteristics page is abnormal;
And/or
Second judgment unit, for judging whether the page elements attribute of the black chain characteristic is invisible attribute,
If so, layout of the judgement black chain characteristic in characteristics page is abnormal;
And/or
Third judging unit, for judging whether the page elements attribute of the black chain characteristic is to be hidden to browser
Attribute, if so, layout of the judgement black chain characteristic in characteristics page is abnormal.
Preferably, the black chain characteristic includes distorting keyword and black chain URL, and the black chain rule generates submodule
Including:
Regular expression extracting unit, for from comprising the page elements for distorting keyword and/or black chain URL,
Regular expression is taken out as black chain rule.
Preferably, the device further includes:
Database update module updates the black chain property data base for interval at preset timed intervals.
Compared with prior art, the application has the following advantages:
The application disperses individually to service by disposing the black chain property data base generated in multiple servers
The pressure of device or client process, when receiving concurrent multiple page tampering detections request, according to institute's request detection page
Characteristic information determine the server of processing current detection, specific tampering detection processing is carried out by the server, so as to
It is numerous that page quantity need to be detected, also, in the case that required matched black chain characteristic is more, effectively improve the detection page and usurp
The efficiency and accuracy rate changed.
Furthermore whether the application judges in the current detection page to include black chain characteristic according to black chain property data base,
It is determined as the page comprising black chain characteristic to be tampered the page.In the embodiment of the present application, in black chain property data base
Black chain feature can not may be used following manner and collect automatically all by artificially collecting:Pass through known black chain characteristic
In conjunction with search engine technique, using the page of the web crawlers crawl comprising this black chain characteristic as characteristics page, by dividing
This layout of the black chain characteristic in these characteristics pages is analysed, extracts packet from the characteristics page of the exception if being laid out exception
Page elements containing the black chain characteristic form a set of general regular expression as black chain rule, which are advised
It is then matched in the other feature page, and extracts new black chain characteristic in matched characteristics page.It collects in this way
Black chain characteristic is not required to manual intervention, and very quickly, also, the accuracy rate of collected black chain characteristic is also very high,
When to used in page tampering detection, the efficiency and accuracy rate of detection can be effectively improved.
Also, the embodiment of the present application is captured using web crawlers and is wrapped in conjunction with search engine technique according to black chain characteristic
The page containing this black chain characteristic, then analysis includes the layout of this black chain characteristic page, to whether judge the page
It is tampered, and the page elements for including the black chain characteristic in the page is tampered described in extraction, ultimately form a set of general
Regular expression as black chain rule.The application is not necessarily to manual intervention, without additional setting system, is made using regular expression
It is matched in the page for black chain rule, to extract more black chain characteristics, the mode of the more black chain rules of training, energy
Preferably it is suitable for the situation of current black chain industrialization, cost can not only be reduced, moreover it is possible to find the page being tampered faster and more
Face effectively improves the efficiency that the detection page is distorted.Also, sandbox technology is isolated based on web crawlers technology and browser kernel
It realizes, safety, confidence level and accuracy that the detection page is distorted also has been effectively ensured.
Specific implementation mode
In order to make the above objects, features, and advantages of the present application more apparent, below in conjunction with the accompanying drawings and it is specific real
Applying mode, the present application will be further described in detail.
Black chain, also referred to as " network psoriasis ".It is well known that search engine, there are one ranking system, search engine is recognized
Website preferably, will be forward in the ranking of search result, and correspondingly, the clicking rate of website will be higher.Search engine weighs
The quality of one website of amount has various indexs, and wherein very important point is exactly the external linkage of website.If one
The external linkage of website is all well and good, then the ranking of this website in a search engine will correspondingly improve.
For example, the ranking of certain website for newly opening in a search engine is very rearward, high (ranking is good, quality for some right later
It is high) website and this website newly opened link since then search engine just will be considered that website that this is newly opened can be with
Upper link is done in high website with such weight, then its weight will not be low, so the row of this website in a search engine
Name will be promoted.If there is the high website of multiple weights is also all linked with this website, then its ranking will rise
It obtains very fast.
, whereas if a website newly opened, without any background, without any relationship, its weight will not be very high, institute
It will not give its very high ranking, the ranking in search result that will compare rearward with search engine.For search engine
This characteristic, at present some tools provide black chain technology, the i.e. website high by invading some weights, by net after invading successfully
The link stood is inserted by the page of invasion website, to realize the effect of link, and by hiding web site url, is made not
People is that can't see any link on by the page of invasion website.
However, realizing what search rank was promoted using black chain technology at present, quite a few is that game private takes website, steals
The dangerous websites such as number wooden horse website, fishing website and advertiser website.For these dangerous websites, search engine will not give it
Very high ranking, but by " black chain ", their ranking will be very forward, in this case, when using search engine
When, the probability for clicking these websites of opening will be very high, if user does not carry out security protection work, will be easy
The virus on website will be infected.
Exactly inventor herein has found the seriousness of this problem, proposes that one of core idea of the embodiment of the present application exists
In the application in multiple servers by disposing the black chain property data base generated to disperse individual server or visitor
The pressure of family end processing, when receiving concurrent multiple page tampering detections request, the feature according to institute's request detection page
Information determines the server of processing current detection, specific tampering detection processing is carried out by the server, so as to need to detect
Page quantity is numerous, in the case that required matched black chain characteristic is more, effectively improve the efficiency distorted of the detection page and
Accuracy rate.Also, in the embodiment of the present application, the black chain feature in black chain property data base can not all by artificially collecting,
Following manner may be used to collect automatically:By known black chain characteristic combination search engine technique, web crawlers is used
The page of the crawl comprising this black chain characteristic is as characteristics page, by analyzing this black chain characteristic in these characteristics pages
In layout, if be laid out it is abnormal if the page elements for including the black chain characteristic are extracted from the characteristics page of exception,
A set of general regular expression is formed as black chain rule, which is matched in the other feature page, and
New black chain characteristic is extracted in matched characteristics page.Black chain characteristic is collected in this way and is not required to manual intervention, very
Quickly, also, the accuracy rate of collected black chain characteristic is also very high, can be effective when to used in page tampering detection
Improve the efficiency and accuracy rate of detection.
Referring to Fig.1, the step flow chart for showing a kind of embodiment of the method that the detection page is distorted of the application, specifically may be used
To include the following steps:
Step 11 generates black chain property data base, and disposes the black chain property data base in multiple servers, described
Black chain property data base includes black chain characteristic;
In the concrete realization, the black chain characteristic may include distorting keyword and black chain URL.Such as distort keyword
" publication of legend private clothes ", black chain URL " http://www.45u.com " etc..
In a preferred embodiment of the present application, black chain property data base can be generated by following sub-step:
Sub-step 111 is characterized using the existing black page of the chain characteristic search comprising the black chain characteristic
The page;
The layout of sub-step 112, the analysis black chain characteristic in characteristics page, when finding that layout is abnormal, from
Extraction includes the page elements of the black chain characteristic in this feature page;
Sub-step 113 generates black chain rule according to the page elements, using the black chain rule in the other feature page
In matched, and new black chain characteristic is extracted in matched characteristics page;
Sub-step 114, the preservation black chain characteristic form black chain property data base;
In the concrete realization, the existing black chain characteristic may include distorting keyword and black chain URL.According to institute
Existing black chain characteristic is stated, includes the page of the black chain characteristic using web crawlers crawl, and by these pages
As characteristics page.
It is well known that the function that search engine automatically extracts webpage from WWW is realized by web crawlers.Net
Network reptile is also known as Web Spider, i.e. Web Spider, and Web Spider is to find webpage by the chained address of webpage, from net
Some page (being typically homepage) of standing starts, and reads the content of webpage, finds other chained addresses in webpage, then lead to
It crosses these chained addresses and finds next webpage, cycle is gone down always in this way, is all captured until all webpages in this website
Until complete.If a website is treated as in entire internet, Web Spider can be with this principle institute on internet
Some webpages all capture.
Current web crawlers can be divided into general reptile and focused crawler.General reptile is based on breadth first search
Thought is opened from the URL (Uniform Resource Locator, uniform resource locator) of one or several Initial pages
Begin, the URL obtained on Initial page constantly extracts new URL from current page and be put into team during capturing webpage
Row, certain stop condition until meeting system.And focused crawler is an automatic program for downloading webpage, for orienting crawl
Related pages resource., according to set crawl target, the webpage selectively accessed on WWW is linked with relevant, is obtained for it
Required information.Different from general reptile, focused crawler does not pursue big covering, but will be targeted by crawl with it is a certain
The relevant webpage of specific subject content is that the user of subject-oriented inquires preparation data resource.
In existing black chain technology, hiding chain is connected to some and fixes skill, such as knowledge of the search engine to javascript
It is not fine, hiding div is exported by javascript.In this case, this directly manually can not be seen by the page
A little links, and it is effective that search engine, which is confirmed as these links,.Code is:The div of front is write by javascript first,
Setting display is none.Then a table is exported, the black chain to be hung is contained in table.Finally pass through again
Javascript exports latter half div.
It can quickly and efficiently discover page-out using the isolation sandbox technology of browser kernel to be tampered.Specifically,
The isolation sandbox technology of browser kernel is browser kernel, such as IE or firefox, constructs the virtual execution of a safety
Environment.User will be redirected to by any disk write operation made by browser in a specific temporary folder.This
Sample, even if comprising virus in webpage, wooden horse, the rogue programs such as advertisement are also to be installed into temporary file after installing by force
In folder, it will not cause damages to user equipment.Browser kernel is responsible for the explanation (such as HTML, JavaScript) to webpage grammer
And render (display) webpage.So commonly referred browser kernel is namely downloaded the page, parses, executes, renders
Engine, which determines how browser shows the content of webpage and the format information of the page.
Black chain feature can safely be analyzed using isolation sandbox technology according to the aforesaid operations characteristic of browser kernel
Whether layout of the data in characteristics page is abnormal, specifically, can pass through the page of the analysis black chain characteristic
Surface element position and attribute, to judge whether layout of the black chain characteristic in characteristics page be abnormal, for example, judging described black
Whether not in preset threshold range, the page elements of the black chain characteristic are for the position of the page elements of chain characteristic
No have a sightless attribute, and/or, whether the page elements of the black chain characteristic have the category hidden to browser
Property, if so, judging that layout of the black chain characteristic in characteristics page is abnormal.If for example, detecting the hyperlink of some page
It is sightless to connect, alternatively, the length, width and height of some html tag element are negative values in the page, then can determine that the layout of the page is different
Often, it is the page being tampered.
When finding that layout is abnormal, is extracted from the characteristics page of layout exception and distort keyword comprising described
And/or the page elements of black chain URL;Then it from comprising the page elements for distorting keyword and/or black chain URL, is abstracted
Go out regular expression as black chain rule.
It is well known that regular expression is the tool for carrying out text matches, usually by some general characters and some
Metacharacter (metacharacters) forms.General character includes the letter and number of capital and small letter, and metacharacter is then with special
Meaning.The matching of regular expression is found and given regular expression phase it is to be understood that in given character string
The part matched.It is possible that there is more than one part to meet given regular expression in character string, at this moment each such portion
Divide and is referred to as a matching.Matching may include three kinds of meanings in this paper:One is Adjective, such as a character
One expression formula of String matching;One is verb characters, such as regular expression is matched in character string;Also one is nouns
Property, it is exactly " part for meeting given regular expression in character string " just mentioned.
The create-rule of regular expression is illustrated below by way of citing.
Assuming that search hi, then regular expression hi can be used.This regular expression can accurately match such
Character string:It is made of two characters, previous character is h, and the latter is i.In practice, regular expression is can to ignore greatly
Small letter.If in many words all including the two continuous characters of hi, such as him, history, high etc..With hi come
If lookup, the hi inside this this word, which can be also found, to be come.If accurately searching hi this word, it should make
With bhi b.Wherein, b be regular expression a metacharacter, it represents the start or end of word, that is, word
Boundary.Although usually the word of English is separated by space or punctuation mark or line feed, b simultaneously mismatch this
Any one of a little word separators, it only matches a position.If what is looked for is nearby to follow one behind hi
Lucy, then should use bhi b.* bLucy b.Wherein, is another metacharacter, matches the arbitrary word other than newline
Symbol.* be equally metacharacter, what it was represented is quantity --- i.e. specified * contents in front can continuously repeat appearance arbitrary time with
Entire expression formula is set to be matched.Now bhi b.* bLucy b the meaning with regard to apparent:A word hi before this, then
It is arbitrary any character (but cannot be line feed), is finally this word of Lucy.
For example, in the html segments of the A pages of page layout exception, extraction includes the page elements of black chain characteristic
It is as follows:
<script>document.write('<D'+'iv st'+'yle'+'=" po'+'si'+'tio'+'n:a'+'
bso'+'lu'+'te;l'+'ef'+'t:'+'-'+'10'+'00'+'0'+'p'+'x;'+'"'+'>')>××××<
script>document.write('<'+'/d'+'i'+'v>');</script>
It is generated according to above-mentioned page elements and is as the regular expression of black chain rule:
<script.*>document\.write.*\(.*\+.*\+.*\+.*\+.*\+.*\).*</
script>([\S\s]+)</div>
Or such as, in the html segments of the B pages of page layout exception, extraction includes the page elements of black chain characteristic
It is as follows:
<A href=" http://www.45u.com " style=" margin-left:-83791;”>;
It is generated according to above-mentioned page elements and is as the regular expression of black chain rule:
<A s*href s*=[" '] .+[" '] s*style=[" '] [w+ -]+:-[0-9]+.*["\'].*
>.*</a>.
Certainly, the method for the black chain rule of above-mentioned generation is solely for example, and those skilled in the art adopt according to actual conditions
Generating mode with any black chain rule is all feasible, and the application is to this without limiting.
It is matched in the other feature page using black chain rule, more black chain characteristics, training can be extracted
More black chain rules, can finally form the black chain property data base for the black chain of the whole network.
An industrial chain is nowadays formed due to hanging black chain, so identical distort keyword and/or black chain URL meetings largely
It appears in other pages being tampered.It is matched in the page as black chain rule using regular expression, to extract more
More black chain characteristic, the more black chain rules of training, is more suitable for the situation of current black chain industrialization, can send out faster and more
The page being now tampered effectively improves the efficiency that the detection page is distorted.
It is numerous to be applicable in required detection page quantity, also, the situation that required matched black chain characteristic is more, at this
Apply in embodiment, needs the black chain property data base that will be generated to be deployed in multiple servers, be such as deployed to the 10 of backstage
In platform server, the black chain property data base content disposed in every server is identical.
In the concrete realization, since black chain characteristic has certain timeliness, it can be spaced initiation at preset timed intervals
Update to the black chain property data base specifically can complete black chain characteristic by repeating above-mentioned sub-step S111-S114
According to the update in library.
Step 12, the characteristic information for obtaining the current detection page;
Step 13 determines corresponding destination server according to the characteristic information of the page;
In the concrete realization, server identification can be respectively set in the server disposed for black chain feature database, described
Any rule and form setting may be used in mark, for example, numeric sorting, character sequence etc., the application is not restricted this.
As a kind of example of the embodiment of the present application concrete application, the characteristic information may include page classifications information,
In this case, the step 103 can specifically include following sub-step:
Sub-step S311, according to the correspondence of preset page classifications information and server identification, extract current page
The corresponding server identification of classification information;
Sub-step S312, the corresponding server of the server identification is determined as destination server.
In the concrete realization, the page classifications information can be the content category message of the page, for example, according in the page
Hold and the page is divided into game class, film class, novel class, video class, music class, shopping class, mailbox class, life kind, bank's class, trip
Swim class etc.;Preset above-mentioned all kinds of content of pages are corresponding with server identification as shown in the table respectively:
With reference to upper table, if the classifying content for getting the current detection page is game class, it is determined that destination server aaa
The server of mark, if the classifying content for getting the current detection page is GT grand touring, it is determined that destination server identifies for kkk
Server.
In a particular application, the page classifications information can also be the classification information of page type, for example, according to the page
The page is divided by type:HTML types homepage, imports block in homepage, HTML types first level pages, the HTML type pages at Flash types homepage
The corresponding three-level page of block content, general first level pages, the general two level page, row in the corresponding two level page, the HTML type pages
Table first level pages, the list two level page;Preset above-mentioned all kinds of page types are corresponding with server identification as shown in the table respectively:
With reference to upper table, if the type for getting the current detection page is general first level pages, it is determined that destination server is
The server of 777 marks, if the type for getting the current detection page is HTML type homepages, it is determined that destination server 111
The server of mark.
In practice, those skilled in the art are feasible using any page classifications information, for example, it is also possible to adopt
With the attributive classification information of the page, labeling information of the page etc., the embodiment of the present application is to this without being limited.
In another preferred embodiment of the present application, the characteristic information may include the URL of the page, the server
With numerical identity, in this case, the step 103 can specifically include following sub-step:
Sub-step S321, the URL of the current detection page is converted to by numerical value using preset algorithm;
Sub-step S322, the server that corresponding numerical identity is extracted by the numerical value are destination server.
For example, it is assumed that current black chain database portion is deployed on n platform servers, in the URL (systems for getting the current detection page
One Resource Locator, web page address) when, using the URL as input, random algorithm is called to obtain a certain character such as MD5 algorithms
Go here and there (such as character strings of 32 bytes), character string is then mapped to a numerical value using certain mapping ruler, using the numerical value as pair
The value of the server n answered, the numerical value such as obtained are 2, that is, the server identification that obtaining will preserve is 2, you can determine that target takes
Business device is the server for identifying 2.
Certainly, the method that the above-mentioned characteristic information according to the page determines corresponding destination server is solely for example, this
Field technology personnel can use any method according to actual conditions, such as solid using the tag characters string of the page is converted to
The method etc. of definite value, the application is to this without limiting.
Step 14 is matched using the black chain property data base in the destination server with the current detection page, is sentenced
Whether comprising the black chain characteristic in the black chain property data base in the disconnected current detection page, if so, judgement current page
Face is to be tampered the page.
In practice, if in the current detection page not including the black chain characteristic in the black chain property data base,
It can determine that current page is not tampered with.
The embodiment of the present invention works as presence by using the framework for carrying out distributed treatment and application to black chain property data base
When the concurrently detection request of multiple pages, can effective distributing server detection pressure, to effectively saving system resource.
It should be noted that for embodiment of the method, for simple description, therefore it is all expressed as a series of action group
It closes, but those skilled in the art should understand that, the application is not limited by the described action sequence, because according to this Shen
Please, certain steps can be performed in other orders or simultaneously.Next, those skilled in the art should also know that, specification
Described in embodiment belong to preferred embodiment, necessary to involved action and module not necessarily the application.
With reference to figure 2, it illustrates a kind of structure diagrams of device embodiment that the detection page is distorted of the application, and specifically may be used
To comprise the following modules:
Database generation module 21, for generating black chain property data base, the black chain property data base includes black chain
Characteristic;
Database deployment module 22, for disposing the black chain property data base in multiple servers;
Characteristic information acquisition module 23, the characteristic information for obtaining the current detection page;
Destination server determining module 24, for determining corresponding destination server according to the characteristic information of the page;
Tampering detection module 25, for using the black chain property data base and the current detection page in the destination server
It is matched, is judged whether comprising the black chain characteristic in the black chain property data base in the current detection page, if so,
Judgement current page is to be tampered the page.
In a preferred embodiment of the present application, the server includes with server identification, the characteristic information
Page classifications information, in this case, the destination server determining module 24 may include following submodule:
Marker extraction submodule, for the correspondence according to preset page classifications information and server identification, extraction
The corresponding server identification of current page classification information;
Mark location submodule, for the corresponding server of the server identification to be determined as destination server.
In another preferred embodiment of the present application, the characteristic information includes the URL of the page, and the server has
Numerical identity, in this case, the destination server determining module 24 may include following submodule:
URL transform subblocks, for the URL of the current detection page to be converted to numerical value using preset algorithm;
The corresponding submodule of mark, for being destination server by the server of the corresponding numerical identity of numerical value extraction.
In the concrete realization, the embodiment of the present application can also include database update module, for being spaced at preset timed intervals
Update the black chain property data base.
In a preferred embodiment of the present application, the database generation module 21 may include following submodule:
Characteristics page searches for submodule, for including the black chain characteristic using existing black chain characteristic search
The page be characterized the page;
Topological analysis's module, for analyzing layout of the black chain characteristic in characteristics page;
Page elements extraction module, for when finding that layout is abnormal, extraction to include the black chain from this feature page
The page elements of characteristic;
Black chain rule generation module, for generating black chain rule according to the page elements;
Black chain characteristic extraction module, for being matched in the other feature page using the black chain rule, and
New black chain characteristic is extracted in matched characteristics page, is preserved the black chain characteristic and is formed black chain characteristic
Library.
In the concrete realization, the black chain characteristic may include distorting keyword and black chain URL.
As a kind of example of the embodiment of the present application concrete application, topological analysis's submodule may include as placed an order
Member:
First judging unit, for judging the position of page element of the black chain characteristic whether in preset threshold range
It is interior, if so, layout of the judgement black chain characteristic in characteristics page is abnormal;
And/or
Second judgment unit, for judging whether the page elements attribute of the black chain characteristic is invisible attribute,
If so, layout of the judgement black chain characteristic in characteristics page is abnormal;
And/or
Third judging unit, for judging whether the page elements attribute of the black chain characteristic is to be hidden to browser
Attribute, if so, layout of the judgement black chain characteristic in characteristics page is abnormal.
In a particular application, it may include such as lower unit that the black chain rule, which generates submodule,:
Regular expression extracting unit, for from comprising the page elements for distorting keyword and/or black chain URL,
Regular expression is taken out as black chain rule.
Since described device embodiment essentially corresponds to aforementioned embodiment of the method shown in FIG. 1, therefore the description of the present embodiment
In not detailed place, may refer to the related description in previous embodiment, just do not repeat herein.
The application can be used in numerous general or special purpose computing system environments or configuration.Such as:Personal computer, service
Device computer, handheld device or portable device, laptop device, multicomputer system, microprocessor-based system, machine top
Box, programmable consumer-elcetronics devices, network PC, minicomputer, mainframe computer including any of the above system or equipment
Distributed computing environment etc..
The application can describe in the general context of computer-executable instructions executed by a computer, such as program
Module.Usually, program module includes routines performing specific tasks or implementing specific abstract data types, program, object, group
Part, data structure etc..The application can also be put into practice in a distributed computing environment, in these distributed computing environments, by
Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with
In the local and remote computer storage media including storage device.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment including a series of elements includes not only that
A little elements, but also include other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or apparatus that includes the element.
Above to a kind of method that the detection page is distorted provided herein, and, a kind of dress that the detection page is distorted
It sets and is described in detail, specific examples are used herein to illustrate the principle and implementation manner of the present application, above
The explanation of embodiment is merely used to help understand the present processes and its core concept;Meanwhile for the general skill of this field
Art personnel, according to the thought of the application, there will be changes in the specific implementation manner and application range, in conclusion this
Description should not be construed as the limitation to the application.