CN102790700B - Method and device for recognizing webpage crawler - Google Patents

Method and device for recognizing webpage crawler Download PDF

Info

Publication number
CN102790700B
CN102790700B CN201110130432.0A CN201110130432A CN102790700B CN 102790700 B CN102790700 B CN 102790700B CN 201110130432 A CN201110130432 A CN 201110130432A CN 102790700 B CN102790700 B CN 102790700B
Authority
CN
China
Prior art keywords
web
time interval
adjacent
web page
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201110130432.0A
Other languages
Chinese (zh)
Other versions
CN102790700A (en
Inventor
叶润国
肖小剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Venus Information Security Technology Co Ltd
Venus Info Tech Inc
Beijing Venus Information Technology Co Ltd
Original Assignee
Beijing Venus Information Security Technology Co Ltd
Beijing Venus Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Venus Information Security Technology Co Ltd, Beijing Venus Information Technology Co Ltd filed Critical Beijing Venus Information Security Technology Co Ltd
Priority to CN201110130432.0A priority Critical patent/CN102790700B/en
Publication of CN102790700A publication Critical patent/CN102790700A/en
Application granted granted Critical
Publication of CN102790700B publication Critical patent/CN102790700B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a method and a device for recognizing webpage crawler, which belongs to the technical field of network security. The method comprises counting the average response time of a Web server to all Webpage requests; acquiring Webpage requests from a Web client to the Web server in a period of time; measuring the time interval between adjacent webpage requests as well as the response time of each Webpage request; correcting the time interval between adjacent webpage requests according to the webpage request response time; determining whether the corrected time interval between the adjacent webpage requests is larger than or equal to a predetermined threshold value delta of the time interval between the adjacent webpage requests; and determining whether the operation of the web client is the network crawler according to whether the determination result meets a preset condition. The method provided by the embodiment of the invention can simply and rapidly determine the hidden webpage crawlers, has high adaptability, and can provide previous response time for subsequent security responses.

Description

A kind of method and apparatus identifying spiders
Technical field
The present invention relates to technical field of network security, particularly relate to a kind of method and apparatus identifying spiders.
Background technology
Due to convenience and the ease for use of Web service, at present increasing Network has adopted private client and private server pattern (C/S model) to transfer to adopt standard web browsers as the browser of client and Web server pattern (B/S pattern) from tradition.These Networks that have employed B/S pattern are commonly referred to as Web application system.Web application system, bringing easily simultaneously, also brings a lot of safety problem, and more common safety problem comprises webpage Trojan horse virus, SQL injection attacks, XSS attack etc.The root that these safety problems of Web application system exist is because Web application system itself exists the defect on program code mostly, introduces Web security breaches, thus hacker is had an opportunity to take advantage of.
When network attack person attacks a Web application system (sometimes also becoming a Web site), first need to carry out vulnerability scanning to whole Web application system, find the Web security breaches can attacking utilization, then this leak is attacked, thus reach its malicious intent.For a brand-new Web application system, network attack person needs to take spiders technology to scan this Web application system, find the webpage that likely there is safety problem, then attack is carried out to this webpage and attempt, thus confirm whether this webpage exists leak.
Research through attacking various common Web finds, when a lot of Web attacks and occurs, the Web attack tool that they use mostly has the behavior of a kind of Web reptile.Comprise:
CC attacks (DDoS): adopt and multiplely act on behalf of the Web page that on concurrent access Web server, those resource consumptions are more, cause Web service DDoS;
Corpse DDoS: the corpse adopting a group to run Web reptile continually climbs Web server, and other Web of the not free reception of Web server is asked;
Web vulnerability scanning (comprising SQL implantation tool): hacker adopts common hole scanner to carry out vulnerability scanning to Web server.
From Web server defence angle, if can identify in early days these malice spiders, and continuous surveillance they, then may carry out flow control to them early, thus guarantee the safety of Web server.
Current common spiders recognition methods judges it whether as spiders by a series of Web page requests monitored a certain Web client and send, it detects basic ideas: if this Web client is reptile, then the probability that time interval of two continuous Web page requests that it sends gets smaller value is larger; If this Web client is normal users, then the probability that time interval of two continuous Web page requests that it sends gets higher value is larger; By monitoring the time interval of n the Web page request that this Web client sends continuously and adopting hypothesis test, just can whether it be that spiders is still manually browsed at certain confidence declaration.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of method and apparatus identifying spiders, can detect the CC client that disguise is stronger, thus is that follow-up HTTP flow control provides the valuable response time.
In order to solve the problem, the invention provides a kind of method identifying spiders, comprising the following steps:
A0, statistics Web server are to the average response time η of all Web page requests;
A1, to collect in a period of time Web client to the web-page requests sequence W of Web server;
A2, calculate the time interval Vt between each adjacent web-page requests in described web-page requests sequence W iwith the response time μ of each Web page request i; A3, based on each Web page request response time μ iwith Web page request average response time η is by adjacent Web page request time interval Vt icarry out being modified to Vt ' i, wherein, modification rule is: if Web page request response time μ ibe greater than average Web page request response time η, revised adjacent Web page request time interval Vt ' ifor Vt ithe product of the penalty factor k of 1 is less than with one;
A4, judge revised each time interval Vt ' respectively iwhether be more than or equal to the adjacent webpage request time interval threshold δ preset, if it is by the Event element the e corresponding revised time interval ibe designated as 0, otherwise be designated as 1; The Event element e that each time interval is corresponding iform an elementary event sequence E;
A5, with described elementary event sequence E mate respectively hypothesis H 0and H 1, wherein H 0what represent web client is operating as normal web page browsing behavior, H 1what represent web client is operating as spiders; If described elementary event sequence E mates hypothesis H 1degree, mate with elementary event sequence E and suppose H 0degree between gap be greater than a degree threshold value, then what judge web client is operating as spiders, otherwise is normal web page browsing behavior.
Preferably, in said method, the penalty factor k being less than 1 in described steps A 3 is 1-(μ i-η)/μ i.
Preferably, in said method, described steps A 5 comprises:
A51, proposition two hypothesis H 0and H 1, wherein H 0what represent Web client is operating as normal web page browsing behavior, H 1what represent Web client is operating as spiders;
A52, set between two adjacent web-page requests producing in normal web page browsing process interval greater than or equal the probability P r [e of δ i=0|H 0] be θ 0, be less than the probability P r [e of δ i=1|H 0] be 1-θ 0, between two the adjacent web-page requests produced in setting spiders process interval greater than the probability P r [e equaling δ i=0|H 1] be θ 1, be less than the probability P r [e of δ i=1|H 1] be 1-θ 1; θ 0> θ 1, and conditional random variable e i| H jmeet independent same distribution;
A53, calculate two hypothesis H 0and H 1the likelihood ratio V (E) of lower generation elementary event sequence E:
V ( E ) = Pr [ E | H 1 ] Pr [ E | H 0 ] = Π i = 1 n - 1 Pr [ e i | H 1 ] Pr [ e i | H 0 ]
A54, by V (E) respectively with two fixed threshold η 0and η 1relatively.Wherein η 0< η 1if: V (E)>=η 1, then what judge Web client is operating as spiders; If V (E)≤η 0, then what judge Web client is operating as normal web page browsing.
Preferably, in said method, in described steps A 54:
When continuous m web-page requests from Web client to Web server all meet adjacent webpage request time interval be more than or equal to the web-page requests time interval threshold value δ time, obtain described threshold value η 0:
&eta; 0 = &Pi; i = 1 m - 1 Pr [ e i = 0 | H 1 ] Pr [ e i = 0 | H 1 ] = ( Pr [ e i = 0 | H 1 ] Pr [ e i = 0 | H 1 ] ) m - 1
When continuous m web-page requests from Web client to Web server all meet adjacent webpage request time interval be less than the web-page requests time interval threshold value δ time, obtain described threshold value η 1:
&eta; 1 = &Pi; i = 1 m - 1 Pr [ e i = 1 | H 1 ] Pr [ e i = 1 | H 1 ] = ( Pr [ e i = 1 | H 1 ] Pr [ e i = 1 | H 1 ] ) m - 1
Wherein, m is positive integer.
Preferably, in said method:
Described δ is 1 second, 2 seconds or 3 seconds;
When δ is 3 seconds, described θ 0and θ 1be respectively 0.6 and 0.4.
The invention also discloses a kind of device identifying spiders, comprising:
Acquiring unit, for obtaining in a period of time Web client to the web-page requests sequence W of Web server, and statistics Web server is to the average response time η of all Web page requests;
Correcting process unit, measure each adjacent Web page request time interval and each Web page request response time, according to Web page request response time correction adjacent Web page request time interval, and judge whether the time interval of revised each adjacent web-page requests is more than or equal to a predetermined adjacent webpage request time interval threshold δ; Wherein, correcting process unit comprises: computing module, correcting module and logging modle;
Computing module, measures each adjacent Web page request time interval and each Web page request response time in the web-page requests that described acquiring unit obtains;
Correcting module, according to average response time and the Web page request response time correction adjacent Web page request time interval of added up Web page request, wherein, modification rule is: if Web page request response time μ ibe greater than average Web page request response time η, revised adjacent Web page request time interval Vt ' ifor Vt ithe product of the penalty factor k of 1 is less than with one;
Logging modle, for judging whether each time interval is more than or equal to the adjacent webpage request time interval threshold δ preset, if it is by the Event element e corresponding this time interval respectively ibe designated as 0, otherwise be designated as 1; Obtain comprising Event element e corresponding to each time interval ian elementary event sequence E;
Recognition unit, for mating hypothesis H respectively with described elementary event sequence E 0and H 1, wherein H 0represent that Web client r's is operating as normal web page browsing behavior, H 1represent that Web client r's is operating as spiders; If described elementary event sequence E mates hypothesis H 1degree, mate with elementary event sequence E and suppose H 0degree between gap be greater than a degree threshold value, then judging that Web client r's is operating as spiders, otherwise is normal web page browsing behavior.
Preferably, in said apparatus, described in be less than 1 penalty factor k be 1-(μ i-η)/μ i.
Preferably, in said apparatus, described recognition unit comprises:
Suppose module, for proposing two hypothesis H 0and H 1, wherein H 0represent that Web client r's is operating as normal web page browsing behavior, H 1represent that Web client r's is operating as spiders;
Setting module, for set between two adjacent web-page requests producing in normal web page browsing process interval greater than or equal the probability P r [e of δ i=0|H 0] be θ 0, be less than the probability P r [e of δ i=1|H 0] be 1-θ 0, between two the adjacent web-page requests produced in setting spiders process interval greater than the probability P r [e equaling δ i=0|H 1] be θ 1, be less than the probability P r [e of δ i=1|H 1] be 1-θ 1; θ 0> θ 1, and conditional random variable e i| H jmeet independent same distribution;
Likelihood ratio computing module, for calculating at two hypothesis H 0and H 1the likelihood ratio V (E) of lower generation elementary event sequence E:
V ( E ) = Pr [ E | H 1 ] Pr [ E | H 0 ] = &Pi; i = 1 n - 1 Pr [ e i | H 1 ] Pr [ e i | H 0 ]
Judging module, for by V (E) respectively with two fixed threshold η 0and η 1relatively, wherein η 0< η 1if: V (E)>=η 1, then judge that Web client r's is operating as spiders; If V (E)≤η 0, then judge that Web client r's is operating as normal web page browsing.
Preferably, in said apparatus, described recognition unit also comprises:
Threshold setting module, for arranging described fixed threshold η 0and η 1; When continuous m the web-page requests from Web client r to Web server s all meet adjacent webpage request time interval be more than or equal to the web-page requests time interval threshold value δ time, obtain described threshold value η 0:
&eta; 0 = &Pi; i = 1 m - 1 Pr [ e i = 0 | H 1 ] Pr [ e i = 0 | H 1 ] = ( Pr [ e i = 0 | H 1 ] Pr [ e i = 0 | H 1 ] ) m - 1
When continuous m the web-page requests from Web client r to Web server s all meet adjacent webpage request time interval be less than the web-page requests time interval threshold value δ time, obtain described threshold value η 1:
&eta; 1 = &Pi; i = 1 m - 1 Pr [ e i = 1 | H 1 ] Pr [ e i = 1 | H 1 ] = ( Pr [ e i = 1 | H 1 ] Pr [ e i = 1 | H 1 ] ) m - 1
Wherein, m is positive integer.
Preferably, in said apparatus:
Described δ is 1 second, 2 seconds or 3 seconds;
When δ is 3 seconds, described θ 0and θ 1be respectively 0. δ and 0.4.
Embodiments of the invention propose the thought identified attack early-stage preparations activity, prepare, or stop to attack warming-up exercise, thus enhance the security reliability of network just can carry out defence before the commence firing; Early-stage preparations activity is attacked in order to identify, embodiments of the invention revise continuous webpage request time spacing value according to the response time interval of web-page requests, thus identify that the normal Web client access frequency of simulation attacks the CC client of Web server, namely hiding spiders is identified, its prioritization scheme adopts rigorous Mathematical Modeling, can detect hiding spiders simply, rapidly, and applicability is strong, the valuable response time can be provided for follow-up security response.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the method identifying spiders in embodiment 1;
Fig. 2 is the schematic flow sheet of steps A 4 in the method identifying spiders in embodiment 1.
Embodiment
Below in conjunction with drawings and the specific embodiments, technical solution of the present invention is described in further details.It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combination in any mutually.
At present, from the angle of Web defender, if just can identify at the initial stage that network attack person scans Web application system takes spiders technology to scan this abnormal behaviour to Web application system, so just can make the attack of hacker and responding timely, such as stop the spiders of hacker to the further scanning behavior of this Web application system, or record the web access behavior that it is follow-up, and Web attack that it is initiated is on the defensive.
Spiders is a software module run, it is automatically from downloading web pages Web application system, then the hyperlink in automatic analysis webpage, then according to the hyperlink automatic acquisition next stage webpage extracted, until the webpage of whole Web application system has all been downloaded.Because spiders simulates the web page browsing behavior of people completely, therefore, accurately to identify that spiders exists very large difficulty.
In some technical scheme, judge that the operation of this web client r is whether as spiders by observing in a period of time from the web-page requests sequence between certain web client r and certain Web server s; Normal web page browsing behavior and the behavior of automatic web reptile is distinguished by the time interval analyzed between adjacent two web-page requests, its Main Basis is: during artificial browsing page, be switched to another Web page from a Web page and need the long time, generally be greater than 2 seconds, and spiders to be switched to another webpage from a webpage be automatic, switching time obviously switches short than manual webpage.
Based on collected from web client r to Web server web-page requests sequence analyze the webpage switching behavior of web client r, adopt the sequence hypothesis method of inspection, first propose two hypothesis H 0and H 1, wherein H 0represent that web client r's is operating as normal web page browsing behavior, H 1represent that web client r's is operating as spiders, then check which hypothesis to set up based on viewed webpage switching behavior, as discovery hypothesis H 1during establishment, then judge that web client r's is operating as spiders.
But, applicant finds that a series of Web page requests sent by a certain Web client of above-mentioned monitoring judge that whether it have some limitations as the technical scheme of spiders, namely the CC client that a kind of disguise common is at present stronger cannot be detected, the request frequency of the normal Web client of this CC client simulation goes to access Web page, but the Web page of asking will consume the more computational resource of Web server (such as text retrieval page request), Web server computational resource is made to be consumed totally, thus cannot respond the access request of validated user.
Therefore, while applicant in this case considers the time interval situation of the continuous Web page request sent in certain Web client of monitoring, monitoring Web server is to the response time situation of these Web page requests, revise continuous webpage request time spacing value according to Web page request response time, thus accurately can detect that the normal Web client access frequency of that simulation attacks the CC client of Web server.
Embodiment 1
The present embodiment, based on above-mentioned thought, provides a kind of method identifying spiders, comprising:
Statistics Web server is to the average response time of all Web page requests, monitoring a period of time (i.e. a setting-up time section) interior Web client is to a series of Web page requests of Web server, measure each adjacent webpage request time interval and each Web page request response time, according to Web page request response time correction adjacent webpage request time interval, finally judge whether revised adjacent Web page request time interval is more than or equal to a predetermined adjacent webpage request time interval threshold δ, whether meet pre-conditioned according to each judged result, whether the operation judging described web client is web crawlers.
As shown in Figure 1, described method specifically comprises the following steps:
A0, statistics Web server are to the average response time of all Web page requests;
In the present embodiment, Web server, to the average response time of all Web page requests, can think that Web server is to the average response time of all Web page requests within a period of time.In other application scenarioss, also can refer to a set point.
A1, collect the web-page requests sequence of web client r to Web server s in a period of time;
A2, calculate the time interval Vt between each adjacent web-page requests in above-mentioned web-page requests sequence iand the response time μ of each Web page request i;
Wherein, to web-page requests sequence W (each element w sequence comprising n web-page requests from web client r to Web server s collected irepresent, wherein i value is 1 to each integer of n, comprises 1 and n), calculates the time interval between adjacent two web-page requests, obtains an adjacent webpage request time intervening sequence T comprising (n-1) individual element (each element t in sequence irepresent, wherein i value is 1 to each integer of n-1, comprise 1 and n-1);
A3, based on each Web page request response time μ iwith Web page request average response time η is by adjacent Web page request time interval Vt icarry out being modified to Vt ' i, wherein, modification rule is: if Web page request response time μ ibe greater than average Web page request response time η, revised adjacent Web page request time interval Vt ' ifor Vt ithe product of the penalty factor k of 1 is less than with one;
A4, judge revised each time interval Vt ' respectively iwhether be more than or equal to the adjacent webpage request time interval threshold δ preset, if so, then by the Event element the e corresponding revised time interval ibe designated as 0, otherwise be designated as 1; The Event element e that each time interval is corresponding iform an elementary event sequence E;
A5, with above-mentioned elementary event sequence E mate respectively hypothesis H 0and H 1, wherein H 0what represent web client is operating as normal web page browsing behavior, H 1what represent web client is operating as spiders; If described elementary event sequence E mates hypothesis H 1degree, mate with elementary event sequence E and suppose H 0degree between gap be greater than a degree threshold value, then what judge web client is operating as spiders, otherwise is normal web page browsing behavior.
Mentioned here, elementary event sequence E mates hypothesis H 1degree, mate with elementary event sequence E and suppose H 0degree between gap be greater than in a degree threshold value, said degree can be probability, similarity etc., and said gap can be ratio, difference etc.
During practical application, also directly can carry out recognition network reptile according to each judged result, such as preset a condition be equal in elementary event sequence E 0 e inumber be greater than in E the e equaling 1 inumber, when each judged result meet this pre-conditioned time, judge that described web client r's is operating as normal main frame, otherwise be web crawlers; For another example preset a condition be equal in elementary event sequence E 0 e inumber and E in the ratio of total element number be less than a proportion threshold value, when each judged result meet this pre-conditioned time, judge that described web client r's is operating as web crawlers, otherwise be normal main frame.
In the present embodiment, steps A 1 needs to collect in a period of time by the institute of web client r to Web server s once successful web-page requests.Once successful web-page requests process mentioned here refers to: first web client r sends a web-page requests message to Web server s, asks the webpage of specifying; After Web server s receives this web-page requests message, take out the webpage of asking and then send to web client r; If the webpage of asking is a dynamic web page, then Web server s needs first to perform the webpage required for web client r that corresponding external program can obtain.
It should be noted that: webpage common is at present all multimedia page, it comprises writings and image simultaneously, once successfully web-page requests will comprise the acquisition to a html file object and multiple picture concerned object simultaneously, therefore, once successfully web-page requests, by the transmission of multiple HTTP request message that comprises between web client r and Web server s and response (and these HTTP request message may send) simultaneously, but only has one to be used to obtain html file object in these HTTP request message.
Therefore, the method of the invention from web client r to can not simply the single HTTP request web client r and Web server s and relevant response be regarded as during the web-page requests of Web server s as once successful web-page requests, and must check that the Content-Type protocol fields of http response message header is to judge the type of its object obtained in collection.Know according to known http protocol specification, if certain HTTP request message object obtains html file object, the Content-Type field value of so relevant http response message header is " text/html ".Therefore, in the present embodiment, when collecting from web client r to the web-page requests of Web server s, only consider those http response message headers Content-Type field value be that the single HTTP request message of " text/html " and response message regard once successfully web-page requests as, to avoid also being used as a web-page requests by from web client r to the acquisition of the object picture of Web server s.
Suppose that steps A 1 at the appointed time have collected from web client r to the n of Web server s web-page requests in section, this n web-page requests is by formation web-page requests sequence W (each element w in W irepresent, wherein i value is each integer from 1 to n, comprise 1 and n-1), according to steps A 2, calculate adjacent webpage request time intervening sequence T based on this web-page requests sequence W below: suppose each web-page requests w in web-page requests sequence W itime of origin be two then adjacent web-page requests w iand w i+1between the time interval be therefore, each element in adjacent webpage request time intervening sequence T wherein i value is from 1 to each integer of each integer of (n-1), comprises 1 and n-1.Same, after have collected n web-page requests, the response time μ of this n web-page requests will be collected respectively i, be about to the response time of collecting n web-page requests, also can form a response time sequence.
In the present embodiment, steps A 3, needs based on each Web page request response time μ iand the adjacent Web page request time interval Vt that Web page request average response time η calculates steps A 2 irevise, the process revised particularly, as Web page request response time μ iwhen being greater than average Web page request response time η, thinking and responded slowly, therefore, need the response time of improving Web page request, be about to the adjacent Web page request time interval Vt ' calculated ithe penalty factor k being less than 1 with one is multiplied, and product is revised adjacent Web page request time interval, and it is less than the Web page request response time calculated.Wherein, the penalty factor k being less than 1 can be 1-(μ i-η)/μ i, wherein μ ifor Web page request response time, η is the average response time of all Web page requests.
In the present embodiment, steps A 4 needs to generate elementary event sequence E based on revised adjacent webpage request time intervening sequence T.Here need to preset adjacent webpage request time interval threshold δ, to judge that two adjacent web-page requests are that spiders sends or sent by normal web page browsing behavior.This adjacent webpage request time interval threshold δ obtains from empirical data.Observed by the time interval between adjacent two web-page requests of sending normal Web page navigation patterns and find, as a rule, its adjacent webpage request time is spaced apart 3 to 8 seconds; And by finding the observation in adjacent two web-page requests time intervals that spiders in web site scan instrument common at present sends, as a rule, its adjacent webpage request time is spaced apart and was less than for 1 second.Therefore, in the inventive method implementation process, can get adjacent webpage request time interval threshold δ is 1 second, 2 seconds or 3 seconds.
After determining adjacent webpage request time interval threshold δ, the process generating elementary event sequence E by revised adjacent webpage request time intervening sequence T in steps A 4 is as follows: to each element t in revised adjacent webpage request time intervening sequence T ianalyze, if t i>=δ, then corresponding element e in elementary event sequence E i=0, otherwise e i=1.
In the present embodiment, steps A 5 adopts the sequence hypothesis method of inspection to analyze elementary event sequence E, thus judges that the operation of web client r is whether as spiders, and concrete steps as shown in Figure 2, comprising:
A51, proposition two hypothesis H 0and H 1, wherein H 0represent that web client r's is operating as normal web page browsing behavior, H 1represent that web client r's is operating as spiders;
A52, set between two adjacent web-page requests producing in normal web page browsing process interval greater than or the probability that equals δ be θ 0, i.e. Pr [e i=0|H 0]=θ 0, the probability being less than δ is 1-θ 0, i.e. Pr [e i=1|H 0]=1-θ 0; Between two the adjacent web-page requests produced in setting spiders process interval greater than the probability equaling δ be θ 1, i.e. Pr [e i=0|H 1]=θ 1, the probability being less than δ is 1-θ 1, i.e. Pr [e i=1|H 1]=1-θ 1; Suppose θ 0> θ 1, and conditional random variable e i| H jmeet independent same distribution;
A53, calculate two hypothesis H 0and H 1the likelihood ratio V (E) of lower generation elementary event sequence E;
V ( E ) = Pr [ E | H 1 ] Pr [ E | H 0 ] = &Pi; i = 1 n - 1 Pr [ e i | H 1 ] Pr [ e i | H 0 ]
A54, given two fixed threshold η 0and η 1(wherein η 0< η 1), by V (E) respectively with η 0and η 1relatively: if V (E)>=η 1, then judge that web client r's is operating as spiders; If V (E)≤η 0, then judge that web client r's is operating as normal web page browsing; If η 0< V (E) < η 1, then need to continue to observe just to make a determination from web client r to the web-page requests of Web server s, now can continue to collect a period of time web-page requests, then concentrate in together with the web-page requests of originally collecting, return steps A 2 and perform.
In the present embodiment, in steps A 51, propose two hypothesis H 0, and H 1, wherein H 0represent that web client r's is operating as normal web page browsing behavior, H 1represent that web client r's is operating as spiders.Then by viewed elementary event sequence E, the present embodiment judges that the possibility which hypothesis is set up is larger.
In the present embodiment, in steps A 52, suppose θ 0> θ 1, this means to produce in normal web page browsing process two adjacent interval greater than or the likelihood ratio spiders of the web-page requests that equals δ want large, this just the present embodiment distinguish the key point of normal web page browsing behavior and spiders behavior; θ 0and θ 1value can based on experience value or test, the size in conjunction with δ is determined; When δ is taken as different values, θ 0and θ 1value also can change.
In the present embodiment, calculate elementary event sequence E in steps A 53 at two hypothesis H 0and H 1under likelihood ratio V (E) time, have employed above-mentioned computing formula, its Main Basis is, conditional random variable e i| H jmeet independent same distribution.
Wherein, two fixed threshold η given in advance are needed in steps A 54 0and η 1(wherein η 0< η 1).Wherein, lower threshold η 0be used for judging that the operation of web client r is whether as normal web page browsing behavior, as the described threshold value η of the upper limit 1be used for judging that the operation of web client r is whether as spiders behavior.
In specific implementation process, the threshold value η estimated with the following method as lower limit can be adopted 0with the threshold value η as the upper limit 1: as long as continuous m the web-page requests supposing to observe from web client r to Web server s all meets adjacent webpage request time interval and is more than or equal to web-page requests time interval threshold value δ and just can judges that web client r's is operating as normal webpage behavior, then described threshold value η 0can value be:
&eta; 0 = &Pi; i = 1 m - 1 Pr [ e i = 0 | H 1 ] Pr [ e i = 0 | H 1 ] = ( Pr [ e i = 0 | H 1 ] Pr [ e i = 0 | H 1 ] ) m - 1
As long as continuous m the web-page requests supposing to observe from web client r to Web server s all meets adjacent webpage request time interval and is less than web-page requests time interval threshold value δ and just can judges that web client r's is operating as spiders, then described threshold value η 1can value be:
&eta; 1 = &Pi; i = 1 m - 1 Pr [ e i = 1 | H 1 ] Pr [ e i = 1 | H 1 ] = ( Pr [ e i = 1 | H 1 ] Pr [ e i = 1 | H 1 ] ) m - 1
Wherein, m is positive integer, and its value can set according to actual conditions, and obtains η 0and η 1time, m can get identical value, also can get different value; η 0and η 1also can directly determine according to practical experience or test.
Be illustrated further with several concrete example below:
In several example, suppose that the adjacent webpage request time interval threshold δ value for distinguishing the switching of manual webpage and automatic web switching behavior is 3 seconds (3000 milliseconds); The probability that the time requesting interval supposing between two adjacent webpages producing in normal web page browsing process is more than or equal to 3 seconds is 0.6, so, the probability that the time requesting interval between two adjacent webpages producing of its revised web client is less than 3 seconds is 0.4; The probability that the time requesting interval supposing between two adjacent webpages that spiders produces is more than or equal to 3 seconds is 0.4, and so, the probability that the time requesting interval between two adjacent webpages that its revised web client produces is less than 3 seconds is 0.6; As long as continuous 5 web-page requests supposing to observe from web client r to Web server s all meet " adjacent webpage request time interval is more than or equal to web-page requests time interval threshold value δ " this condition and just can judge that web client r's is operating as normal web page browsing behavior (i.e. m=5), then described threshold value η 0be set to (0.4/0. δ) ^5=0.132; As long as continuous 5 web-page requests supposing to observe from web client r to Web server s all meet " adjacent webpage request time interval is less than web-page requests time interval threshold value δ " this condition and just can judge that web client r's is operating as spiders, then described threshold value η 1be set to (0.6/0.4) ^5=7.59.
Such as, suppose according to spiders recognition methods steps A 1, have collected 10 web-page requests that certain CC client mails to protected Web server s, the initiation time of this 10 web-page requests and request response time as shown in table 1.
Table 1 is 10 web-page requests tables that certain CC client mails to protected Web server s
As if statistics is 10 milliseconds to the average response time of Web server to all Web page requests, according to spiders recognition methods steps A 2, and step is revised at neighbor request interval in A4, calculate element number be 9 revised adjacent webpage request time intervening sequence T as shown in table 2ly (suppose modifying factor k=1-(μ i-η)/μ i).
Table 2 is for revising adjacent webpage request time spacing sheet in rear shown 10 web-page requests of table 1
According to step spiders recognition methods steps A 4 and adjacent webpage time interval threshold value δ=3000 millisecond that preset, obtain elementary event sequence E as shown in table 3.
Table 3 is elementary event sequence table
Element numbers 1 2 3 4 5 6 7 8 9
Elementary event 1 1 1 1 1 1 0 1 1
According to step spiders recognition methods steps A 5 and the lower threshold η 0 that presets be 0.132 and upper limit threshold η 1 be 7.59, first calculate the likelihood ratio of elementary event sequence E:
V (E)=(0.6/0.4) * (0.6/0.4) * (0.6/0.4) * (0.6/0.4) * (0.6/0.4) * (0.6/0.4) * (0.4/0.6) * (0.6/0.4) * (0.6/0.4)=17.8, it is greater than upper limit threshold η 1(its value is 7.59), therefore, judges that this web client r's is operating as spiders behavior.
If do not revised adjacent webpage request time spacing value according to each Web page request response time, traditionally decision method (i.e. existing decision method), then may be judged to be normal Web page navigation patterns.
Embodiment 2
The present embodiment introduces a kind of device identifying spiders, and it can realize the method for the identification spiders shown in embodiment 1.This device comprises:
Acquiring unit, for obtaining in a period of time web client to the web-page requests of Web server, and statistics Web server is to the average response time of all Web page requests;
Correcting process unit, measure each adjacent Web page request time interval and each Web page request response time, according to Web page request response time correction adjacent Web page request time interval, and judge whether the time interval of each adjacent web-page requests is more than or equal to a predetermined adjacent webpage request time interval threshold δ;
Recognition unit, for whether meeting pre-conditioned according to each judged result, judges whether the operation of described web client is web crawlers.
In the present embodiment, correcting process specifically comprises:
Computing module, for calculating time interval in web-page requests sequence W that acquiring unit obtains between each adjacent web-page requests and each Web page request response time;
Correcting module, according to average response time and the Web page request response time correction adjacent Web page request time interval of added up Web page request, wherein, modification rule is: if Web page request response time μ ibe greater than average Web page request response time η, revised adjacent Web page request time interval Vt ' ifor Vt ithe product of the penalty factor k of 1 is less than with one;
Wherein, correcting module is to the adjacent Web page request time interval t calculated ithe detailed process of carrying out revising is, as Web page request response time μ iwhen being greater than average Web page request response time η, thinking and responded slowly, therefore, need the response time of improving Web page request, be about to the adjacent Web page request time interval Vt ' calculated ithe penalty factor k being less than 1 with one is multiplied, and product is revised adjacent Web page request time interval, and it is less than the Web page request response time calculated.Wherein, the penalty factor k being less than 1 can be 1-(μ i-η)/μ i, wherein μ ifor Web page request response time, η is the average response time of all Web page requests.
Logging modle, for judging whether revised each time interval is more than or equal to the adjacent webpage request time interval threshold δ preset, if it is by the Event element e corresponding this time interval respectively ibe designated as 0, otherwise be designated as 1; Obtain comprising Event element e corresponding to each time interval ian elementary event sequence E;
Whether described recognition unit meets pre-conditioned according to each judged result, judges whether the operation of described web client is that web crawlers refers to:
The described elementary event sequence E of described recognition unit mates hypothesis H respectively 0and H 1, wherein H 0represent that web client r's is operating as normal web page browsing behavior, H 1represent that web client r's is operating as spiders; If described elementary event sequence E mates hypothesis H 1degree, mate with elementary event sequence E and suppose H 0degree between gap be greater than a degree threshold value, then judging that web client r's is operating as spiders, otherwise is normal web page browsing behavior.
In the present embodiment, described recognition unit comprises:
Suppose module, for proposing two hypothesis H 0and H 1, wherein H 0represent that web client r's is operating as normal web page browsing behavior, H 1represent that web client r's is operating as spiders;
Setting module, for set between two adjacent web-page requests producing in normal web page browsing process interval greater than or equal the probability P r [e of δ i=0|H 0] be θ 0, be less than the probability P r [e of δ i=1|H 0] be 1-θ 0, between two the adjacent web-page requests produced in setting spiders process interval greater than the probability P r [e equaling δ i=0|H 1] be θ 1, be less than the probability P r [e of δ i=1|H 1] be 1-θ 1; θ 0> θ 1, and conditional random variable e i| H jmeet independent same distribution;
Likelihood ratio computing module, for calculating at two hypothesis H 0and H 1the likelihood ratio V (E) of lower generation elementary event sequence E:
V ( E ) = Pr [ E | H 1 ] Pr [ E | H 0 ] = &Pi; i = 1 n - 1 Pr [ e i | H 1 ] Pr [ e i | H 0 ]
Judging module, for by V (E) respectively with two fixed threshold η 0and η 1relatively, wherein η 0< η 1if: V (E)>=η 1, then judge that web client r's is operating as spiders; If V (E)≤η 0, then judge that web client r's is operating as normal web page browsing.
In the present embodiment, described recognition unit also comprises:
Threshold setting module, for arranging described fixed threshold η 0and η 1; When continuous m the web-page requests from web client r to Web server s all meet adjacent webpage request time interval be more than or equal to the web-page requests time interval threshold value δ time, obtain described threshold value η 0:
&eta; 0 = &Pi; i = 1 m - 1 Pr [ e i = 0 | H 1 ] Pr [ e i = 0 | H 1 ] = ( Pr [ e i = 0 | H 1 ] Pr [ e i = 0 | H 1 ] ) m - 1
When continuous m the web-page requests from web client r to Web server s all meet adjacent webpage request time interval be less than the web-page requests time interval threshold value δ time, obtain described threshold value η 1:
Wherein, m is positive integer; Threshold setting module can obtain web-page requests by described acquiring unit, judge by described judging unit, and judged result is counted, if continuously m web-page requests all meets adjacent webpage request time interval and be less than (or be greater than, equal) web-page requests time interval threshold value δ, then calculate described threshold value η 1(or η 0); Can certainly directly obtain web-page requests and judgement.
Other realize details can with embodiment 1.
Certainly; the present invention also can have other various embodiments; when not deviating from the present invention's spirit and essence thereof; those of ordinary skill in the art are when making various corresponding change and distortion according to the present invention, but these change accordingly and are out of shape the protection range that all should belong to claim of the present invention.

Claims (8)

1. identify a method for spiders, it is characterized in that, the method comprises the following steps:
A0, statistics Web server are to the average response time η of all Web page requests;
A1, to collect in a period of time Web client to the web-page requests sequence W of Web server;
A2, calculate the time interval Vt between each adjacent web-page requests in described web-page requests sequence W iwith the response time μ of each Web page request i; A3, based on each Web page request response time μ iwith Web page request average response time η is by adjacent Web page request time interval Vt icarry out being modified to Vt ' i, wherein, modification rule is: if Web page request response time μ ibe greater than average Web page request response time η, revised adjacent Web page request time interval Vt ' ifor Vt ithe product of the penalty factor k of 1 is less than with one;
A4, judge revised each time interval Vt ' respectively iwhether be more than or equal to the adjacent webpage request time interval threshold δ preset, if it is by the Event element the e corresponding revised time interval ibe designated as 0, otherwise be designated as 1; The Event element e that each time interval is corresponding iform an elementary event sequence E;
A5, with described elementary event sequence E mate respectively hypothesis H 0and H 1, wherein H 0what represent web client is operating as normal web page browsing behavior, H 1what represent web client is operating as spiders; If described elementary event sequence E mates hypothesis H 1degree, mate with elementary event sequence E and suppose H 0degree between gap be greater than a degree threshold value, then what judge web client is operating as spiders, otherwise is normal web page browsing behavior.
2. the method for claim 1, is characterized in that,
The penalty factor k being less than 1 in described steps A 3 is 1-(μ i-η)/μ i.
3. method as claimed in claim 1 or 2, it is characterized in that, described steps A 5 comprises:
A51, proposition two hypothesis H 0and H 1, wherein H 0what represent Web client is operating as normal web page browsing behavior, H 1what represent Web client is operating as spiders;
A52, set between two adjacent web-page requests producing in normal web page browsing process interval greater than or equal the probability P r [e of δ i=0|H 0] be θ 0, be less than the probability P r [e of δ i=1|H 0] be 1-θ 0, between two the adjacent web-page requests produced in setting spiders process interval greater than the probability P r [e equaling δ i=0|H 1] be θ 1, be less than the probability P r [e of δ i=1|H 1] be 1-θ 1; θ 0> θ 1, and conditional random variable e i| H jmeet independent same distribution;
A53, calculate two hypothesis H 0and H 1the likelihood ratio V (E) of lower generation elementary event sequence E:
( E ) = Pr [ E | H 1 ] Pr [ E | H 0 ] = &Pi; i = 1 n - 1 Pr [ e i | H 1 ] Pr [ e i | H 0 ]
A54, by V (E) respectively with two fixed threshold η 0and η 1relatively; Wherein η 0< η 1if: V (E)>=η 1, then what judge Web client is operating as spiders; If V (E)≤η 0, then what judge Web client is operating as normal web page browsing.
4. method as claimed in claim 3, is characterized in that, in described steps A 54:
When continuous m web-page requests from Web client to Web server all meet adjacent webpage request time interval be more than or equal to the web-page requests time interval threshold value δ time, obtain described threshold value η 0:
&eta; 0 = &Pi; i = 1 m - 1 Pr [ e i = 0 | H 1 ] Pr [ e i = 0 | H 1 ] = ( Pr [ e i = 0 | H 1 ] Pr [ e i = 0 | H 1 ] ) m - 1
When continuous m web-page requests from Web client to Web server all meet adjacent webpage request time interval be less than the web-page requests time interval threshold value δ time, obtain described threshold value η 1:
&eta; 1 = &Pi; i = 1 m - 1 Pr [ e i = 1 | H 1 ] Pr [ e i = 1 | H 1 ] = ( Pr [ e i = 1 | H 1 ] Pr [ e i = 1 | H 1 ] ) m - 1
Wherein, m is positive integer.
5. identify a device for spiders, it is characterized in that, comprising:
Acquiring unit, for obtaining in a period of time Web client to the web-page requests sequence W of Web server, and statistics Web server is to the average response time η of all Web page requests;
Correcting process unit, measure each adjacent Web page request time interval and each Web page request response time, according to Web page request response time correction adjacent Web page request time interval, and judge whether the time interval of revised each adjacent web-page requests is more than or equal to a predetermined adjacent webpage request time interval threshold δ; Wherein, correcting process unit comprises: computing module, correcting module and logging modle;
Computing module, measures each adjacent Web page request time interval and each Web page request response time in the web-page requests that described acquiring unit obtains;
Correcting module, according to average response time and the Web page request response time correction adjacent Web page request time interval of added up Web page request, wherein, modification rule is: if Web page request response time μ ibe greater than average Web page request response time η, revised adjacent Web page request time interval Vt ' ifor Vt ithe product of the penalty factor k of 1 is less than with one;
Logging modle, for judging whether each time interval is more than or equal to the adjacent webpage request time interval threshold δ preset, if it is by the Event element e corresponding this time interval respectively ibe designated as 0, otherwise be designated as 1; Obtain comprising Event element e corresponding to each time interval ian elementary event sequence E;
Recognition unit, for mating hypothesis H respectively with described elementary event sequence E 0and H 1, wherein H 0what represent Web client is operating as normal web page browsing behavior, H 1what represent Web client is operating as spiders; If described elementary event sequence E mates hypothesis H 1degree, mate with elementary event sequence E and suppose H 0degree between gap be greater than a degree threshold value, then judging that Web client r's is operating as spiders, otherwise is normal web page browsing behavior.
6. device as claimed in claim 5, is characterized in that,
The described penalty factor k being less than 1 is 1-(μ i-η)/μ i.
7. the device as described in claim 5 or 6, is characterized in that, described recognition unit comprises:
Suppose module, for proposing two hypothesis H 0and H 1, wherein H 0represent that Web client r's is operating as normal web page browsing behavior, H 1represent that Web client r's is operating as spiders;
Setting module, for set between two adjacent web-page requests producing in normal web page browsing process interval greater than or equal the probability P r [e of δ i=0|H 0] be θ 0, be less than the probability P r [e of δ i=1|H 0] be 1-θ 0, between two the adjacent web-page requests produced in setting spiders process interval greater than the probability P r [e equaling δ i=0|H 1] be θ 1, be less than the probability P r [e of δ i=1|H 1] be 1-θ 1; θ 0> θ 1, and conditional random variable e i| H jmeet independent same distribution;
Likelihood ratio computing module, for calculating at two hypothesis H 0and H 1the likelihood ratio V (E) of lower generation elementary event sequence E:
( E ) = Pr [ E | H 1 ] Pr [ E | H 0 ] = &Pi; i = 1 n - 1 Pr [ e i | H 1 ] Pr [ e i | H 0 ]
Judging module, for by V (E) respectively with two fixed threshold η 0and η 1relatively, wherein η 0< η 1if: V (E)>=η 1, then judge that Web client r's is operating as spiders; If V (E)≤η 0, then judge that Web client r's is operating as normal web page browsing.
8. device as claimed in claim 7, it is characterized in that, described recognition unit also comprises:
Threshold setting module, for arranging described fixed threshold η 0and η 1; When continuous m the web-page requests from Web client r to Web server s all meet adjacent webpage request time interval be more than or equal to the web-page requests time interval threshold value δ time, obtain described threshold value η 0:
&eta; 0 = &Pi; i = 1 m - 1 Pr [ e i = 0 | H 1 ] Pr [ e i = 0 | H 1 ] = ( Pr [ e i = 0 | H 1 ] Pr [ e i = 0 | H 1 ] ) m - 1
When continuous m the web-page requests from Web client r to Web server s all meet adjacent webpage request time interval be less than the web-page requests time interval threshold value δ time, obtain described threshold value η 1:
&eta; 1 = &Pi; i = 1 m - 1 Pr [ e i = 1 | H 1 ] Pr [ e i = 1 | H 1 ] = ( Pr [ e i = 1 | H 1 ] Pr [ e i = 1 | H 1 ] ) m - 1
Wherein, m is positive integer.
CN201110130432.0A 2011-05-19 2011-05-19 Method and device for recognizing webpage crawler Expired - Fee Related CN102790700B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110130432.0A CN102790700B (en) 2011-05-19 2011-05-19 Method and device for recognizing webpage crawler

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110130432.0A CN102790700B (en) 2011-05-19 2011-05-19 Method and device for recognizing webpage crawler

Publications (2)

Publication Number Publication Date
CN102790700A CN102790700A (en) 2012-11-21
CN102790700B true CN102790700B (en) 2015-06-10

Family

ID=47156007

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110130432.0A Expired - Fee Related CN102790700B (en) 2011-05-19 2011-05-19 Method and device for recognizing webpage crawler

Country Status (1)

Country Link
CN (1) CN102790700B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103036746B (en) * 2012-12-21 2015-07-08 中国科学院计算技术研究所 Passive measurement method and passive measurement system of web page responding time based on network intermediate point
CN103902912B (en) * 2012-12-26 2017-09-19 深圳市腾讯计算机系统有限公司 The detection method and device of webpage leak
CN103051722B (en) * 2012-12-26 2015-10-14 新浪网技术(中国)有限公司 A kind ofly determine the method whether page is held as a hostage and relevant device
CN104113525A (en) * 2014-05-23 2014-10-22 中国电子技术标准化研究院 Method and apparatus for defending resource consumption type Web attacks
CN104320400B (en) * 2014-10-31 2017-10-03 北京神州绿盟信息安全科技股份有限公司 Web vulnerability scanning method and devices
CN106156055B (en) * 2015-03-27 2019-10-15 阿里巴巴集团控股有限公司 The identification of search engine crawler, processing method and processing device
CN106202077B (en) * 2015-04-30 2020-01-21 华为技术有限公司 Task distribution method and device
CN106294368B (en) * 2015-05-15 2019-11-05 阿里巴巴集团控股有限公司 Web spider identification method and device
CN106294364B (en) * 2015-05-15 2020-04-10 阿里巴巴集团控股有限公司 Method and device for realizing web crawler to capture webpage
CN106936778B (en) * 2015-12-29 2020-05-05 北京国双科技有限公司 Method and device for detecting abnormal website traffic
CN106021552A (en) * 2016-05-30 2016-10-12 深圳市华傲数据技术有限公司 Internet creeper concurrency data collection method and system based on crowd behavior simulation
CN106027564B (en) * 2016-07-08 2019-05-21 携程计算机技术(上海)有限公司 Detect the method and device of anti-crawler security policy
CN106534062B (en) * 2016-09-23 2019-05-10 南京途牛科技有限公司 A kind of method of anti-crawler
CN108429721B (en) * 2017-02-15 2020-08-04 腾讯科技(深圳)有限公司 Identification method and device for web crawler
CN107147640B (en) * 2017-05-09 2019-12-31 网宿科技股份有限公司 Method and system for identifying web crawler
CN107395564A (en) * 2017-06-15 2017-11-24 公安部交通管理科学研究所 Internet preselects the anti-snatch method and system of automotive number plate
CN109145185B (en) * 2018-02-02 2019-07-02 北京数安鑫云信息技术有限公司 It identifies web crawlers and extracts the method and device of web crawlers feature
CN109067780B (en) * 2018-09-17 2023-02-28 平安科技(深圳)有限公司 Crawler user detection method and device, computer equipment and storage medium
CN109189660A (en) * 2018-09-30 2019-01-11 北京诸葛找房信息技术有限公司 A kind of crawler recognition methods based on user's mouse interbehavior
CN111355728B (en) * 2020-02-27 2023-01-03 紫光云技术有限公司 Malicious crawler protection method
CN111431942B (en) * 2020-06-10 2020-09-15 杭州圆石网络安全技术有限公司 CC attack detection method and device and network equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101630327A (en) * 2009-08-14 2010-01-20 昆明理工大学 Design method of theme network crawler system
CN101739427A (en) * 2008-11-10 2010-06-16 中国移动通信集团公司 Crawler capturing method and device thereof
CN101902438A (en) * 2009-05-25 2010-12-01 北京启明星辰信息技术股份有限公司 Method and device for automatically identifying web crawlers
CN101969445A (en) * 2010-11-03 2011-02-09 中国电信股份有限公司 Method and device for defensing DDoS (Distributed Denial of Service) and CC (Connections Flood) attacks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8433785B2 (en) * 2008-09-16 2013-04-30 Yahoo! Inc. System and method for detecting internet bots

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739427A (en) * 2008-11-10 2010-06-16 中国移动通信集团公司 Crawler capturing method and device thereof
CN101902438A (en) * 2009-05-25 2010-12-01 北京启明星辰信息技术股份有限公司 Method and device for automatically identifying web crawlers
CN101630327A (en) * 2009-08-14 2010-01-20 昆明理工大学 Design method of theme network crawler system
CN101969445A (en) * 2010-11-03 2011-02-09 中国电信股份有限公司 Method and device for defensing DDoS (Distributed Denial of Service) and CC (Connections Flood) attacks

Also Published As

Publication number Publication date
CN102790700A (en) 2012-11-21

Similar Documents

Publication Publication Date Title
CN102790700B (en) Method and device for recognizing webpage crawler
CN101902438B (en) Method and device for automatically identifying web crawlers
CN107241352B (en) Network security event classification and prediction method and system
CN106027559B (en) Large scale network scanning detection method based on network session statistical nature
CN103179132B (en) A kind of method and device detecting and defend CC attack
CN103929440B (en) Webpage tamper resistant device and its method based on web server cache match
CN108718298B (en) Malicious external connection flow detection method and device
CN109600363A (en) A kind of internet-of-things terminal network portrait and abnormal network access behavioral value method
CN105930727A (en) Web-based crawler identification algorithm
RU133954U1 (en) NETWORK SECURITY DEVICE
CN107135212A (en) Man-machine identifying device and method under a kind of Web environment of Behavior-based control difference
CN109257393A (en) XSS attack defence method and device based on machine learning
CN105959316A (en) Network security authentication system
CN110572397B (en) Flow-based webshell detection method
CN103457909A (en) Botnet detection method and device
CN103136476A (en) Mobile intelligent terminal malicious software analysis system
Alkawaz et al. A comprehensive survey on identification and analysis of phishing website based on machine learning methods
Zhu et al. An effective neural network phishing detection model based on optimal feature selection
CN110061960A (en) WAF rule self-study system
Antzoulis et al. IoT Security for Smart Home: Issues and Solutions
Sampat et al. Detection of phishing website using machine learning
CN112966194A (en) Method and system for checking two-dimensional code
Garcia-Lebron et al. A framework for characterizing the evolution of cyber attacker-victim relation graphs
CN108171053B (en) Rule discovery method and system
CN116015800A (en) Scanner identification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150610

Termination date: 20200519

CF01 Termination of patent right due to non-payment of annual fee