CN109714313A - The method of anti-crawler - Google Patents

The method of anti-crawler Download PDF

Info

Publication number
CN109714313A
CN109714313A CN201811381554.5A CN201811381554A CN109714313A CN 109714313 A CN109714313 A CN 109714313A CN 201811381554 A CN201811381554 A CN 201811381554A CN 109714313 A CN109714313 A CN 109714313A
Authority
CN
China
Prior art keywords
request
url
crawler
built
hiding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811381554.5A
Other languages
Chinese (zh)
Inventor
赵俊池
陈四强
刘天翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yuanjiang Shengbang (beijing) Network Security Polytron Technologies Inc
Original Assignee
Yuanjiang Shengbang (beijing) Network Security Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yuanjiang Shengbang (beijing) Network Security Polytron Technologies Inc filed Critical Yuanjiang Shengbang (beijing) Network Security Polytron Technologies Inc
Priority to CN201811381554.5A priority Critical patent/CN109714313A/en
Publication of CN109714313A publication Critical patent/CN109714313A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention proposes a kind of methods of anti-crawler, comprising: step S1 judges the source IP address of the request received whether in default blacklist, no to then follow the steps S2 if it is thening follow the steps S4;Step S2 judges the URL link of the request received, and then the source IP address of the request is added into the default blacklist if it is built-in hiding URL, executes step S4, thens follow the steps S3 if not built-in hiding URL;A hiding URL link is added in step S3, the request to receiving in the response body of return;Step S4 intercepts the request;Step S5 allows the request.The present invention judges whether request is crawler, fundamentally avoids wrong report, solves the problems, such as to intercept brought by crawler by access frequency in the prior art by the URL requested access to.

Description

The method of anti-crawler
Technical field
The present invention relates to technical field of network security, in particular to a kind of method of anti-crawler.
Background technique
The method that anti-crawler exists in the prior art mainly judges that the request of user is according to the access frequency of user It is no legal, it, will be by temporarily preventing user from accessing or allow user to input identifying code once access frequency is more than the threshold value of setting Mode prevent malice crawler.
In anti-crawler method in the prior art, if the threshold value setting by access frequency is excessively high, it will cause protections Not enough, malice crawler can continue to crawl site information in the threshold range degree;If threshold value setting is too low, may shadow It rings user and normally accesses website.
Therefore it needs to design new anti-crawler method, preferably to carry out crawler protected working.
Summary of the invention
The purpose of the present invention aims to solve at least one of described technological deficiency.
For this purpose, it is an object of the invention to propose a kind of method of anti-crawler.
To achieve the goals above, the embodiment of the present invention provides a kind of method of anti-crawler, includes the following steps:
Step S1 judges the source IP address of the request received whether in default blacklist, if it is thening follow the steps S4, it is no to then follow the steps S2;
Step S2 judges the URL link of the request received, if it is built-in hiding URL then by the request Source IP address is added into the default blacklist, executes step S4, thens follow the steps S3 if not built-in hiding URL;
Step S3, the request to receiving execute step S5, and a hiding URL chain is added in the response body of return It connects;
Step S4 intercepts the request;
Step S5 allows the request.
Further, the IP address of the default blacklist record is provided with expired time, if it exceeds setting it is expired when Between after, then be automatically deleted the record of the IP address in blacklist, allow the IP address that can normally access website again.
Further, in the step S2, the format of the built-in hiding URL are as follows:<a href="URL_LINK"></a >。
Further, the built-in hiding URL is sightless in a browser, therefore when user normally accesses website is not The URL link can be clicked;The URL link is requested if having, judges the request not instead of manual operation, crawler crawls net It stands.
Further, the built-in hiding URL is periodically updated.
The method of anti-crawler according to an embodiment of the present invention judges whether request is crawler by the URL requested access to, Wrong report is fundamentally avoided, solves the problems, such as to intercept brought by crawler by access frequency in the prior art.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect of the invention and advantage will become from the description of the embodiment in conjunction with the following figures Obviously and it is readily appreciated that, in which:
Fig. 1 is the flow chart according to the anti-crawler method of the embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described below in detail, the example of embodiment is shown in the accompanying drawings, wherein identical from beginning to end Or similar label indicates same or similar element or element with the same or similar functions.It is retouched below with reference to attached drawing The embodiment stated is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
As shown in Figure 1, the method for the anti-crawler of the embodiment of the present invention, includes the following steps:
Step S1 judges the source IP address of the request received whether in default blacklist, if it is thening follow the steps S4, it is no to then follow the steps S2.
In one embodiment of the invention, the IP address for presetting blacklist record is provided with expired time, if it exceeds After the expired time of setting, then it is automatically deleted the record of the IP address in blacklist, allows the IP address that can normally access net again It stands.
Step S2 judges the URL link of the request received, if it is built-in hiding URL then by the request Source IP address is added into default blacklist, executes step S4, thens follow the steps S3 if not built-in hiding URL.
In step s 2, built-in hiding URL is sightless URL link, format in browser are as follows: < ahref=" URL_LINK"></a>。
Specifically, the link of built-in hiding URL form, is sightless in a browser, therefore user normally accesses net It is that will not click the URL link when standing;And crawler can traverse website according to the link in<a>label when crawling website.Cause This, requests the URL link if having, can determine the request not instead of manual operation, crawler crawls website.
It should be noted that it should be noted that the name of built-in hiding URL, avoids special order as far as possible, in order to avoid made by crawler Person's discovery.
In one embodiment of the invention, built-in hiding URL is periodically updated.So as to avoid crawler author from sending out Crawler script is now updated afterwards causes such anti-crawler method failure.
Step S3, the request to receiving execute step S5, and a hiding URL chain is added in the response body of return It connects;
Step S4, interception request;
Step S5 allows to request.
Below by taking forum website as an example, technical solution of the present invention is introduced:
When forum Web pages interface to request when, following method can be executed:
S1: the source IP of the request received is judged: if in blacklist, directly progress step S4;If Not in blacklist, then step S2 is carried out;
S2: assuming that the current built-in hiding URL of forum website is /misc.php, then to the URL of the request received into Row judgement: if it is/misc.php, then blacklist is added in source IP, carries out step S4;If not/misc.php, then carry out Step S3;
S3: the request to receiving is added in the response body of return<a href="/misc.php"></a>;
S4: interception request;
S5: allow to request.
It is automatically deleted black after the expired time of setting further, it is also possible to record setting expired time in IP blacklist The record of the IP address in list, the IP address can normally access website again.For example, set automatic expired time as 2 hours, If this request is judged as crawler, in 2 hours, which can all block all requests that forum website is initiated It cuts;The IP can initiate normal request to forum website after 2 hours.
The method of anti-crawler according to an embodiment of the present invention judges whether request is crawler by the URL requested access to, Wrong report is fundamentally avoided, solves the problems, such as to intercept brought by crawler by access frequency in the prior art.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not Centainly refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any One or more embodiment or examples in can be combined in any suitable manner.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example Property, it is not considered as limiting the invention, those skilled in the art are not departing from the principle of the present invention and objective In the case where can make changes, modifications, alterations, and variations to the above described embodiments within the scope of the invention.The scope of the present invention By appended claims and its equivalent limit.

Claims (5)

1. a kind of method of anti-crawler, which comprises the steps of:
Step S1 judges the source IP address of the request received whether in default blacklist, no if it is thening follow the steps S4 Then follow the steps S2;
Step S2 judges the URL link of the request received, if it is built-in hiding URL then by the source IP of the request Address is added into the default blacklist, executes step S4, thens follow the steps S3 if not built-in hiding URL;
Step S3, the request to receiving execute step S5, and a hiding URL link is added in the response body of return;
Step S4 intercepts the request;
Step S5 allows the request.
2. the method for anti-crawler as described in claim 1, which is characterized in that the IP address setting of the default blacklist record There is expired time, if it exceeds being then automatically deleted the record of the IP address in blacklist after the expired time of setting, allowing the IP Address can normally access website again.
3. the method for anti-crawler as described in claim 1, which is characterized in that in the step S2, the built-in hiding URL Format are as follows:<a href="URL_LINK"></a>.
4. the method for anti-crawler as claimed in claim 3, which is characterized in that the built-in hiding URL is not in a browser It is visible, therefore when user normally accesses website is will not to click the URL link;The URL link is requested if having, judgement should Not instead of manual operation is requested, crawler crawls website.
5. the method for anti-crawler as described in claim 3 or 4, which is characterized in that the built-in hiding URL is periodically carried out more Newly.
CN201811381554.5A 2018-11-20 2018-11-20 The method of anti-crawler Pending CN109714313A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811381554.5A CN109714313A (en) 2018-11-20 2018-11-20 The method of anti-crawler

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811381554.5A CN109714313A (en) 2018-11-20 2018-11-20 The method of anti-crawler

Publications (1)

Publication Number Publication Date
CN109714313A true CN109714313A (en) 2019-05-03

Family

ID=66254954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811381554.5A Pending CN109714313A (en) 2018-11-20 2018-11-20 The method of anti-crawler

Country Status (1)

Country Link
CN (1) CN109714313A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110351248A (en) * 2019-06-14 2019-10-18 北京纵横无双科技有限公司 A kind of safety protecting method and device based on intellectual analysis and intelligent current limliting
CN111355728A (en) * 2020-02-27 2020-06-30 紫光云技术有限公司 Malicious crawler protection method
CN111614652A (en) * 2020-05-15 2020-09-01 广东科徕尼智能科技有限公司 Crawler identification interception method, equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1874303A (en) * 2006-03-04 2006-12-06 华为技术有限公司 Method for implementing black sheet
CN101188612A (en) * 2007-12-10 2008-05-28 中兴通讯股份有限公司 A blacklist real time management method and device
CN101227467A (en) * 2008-01-08 2008-07-23 中兴通讯股份有限公司 Apparatus and method for managing black list
CN102413105A (en) * 2010-09-25 2012-04-11 杭州华三通信技术有限公司 Method and device for preventing attack of challenge collapsar (CC)
CN103279516A (en) * 2013-05-27 2013-09-04 百度在线网络技术(北京)有限公司 Web spider identification method
CN103475637A (en) * 2013-04-24 2013-12-25 携程计算机技术(上海)有限公司 Network access control method and system based on IP access behaviors
CN103825900A (en) * 2014-02-28 2014-05-28 广州云宏信息科技有限公司 Website access method and device and filter form downloading and updating method and system
US9049117B1 (en) * 2009-10-21 2015-06-02 Narus, Inc. System and method for collecting and processing information of an internet user via IP-web correlation
CN105827619A (en) * 2016-04-25 2016-08-03 无锡中科富农物联科技有限公司 Crawler blocking method under large visitor volume condition

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1874303A (en) * 2006-03-04 2006-12-06 华为技术有限公司 Method for implementing black sheet
CN101188612A (en) * 2007-12-10 2008-05-28 中兴通讯股份有限公司 A blacklist real time management method and device
CN101227467A (en) * 2008-01-08 2008-07-23 中兴通讯股份有限公司 Apparatus and method for managing black list
US9049117B1 (en) * 2009-10-21 2015-06-02 Narus, Inc. System and method for collecting and processing information of an internet user via IP-web correlation
CN102413105A (en) * 2010-09-25 2012-04-11 杭州华三通信技术有限公司 Method and device for preventing attack of challenge collapsar (CC)
CN103475637A (en) * 2013-04-24 2013-12-25 携程计算机技术(上海)有限公司 Network access control method and system based on IP access behaviors
CN103279516A (en) * 2013-05-27 2013-09-04 百度在线网络技术(北京)有限公司 Web spider identification method
CN103825900A (en) * 2014-02-28 2014-05-28 广州云宏信息科技有限公司 Website access method and device and filter form downloading and updating method and system
CN105827619A (en) * 2016-04-25 2016-08-03 无锡中科富农物联科技有限公司 Crawler blocking method under large visitor volume condition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李敏等: "《网络安全技术与实例》", 31 August 2013, 上海:复旦大学出版社 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110351248A (en) * 2019-06-14 2019-10-18 北京纵横无双科技有限公司 A kind of safety protecting method and device based on intellectual analysis and intelligent current limliting
CN111355728A (en) * 2020-02-27 2020-06-30 紫光云技术有限公司 Malicious crawler protection method
CN111614652A (en) * 2020-05-15 2020-09-01 广东科徕尼智能科技有限公司 Crawler identification interception method, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN103607385B (en) Method and apparatus for security detection based on browser
KR100684986B1 (en) Online dangerous information screening system and method
CN109714313A (en) The method of anti-crawler
CN102546576B (en) A kind of web page horse hanging detects and means of defence, system and respective code extracting method
JP6408395B2 (en) Blacklist management method
CN104601540B (en) A kind of cross site scripting XSS attack defence method and Web server
CN105306465B (en) Web portal security accesses implementation method and device
CN109688097A (en) Website protection method, website protective device, website safeguard and storage medium
CN105827608B (en) Distributed API service abnormal user identifying and analyzing method and reverse proxy gateway
CN104219316A (en) Method and device for processing call request in distributed system
KR101907392B1 (en) Method and system for inspecting malicious link addree listed on email
CN104182685B (en) A kind of XSS defence methods and component for JAVA WEB applications
CN103023905B (en) A kind of equipment, method and system for detection of malicious link
CN104468546B (en) A kind of web information processing method and firewall device, system
CN104378255B (en) The detection method and device of web malicious users
JP2008116998A (en) Terminal device management system, data relay device, inter-network connection device, and method for quarantining terminal device
CN107911355A (en) A kind of website back door based on attack chain utilizes event recognition method
US20080235800A1 (en) Systems And Methods For Determining Anti-Virus Protection Status
KR100961149B1 (en) Method for detecting malicious site, method for gathering information of malicious site, apparatus, system, and recording medium having computer program recorded
JP5805585B2 (en) Relay server and proxy access method
CN110998577A (en) Safety diagnosis device and safety diagnosis method
CN107992745A (en) Kidnap countermeasure in a kind of interface based on Android platform
KR101372906B1 (en) Method and system to prevent malware code
CN102404331A (en) Method for judging whether website is maliciously tampered
KR101234066B1 (en) Web / email for distributing malicious code through the automatic control system and how to manage them

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190503