CN109714313A - The method of anti-crawler - Google Patents
The method of anti-crawler Download PDFInfo
- Publication number
- CN109714313A CN109714313A CN201811381554.5A CN201811381554A CN109714313A CN 109714313 A CN109714313 A CN 109714313A CN 201811381554 A CN201811381554 A CN 201811381554A CN 109714313 A CN109714313 A CN 109714313A
- Authority
- CN
- China
- Prior art keywords
- request
- url
- crawler
- built
- hiding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention proposes a kind of methods of anti-crawler, comprising: step S1 judges the source IP address of the request received whether in default blacklist, no to then follow the steps S2 if it is thening follow the steps S4;Step S2 judges the URL link of the request received, and then the source IP address of the request is added into the default blacklist if it is built-in hiding URL, executes step S4, thens follow the steps S3 if not built-in hiding URL;A hiding URL link is added in step S3, the request to receiving in the response body of return;Step S4 intercepts the request;Step S5 allows the request.The present invention judges whether request is crawler, fundamentally avoids wrong report, solves the problems, such as to intercept brought by crawler by access frequency in the prior art by the URL requested access to.
Description
Technical field
The present invention relates to technical field of network security, in particular to a kind of method of anti-crawler.
Background technique
The method that anti-crawler exists in the prior art mainly judges that the request of user is according to the access frequency of user
It is no legal, it, will be by temporarily preventing user from accessing or allow user to input identifying code once access frequency is more than the threshold value of setting
Mode prevent malice crawler.
In anti-crawler method in the prior art, if the threshold value setting by access frequency is excessively high, it will cause protections
Not enough, malice crawler can continue to crawl site information in the threshold range degree;If threshold value setting is too low, may shadow
It rings user and normally accesses website.
Therefore it needs to design new anti-crawler method, preferably to carry out crawler protected working.
Summary of the invention
The purpose of the present invention aims to solve at least one of described technological deficiency.
For this purpose, it is an object of the invention to propose a kind of method of anti-crawler.
To achieve the goals above, the embodiment of the present invention provides a kind of method of anti-crawler, includes the following steps:
Step S1 judges the source IP address of the request received whether in default blacklist, if it is thening follow the steps
S4, it is no to then follow the steps S2;
Step S2 judges the URL link of the request received, if it is built-in hiding URL then by the request
Source IP address is added into the default blacklist, executes step S4, thens follow the steps S3 if not built-in hiding URL;
Step S3, the request to receiving execute step S5, and a hiding URL chain is added in the response body of return
It connects;
Step S4 intercepts the request;
Step S5 allows the request.
Further, the IP address of the default blacklist record is provided with expired time, if it exceeds setting it is expired when
Between after, then be automatically deleted the record of the IP address in blacklist, allow the IP address that can normally access website again.
Further, in the step S2, the format of the built-in hiding URL are as follows:<a href="URL_LINK"></a
>。
Further, the built-in hiding URL is sightless in a browser, therefore when user normally accesses website is not
The URL link can be clicked;The URL link is requested if having, judges the request not instead of manual operation, crawler crawls net
It stands.
Further, the built-in hiding URL is periodically updated.
The method of anti-crawler according to an embodiment of the present invention judges whether request is crawler by the URL requested access to,
Wrong report is fundamentally avoided, solves the problems, such as to intercept brought by crawler by access frequency in the prior art.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description
Obviously, or practice through the invention is recognized.
Detailed description of the invention
Above-mentioned and/or additional aspect of the invention and advantage will become from the description of the embodiment in conjunction with the following figures
Obviously and it is readily appreciated that, in which:
Fig. 1 is the flow chart according to the anti-crawler method of the embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described below in detail, the example of embodiment is shown in the accompanying drawings, wherein identical from beginning to end
Or similar label indicates same or similar element or element with the same or similar functions.It is retouched below with reference to attached drawing
The embodiment stated is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
As shown in Figure 1, the method for the anti-crawler of the embodiment of the present invention, includes the following steps:
Step S1 judges the source IP address of the request received whether in default blacklist, if it is thening follow the steps
S4, it is no to then follow the steps S2.
In one embodiment of the invention, the IP address for presetting blacklist record is provided with expired time, if it exceeds
After the expired time of setting, then it is automatically deleted the record of the IP address in blacklist, allows the IP address that can normally access net again
It stands.
Step S2 judges the URL link of the request received, if it is built-in hiding URL then by the request
Source IP address is added into default blacklist, executes step S4, thens follow the steps S3 if not built-in hiding URL.
In step s 2, built-in hiding URL is sightless URL link, format in browser are as follows: < ahref="
URL_LINK"></a>。
Specifically, the link of built-in hiding URL form, is sightless in a browser, therefore user normally accesses net
It is that will not click the URL link when standing;And crawler can traverse website according to the link in<a>label when crawling website.Cause
This, requests the URL link if having, can determine the request not instead of manual operation, crawler crawls website.
It should be noted that it should be noted that the name of built-in hiding URL, avoids special order as far as possible, in order to avoid made by crawler
Person's discovery.
In one embodiment of the invention, built-in hiding URL is periodically updated.So as to avoid crawler author from sending out
Crawler script is now updated afterwards causes such anti-crawler method failure.
Step S3, the request to receiving execute step S5, and a hiding URL chain is added in the response body of return
It connects;
Step S4, interception request;
Step S5 allows to request.
Below by taking forum website as an example, technical solution of the present invention is introduced:
When forum Web pages interface to request when, following method can be executed:
S1: the source IP of the request received is judged: if in blacklist, directly progress step S4;If
Not in blacklist, then step S2 is carried out;
S2: assuming that the current built-in hiding URL of forum website is /misc.php, then to the URL of the request received into
Row judgement: if it is/misc.php, then blacklist is added in source IP, carries out step S4;If not/misc.php, then carry out
Step S3;
S3: the request to receiving is added in the response body of return<a href="/misc.php"></a>;
S4: interception request;
S5: allow to request.
It is automatically deleted black after the expired time of setting further, it is also possible to record setting expired time in IP blacklist
The record of the IP address in list, the IP address can normally access website again.For example, set automatic expired time as 2 hours,
If this request is judged as crawler, in 2 hours, which can all block all requests that forum website is initiated
It cuts;The IP can initiate normal request to forum website after 2 hours.
The method of anti-crawler according to an embodiment of the present invention judges whether request is crawler by the URL requested access to,
Wrong report is fundamentally avoided, solves the problems, such as to intercept brought by crawler by access frequency in the prior art.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not
Centainly refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any
One or more embodiment or examples in can be combined in any suitable manner.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example
Property, it is not considered as limiting the invention, those skilled in the art are not departing from the principle of the present invention and objective
In the case where can make changes, modifications, alterations, and variations to the above described embodiments within the scope of the invention.The scope of the present invention
By appended claims and its equivalent limit.
Claims (5)
1. a kind of method of anti-crawler, which comprises the steps of:
Step S1 judges the source IP address of the request received whether in default blacklist, no if it is thening follow the steps S4
Then follow the steps S2;
Step S2 judges the URL link of the request received, if it is built-in hiding URL then by the source IP of the request
Address is added into the default blacklist, executes step S4, thens follow the steps S3 if not built-in hiding URL;
Step S3, the request to receiving execute step S5, and a hiding URL link is added in the response body of return;
Step S4 intercepts the request;
Step S5 allows the request.
2. the method for anti-crawler as described in claim 1, which is characterized in that the IP address setting of the default blacklist record
There is expired time, if it exceeds being then automatically deleted the record of the IP address in blacklist after the expired time of setting, allowing the IP
Address can normally access website again.
3. the method for anti-crawler as described in claim 1, which is characterized in that in the step S2, the built-in hiding URL
Format are as follows:<a href="URL_LINK"></a>.
4. the method for anti-crawler as claimed in claim 3, which is characterized in that the built-in hiding URL is not in a browser
It is visible, therefore when user normally accesses website is will not to click the URL link;The URL link is requested if having, judgement should
Not instead of manual operation is requested, crawler crawls website.
5. the method for anti-crawler as described in claim 3 or 4, which is characterized in that the built-in hiding URL is periodically carried out more
Newly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811381554.5A CN109714313A (en) | 2018-11-20 | 2018-11-20 | The method of anti-crawler |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811381554.5A CN109714313A (en) | 2018-11-20 | 2018-11-20 | The method of anti-crawler |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109714313A true CN109714313A (en) | 2019-05-03 |
Family
ID=66254954
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811381554.5A Pending CN109714313A (en) | 2018-11-20 | 2018-11-20 | The method of anti-crawler |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109714313A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110351248A (en) * | 2019-06-14 | 2019-10-18 | 北京纵横无双科技有限公司 | A kind of safety protecting method and device based on intellectual analysis and intelligent current limliting |
CN111355728A (en) * | 2020-02-27 | 2020-06-30 | 紫光云技术有限公司 | Malicious crawler protection method |
CN111614652A (en) * | 2020-05-15 | 2020-09-01 | 广东科徕尼智能科技有限公司 | Crawler identification interception method, equipment and storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1874303A (en) * | 2006-03-04 | 2006-12-06 | 华为技术有限公司 | Method for implementing black sheet |
CN101188612A (en) * | 2007-12-10 | 2008-05-28 | 中兴通讯股份有限公司 | A blacklist real time management method and device |
CN101227467A (en) * | 2008-01-08 | 2008-07-23 | 中兴通讯股份有限公司 | Apparatus and method for managing black list |
CN102413105A (en) * | 2010-09-25 | 2012-04-11 | 杭州华三通信技术有限公司 | Method and device for preventing attack of challenge collapsar (CC) |
CN103279516A (en) * | 2013-05-27 | 2013-09-04 | 百度在线网络技术(北京)有限公司 | Web spider identification method |
CN103475637A (en) * | 2013-04-24 | 2013-12-25 | 携程计算机技术(上海)有限公司 | Network access control method and system based on IP access behaviors |
CN103825900A (en) * | 2014-02-28 | 2014-05-28 | 广州云宏信息科技有限公司 | Website access method and device and filter form downloading and updating method and system |
US9049117B1 (en) * | 2009-10-21 | 2015-06-02 | Narus, Inc. | System and method for collecting and processing information of an internet user via IP-web correlation |
CN105827619A (en) * | 2016-04-25 | 2016-08-03 | 无锡中科富农物联科技有限公司 | Crawler blocking method under large visitor volume condition |
-
2018
- 2018-11-20 CN CN201811381554.5A patent/CN109714313A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1874303A (en) * | 2006-03-04 | 2006-12-06 | 华为技术有限公司 | Method for implementing black sheet |
CN101188612A (en) * | 2007-12-10 | 2008-05-28 | 中兴通讯股份有限公司 | A blacklist real time management method and device |
CN101227467A (en) * | 2008-01-08 | 2008-07-23 | 中兴通讯股份有限公司 | Apparatus and method for managing black list |
US9049117B1 (en) * | 2009-10-21 | 2015-06-02 | Narus, Inc. | System and method for collecting and processing information of an internet user via IP-web correlation |
CN102413105A (en) * | 2010-09-25 | 2012-04-11 | 杭州华三通信技术有限公司 | Method and device for preventing attack of challenge collapsar (CC) |
CN103475637A (en) * | 2013-04-24 | 2013-12-25 | 携程计算机技术(上海)有限公司 | Network access control method and system based on IP access behaviors |
CN103279516A (en) * | 2013-05-27 | 2013-09-04 | 百度在线网络技术(北京)有限公司 | Web spider identification method |
CN103825900A (en) * | 2014-02-28 | 2014-05-28 | 广州云宏信息科技有限公司 | Website access method and device and filter form downloading and updating method and system |
CN105827619A (en) * | 2016-04-25 | 2016-08-03 | 无锡中科富农物联科技有限公司 | Crawler blocking method under large visitor volume condition |
Non-Patent Citations (1)
Title |
---|
李敏等: "《网络安全技术与实例》", 31 August 2013, 上海:复旦大学出版社 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110351248A (en) * | 2019-06-14 | 2019-10-18 | 北京纵横无双科技有限公司 | A kind of safety protecting method and device based on intellectual analysis and intelligent current limliting |
CN111355728A (en) * | 2020-02-27 | 2020-06-30 | 紫光云技术有限公司 | Malicious crawler protection method |
CN111614652A (en) * | 2020-05-15 | 2020-09-01 | 广东科徕尼智能科技有限公司 | Crawler identification interception method, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103607385B (en) | Method and apparatus for security detection based on browser | |
KR100684986B1 (en) | Online dangerous information screening system and method | |
CN109714313A (en) | The method of anti-crawler | |
CN102546576B (en) | A kind of web page horse hanging detects and means of defence, system and respective code extracting method | |
JP6408395B2 (en) | Blacklist management method | |
CN104601540B (en) | A kind of cross site scripting XSS attack defence method and Web server | |
CN105306465B (en) | Web portal security accesses implementation method and device | |
CN109688097A (en) | Website protection method, website protective device, website safeguard and storage medium | |
CN105827608B (en) | Distributed API service abnormal user identifying and analyzing method and reverse proxy gateway | |
CN104219316A (en) | Method and device for processing call request in distributed system | |
KR101907392B1 (en) | Method and system for inspecting malicious link addree listed on email | |
CN104182685B (en) | A kind of XSS defence methods and component for JAVA WEB applications | |
CN103023905B (en) | A kind of equipment, method and system for detection of malicious link | |
CN104468546B (en) | A kind of web information processing method and firewall device, system | |
CN104378255B (en) | The detection method and device of web malicious users | |
JP2008116998A (en) | Terminal device management system, data relay device, inter-network connection device, and method for quarantining terminal device | |
CN107911355A (en) | A kind of website back door based on attack chain utilizes event recognition method | |
US20080235800A1 (en) | Systems And Methods For Determining Anti-Virus Protection Status | |
KR100961149B1 (en) | Method for detecting malicious site, method for gathering information of malicious site, apparatus, system, and recording medium having computer program recorded | |
JP5805585B2 (en) | Relay server and proxy access method | |
CN110998577A (en) | Safety diagnosis device and safety diagnosis method | |
CN107992745A (en) | Kidnap countermeasure in a kind of interface based on Android platform | |
KR101372906B1 (en) | Method and system to prevent malware code | |
CN102404331A (en) | Method for judging whether website is maliciously tampered | |
KR101234066B1 (en) | Web / email for distributing malicious code through the automatic control system and how to manage them |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190503 |