CN109474629A - A kind of honey jar design and implementation methods of anti-web crawlers - Google Patents

A kind of honey jar design and implementation methods of anti-web crawlers Download PDF

Info

Publication number
CN109474629A
CN109474629A CN201811617670.2A CN201811617670A CN109474629A CN 109474629 A CN109474629 A CN 109474629A CN 201811617670 A CN201811617670 A CN 201811617670A CN 109474629 A CN109474629 A CN 109474629A
Authority
CN
China
Prior art keywords
user
honey jar
web crawlers
implementation methods
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811617670.2A
Other languages
Chinese (zh)
Inventor
仝兴舜
谢坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhuyun Science & Technology Co Ltd
Original Assignee
Shenzhen Zhuyun Science & Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhuyun Science & Technology Co Ltd filed Critical Shenzhen Zhuyun Science & Technology Co Ltd
Priority to CN201811617670.2A priority Critical patent/CN109474629A/en
Publication of CN109474629A publication Critical patent/CN109474629A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1491Countermeasures against malicious traffic using deception as countermeasure, e.g. honeypots, honeynets, decoys or entrapment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/121Timestamp

Abstract

The present invention relates to a kind of honey jar design and implementation methods of anti-web crawlers, a kind of honey jar design and implementation methods of the anti-web crawlers, anti- crawler method based on hiding field, the unauthorized access of crawler can effectively be prevented, it avoids accidentally injuring common normal users access, can effectively mitigate server access pressure.

Description

A kind of honey jar design and implementation methods of anti-web crawlers
Technical field
The present invention relates to a kind of honey jar design and implementation methods, more specifically to a kind of honey jar of anti-web crawlers Design and implementation methods.
Background technique
With the development of internet, the crawler amount at present on internet increasingly increases, and crawler can forge user behavior, constantly Access network service, the access pressure for the network server that will increase in this way, especially when the request of crawler greater than network take When the maximum amount of access that business device can carry, normal network service, and the acquisition information that crawler can also be illegal can be worn down, it is right Individual privacy safety belt carrys out very big risk.
Prior art first is that counted to backstage to access, if single IP access is more than threshold value, sealed Lock.Access is counted based on backstage, if single IP access is more than threshold value, although the scheme effect blocked is also not Mistake, but in fact there are two defect, one is to be very easy to accidentally injure ordinary user, because user is implicitly present in high-frequency and uses together A possibility that one website service, will accidentally injure ordinary user if network threshold setting is bad;Another is exactly to replace IP The cost of address is small, and tens yuan it could even be possible to buy hundreds of thousands IP, or be easy to pretend by some network softwares The address ip escapes block.So comparing on the whole thanks to.
Prior art second is that based on user request the anti-crawler of Header strategy.Based on user's request It when the anti-crawler of Header, is accessed by browser due to being when normal users access website, so targeted website would generally be User-Agent field when receiving request in verification Header, if not the request for carrying normal User-Agent information Just it can not pass through request.Some website can also verify the Referer field in request Header for door chain.Such as Fruit encounters this kind of anti-crawler mechanism, directly Header can be added in the crawler oneself write, by the User- of browser Agent is copied in the Header of crawler;Additionally by the packet capturing analysis to request, Referer value is revised as targeted website Domain name can bypass well.
Summary of the invention
The technical problem to be solved in the present invention is that for the defects in the prior art, a kind of anti-web crawlers is provided Honey jar design and implementation methods solve the problems, such as the unauthorized access of the web crawlers of the non-search engine of current network.
The technical solution adopted by the present invention to solve the technical problems is: construct the honey jar design of anti-web crawlers a kind of with Implementation method, the anti-crawler method based on hiding field, can effectively prevent the unauthorized access of crawler.
In the honey jar design and implementation methods of anti-web crawlers of the present invention, the honey jar of the anti-web crawlers is set Meter and implementation method step are as follows:
S1. some designed the text fields in advance are added in the source code of Website page first, according to current time The new timestamp that stamp+specific 8 hours of time interval calculate, setting < inputtype=" hidden " name=" m_ts " Vaule=" 1495940330 ">,<input class=" _ 56gb_4u9z_5ruq " name=" l_tk " vaule=" 78a6 D9e35ec647c185ea2bcb7a77e8f2 " >, this is the unique value that background server calculates;
S2. the above different elements hide user by different modes:
M_ts:1495940330 is a Hidden field, l_tk:78a6d9e35ec647c185ea2bcb7a77e8f2 Element is moved to right into 50000 pixels and hides scroll bar.
S3. the request list that server is submitted according to network judges whether to be illegal crawler access;It is visited if it is illegal It asks, then the address of the ip of the user and user id is put into blacklist, forbid using the network service that family visits again us instead;If It is normal user's request, then the user is allowed to continue to access our network service.
S4. this similar gimmick cannot only be applied on the list of website, be also applied to connection, picture, text Part and it is some can be read by web crawlers, but above content that but can't see on a web browser of ordinary user.Visitor If having accessed " implicit " content on website, will trigger the server script close down this user.
The honey jar design and implementation methods for implementing a kind of anti-web crawlers of the invention, have the advantages that and are based on The anti-crawler thought of this Honeypot Techniques can effectively prevent the unauthorized access of crawler, avoid accidentally injuring common normal users visit It asks, can effectively mitigate server access pressure.
Detailed description of the invention
Present invention will be further explained below with reference to the attached drawings and examples, in attached drawing:
Fig. 1 is the honey jar design and implementation methods flow chart of anti-web crawlers of the invention
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
As shown in Figure 1, the honey jar design and implementation methods step of the anti-web crawlers are as follows:
S1. some designed the text fields in advance are added in the source code of Website page first, according to current time The new timestamp that stamp+specific 8 hours of time interval calculate, setting < input type=" hidden " name=" m_ts " Vaule=" 1495940330 ">,<input class=" _ 56gb_4u9z_5ruq " name=" l_tk " vaule=" 78a6 D9e35ec647c185ea2bcb7a77e8f2 " >, this is the unique value that background server calculates;
S2. the above different elements hide user by different modes:
M_ts:1495940330 is a Hidden field, l_tk:78a6d9e35ec647c185ea2bcb7a77e8f2 Element is moved to right into 50000 pixels and hides scroll bar.
S3. the request list that server is submitted according to network judges whether to be illegal crawler access;It is visited if it is illegal It asks, then the address of the ip of the user and user id is put into blacklist, forbid using the network service that family visits again us instead;If It is normal user's request, then the user is allowed to continue to access our network service.Because ordinary user access website when Time is cannot to fill in these fields, because it is not explicitly shown on browser;If this field is filled out, it is possible to be to climb Worm access, so that it may judge that this access is invalid, directly close down the ip and user Id of this user, and this illegal user System is kicked out of, and is pulled in the blacklist of website.
S4. this similar gimmick cannot only be applied on the list of website, be also applied to connection, picture, text Part and it is some can be read by web crawlers, but above content that but can't see on a web browser of ordinary user.Visitor If having accessed " implicit " content on website, will trigger the server script close down this user.
Further, the honey jar design and implementation methods of the anti-web crawlers cannot only be applied to website form page Face also can be applied in the protection of various multimedia resources: such as picture, video, file etc..
Further, the m_ts field is the timestamp of camouflage, is not current timestamp, is in the current time Stamp prolongs the timestamp of 8 hours generation backward;L_tk field is unique field that server generates, to identify the network request Whether by post-consumer, after by post-consumer, this network service cannot be requested access to same again.
Although being disclosed by above embodiments to the present invention, scope of protection of the present invention is not limited thereto, Under conditions of without departing from present inventive concept, deformation, the replacement etc. done to above each component will fall into right of the invention In claimed range.

Claims (3)

1. a kind of honey jar design and implementation methods of anti-web crawlers, which is characterized in that the honey jar of the anti-web crawlers designs With implementation method step are as follows:
S1. some designed the text fields in advance are added in the source code of Website page first, according to current time stamp+spy The new timestamp that fixed 8 hours of time interval calculate, setting < input type=" hidden " name=" m_ts " vaule =" 1495940330 ">,<input class=" _ 56gb_4u9z_5ruq " name=" l_tk " vaule=" 78a6d9e35 Ec647c185ea2bcb7a77e8f2 " >, this is the unique value that background server calculates;
S2. the above different elements hide user by different modes:
M_ts:1495940330, is a Hidden field, and l_tk:78a6d9e35ec647c185ea2bcb7a77e8f2 will be first Element moves to right 50000 pixels and hides scroll bar.
S3. the request list that server is submitted according to network judges whether to be illegal crawler access;If it is unauthorized access, The address of the ip of the user and user id are then put into blacklist, forbid using the network service that family visits again us instead;If it is Normal user's request, then allow the user to continue to access our network service.
2. the honey jar design and implementation methods of anti-web crawlers according to claim 1, which is characterized in that the anti-network The honey jar design and implementation methods of crawler can not only be applied to the website form page, moreover it is possible to be applied to the guarantor of various multimedia resources In shield.
3. the honey jar design and implementation methods of anti-web crawlers according to claim 1, which is characterized in that the m_ts word Section is the timestamp of camouflage, is not current timestamp, is to prolong the time of 8 hours generation backward in current timestamp Stamp;L_tk field is unique field that server generates, whether to identify the network request by post-consumer, once by post-consumer Afterwards, this network service cannot be requested access to same again.
CN201811617670.2A 2018-12-28 2018-12-28 A kind of honey jar design and implementation methods of anti-web crawlers Pending CN109474629A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811617670.2A CN109474629A (en) 2018-12-28 2018-12-28 A kind of honey jar design and implementation methods of anti-web crawlers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811617670.2A CN109474629A (en) 2018-12-28 2018-12-28 A kind of honey jar design and implementation methods of anti-web crawlers

Publications (1)

Publication Number Publication Date
CN109474629A true CN109474629A (en) 2019-03-15

Family

ID=65678033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811617670.2A Pending CN109474629A (en) 2018-12-28 2018-12-28 A kind of honey jar design and implementation methods of anti-web crawlers

Country Status (1)

Country Link
CN (1) CN109474629A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101190261B1 (en) * 2010-12-21 2012-10-12 한국인터넷진흥원 Hybrid interaction client honeypot system and its operation method
CN103559235A (en) * 2013-10-24 2014-02-05 中国科学院信息工程研究所 Online social network malicious webpage detection and identification method
CN104967628A (en) * 2015-07-16 2015-10-07 浙江大学 Deceiving method of protecting web application safety

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101190261B1 (en) * 2010-12-21 2012-10-12 한국인터넷진흥원 Hybrid interaction client honeypot system and its operation method
CN103559235A (en) * 2013-10-24 2014-02-05 中国科学院信息工程研究所 Online social network malicious webpage detection and identification method
CN104967628A (en) * 2015-07-16 2015-10-07 浙江大学 Deceiving method of protecting web application safety

Similar Documents

Publication Publication Date Title
Hong et al. How you get shot in the back: A systematical study about cryptojacking in the real world
Gupta et al. Hunting for DOM-Based XSS vulnerabilities in mobile cloud-based online social network
Shar et al. Automated removal of cross site scripting vulnerabilities in web applications
US8515918B2 (en) Method, system and computer program product for comparing or measuring information content in at least one data stream
US9712560B2 (en) Web page and web browser protection against malicious injections
Borders et al. Quantifying information leaks in outbound web traffic
CN101356535B (en) A method and apparatus for detecting and preventing unsafe behavior of javascript programs
Son et al. The Postman Always Rings Twice: Attacking and Defending postMessage in HTML5 Websites.
US9584543B2 (en) Method and system for web integrity validator
US8812959B2 (en) Method and system for delivering digital content
Tang et al. Fortifying web-based applications automatically
CN102045319B (en) Method and device for detecting SQL (Structured Query Language) injection attack
CN105359156B (en) Unauthorized access detecting system and unauthorized access detection method
Tran et al. Tracking the trackers: Fast and scalable dynamic analysis of web content for privacy violations
Schmucker Web tracking
US11503072B2 (en) Identifying, reporting and mitigating unauthorized use of web code
CN102957705B (en) A kind of method and device of webpage tamper protection
Mitropoulos et al. How to train your browser: Preventing XSS attacks using contextual script fingerprints
Kaur et al. Browser fingerprinting as user tracking technology
Kim et al. Inferring browser activity and status through remote monitoring of storage usage
CN104079531A (en) Hotlinking detection method, system and device
Bugliesi et al. Automatic and robust client-side protection for cookie-based sessions
Li et al. Mash-IF: Practical information-flow control within client-side mashups
Franken et al. Exposing cookie policy flaws through an extensive evaluation of browsers and their extensions
Shahriar et al. Proclick: a framework for testing clickjacking attacks in web applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190315

RJ01 Rejection of invention patent application after publication