CN109474629A - A kind of honey jar design and implementation methods of anti-web crawlers - Google Patents
A kind of honey jar design and implementation methods of anti-web crawlers Download PDFInfo
- Publication number
- CN109474629A CN109474629A CN201811617670.2A CN201811617670A CN109474629A CN 109474629 A CN109474629 A CN 109474629A CN 201811617670 A CN201811617670 A CN 201811617670A CN 109474629 A CN109474629 A CN 109474629A
- Authority
- CN
- China
- Prior art keywords
- user
- honey jar
- web crawlers
- implementation methods
- access
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1491—Countermeasures against malicious traffic using deception as countermeasure, e.g. honeypots, honeynets, decoys or entrapment
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2463/00—Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
- H04L2463/121—Timestamp
Abstract
The present invention relates to a kind of honey jar design and implementation methods of anti-web crawlers, a kind of honey jar design and implementation methods of the anti-web crawlers, anti- crawler method based on hiding field, the unauthorized access of crawler can effectively be prevented, it avoids accidentally injuring common normal users access, can effectively mitigate server access pressure.
Description
Technical field
The present invention relates to a kind of honey jar design and implementation methods, more specifically to a kind of honey jar of anti-web crawlers
Design and implementation methods.
Background technique
With the development of internet, the crawler amount at present on internet increasingly increases, and crawler can forge user behavior, constantly
Access network service, the access pressure for the network server that will increase in this way, especially when the request of crawler greater than network take
When the maximum amount of access that business device can carry, normal network service, and the acquisition information that crawler can also be illegal can be worn down, it is right
Individual privacy safety belt carrys out very big risk.
Prior art first is that counted to backstage to access, if single IP access is more than threshold value, sealed
Lock.Access is counted based on backstage, if single IP access is more than threshold value, although the scheme effect blocked is also not
Mistake, but in fact there are two defect, one is to be very easy to accidentally injure ordinary user, because user is implicitly present in high-frequency and uses together
A possibility that one website service, will accidentally injure ordinary user if network threshold setting is bad;Another is exactly to replace IP
The cost of address is small, and tens yuan it could even be possible to buy hundreds of thousands IP, or be easy to pretend by some network softwares
The address ip escapes block.So comparing on the whole thanks to.
Prior art second is that based on user request the anti-crawler of Header strategy.Based on user's request
It when the anti-crawler of Header, is accessed by browser due to being when normal users access website, so targeted website would generally be
User-Agent field when receiving request in verification Header, if not the request for carrying normal User-Agent information
Just it can not pass through request.Some website can also verify the Referer field in request Header for door chain.Such as
Fruit encounters this kind of anti-crawler mechanism, directly Header can be added in the crawler oneself write, by the User- of browser
Agent is copied in the Header of crawler;Additionally by the packet capturing analysis to request, Referer value is revised as targeted website
Domain name can bypass well.
Summary of the invention
The technical problem to be solved in the present invention is that for the defects in the prior art, a kind of anti-web crawlers is provided
Honey jar design and implementation methods solve the problems, such as the unauthorized access of the web crawlers of the non-search engine of current network.
The technical solution adopted by the present invention to solve the technical problems is: construct the honey jar design of anti-web crawlers a kind of with
Implementation method, the anti-crawler method based on hiding field, can effectively prevent the unauthorized access of crawler.
In the honey jar design and implementation methods of anti-web crawlers of the present invention, the honey jar of the anti-web crawlers is set
Meter and implementation method step are as follows:
S1. some designed the text fields in advance are added in the source code of Website page first, according to current time
The new timestamp that stamp+specific 8 hours of time interval calculate, setting < inputtype=" hidden " name=" m_ts "
Vaule=" 1495940330 ">,<input class=" _ 56gb_4u9z_5ruq " name=" l_tk " vaule=" 78a6
D9e35ec647c185ea2bcb7a77e8f2 " >, this is the unique value that background server calculates;
S2. the above different elements hide user by different modes:
M_ts:1495940330 is a Hidden field, l_tk:78a6d9e35ec647c185ea2bcb7a77e8f2
Element is moved to right into 50000 pixels and hides scroll bar.
S3. the request list that server is submitted according to network judges whether to be illegal crawler access;It is visited if it is illegal
It asks, then the address of the ip of the user and user id is put into blacklist, forbid using the network service that family visits again us instead;If
It is normal user's request, then the user is allowed to continue to access our network service.
S4. this similar gimmick cannot only be applied on the list of website, be also applied to connection, picture, text
Part and it is some can be read by web crawlers, but above content that but can't see on a web browser of ordinary user.Visitor
If having accessed " implicit " content on website, will trigger the server script close down this user.
The honey jar design and implementation methods for implementing a kind of anti-web crawlers of the invention, have the advantages that and are based on
The anti-crawler thought of this Honeypot Techniques can effectively prevent the unauthorized access of crawler, avoid accidentally injuring common normal users visit
It asks, can effectively mitigate server access pressure.
Detailed description of the invention
Present invention will be further explained below with reference to the attached drawings and examples, in attached drawing:
Fig. 1 is the honey jar design and implementation methods flow chart of anti-web crawlers of the invention
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
As shown in Figure 1, the honey jar design and implementation methods step of the anti-web crawlers are as follows:
S1. some designed the text fields in advance are added in the source code of Website page first, according to current time
The new timestamp that stamp+specific 8 hours of time interval calculate, setting < input type=" hidden " name=" m_ts "
Vaule=" 1495940330 ">,<input class=" _ 56gb_4u9z_5ruq " name=" l_tk " vaule=" 78a6
D9e35ec647c185ea2bcb7a77e8f2 " >, this is the unique value that background server calculates;
S2. the above different elements hide user by different modes:
M_ts:1495940330 is a Hidden field, l_tk:78a6d9e35ec647c185ea2bcb7a77e8f2
Element is moved to right into 50000 pixels and hides scroll bar.
S3. the request list that server is submitted according to network judges whether to be illegal crawler access;It is visited if it is illegal
It asks, then the address of the ip of the user and user id is put into blacklist, forbid using the network service that family visits again us instead;If
It is normal user's request, then the user is allowed to continue to access our network service.Because ordinary user access website when
Time is cannot to fill in these fields, because it is not explicitly shown on browser;If this field is filled out, it is possible to be to climb
Worm access, so that it may judge that this access is invalid, directly close down the ip and user Id of this user, and this illegal user
System is kicked out of, and is pulled in the blacklist of website.
S4. this similar gimmick cannot only be applied on the list of website, be also applied to connection, picture, text
Part and it is some can be read by web crawlers, but above content that but can't see on a web browser of ordinary user.Visitor
If having accessed " implicit " content on website, will trigger the server script close down this user.
Further, the honey jar design and implementation methods of the anti-web crawlers cannot only be applied to website form page
Face also can be applied in the protection of various multimedia resources: such as picture, video, file etc..
Further, the m_ts field is the timestamp of camouflage, is not current timestamp, is in the current time
Stamp prolongs the timestamp of 8 hours generation backward;L_tk field is unique field that server generates, to identify the network request
Whether by post-consumer, after by post-consumer, this network service cannot be requested access to same again.
Although being disclosed by above embodiments to the present invention, scope of protection of the present invention is not limited thereto,
Under conditions of without departing from present inventive concept, deformation, the replacement etc. done to above each component will fall into right of the invention
In claimed range.
Claims (3)
1. a kind of honey jar design and implementation methods of anti-web crawlers, which is characterized in that the honey jar of the anti-web crawlers designs
With implementation method step are as follows:
S1. some designed the text fields in advance are added in the source code of Website page first, according to current time stamp+spy
The new timestamp that fixed 8 hours of time interval calculate, setting < input type=" hidden " name=" m_ts " vaule
=" 1495940330 ">,<input class=" _ 56gb_4u9z_5ruq " name=" l_tk " vaule=" 78a6d9e35
Ec647c185ea2bcb7a77e8f2 " >, this is the unique value that background server calculates;
S2. the above different elements hide user by different modes:
M_ts:1495940330, is a Hidden field, and l_tk:78a6d9e35ec647c185ea2bcb7a77e8f2 will be first
Element moves to right 50000 pixels and hides scroll bar.
S3. the request list that server is submitted according to network judges whether to be illegal crawler access;If it is unauthorized access,
The address of the ip of the user and user id are then put into blacklist, forbid using the network service that family visits again us instead;If it is
Normal user's request, then allow the user to continue to access our network service.
2. the honey jar design and implementation methods of anti-web crawlers according to claim 1, which is characterized in that the anti-network
The honey jar design and implementation methods of crawler can not only be applied to the website form page, moreover it is possible to be applied to the guarantor of various multimedia resources
In shield.
3. the honey jar design and implementation methods of anti-web crawlers according to claim 1, which is characterized in that the m_ts word
Section is the timestamp of camouflage, is not current timestamp, is to prolong the time of 8 hours generation backward in current timestamp
Stamp;L_tk field is unique field that server generates, whether to identify the network request by post-consumer, once by post-consumer
Afterwards, this network service cannot be requested access to same again.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811617670.2A CN109474629A (en) | 2018-12-28 | 2018-12-28 | A kind of honey jar design and implementation methods of anti-web crawlers |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811617670.2A CN109474629A (en) | 2018-12-28 | 2018-12-28 | A kind of honey jar design and implementation methods of anti-web crawlers |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109474629A true CN109474629A (en) | 2019-03-15 |
Family
ID=65678033
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811617670.2A Pending CN109474629A (en) | 2018-12-28 | 2018-12-28 | A kind of honey jar design and implementation methods of anti-web crawlers |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109474629A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101190261B1 (en) * | 2010-12-21 | 2012-10-12 | 한국인터넷진흥원 | Hybrid interaction client honeypot system and its operation method |
CN103559235A (en) * | 2013-10-24 | 2014-02-05 | 中国科学院信息工程研究所 | Online social network malicious webpage detection and identification method |
CN104967628A (en) * | 2015-07-16 | 2015-10-07 | 浙江大学 | Deceiving method of protecting web application safety |
-
2018
- 2018-12-28 CN CN201811617670.2A patent/CN109474629A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101190261B1 (en) * | 2010-12-21 | 2012-10-12 | 한국인터넷진흥원 | Hybrid interaction client honeypot system and its operation method |
CN103559235A (en) * | 2013-10-24 | 2014-02-05 | 中国科学院信息工程研究所 | Online social network malicious webpage detection and identification method |
CN104967628A (en) * | 2015-07-16 | 2015-10-07 | 浙江大学 | Deceiving method of protecting web application safety |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hong et al. | How you get shot in the back: A systematical study about cryptojacking in the real world | |
Gupta et al. | Hunting for DOM-Based XSS vulnerabilities in mobile cloud-based online social network | |
Shar et al. | Automated removal of cross site scripting vulnerabilities in web applications | |
US8515918B2 (en) | Method, system and computer program product for comparing or measuring information content in at least one data stream | |
US9712560B2 (en) | Web page and web browser protection against malicious injections | |
Borders et al. | Quantifying information leaks in outbound web traffic | |
CN101356535B (en) | A method and apparatus for detecting and preventing unsafe behavior of javascript programs | |
Son et al. | The Postman Always Rings Twice: Attacking and Defending postMessage in HTML5 Websites. | |
US9584543B2 (en) | Method and system for web integrity validator | |
US8812959B2 (en) | Method and system for delivering digital content | |
Tang et al. | Fortifying web-based applications automatically | |
CN102045319B (en) | Method and device for detecting SQL (Structured Query Language) injection attack | |
CN105359156B (en) | Unauthorized access detecting system and unauthorized access detection method | |
Tran et al. | Tracking the trackers: Fast and scalable dynamic analysis of web content for privacy violations | |
Schmucker | Web tracking | |
US11503072B2 (en) | Identifying, reporting and mitigating unauthorized use of web code | |
CN102957705B (en) | A kind of method and device of webpage tamper protection | |
Mitropoulos et al. | How to train your browser: Preventing XSS attacks using contextual script fingerprints | |
Kaur et al. | Browser fingerprinting as user tracking technology | |
Kim et al. | Inferring browser activity and status through remote monitoring of storage usage | |
CN104079531A (en) | Hotlinking detection method, system and device | |
Bugliesi et al. | Automatic and robust client-side protection for cookie-based sessions | |
Li et al. | Mash-IF: Practical information-flow control within client-side mashups | |
Franken et al. | Exposing cookie policy flaws through an extensive evaluation of browsers and their extensions | |
Shahriar et al. | Proclick: a framework for testing clickjacking attacks in web applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190315 |
|
RJ01 | Rejection of invention patent application after publication |