CN105577701B - The recognition methods of web crawlers and system - Google Patents

The recognition methods of web crawlers and system Download PDF

Info

Publication number
CN105577701B
CN105577701B CN201610134556.9A CN201610134556A CN105577701B CN 105577701 B CN105577701 B CN 105577701B CN 201610134556 A CN201610134556 A CN 201610134556A CN 105577701 B CN105577701 B CN 105577701B
Authority
CN
China
Prior art keywords
key value
server
client
encrypted
sent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610134556.9A
Other languages
Chinese (zh)
Other versions
CN105577701A (en
Inventor
崔广宇
李巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ctrip Computer Technology Shanghai Co Ltd
Original Assignee
Ctrip Computer Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ctrip Computer Technology Shanghai Co Ltd filed Critical Ctrip Computer Technology Shanghai Co Ltd
Priority to CN201610134556.9A priority Critical patent/CN105577701B/en
Publication of CN105577701A publication Critical patent/CN105577701A/en
Application granted granted Critical
Publication of CN105577701B publication Critical patent/CN105577701B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1491Countermeasures against malicious traffic using deception as countermeasure, e.g. honeypots, honeynets, decoys or entrapment

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a kind of recognition methods of web crawlers and system, which includes:The request of default URL link is sent to server by client;Encrypted first key value and the JS of generation decryption scripts are sent to client according to requesting to generate a first key value by server;Client generates the second key value according to encrypted first key value and JS decryption scripts, and sends it to server;Server judges whether first key value identical with the second key value, if not, it is determined that the corresponding user of client is web crawlers.Compared with prior art, the present invention can make web crawlers just be identified when accessing server for the first time, without carrying out the detection of a large amount of access frequencys, saved cpu resource, improved recognition efficiency.Meanwhile the storage without carrying out first key value to the request of different URL links, the memory space of server is greatly saved.

Description

The recognition methods of web crawlers and system
Technical field
The present invention relates to field of computer technology, more particularly to the recognition methods of a kind of web crawlers and system.
Background technology
With the development of internet, the reptile amount on internet increasingly increases at present, and reptile can forge user behavior, constantly Ground accesses server to obtain information, the speed of service of slow server can be dragged significantly in this way, especially when the link of request needs It largely to calculate when can just obtain, while also have the risk that information is obtained in batches.
The anti-reptile means of mainstream are carried out to the client ip address that high-frequency accesses certain according to access frequency at present Limitation.This way has the disadvantage that:Firstly, it is necessary to record a large amount of data, have to the memory space of server high It is required that;Secondly, there is certain hysteresis quality, need the reptile access regular hour that can just determine that other side is reptile, and this When reptile taken enough information;Finally, due to which calculation amount is frequent, even if no reptile is also required to constantly transport It calculates, pressure is very big caused by server.
Invention content
The technical problem to be solved by the present invention is to identify that web crawlers is deposited according to access frequency in the prior art to overcome It is occupying server storage, there is hysteresis quality and the frequent defect of calculation amount, a kind of web crawlers is provided and is accessed for the first time It can be identified and save the web spider identification method and system of server storage.
The present invention is to solve above-mentioned technical problem by following technical proposals:
A kind of recognition methods of web crawlers, feature is, includes the following steps:
S1, client asks default URL (Uniform Resource Locator, a uniform resource locator) link It asks and is sent to server;
S2, the server request to generate a first key value according to what this preset URL link, to the first key value into A JS (JavaScript) is generated during row is encrypted and decrypts script, and encrypted first key value and the JS are decrypted Script is sent to the client, wherein JS decryption scripts are for being decrypted the encrypted first key value;
S3, the client according to the encrypted first key value and the JS decryption script generate one second key value, and Second key value is sent to the server;
S4, the server judge whether first key value identical with second key value, if not, it is determined that the visitor The corresponding user in family end is web crawlers.
In the present solution, it can be normal users to send the corresponding user of client of request to server, or net Network reptile.
Step S3In, when the corresponding user of client is normal users, client is based on encrypted first key value It runs JS decryption scripts and generates the second key value, and second key value is identical as the first key value;When client is corresponding When user is web crawlers, client nonrecognition JS decrypts script, and the second key value that client generates at this time is given birth to server At first key value differ or the second key value is sky, therefore can prove that user end to server is sent default The request of URL link is to forge.
This programme is compared by the first key value for generating server with the second key value that client generates, and The identification that web crawlers is realized according to comparison result can make web crawlers just be known when accessing server for the first time It does not come out, compared with prior art, without carrying out the detection of a large amount of access frequencys, has saved CPU (central processing unit) resource, carried High recognition efficiency.
In the present solution, the request for the default URL link that server receives every time may be the same or different, still Server can all be based on default URL link when receiving default URL link every time and regenerate first key value, therefore service Device is not necessarily to carry out the storage of first key value for the request of different URL links, and the memory space of server is greatly saved.
Wherein, default URL link can sets itself as needed, such as/domestic/cas/ can be set to Key specifically in mvc websites, needs the controller for increasing a cas newly, and increase an action, name is key。
Preferably, by step S2Replace with step S2':
S2', the server request of URL link is preset according to this and generates a first key value every a period, JS decryption scripts are generated during current first key value is encrypted, and by encrypted first key value and are somebody's turn to do JS decryption scripts are sent to the client, wherein JS decryption scripts are for being decrypted the encrypted first key value.
In the present solution, in order to avoid causing to accidentally injure normal users due to network delay, server generates multiple first keys Value, step S4Middle server judges to whether there is first key value identical with second key value in multiple first key values. Wherein it is possible to which the period is arranged according to the concrete condition of network.For example, when the speed of network, it can be by the time Section is configured longer, such as 5 minutes;When the speed of network is slower, it can will be configured shorter the period, such as 3 minutes.
Preferably, step S4Further include:The server is judging that first key identical with second key value is not present When value, all requests which sends out are intercepted, or, sending a deceptive information to the client.
In the present solution, when server judges, there is no when first key value identical with second key value, to determine the visitor The corresponding user in family end is web crawlers, and all requests that at this moment can send out client intercept, or can be to visitor Family end sends deceptive information, with honeypot reptile.Wherein, deceptive information can sets itself as needed.
Preferably, step S3Further include:The request of one target URL link is sent to the server by the client;
Step S4Further include:The server judge exist first key value identical with second key value when, according to Request lookup and the target information corresponding to the target URL link of the target URL link, and the target information is sent to The client.
In the present solution, to obtain the target information on server, client needs to send corresponding target to server The request of URL link.It, just can will be with target URL link when server determines that the corresponding user of client is not web crawlers Corresponding target information is sent to client, effectively prevents the information leakage of server.
Preferably, first key value and the second key value are pseudorandom values.In the present solution, server can utilize at random Function generates first key value, and client can also utilize random function to generate the second key value, due to random in computer Function is generated according to certain algorithm simulation, as a result, determining, is visible, this foreseeable result occurs general Rate is 100%, therefore the random value caused by computer random function is not random, is pseudorandom values.That is, our First key value and the second key value in case are pseudorandom values.
The present invention also provides a kind of identifying system of web crawlers, feature is, including a client and a service Device;
The client is used to the request of a default URL link being sent to the server;
The server is used to request to generate a first key value according to what this preset URL link, to the first key value JS decryption scripts are generated during being encrypted, and encrypted first key value and JS decryption scripts are sent to The client, wherein JS decryption scripts are for being decrypted the encrypted first key value;
The client is used to generate one second key value according to the encrypted first key value and JS decryption scripts, and Second key value is sent to the server;
The server is for judging whether first key value identical with second key value, if not, it is determined that should The corresponding user of client is web crawlers.
Preferably, the request that the server is additionally operable to preset URL link according to this generates one first every a period Key value generates a JS decryption scripts during current first key value is encrypted, and close by encrypted first Key value and JS decryption scripts are sent to the client, wherein the JS decrypts script and is used for the encrypted first key Value is decrypted.
Preferably, the server is additionally operable to when judging that first key value identical with second key value is not present, it will All requests that the client is sent out are intercepted, or, sending a deceptive information to the client.
Preferably, the client is additionally operable to the request of a target URL link being sent to the server;
The server is additionally operable to when judging to have first key value identical with second key value, according to the target Request lookup and the target information corresponding to the target URL link of URL link, and the target information is sent to the client End.
Preferably, first key value and the second key value are pseudorandom values.
On the basis of common knowledge of the art, above-mentioned each optimum condition can be combined arbitrarily to get each preferable reality of the present invention Example.
The positive effect of the present invention is that:Compared with prior art, the present invention by server is generated first Key value is compared with the second key value that client generates, and the identification of web crawlers is realized according to comparison result, can be with So that web crawlers is just identified when accessing server for the first time, without carrying out the detection of a large amount of access frequencys, Cpu resource has been saved, recognition efficiency is improved.Meanwhile server is not necessarily to carry out first key for the request of different URL links The memory space of server is greatly saved in the storage of value.
Description of the drawings
Fig. 1 is the recognition methods flow chart of the web crawlers of the embodiment of the present invention 1.
Fig. 2 is the recognition methods flow chart of the web crawlers of the embodiment of the present invention 2.
Specific implementation mode
It is further illustrated the present invention below by the mode of embodiment, but does not therefore limit the present invention to the reality It applies among a range.
Embodiment 1
The present embodiment provides a kind of recognition methods of web crawlers, as shown in Figure 1, including the following steps:
The request of default URL link is sent to server by step 101, client;
Step 102, the server request to generate a first key value according to what this preset URL link, first close to this Key value generates JS decryption scripts during being encrypted, and encrypted first key value and JS decryption scripts are sent out It send to the client, wherein JS decryption scripts are for being decrypted the encrypted first key value;
Step 103, the client generate one second key according to the encrypted first key value and JS decryption scripts Value, and the request of second key value and target URL link is sent to the server;
Step 104, the server judge whether first key value identical with second key value, if so, holding Row step 105, if it is not, thening follow the steps 106;
Step 105, the server are searched and one corresponding to the target URL link according to the request of the target URL link Target information, and the target information is sent to the client, terminate flow;
Step 106 determines that the corresponding user of the client is web crawlers.
In the present embodiment, first key value and the second key value are the pseudorandom values generated using random function.
It is given a concrete illustration below to illustrate the recognition methods of web crawlers in the present embodiment.
Server receive customer end A transmission /domestic/cas/key request after, according to solicited message generate one Key, and key is repeatedly encrypted, while one section of JS of dynamic generation decrypts script to client, code sample is: Response.Write(encode(md5(DateTime.Now)));Include this corresponding solution wherein in the realization of encode Decryption method is used for client;
Customer end A generates key1 according to encrypted key and JS the decryption script received, while by key1 and/price Request is sent to server, and code sample is:Var evalCode=window.eval;
Server judges whether the key that key1 and/domestic/cas/key is issued is identical, and code sample is:var CheckResult=(Request [" key "]==md5 (DateTime.Now)), checkResult is testing result.Its In, key and key1 are the value randomly generated.
If judgement is identical, server sends it to customer end A according to the/price request corresponding price of lookup. If judge differ, prove customer end A send /domestic/cas/key request be forge, determine customer end A correspond to User be web crawlers.
In above-mentioned example, be compared with the key1 that customer end A generates by the key for generating server, and according to than Relatively result realizes the identification of web crawlers, and web crawlers can be made just to be identified when accessing server for the first time Come, compared with prior art, without carrying out the detection of a large amount of access frequencys, has saved cpu resource, improved recognition efficiency.Together When, server is not necessarily to carry out the storage of first key value for the request of different URL links, and the storage of server is greatly saved Space.
The present embodiment also provides a kind of identifying system of web crawlers, including a client and a server.
The client is used to the request of a default URL link being sent to the server;
The server is used to request to generate a first key value according to what this preset URL link, to the first key value JS decryption scripts are generated during being encrypted, and encrypted first key value and JS decryption scripts are sent to The client, wherein JS decryption scripts are for being decrypted the encrypted first key value;
The client is used to generate one second key value according to the encrypted first key value and JS decryption scripts, and The request of second key value and target URL link is sent to the server;
The server is for judging whether first key value identical with second key value, if not, it is determined that should The corresponding user of client is web crawlers, if so, being searched and the target URL link according to the request of the target URL link A corresponding target information, and the target information is sent to the client.
Embodiment 2
The present embodiment provides a kind of recognition methods of web crawlers, the area with the recognition methods of web crawlers in embodiment 1 It is not:As shown in Fig. 2, the step 102 in embodiment 1 is replaced with following step 202, by the step 106 in embodiment 1 Replace with following step 206:
The request that step 202, the server preset URL link according to this generated a first key value every 3 minutes, JS decryption scripts are generated during current first key value is encrypted, and by encrypted first key value and are somebody's turn to do JS decryption scripts are sent to the client, wherein JS decryption scripts are for being decrypted the encrypted first key value;
Step 206, determine the corresponding user of the client be web crawlers, and by the client send out it is all ask into Row intercepts.
In the present embodiment, in order to avoid causing to accidentally injure normal users due to network delay, it is close that server generates multiple first Key value, server needs to judge to whether there is in multiple first key values and second key value identical first in step 104 Key value.Such as when going to step 104 server symbiosis at 3 first key values, including key10, key11, key12, Server is respectively by key10, key11, key12 compared with key20, if being all different, it is determined that the corresponding user of client is Web crawlers.
In addition, after it is web crawlers to determine the corresponding user of client, the mode that may be used in step 206 is blocked It cuts, can also be cheated by the way of sending deceptive information to client.
The present embodiment also provides a kind of identifying system of web crawlers, and on the basis of embodiment 1, server is additionally operable to root The request that URL link is preset according to this generates a first key value every a period, adds to current first key value JS decryption scripts are generated in close process, and encrypted first key value and JS decryption scripts are sent to the client End;And judging that there is no all requests that when first key value identical with second key value, which is sent out It is intercepted, or, sending a deceptive information to the client.
Although specific embodiments of the present invention have been described above, it will be appreciated by those of skill in the art that these It is merely illustrative of, protection scope of the present invention is defined by the appended claims.Those skilled in the art is not carrying on the back Under the premise of from the principle and substance of the present invention, many changes and modifications may be made, but these are changed Protection scope of the present invention is each fallen with modification.

Claims (8)

1. a kind of recognition methods of web crawlers, which is characterized in that include the following steps:
S1, client the request of one default URL link is sent to server;
S2, the server request to generate a first key value according to what this preset URL link, add to the first key value JS decryption scripts are generated in close process, and encrypted first key value and JS decryption scripts are sent to the client End, wherein JS decryption scripts are for being decrypted the encrypted first key value;
S3, the client one second key value is generated according to the encrypted first key value and JS decryption scripts, and by this Two key values are sent to the server;
S4, the server judge whether first key value identical with second key value, if not, it is determined that the client Corresponding user is web crawlers;
Wherein, first key value and the second key value are the pseudorandom values generated using random function.
2. recognition methods as described in claim 1, which is characterized in that by step S2Replace with step S2':
S2', the server request of URL link is preset according to this and generates a first key value every a period, to current First key value generates JS decryption scripts during being encrypted, and encrypted first key value and the JS are decrypted Script is sent to the client, wherein JS decryption scripts are for being decrypted the encrypted first key value.
3. recognition methods as claimed in claim 1 or 2, which is characterized in that step S4Further include:The server is judging not deposit In first key value identical with second key value, all requests which sends out are intercepted, or, to the visitor Family end sends a deceptive information.
4. recognition methods as claimed in claim 3, which is characterized in that
Step S3Further include:The request of one target URL link is sent to the server by the client;
Step S4Further include:The server is when judging to have first key value identical with second key value, according to the target Request lookup and the target information corresponding to the target URL link of URL link, and the target information is sent to the client End.
5. a kind of identifying system of web crawlers, which is characterized in that including a client and a server;
The client is used to the request of a default URL link being sent to the server;
The server is used to request to generate a first key value according to what this preset URL link, is carried out to the first key value A JS is generated during encrypted and decrypts script, and encrypted first key value and JS decryption scripts are sent to the visitor Family end, wherein JS decryption scripts are for being decrypted the encrypted first key value;
The client is used to generate one second key value according to the encrypted first key value and JS decryption scripts, and should Second key value is sent to the server;
The server is for judging whether first key value identical with second key value, if not, it is determined that the client It is web crawlers to hold corresponding user;
Wherein, first key value and the second key value are the pseudorandom values generated using random function.
6. identifying system as claimed in claim 5, which is characterized in that the server is additionally operable to preset URL link according to this Request generates a first key value every a period, and a JS is generated during current first key value is encrypted Script is decrypted, and encrypted first key value and JS decryption scripts are sent to the client, wherein the JS decrypts foot This is for being decrypted the encrypted first key value.
7. such as identifying system described in claim 5 or 6, which is characterized in that the server is additionally operable to judging to be not present and this When the identical first key value of the second key value, all requests which sends out are intercepted, or, being sent out to the client Send a deceptive information.
8. identifying system as claimed in claim 7, which is characterized in that
The client is additionally operable to the request of a target URL link being sent to the server;
The server is additionally operable to when judging to have first key value identical with second key value, according to target URL chains Request lookup and the target information corresponding to the target URL link connect, and the target information is sent to the client.
CN201610134556.9A 2016-03-09 2016-03-09 The recognition methods of web crawlers and system Active CN105577701B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610134556.9A CN105577701B (en) 2016-03-09 2016-03-09 The recognition methods of web crawlers and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610134556.9A CN105577701B (en) 2016-03-09 2016-03-09 The recognition methods of web crawlers and system

Publications (2)

Publication Number Publication Date
CN105577701A CN105577701A (en) 2016-05-11
CN105577701B true CN105577701B (en) 2018-11-09

Family

ID=55887356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610134556.9A Active CN105577701B (en) 2016-03-09 2016-03-09 The recognition methods of web crawlers and system

Country Status (1)

Country Link
CN (1) CN105577701B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106534193A (en) * 2016-12-16 2017-03-22 携程计算机技术(上海)有限公司 Request data processing method and request data processing system
CN106960158A (en) * 2017-03-22 2017-07-18 福建中金在线信息科技有限公司 A kind of method and apparatus for preventing blog from being retrieved by web crawlers
CN107426148B (en) * 2017-03-30 2020-07-31 成都优易数据有限公司 Crawler-resisting method and system based on running environment feature recognition
CN107800684B (en) * 2017-09-20 2018-09-18 贵州白山云科技有限公司 A kind of low frequency reptile recognition methods and device
CN108429785A (en) * 2018-01-17 2018-08-21 广东智媒云图科技股份有限公司 A kind of generation method, reptile recognition methods and the device of reptile identification encryption string
CN108429757A (en) * 2018-03-26 2018-08-21 成都睿码科技有限责任公司 A kind of the counter of guarding website resource climbs method
CN108769037B (en) * 2018-06-04 2020-11-10 厦门集微科技有限公司 Data processing method and device, computer storage medium and terminal
CN110012023B (en) * 2019-04-15 2020-06-09 重庆天蓬网络有限公司 Poison-throwing type anti-climbing method, system, terminal and medium
CN115037526B (en) * 2022-05-19 2024-04-19 咪咕文化科技有限公司 Anticreeper method, device, equipment and computer storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279516A (en) * 2013-05-27 2013-09-04 百度在线网络技术(北京)有限公司 Web spider identification method
CN105306473A (en) * 2015-11-05 2016-02-03 北京奇虎科技有限公司 Method, client, server and system for preventing injection attacks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090094372A1 (en) * 2007-10-05 2009-04-09 Nyang Daehun Secret user session managing method and system under web environment, recording medium recorded program executing it

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279516A (en) * 2013-05-27 2013-09-04 百度在线网络技术(北京)有限公司 Web spider identification method
CN105306473A (en) * 2015-11-05 2016-02-03 北京奇虎科技有限公司 Method, client, server and system for preventing injection attacks

Also Published As

Publication number Publication date
CN105577701A (en) 2016-05-11

Similar Documents

Publication Publication Date Title
CN105577701B (en) The recognition methods of web crawlers and system
US11122067B2 (en) Methods for detecting and mitigating malicious network behavior and devices thereof
EP3424178B1 (en) Deterministic reproduction of client/server computer state or output sent to one or more client computers
CN106899680A (en) The burst treating method and apparatus of multi-tiling chain
CN102263828A (en) Load balanced sharing method and equipment
US20110161825A1 (en) Systems and methods for testing multiple page versions across multiple applications
US20190253394A1 (en) Automatic placeholder finder-filler
TW201824047A (en) Attack request determination method, apparatus and server
US9832221B1 (en) Systems and methods for monitoring the activity of devices within an organization by leveraging data generated by an existing security solution deployed within the organization
CN109190341B (en) Login management system and method
US9661004B1 (en) Systems and methods for using reputation information to evaluate the trustworthiness of files obtained via torrent transactions
CN107528865A (en) The method for down loading and system of file
US11816249B2 (en) System and method for dynamic management of private data
CN113901505A (en) Data sharing method and device, electronic equipment and storage medium
JP2019519849A (en) Method and device for preventing attacks on servers
CN111966967A (en) Copyright storage method and system based on block chain technology and CDN
CN109818906A (en) A kind of device-fingerprint information processing method, device and server
CN112181599B (en) Model training method, device and storage medium
US11418570B2 (en) Robust computing device identification framework
JP6517468B2 (en) INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING APPARATUS, MONITORING APPARATUS, MONITORING METHOD, AND PROGRAM
CN107276967B (en) Distributed system and login verification method thereof
CN109361712B (en) Information processing method and information processing device
CN114390105A (en) Enterprise user distribution method and device based on test
CN111917787A (en) Request detection method and device, electronic equipment and computer-readable storage medium
CN107995264A (en) A kind of CDN service identifying code distribution method and system based on message queue

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant