CN110020512A - A kind of method, apparatus, equipment and the storage medium of anti-crawler - Google Patents

A kind of method, apparatus, equipment and the storage medium of anti-crawler Download PDF

Info

Publication number
CN110020512A
CN110020512A CN201910294378.XA CN201910294378A CN110020512A CN 110020512 A CN110020512 A CN 110020512A CN 201910294378 A CN201910294378 A CN 201910294378A CN 110020512 A CN110020512 A CN 110020512A
Authority
CN
China
Prior art keywords
user
access
abnormal
crawler
parameter set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910294378.XA
Other languages
Chinese (zh)
Inventor
孟凡杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Tianpeng Network Co Ltd
Original Assignee
Chongqing Tianpeng Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Tianpeng Network Co Ltd filed Critical Chongqing Tianpeng Network Co Ltd
Priority to CN201910294378.XA priority Critical patent/CN110020512A/en
Publication of CN110020512A publication Critical patent/CN110020512A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses method, apparatus, equipment and the storage mediums of a kind of anti-crawler;By queried access log to determine whether abnormal access;After abnormal access occurs in determination, access log is analyzed according to the preset rules in preset rules library, to determine abnormal user parameter set;Blacklist is added in the abnormal user parameter set;The abnormal user is carried out to close operation according to abnormal user parameter set described in the blacklist;Automatic identification crawler user is achieved the purpose that and has closed crawler user, has played the role of reducing since the access of a large amount of crawlers causes server resource to overrun, site resource is protected, to improve user experience and improve website operational efficiency.

Description

A kind of method, apparatus, equipment and the storage medium of anti-crawler
Technical field
The present invention relates to field of information security technology, and in particular to a kind of method, apparatus of anti-crawler, equipment and storage are situated between Matter.
Background technique
Web crawlers is a kind of computer program or script executed automatically, and web crawlers is according to ten thousand dimension of setting rule access Web site simultaneously grabs site page information, and usually high-volume whole station grabs completely or element branches grab completely for this crawl, And highdensity multiple crawl will do it to the page of frequent updating.Web crawlers may cause station to a large amount of access of website The server resource of point consumes excessively to influence the access of normal users or a large amount of site information of web crawlers crawl may Site information is caused to be used by wrongful business.Current common anti-crawler method is accessed by station maintenance personnel personal monitoring The access behavior of the IP address of website is to determine whether be crawler, the method for above-mentioned anti-crawler needs station maintenance personnel manually to supervise Survey causes working efficiency low, thus a kind of method for needing automatic identification and closing crawler.
Summary of the invention
For the defects in the prior art, the present invention provides a kind of anti-crawler method, apparatus, equipment and storage medium, uses In solving the problems, such as automatic identification and close crawler.
On the one hand, the present invention provides a kind of anti-crawler methods, comprising: by queried access log to determine whether Abnormal access;After abnormal access occurs in determination, access log is analyzed according to the preset rules in preset rules library, it is different with determination Common family parameter set;Blacklist is added in the abnormal user parameter set;According to abnormal user parameter described in the blacklist Collection carries out the abnormal user to close operation.
Preferably, described to determine whether that access exception includes: according to trigger condition queried access by access log Log;Abnormal visit is determined whether according to the first threshold of the first preset condition in the first preset time section and access log It asks.
Preferably, the preset rules in the preset rules library of the anti-crawler method include at least one following rule: Access behavior whether more than the second preset condition in the second preset time section second threshold;In access request head whether include The keyword that preset field is concentrated;Whether include the keyword in request header white list in access request head;Access request IP Whether location is in IP address white list.
Preferably, the preset field collection includes at least one following keyword:
Java, Python, C++, C#, PHP, Perl, PHP and GO.
Preferably, before the addition blacklist by the abnormal user further include: the abnormal user parameter set is added Proof listing is used as and verifies user again again;Receive the User Page request of verifying again;Send the manual authentication page to It is described to verify user again;Receive returning the result for the manual authentication page for verifying user again;According to described It returns the result and determines whether the verifying user again blacklist is added.
Preferably, the abnormal user parameter set includes at least IP address.
Preferably, field includes search engine request head file in the request header white list.
On the other hand, the present invention provides a kind of anti-crawler devices, comprising: enquiry module, analysis module, black list module And filtering module;The enquiry module is configured as determining whether abnormal access by queried access log;The analysis Module is configured as analyzing access log according to the preset rules in preset rules library, to determine abnormal user parameter set;It is described black Name single module is configured as abnormal user parameter set blacklist is added;The filtering module is configured as according to described black The abnormal user parameter set carries out the abnormal user to close operation in list.
On the other hand, the present invention provides a kind of anti-crawler equipment, comprising: at least one processor and can be stored in place The memory of the computer instruction run on reason device;Wherein, the processor is above-mentioned for running the computer instruction realization Anti- crawler method.
On the other hand, the present invention provides a kind of storage medium, it is stored with computer instruction in the storage medium, it is special Sign is that the computer instruction realizes above-mentioned anti-crawler method when being executed by processor.
The beneficial effects of the present invention are embodied in: by queried access log to determine whether abnormal access;According to pre- If the preset rules in rule base analyze access log, to determine abnormal user parameter set;The abnormal user parameter set is added Enter blacklist;The abnormal user is carried out to close operation according to abnormal user parameter set described in the blacklist;Reach Automatic identification crawler user and the purpose for closing crawler user play reduction since the access of a large amount of crawlers leads to server resource The effect to overrun, protects site resource, to improve user experience and improve website operational efficiency.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art are briefly described.In all the appended drawings, similar element Or part is generally identified by similar appended drawing reference.In attached drawing, each element or part might not be drawn according to actual ratio.
Fig. 1 is a kind of flow diagram for anti-crawler method that the embodiment of the present invention one provides;
It is determined whether in a kind of anti-crawler method that Fig. 2 provides for the embodiment of the present invention one by queried access log Abnormal access flow diagram;
Verify process signal in a kind of anti-crawler method that Fig. 3 provides for the embodiment of the present invention one again to abnormal user Figure;
Fig. 4 is a kind of module diagram of anti-crawler method provided by Embodiment 2 of the present invention.
Specific embodiment
It is described in detail below in conjunction with embodiment of the attached drawing to technical solution of the present invention.Following embodiment is only used for Clearly illustrate technical solution of the present invention, therefore be only used as example, and cannot be used as a limitation and limit protection model of the invention It encloses.
It should be noted that unless otherwise indicated, technical term or scientific term used in this application should be this hair The ordinary meaning that bright one of ordinary skill in the art are understood.
As shown in Figure 1, the embodiment of the present invention one provides a kind of anti-crawler method, comprising:
S110, by queried access log to determine whether abnormal access;
S120 analyzes access log according to the preset rules in preset rules library, with true after abnormal access occurs in determination Determine abnormal user parameter set;
Blacklist is added in the abnormal user parameter set by S130;
S140 carries out the abnormal user to close operation according to abnormal user parameter set described in the blacklist.
In a specific embodiment, the shell script queried access log of automatic running is found in access log Appearance abnormal access, such as the amount of access of specific webpage are excessive, then the access log are further analyzed, according to default rule The then preset rules in library are then recognized for example, the number that the same IP accessed specific webpage within past 10 minutes is more than 100 times It is abnormal user for the user, determines that the information for including IP address and user name of the user etc. about the user is the user Abnormal user parameter set, blacklist is added in the customer parameter collection of the abnormal user;When the abnormal user initiates the page again When request, the customer parameter collection of comparison abnormal user refuses to respond this if in blacklist whether in black bright list first The page request of user achievees the purpose that close the abnormal user.By the above technological means, automatic identification crawler use is reached Family and the purpose for closing crawler user, playing reduces since a large amount of crawlers access the work for causing server resource to overrun With protecting site resource, user experience and improve website operational efficiency to improve.
As shown in Fig. 2, in a preferred embodiment of anti-crawler method provided by the invention, it is described by accessing day Will determines whether that access exception includes:
S111, according to trigger condition queried access log;
S112 determines whether according to the first threshold of the first preset condition in the first preset time section and access log Existing abnormal access.
In a specific embodiment, trigger condition is timer expiry, and automatic script program timing is initiated to visit Ask the inquiry of log;First preset time section is 10 minutes, and the first preset condition is to visit the unit time of the same IP address Ask the number of a page or website, first threshold is 100 times.That is, timing queried access log, if in 10 minutes Same IP address access specific webpage number is more than then to think abnormal access occurred 100 times.
In another specific embodiment, trigger condition is that server load has been more than preset threshold, triggers script journey Sequence queried access log;Shell script find website or specific webpage in past 30 minutes our station or specific webpage by not Same user's access times are more than 3000 times, then it is assumed that abnormal access occur.
It will be understood by those skilled in the art that the above specific embodiment is not the specific limit to technical solution of the present invention Fixed, those skilled in the art can be arranged trigger condition according to the actual situation, the first preset time section, the first preset condition and First preset threshold come realize automatic queried access log and identify abnormal access purpose.
In one preferred embodiment of anti-crawler method provided by the invention, the described of the anti-crawler method is preset The preset rules of rule base include at least one following rule:
Access behavior whether more than the second preset condition in the second preset time section second threshold;In access request head The keyword whether concentrated comprising preset field;Whether include the keyword in request header white list in access request head;Access Whether IP address requesting is in IP address white list.Wherein the request header is HTTP request head.
In a specific embodiment,
Access log is analyzed by single preset rules, for example, the specific webpage quilt in known past 30 minutes Access times have been more than 1000 times, then it is assumed that abnormal access occur.By preset rules, whether access behavior is pre- more than second If the second threshold of the second preset condition in time interval, that is, the same IP address access specific webpage time in past 10 minutes Number is more than 100 times, and to analyzing for all access requests, extraction meets the users such as the IP address of access request of this rule ginseng Manifold.In another feasible embodiment, whether the access behavior is more than the second preset condition in the second preset time section Second threshold may be in 10 minutes same user be more than 100 times to the same operation of the same page.
In another particular embodiment of the invention, access log is analyzed by the combination of more than two rules, For example, it has been more than 1000 times that specific webpage, which is accessed number, in known past 30 minutes, then it is assumed that abnormal access occur.It is logical Cross whether access behavior more than the second threshold rule of the second preset condition in the second preset time section judges multiple IP address Or amount of access of the user in 10 minutes has been more than 100 times;By judging whether contain in white name in access request head Single keyword, such as the keyword of Usual Search Engines, the access of clearance Usual Search Engines;By with judging access request IP Whether location is in IP address white list, the access of the IP address initiation in clearance IP address white list;In IP address white list IP address is known IP address trusty, for example, the IP address of the IP address of the branch of our company, trust user Deng;Other IP address or user then obtain its customer parameter collection, and blacklist is added as abnormal user parameter set.
In another specific embodiment, access log is analyzed by the combination of more than two rules, For example, it has been more than 1000 times that specific webpage, which is accessed number, in known past 30 minutes, then it is assumed that abnormal access occur.It is logical It crosses in access request header and whether judges the request header of multiple IP address or user's request comprising the field rule that preset field is concentrated In User Agent field contain the article of specific crawler, for example, Java, Python, PHP etc., then tentatively suspect this IP Address or user are abnormal user, further, by judging whether contain single keyword in white name in access request head, Such as the keyword of Usual Search Engines, the access of clearance Usual Search Engines;By judging in access request whether is IP address In IP address white list, the access of the IP address initiation in clearance IP address white list;IP address in IP address white list For known IP address trusty, for example, the IP address of the branch of our company, the IP address for trusting user etc.;Other IP address or user then obtain its customer parameter collection, as abnormal user parameter set be added blacklist.
It can be seen that by above-mentioned specific embodiment and automatically analyze access log using single preset rules, with determination Abnormal user parameter set can rapidly and efficiently filter out abnormal user, and carry out closing filter operation to abnormal user.Using The mode of multiple preset rules combinations analyzes access log, to determine abnormal user parameter set, can avoid as far as possible to just The influence at common family, improves the accuracy rate for closing screen crawler.It will be understood by those skilled in the art that can be according to reality Situation is combined the preset rules in rule base to reach fast automatic identification and close crawler, while is avoided again to normal The operation of user generates interference.
Preferably, the preset field collection includes at least one following keyword: Java, Python, C++, C#, PHP, Perl, PHP and GO.The User Agent field of HTTP request head includes the above keyword a period of time, because it is not probably The request that browser issues, then the user for probably issuing this request is crawler user.
As shown in figure 3, in a preferred embodiment of anti-crawler method provided by the invention, it is described by the exception Before blacklist is added in user further include:
Proof listing again is added as verifying user again in the abnormal user parameter set by S131;
S132 receives the User Page request of verifying again;
S133 sends the manual authentication page to described and verifies user again;
S134 receives returning the result for the manual authentication page for verifying user again;
S135 determines whether the verifying user again blacklist is added according to described returning the result
In a specific embodiment, by the way that proof listing again is added as again in the abnormal user parameter set Secondary verifying user, when response verifies the page request of user again, Xiang Suoshu sends the hand comprising manual verification methods again The dynamic verifying page, for example, the page comprising verifying code verification method, the page comprising sliding block verification method, including picture recognition Verifying page of verification method etc. verifies the page;Returning the result for the manual authentication page for verifying user again is received, such as Fruit is to be proved to be successful then to test the user's removal of the verifying again proof listing again again by described if authentication failed Blacklist is added in the customer parameter collection for demonstrate,proving user, carries out closing filter operation to the user by blacklist.
Preferably, the abnormal user parameter set includes at least IP address.The abnormal user parameter set at least wraps IP address is included, can also include: that user name, User ID and user's phone number etc. are available for identifying the letter of user identity Breath.
Preferably, field includes search engine request head file in the request header white list.Such as HTTP request head User Agent field includes one of field in following white list: googlebot, mediapartners-google, baiduspider、sogou spider、sogou web sosospider、360spider、yahoo、msn、msnbot、 Sohu, yodaoBot, twiceler, ia_archiver, iaarchiver, slurp, bot then think that the access comes from for access Search engine, the access are handled not as malice crawler.
Second embodiment of the present invention provides a kind of anti-crawler devices characterized by comprising enquiry module M110, analysis Module M120, black list module M130 and filtering module M140;
The enquiry module M110 is configured as through queried access log to determine whether abnormal access;Described point Analysis module M120 is configured as after abnormal access occurs in determination, analyzes access log according to the preset rules in preset rules library, To determine abnormal user parameter set;The black list module M130 is configured as the abnormal user parameter set black name is added It is single;The filtering module M140 be configured as according to abnormal user parameter set described in the blacklist to the abnormal user into Row closes operation.
In a specific embodiment, the shell script queried access log of enquiry module M110 automatic running, hair Show and occur abnormal access in access log, such as the amount of access of specific webpage is excessive, then is further divided the access log Analysis, analysis module M120 according to the preset rules in preset rules library, for example, the same IP accessed within past 10 minutes it is specific The number of the page is more than 100 times, then it is assumed that the user is abnormal user, determines to include IP address and user name of the user etc. Information about the user is the abnormal user parameter set of the user, and black list module M130 is by the customer parameter of the abnormal user Blacklist is added in collection;When the abnormal user initiates page request again, filtering module M140 compares the use of abnormal user first Whether in black bright list, the page request that the user is refused to respond if in blacklist reaches closes the exception to family parameter set The purpose of user.By the above technological means, automatic identification crawler user is achieved the purpose that and has closed crawler user, played It reduces since a large amount of crawlers access the effect for causing server resource to overrun, site resource is protected, to improve use Experience and improve website operational efficiency in family.
In one preferred embodiment of anti-crawler device provided by the invention, the enquiry module includes:
Trigger module is configured as according to trigger condition queried access log;
Judgment module is configured as according to the first threshold of the first preset condition in the first preset time section and access day Will determines whether abnormal access.
In a specific embodiment, the trigger condition of trigger module is timer expiry, and automatic script program is fixed Inquiry of the Shi Faqi to access log;Judgment module is 10 minutes according to the first preset time section, and the first preset condition is same The unit time of one IP address accesses the number of a page or website, and first threshold is 100 times.That is, timing is looked into Access log is ask, thinks exception occurred if IP address same in 10 minutes access specific webpage number is more than 100 times Access.
In another specific embodiment, it has been more than preset threshold that the trigger condition of trigger module, which is server load, Trigger shell script queried access log;Judgment module finds website or specific webpage at past 30 minutes according to shell script Interior our station or specific webpage are more than 300 times by different user's access times, rule judgement there is abnormal access.
It will be understood by those skilled in the art that the above specific embodiment is not the specific limit to technical solution of the present invention Fixed, those skilled in the art can be arranged trigger condition according to the actual situation, the first preset time section, the first preset condition and First preset threshold come realize automatic queried access log and identify abnormal access purpose.
In one preferred embodiment of anti-crawler device provided by the invention, the described of the anti-crawler method is preset The preset rules of rule base include at least one following rule:
Access behavior whether more than the second preset condition in the second preset time section second threshold;In access request head The keyword whether concentrated comprising preset field;Whether include the keyword in request header white list in access request head;Access Whether IP address requesting is in IP address white list.
In a specific embodiment, analysis module analyzes access log by single preset rules, example Such as, it has been more than 1000 times that specific webpage, which is accessed number, in known past 30 minutes, then it is assumed that abnormal access occurs.Pass through Preset rules, access behavior whether more than the second preset condition in the second preset time section second threshold, that is, 10 points of the past The same IP address access specific webpage number is more than 100 times in clock, and to analyzing for all access requests, extraction meets this The customer parameters collection such as the IP address of access request of rule.In another feasible embodiment, the access behavior whether be more than In second preset time section the second threshold of the second preset condition may be in 10 minutes same user to the same page Same operation is more than 100 times.
In another particular embodiment of the invention, analysis module by the combinations of more than two rules to access log into Row analysis, for example, it has been more than 1000 times that specific webpage, which is accessed number, in known past 30 minutes, then it is assumed that exception occur Access.It is multiple by the way that whether access behavior judges more than the second threshold rule of the second preset condition in the second preset time section The amount of access of IP address or user in 10 minutes has been more than 100 times;It is white by judging whether to contain in access request head Single keyword in name, such as the keyword of Usual Search Engines, the access of clearance Usual Search Engines;By judging that access is asked Ask IP address whether in IP address white list, the access that the IP address in clearance IP address white list is initiated;The white name of IP address IP address in list is known IP address trusty, for example, the IP of the IP address of the branch of our company, trust user Address etc.;Other IP address or user then obtain its customer parameter collection, and blacklist is added as abnormal user parameter set.
In another specific embodiment, analysis module by the combinations of more than two rules to access log into Row analysis, for example, it has been more than 1000 times that specific webpage, which is accessed number, in known past 30 minutes, then it is assumed that exception occur Access.By in access request head, whether the field rule comprising preset field concentration judges what multiple IP address or user were requested User Agent field in request header contains the article of specific crawler, for example, Java, Python, PHP etc., then preliminary to cherish Doubting this IP address or user is abnormal user, further, by judging whether contain the list in white name in access request head Keyword, such as the keyword of Usual Search Engines, the access of clearance Usual Search Engines;By judging that access asks the IP address to be The no access that in IP address white list, the IP address in clearance IP address white list is initiated;IP in IP address white list Location is known IP address trusty, for example, the IP address of the branch of our company, the IP address for trusting user etc.;Its His IP address or user then obtain its customer parameter collection, and blacklist is added as abnormal user parameter set.
It can be seen that by above-mentioned specific embodiment and automatically analyze access log using single preset rules, with determination Abnormal user parameter set can rapidly and efficiently filter out abnormal user, and carry out closing filter operation to abnormal user.Using The mode of multiple preset rules combinations analyzes access log, to determine abnormal user parameter set, can avoid as far as possible to just The influence at common family, improves the accuracy rate for closing screen crawler.
Preferably, the preset field collection includes at least one following keyword: Java, Python, C++, C#, PHP, Perl, PHP and GO.The User Agent field of HTTP request head includes the above keyword a period of time, because it is not probably The request that browser issues, then the user for probably issuing this request is crawler user.
In one preferred embodiment of anti-crawler device provided by the invention, the anti-crawler device further include:
Proof listing module is configured as abnormal user parameter set proof listing again is added as verifying again User;
Receiving module is configured as receiving the User Page request of verifying again;
Authentication module is configured as the transmission manual authentication page to described and verifies user again;
The receiving module is additionally configured to receive the return knot of the manual authentication page for verifying user again Fruit;
Authentication module is stated, is configured as being determined whether the verifying user again black name is added according to described returning the result It is single.
In a specific embodiment, proof listing module by testing abnormal user parameter set addition again Card list as verifying user again, and for receiving module when response verifies the page request of user again, authentication module is to described Again send include manual verification methods the manual authentication page, for example, comprising verify code verification method the page, include sliding block The page of verification method, verifying page comprising picture recognition verification method etc. verify the page;The authentication module receives again Returning the result for the manual authentication page of user is verified, then removes the verifying user again again if it is being proved to be successful The customer parameter collection for verifying user again is added blacklist, passes through blacklist by secondary proof listing if authentication failed The user is carried out to close filter operation.
Preferably, the abnormal user parameter set includes at least IP address.The abnormal user parameter set at least wraps IP address is included, can also include: that user name, User ID and user's phone number etc. are available for identifying the letter of user identity Breath.
Preferably, field includes search engine request head file in the request header white list.Such as HTTP request head User Agent field includes one of field in following white list: googlebot, mediapartners-google, baiduspider、sogou spider、sogou web sosospider、360spider、yahoo、msn、msnbot、 Sohu, yodaoBot, twiceler, ia_archiver, iaarchiver, slurp, bot then think that the access comes from for access Search engine, the access are handled not as malice crawler.
The embodiment of the present invention three provides a kind of anti-crawler equipment, comprising: at least one processor and can be stored in place The memory of the computer instruction run on reason device;Wherein, the processor is above-mentioned for running the computer instruction realization Anti- crawler method.Wherein, the processor can be a centralized processor, or multiple processor clusters, It can be multiple distributed processors;The processor includes central processing unit (CPU), microcontroller, programmable logic control Any one in device (PLC) processed, programming device, other processing equipments or combinations thereof.As non-limiting example, processor It may include specific integrated circuit (ASIC), system on chip (SOC), logic gate array, programmable gate array (for example, existing Field programmable gate array (FPGA)), other hardware elements or combinations thereof.Processor is used to execute the calculating of storage on a memory Machine readable instruction.The memory may include non-transient computer readable storage medium.As non-limiting example, storage Device includes volatile storage (for example, random access memory (RAM)), non-volatile memories (for example, read-only memory (ROM)) Or combinations thereof.As non-limiting example, memory may include dynamic ram (DRAM), electric programmable read-only memory (EPROM), hard disk drive, solid state drive, flash drive, disk, removable media (storage card, thumb actuator, light Disk etc.) or other storage equipment.It will be understood by those skilled in the art that the specific embodiment in above-described embodiment and embodiment It can be applied to the present embodiment.
The embodiment of the present invention four provides a kind of storage medium, and computer instruction is stored in the storage medium, described Above-mentioned anti-crawler method is realized when computer instruction is executed by processor.The memory may include non-transient computer Readable storage medium storing program for executing.As non-limiting example, memory may include dynamic ram (DRAM), electric programmable read-only memory (EPROM), hard disk drive, solid state drive, flash drive, disk, removable media (storage card, thumb actuator, light Disk etc.), remote storage device, cloud storage equipment or other storage equipment.It will be understood by those skilled in the art that above-mentioned implementation Specific embodiment in example and embodiment can be applied to the present embodiment.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme should all cover within the scope of the claims and the description of the invention.

Claims (10)

1. a kind of anti-crawler method characterized by comprising
By queried access log to determine whether abnormal access;
After determining and abnormal access occur, access log is analyzed according to the preset rules in preset rules library, to determine abnormal use Family parameter set;
Blacklist is added in the abnormal user parameter set;
The abnormal user is carried out to close operation according to abnormal user parameter set described in the blacklist.
2. anti-crawler method according to claim 1, which is characterized in that described to determine whether to visit by access log Ask that exception includes:
According to trigger condition queried access log;
Abnormal visit is determined whether according to the first threshold of the first preset condition in the first preset time section and access log It asks.
3. anti-crawler method according to claim 1, which is characterized in that the preset rules library of the anti-crawler method Preset rules include at least one following rule:
Access behavior whether more than the second preset condition in the second preset time section second threshold;
The keyword whether concentrated comprising preset field in access request head;
Whether include the keyword in request header white list in access request head;
Whether access request IP address is in IP address white list.
4. anti-crawler method according to claim 3, which is characterized in that the preset field collection include following keyword extremely It is one of few:
Java, Python, C++, C#, PHP, Perl, PHP and GO.
5. anti-crawler method according to claim 1, which is characterized in that before the addition blacklist by the abnormal user Further include:
Proof listing again is added as verifying user again in the abnormal user parameter set;
Receive the User Page request of verifying again;
It sends the manual authentication page and verifies user again to described;
Receive returning the result for the manual authentication page for verifying user again;
Determined whether the verifying user again blacklist is added according to described returning the result.
6. anti-crawler method according to claim 1, which is characterized in that the abnormal user parameter set includes at least user IP address.
7. anti-crawler method according to claim 1, which is characterized in that field includes search in the request header white list Engine requests head file.
8. a kind of anti-crawler device characterized by comprising enquiry module, analysis module, black list module and filtering module;
The enquiry module is configured as determining whether abnormal access by queried access log;
The analysis module is configured as analyzing access log according to the preset rules in preset rules library, to determine that abnormal user is joined Manifold;
The black list module is configured as abnormal user parameter set blacklist is added;
The filtering module is configured as carrying out the abnormal user according to abnormal user parameter set described in the blacklist Close operation.
9. a kind of anti-crawler equipment characterized by comprising what at least one processor and capable of storing was run on a processor The memory of computer instruction;
Wherein, the processor realizes method described in claim 1-7 for running the computer instruction.
10. a kind of storage medium, computer instruction is stored in the storage medium, which is characterized in that the computer instruction Method described in claim 1-7 is realized when being executed by processor.
CN201910294378.XA 2019-04-12 2019-04-12 A kind of method, apparatus, equipment and the storage medium of anti-crawler Pending CN110020512A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910294378.XA CN110020512A (en) 2019-04-12 2019-04-12 A kind of method, apparatus, equipment and the storage medium of anti-crawler

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910294378.XA CN110020512A (en) 2019-04-12 2019-04-12 A kind of method, apparatus, equipment and the storage medium of anti-crawler

Publications (1)

Publication Number Publication Date
CN110020512A true CN110020512A (en) 2019-07-16

Family

ID=67191220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910294378.XA Pending CN110020512A (en) 2019-04-12 2019-04-12 A kind of method, apparatus, equipment and the storage medium of anti-crawler

Country Status (1)

Country Link
CN (1) CN110020512A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111064745A (en) * 2019-12-30 2020-04-24 厦门市美亚柏科信息股份有限公司 Self-adaptive back-climbing method and system based on abnormal behavior detection
CN112165445A (en) * 2020-08-13 2021-01-01 杭州数梦工场科技有限公司 Method, device, storage medium and computer equipment for detecting network attack
CN112688919A (en) * 2020-12-11 2021-04-20 杭州安恒信息技术股份有限公司 APP interface-based crawler-resisting method, device and medium
CN113810358A (en) * 2021-02-05 2021-12-17 京东科技控股股份有限公司 Access limiting method, device, computer equipment and storage medium
CN114553541A (en) * 2022-02-17 2022-05-27 苏州良医汇网络科技有限公司 Method, device and equipment for verifying crawler prevention in grading manner and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013185612A1 (en) * 2012-06-13 2013-12-19 腾讯科技(深圳)有限公司 Method and device for determining security information of unknown file in cloud security system
CN104601601A (en) * 2015-02-25 2015-05-06 小米科技有限责任公司 Web crawler detecting method and device
CN106657057A (en) * 2016-12-20 2017-05-10 北京金堤科技有限公司 Anti-crawler system and method
CN107590227A (en) * 2017-09-05 2018-01-16 成都知道创宇信息技术有限公司 A kind of log analysis method of combination reptile
CN109145185A (en) * 2018-02-02 2019-01-04 北京数安鑫云信息技术有限公司 It identifies web crawlers and extracts the method and device of web crawlers feature
CN109298987A (en) * 2017-07-25 2019-02-01 北京国双科技有限公司 A kind of method and device detecting web crawlers operating status

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013185612A1 (en) * 2012-06-13 2013-12-19 腾讯科技(深圳)有限公司 Method and device for determining security information of unknown file in cloud security system
CN104601601A (en) * 2015-02-25 2015-05-06 小米科技有限责任公司 Web crawler detecting method and device
CN106657057A (en) * 2016-12-20 2017-05-10 北京金堤科技有限公司 Anti-crawler system and method
CN109298987A (en) * 2017-07-25 2019-02-01 北京国双科技有限公司 A kind of method and device detecting web crawlers operating status
CN107590227A (en) * 2017-09-05 2018-01-16 成都知道创宇信息技术有限公司 A kind of log analysis method of combination reptile
CN109145185A (en) * 2018-02-02 2019-01-04 北京数安鑫云信息技术有限公司 It identifies web crawlers and extracts the method and device of web crawlers feature

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111064745A (en) * 2019-12-30 2020-04-24 厦门市美亚柏科信息股份有限公司 Self-adaptive back-climbing method and system based on abnormal behavior detection
CN111064745B (en) * 2019-12-30 2022-06-03 厦门市美亚柏科信息股份有限公司 Self-adaptive back-climbing method and system based on abnormal behavior detection
CN112165445A (en) * 2020-08-13 2021-01-01 杭州数梦工场科技有限公司 Method, device, storage medium and computer equipment for detecting network attack
CN112688919A (en) * 2020-12-11 2021-04-20 杭州安恒信息技术股份有限公司 APP interface-based crawler-resisting method, device and medium
CN113810358A (en) * 2021-02-05 2021-12-17 京东科技控股股份有限公司 Access limiting method, device, computer equipment and storage medium
CN114553541A (en) * 2022-02-17 2022-05-27 苏州良医汇网络科技有限公司 Method, device and equipment for verifying crawler prevention in grading manner and storage medium
CN114553541B (en) * 2022-02-17 2024-02-06 苏州良医汇网络科技有限公司 Method, device, equipment and storage medium for checking anti-crawlers in grading mode

Similar Documents

Publication Publication Date Title
CN110020512A (en) A kind of method, apparatus, equipment and the storage medium of anti-crawler
CN103368904B (en) The detection of mobile terminal, questionable conduct and decision-making system and method
Apruzzese et al. “real attackers don't compute gradients”: bridging the gap between adversarial ml research and practice
US10257222B2 (en) Cloud checking and killing method, device and system for combating anti-antivirus test
KR20200085899A (en) Identity verification method and apparatus
CN110417778B (en) Access request processing method and device
CN109359972B (en) Core product pushing and core method and system
CN113132311B (en) Abnormal access detection method, device and equipment
CN110602029A (en) Method and system for identifying network attack
CN104753909B (en) Method for authenticating after information updating, Apparatus and system
CN111092910B (en) Database security access method, device, equipment, system and readable storage medium
CN110912874B (en) Method and system for effectively identifying machine access behaviors
CN107302586A (en) A kind of Webshell detection methods and device, computer installation, readable storage medium storing program for executing
CN110276198A (en) A kind of embedded changeable granularity control flow verification method and system based on probabilistic forecasting
CN107103237A (en) A kind of detection method and device of malicious file
CN113111359A (en) Big data resource sharing method and resource sharing system based on information security
CN114091042A (en) Risk early warning method
CN110135162A (en) The recognition methods of the back door WEBSHELL, device, equipment and storage medium
CN109657434A (en) Application access method and device
CN112330355B (en) Method, device, equipment and storage medium for processing consumption coupon transaction data
CN114117414A (en) Security protection system, method, device and storage medium for mobile application
CN110427971A (en) Recognition methods, device, server and the storage medium of user and IP
CN115174205A (en) Network space safety real-time monitoring method, system and computer storage medium
CN109309668A (en) Website verification method, device, system, computer equipment and storage medium
CN114356693A (en) Data monitoring method, device, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190716