CN110020512A - A kind of method, apparatus, equipment and the storage medium of anti-crawler - Google Patents
A kind of method, apparatus, equipment and the storage medium of anti-crawler Download PDFInfo
- Publication number
- CN110020512A CN110020512A CN201910294378.XA CN201910294378A CN110020512A CN 110020512 A CN110020512 A CN 110020512A CN 201910294378 A CN201910294378 A CN 201910294378A CN 110020512 A CN110020512 A CN 110020512A
- Authority
- CN
- China
- Prior art keywords
- user
- access
- abnormal
- crawler
- parameter set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses method, apparatus, equipment and the storage mediums of a kind of anti-crawler;By queried access log to determine whether abnormal access;After abnormal access occurs in determination, access log is analyzed according to the preset rules in preset rules library, to determine abnormal user parameter set;Blacklist is added in the abnormal user parameter set;The abnormal user is carried out to close operation according to abnormal user parameter set described in the blacklist;Automatic identification crawler user is achieved the purpose that and has closed crawler user, has played the role of reducing since the access of a large amount of crawlers causes server resource to overrun, site resource is protected, to improve user experience and improve website operational efficiency.
Description
Technical field
The present invention relates to field of information security technology, and in particular to a kind of method, apparatus of anti-crawler, equipment and storage are situated between
Matter.
Background technique
Web crawlers is a kind of computer program or script executed automatically, and web crawlers is according to ten thousand dimension of setting rule access
Web site simultaneously grabs site page information, and usually high-volume whole station grabs completely or element branches grab completely for this crawl,
And highdensity multiple crawl will do it to the page of frequent updating.Web crawlers may cause station to a large amount of access of website
The server resource of point consumes excessively to influence the access of normal users or a large amount of site information of web crawlers crawl may
Site information is caused to be used by wrongful business.Current common anti-crawler method is accessed by station maintenance personnel personal monitoring
The access behavior of the IP address of website is to determine whether be crawler, the method for above-mentioned anti-crawler needs station maintenance personnel manually to supervise
Survey causes working efficiency low, thus a kind of method for needing automatic identification and closing crawler.
Summary of the invention
For the defects in the prior art, the present invention provides a kind of anti-crawler method, apparatus, equipment and storage medium, uses
In solving the problems, such as automatic identification and close crawler.
On the one hand, the present invention provides a kind of anti-crawler methods, comprising: by queried access log to determine whether
Abnormal access;After abnormal access occurs in determination, access log is analyzed according to the preset rules in preset rules library, it is different with determination
Common family parameter set;Blacklist is added in the abnormal user parameter set;According to abnormal user parameter described in the blacklist
Collection carries out the abnormal user to close operation.
Preferably, described to determine whether that access exception includes: according to trigger condition queried access by access log
Log;Abnormal visit is determined whether according to the first threshold of the first preset condition in the first preset time section and access log
It asks.
Preferably, the preset rules in the preset rules library of the anti-crawler method include at least one following rule:
Access behavior whether more than the second preset condition in the second preset time section second threshold;In access request head whether include
The keyword that preset field is concentrated;Whether include the keyword in request header white list in access request head;Access request IP
Whether location is in IP address white list.
Preferably, the preset field collection includes at least one following keyword:
Java, Python, C++, C#, PHP, Perl, PHP and GO.
Preferably, before the addition blacklist by the abnormal user further include: the abnormal user parameter set is added
Proof listing is used as and verifies user again again;Receive the User Page request of verifying again;Send the manual authentication page to
It is described to verify user again;Receive returning the result for the manual authentication page for verifying user again;According to described
It returns the result and determines whether the verifying user again blacklist is added.
Preferably, the abnormal user parameter set includes at least IP address.
Preferably, field includes search engine request head file in the request header white list.
On the other hand, the present invention provides a kind of anti-crawler devices, comprising: enquiry module, analysis module, black list module
And filtering module;The enquiry module is configured as determining whether abnormal access by queried access log;The analysis
Module is configured as analyzing access log according to the preset rules in preset rules library, to determine abnormal user parameter set;It is described black
Name single module is configured as abnormal user parameter set blacklist is added;The filtering module is configured as according to described black
The abnormal user parameter set carries out the abnormal user to close operation in list.
On the other hand, the present invention provides a kind of anti-crawler equipment, comprising: at least one processor and can be stored in place
The memory of the computer instruction run on reason device;Wherein, the processor is above-mentioned for running the computer instruction realization
Anti- crawler method.
On the other hand, the present invention provides a kind of storage medium, it is stored with computer instruction in the storage medium, it is special
Sign is that the computer instruction realizes above-mentioned anti-crawler method when being executed by processor.
The beneficial effects of the present invention are embodied in: by queried access log to determine whether abnormal access;According to pre-
If the preset rules in rule base analyze access log, to determine abnormal user parameter set;The abnormal user parameter set is added
Enter blacklist;The abnormal user is carried out to close operation according to abnormal user parameter set described in the blacklist;Reach
Automatic identification crawler user and the purpose for closing crawler user play reduction since the access of a large amount of crawlers leads to server resource
The effect to overrun, protects site resource, to improve user experience and improve website operational efficiency.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art
Embodiment or attached drawing needed to be used in the description of the prior art are briefly described.In all the appended drawings, similar element
Or part is generally identified by similar appended drawing reference.In attached drawing, each element or part might not be drawn according to actual ratio.
Fig. 1 is a kind of flow diagram for anti-crawler method that the embodiment of the present invention one provides;
It is determined whether in a kind of anti-crawler method that Fig. 2 provides for the embodiment of the present invention one by queried access log
Abnormal access flow diagram;
Verify process signal in a kind of anti-crawler method that Fig. 3 provides for the embodiment of the present invention one again to abnormal user
Figure;
Fig. 4 is a kind of module diagram of anti-crawler method provided by Embodiment 2 of the present invention.
Specific embodiment
It is described in detail below in conjunction with embodiment of the attached drawing to technical solution of the present invention.Following embodiment is only used for
Clearly illustrate technical solution of the present invention, therefore be only used as example, and cannot be used as a limitation and limit protection model of the invention
It encloses.
It should be noted that unless otherwise indicated, technical term or scientific term used in this application should be this hair
The ordinary meaning that bright one of ordinary skill in the art are understood.
As shown in Figure 1, the embodiment of the present invention one provides a kind of anti-crawler method, comprising:
S110, by queried access log to determine whether abnormal access;
S120 analyzes access log according to the preset rules in preset rules library, with true after abnormal access occurs in determination
Determine abnormal user parameter set;
Blacklist is added in the abnormal user parameter set by S130;
S140 carries out the abnormal user to close operation according to abnormal user parameter set described in the blacklist.
In a specific embodiment, the shell script queried access log of automatic running is found in access log
Appearance abnormal access, such as the amount of access of specific webpage are excessive, then the access log are further analyzed, according to default rule
The then preset rules in library are then recognized for example, the number that the same IP accessed specific webpage within past 10 minutes is more than 100 times
It is abnormal user for the user, determines that the information for including IP address and user name of the user etc. about the user is the user
Abnormal user parameter set, blacklist is added in the customer parameter collection of the abnormal user;When the abnormal user initiates the page again
When request, the customer parameter collection of comparison abnormal user refuses to respond this if in blacklist whether in black bright list first
The page request of user achievees the purpose that close the abnormal user.By the above technological means, automatic identification crawler use is reached
Family and the purpose for closing crawler user, playing reduces since a large amount of crawlers access the work for causing server resource to overrun
With protecting site resource, user experience and improve website operational efficiency to improve.
As shown in Fig. 2, in a preferred embodiment of anti-crawler method provided by the invention, it is described by accessing day
Will determines whether that access exception includes:
S111, according to trigger condition queried access log;
S112 determines whether according to the first threshold of the first preset condition in the first preset time section and access log
Existing abnormal access.
In a specific embodiment, trigger condition is timer expiry, and automatic script program timing is initiated to visit
Ask the inquiry of log;First preset time section is 10 minutes, and the first preset condition is to visit the unit time of the same IP address
Ask the number of a page or website, first threshold is 100 times.That is, timing queried access log, if in 10 minutes
Same IP address access specific webpage number is more than then to think abnormal access occurred 100 times.
In another specific embodiment, trigger condition is that server load has been more than preset threshold, triggers script journey
Sequence queried access log;Shell script find website or specific webpage in past 30 minutes our station or specific webpage by not
Same user's access times are more than 3000 times, then it is assumed that abnormal access occur.
It will be understood by those skilled in the art that the above specific embodiment is not the specific limit to technical solution of the present invention
Fixed, those skilled in the art can be arranged trigger condition according to the actual situation, the first preset time section, the first preset condition and
First preset threshold come realize automatic queried access log and identify abnormal access purpose.
In one preferred embodiment of anti-crawler method provided by the invention, the described of the anti-crawler method is preset
The preset rules of rule base include at least one following rule:
Access behavior whether more than the second preset condition in the second preset time section second threshold;In access request head
The keyword whether concentrated comprising preset field;Whether include the keyword in request header white list in access request head;Access
Whether IP address requesting is in IP address white list.Wherein the request header is HTTP request head.
In a specific embodiment,
Access log is analyzed by single preset rules, for example, the specific webpage quilt in known past 30 minutes
Access times have been more than 1000 times, then it is assumed that abnormal access occur.By preset rules, whether access behavior is pre- more than second
If the second threshold of the second preset condition in time interval, that is, the same IP address access specific webpage time in past 10 minutes
Number is more than 100 times, and to analyzing for all access requests, extraction meets the users such as the IP address of access request of this rule ginseng
Manifold.In another feasible embodiment, whether the access behavior is more than the second preset condition in the second preset time section
Second threshold may be in 10 minutes same user be more than 100 times to the same operation of the same page.
In another particular embodiment of the invention, access log is analyzed by the combination of more than two rules,
For example, it has been more than 1000 times that specific webpage, which is accessed number, in known past 30 minutes, then it is assumed that abnormal access occur.It is logical
Cross whether access behavior more than the second threshold rule of the second preset condition in the second preset time section judges multiple IP address
Or amount of access of the user in 10 minutes has been more than 100 times;By judging whether contain in white name in access request head
Single keyword, such as the keyword of Usual Search Engines, the access of clearance Usual Search Engines;By with judging access request IP
Whether location is in IP address white list, the access of the IP address initiation in clearance IP address white list;In IP address white list
IP address is known IP address trusty, for example, the IP address of the IP address of the branch of our company, trust user
Deng;Other IP address or user then obtain its customer parameter collection, and blacklist is added as abnormal user parameter set.
In another specific embodiment, access log is analyzed by the combination of more than two rules,
For example, it has been more than 1000 times that specific webpage, which is accessed number, in known past 30 minutes, then it is assumed that abnormal access occur.It is logical
It crosses in access request header and whether judges the request header of multiple IP address or user's request comprising the field rule that preset field is concentrated
In User Agent field contain the article of specific crawler, for example, Java, Python, PHP etc., then tentatively suspect this IP
Address or user are abnormal user, further, by judging whether contain single keyword in white name in access request head,
Such as the keyword of Usual Search Engines, the access of clearance Usual Search Engines;By judging in access request whether is IP address
In IP address white list, the access of the IP address initiation in clearance IP address white list;IP address in IP address white list
For known IP address trusty, for example, the IP address of the branch of our company, the IP address for trusting user etc.;Other
IP address or user then obtain its customer parameter collection, as abnormal user parameter set be added blacklist.
It can be seen that by above-mentioned specific embodiment and automatically analyze access log using single preset rules, with determination
Abnormal user parameter set can rapidly and efficiently filter out abnormal user, and carry out closing filter operation to abnormal user.Using
The mode of multiple preset rules combinations analyzes access log, to determine abnormal user parameter set, can avoid as far as possible to just
The influence at common family, improves the accuracy rate for closing screen crawler.It will be understood by those skilled in the art that can be according to reality
Situation is combined the preset rules in rule base to reach fast automatic identification and close crawler, while is avoided again to normal
The operation of user generates interference.
Preferably, the preset field collection includes at least one following keyword: Java, Python, C++, C#, PHP,
Perl, PHP and GO.The User Agent field of HTTP request head includes the above keyword a period of time, because it is not probably
The request that browser issues, then the user for probably issuing this request is crawler user.
As shown in figure 3, in a preferred embodiment of anti-crawler method provided by the invention, it is described by the exception
Before blacklist is added in user further include:
Proof listing again is added as verifying user again in the abnormal user parameter set by S131;
S132 receives the User Page request of verifying again;
S133 sends the manual authentication page to described and verifies user again;
S134 receives returning the result for the manual authentication page for verifying user again;
S135 determines whether the verifying user again blacklist is added according to described returning the result
In a specific embodiment, by the way that proof listing again is added as again in the abnormal user parameter set
Secondary verifying user, when response verifies the page request of user again, Xiang Suoshu sends the hand comprising manual verification methods again
The dynamic verifying page, for example, the page comprising verifying code verification method, the page comprising sliding block verification method, including picture recognition
Verifying page of verification method etc. verifies the page;Returning the result for the manual authentication page for verifying user again is received, such as
Fruit is to be proved to be successful then to test the user's removal of the verifying again proof listing again again by described if authentication failed
Blacklist is added in the customer parameter collection for demonstrate,proving user, carries out closing filter operation to the user by blacklist.
Preferably, the abnormal user parameter set includes at least IP address.The abnormal user parameter set at least wraps
IP address is included, can also include: that user name, User ID and user's phone number etc. are available for identifying the letter of user identity
Breath.
Preferably, field includes search engine request head file in the request header white list.Such as HTTP request head
User Agent field includes one of field in following white list: googlebot, mediapartners-google,
baiduspider、sogou spider、sogou web sosospider、360spider、yahoo、msn、msnbot、
Sohu, yodaoBot, twiceler, ia_archiver, iaarchiver, slurp, bot then think that the access comes from for access
Search engine, the access are handled not as malice crawler.
Second embodiment of the present invention provides a kind of anti-crawler devices characterized by comprising enquiry module M110, analysis
Module M120, black list module M130 and filtering module M140;
The enquiry module M110 is configured as through queried access log to determine whether abnormal access;Described point
Analysis module M120 is configured as after abnormal access occurs in determination, analyzes access log according to the preset rules in preset rules library,
To determine abnormal user parameter set;The black list module M130 is configured as the abnormal user parameter set black name is added
It is single;The filtering module M140 be configured as according to abnormal user parameter set described in the blacklist to the abnormal user into
Row closes operation.
In a specific embodiment, the shell script queried access log of enquiry module M110 automatic running, hair
Show and occur abnormal access in access log, such as the amount of access of specific webpage is excessive, then is further divided the access log
Analysis, analysis module M120 according to the preset rules in preset rules library, for example, the same IP accessed within past 10 minutes it is specific
The number of the page is more than 100 times, then it is assumed that the user is abnormal user, determines to include IP address and user name of the user etc.
Information about the user is the abnormal user parameter set of the user, and black list module M130 is by the customer parameter of the abnormal user
Blacklist is added in collection;When the abnormal user initiates page request again, filtering module M140 compares the use of abnormal user first
Whether in black bright list, the page request that the user is refused to respond if in blacklist reaches closes the exception to family parameter set
The purpose of user.By the above technological means, automatic identification crawler user is achieved the purpose that and has closed crawler user, played
It reduces since a large amount of crawlers access the effect for causing server resource to overrun, site resource is protected, to improve use
Experience and improve website operational efficiency in family.
In one preferred embodiment of anti-crawler device provided by the invention, the enquiry module includes:
Trigger module is configured as according to trigger condition queried access log;
Judgment module is configured as according to the first threshold of the first preset condition in the first preset time section and access day
Will determines whether abnormal access.
In a specific embodiment, the trigger condition of trigger module is timer expiry, and automatic script program is fixed
Inquiry of the Shi Faqi to access log;Judgment module is 10 minutes according to the first preset time section, and the first preset condition is same
The unit time of one IP address accesses the number of a page or website, and first threshold is 100 times.That is, timing is looked into
Access log is ask, thinks exception occurred if IP address same in 10 minutes access specific webpage number is more than 100 times
Access.
In another specific embodiment, it has been more than preset threshold that the trigger condition of trigger module, which is server load,
Trigger shell script queried access log;Judgment module finds website or specific webpage at past 30 minutes according to shell script
Interior our station or specific webpage are more than 300 times by different user's access times, rule judgement there is abnormal access.
It will be understood by those skilled in the art that the above specific embodiment is not the specific limit to technical solution of the present invention
Fixed, those skilled in the art can be arranged trigger condition according to the actual situation, the first preset time section, the first preset condition and
First preset threshold come realize automatic queried access log and identify abnormal access purpose.
In one preferred embodiment of anti-crawler device provided by the invention, the described of the anti-crawler method is preset
The preset rules of rule base include at least one following rule:
Access behavior whether more than the second preset condition in the second preset time section second threshold;In access request head
The keyword whether concentrated comprising preset field;Whether include the keyword in request header white list in access request head;Access
Whether IP address requesting is in IP address white list.
In a specific embodiment, analysis module analyzes access log by single preset rules, example
Such as, it has been more than 1000 times that specific webpage, which is accessed number, in known past 30 minutes, then it is assumed that abnormal access occurs.Pass through
Preset rules, access behavior whether more than the second preset condition in the second preset time section second threshold, that is, 10 points of the past
The same IP address access specific webpage number is more than 100 times in clock, and to analyzing for all access requests, extraction meets this
The customer parameters collection such as the IP address of access request of rule.In another feasible embodiment, the access behavior whether be more than
In second preset time section the second threshold of the second preset condition may be in 10 minutes same user to the same page
Same operation is more than 100 times.
In another particular embodiment of the invention, analysis module by the combinations of more than two rules to access log into
Row analysis, for example, it has been more than 1000 times that specific webpage, which is accessed number, in known past 30 minutes, then it is assumed that exception occur
Access.It is multiple by the way that whether access behavior judges more than the second threshold rule of the second preset condition in the second preset time section
The amount of access of IP address or user in 10 minutes has been more than 100 times;It is white by judging whether to contain in access request head
Single keyword in name, such as the keyword of Usual Search Engines, the access of clearance Usual Search Engines;By judging that access is asked
Ask IP address whether in IP address white list, the access that the IP address in clearance IP address white list is initiated;The white name of IP address
IP address in list is known IP address trusty, for example, the IP of the IP address of the branch of our company, trust user
Address etc.;Other IP address or user then obtain its customer parameter collection, and blacklist is added as abnormal user parameter set.
In another specific embodiment, analysis module by the combinations of more than two rules to access log into
Row analysis, for example, it has been more than 1000 times that specific webpage, which is accessed number, in known past 30 minutes, then it is assumed that exception occur
Access.By in access request head, whether the field rule comprising preset field concentration judges what multiple IP address or user were requested
User Agent field in request header contains the article of specific crawler, for example, Java, Python, PHP etc., then preliminary to cherish
Doubting this IP address or user is abnormal user, further, by judging whether contain the list in white name in access request head
Keyword, such as the keyword of Usual Search Engines, the access of clearance Usual Search Engines;By judging that access asks the IP address to be
The no access that in IP address white list, the IP address in clearance IP address white list is initiated;IP in IP address white list
Location is known IP address trusty, for example, the IP address of the branch of our company, the IP address for trusting user etc.;Its
His IP address or user then obtain its customer parameter collection, and blacklist is added as abnormal user parameter set.
It can be seen that by above-mentioned specific embodiment and automatically analyze access log using single preset rules, with determination
Abnormal user parameter set can rapidly and efficiently filter out abnormal user, and carry out closing filter operation to abnormal user.Using
The mode of multiple preset rules combinations analyzes access log, to determine abnormal user parameter set, can avoid as far as possible to just
The influence at common family, improves the accuracy rate for closing screen crawler.
Preferably, the preset field collection includes at least one following keyword: Java, Python, C++, C#, PHP,
Perl, PHP and GO.The User Agent field of HTTP request head includes the above keyword a period of time, because it is not probably
The request that browser issues, then the user for probably issuing this request is crawler user.
In one preferred embodiment of anti-crawler device provided by the invention, the anti-crawler device further include:
Proof listing module is configured as abnormal user parameter set proof listing again is added as verifying again
User;
Receiving module is configured as receiving the User Page request of verifying again;
Authentication module is configured as the transmission manual authentication page to described and verifies user again;
The receiving module is additionally configured to receive the return knot of the manual authentication page for verifying user again
Fruit;
Authentication module is stated, is configured as being determined whether the verifying user again black name is added according to described returning the result
It is single.
In a specific embodiment, proof listing module by testing abnormal user parameter set addition again
Card list as verifying user again, and for receiving module when response verifies the page request of user again, authentication module is to described
Again send include manual verification methods the manual authentication page, for example, comprising verify code verification method the page, include sliding block
The page of verification method, verifying page comprising picture recognition verification method etc. verify the page;The authentication module receives again
Returning the result for the manual authentication page of user is verified, then removes the verifying user again again if it is being proved to be successful
The customer parameter collection for verifying user again is added blacklist, passes through blacklist by secondary proof listing if authentication failed
The user is carried out to close filter operation.
Preferably, the abnormal user parameter set includes at least IP address.The abnormal user parameter set at least wraps
IP address is included, can also include: that user name, User ID and user's phone number etc. are available for identifying the letter of user identity
Breath.
Preferably, field includes search engine request head file in the request header white list.Such as HTTP request head
User Agent field includes one of field in following white list: googlebot, mediapartners-google,
baiduspider、sogou spider、sogou web sosospider、360spider、yahoo、msn、msnbot、
Sohu, yodaoBot, twiceler, ia_archiver, iaarchiver, slurp, bot then think that the access comes from for access
Search engine, the access are handled not as malice crawler.
The embodiment of the present invention three provides a kind of anti-crawler equipment, comprising: at least one processor and can be stored in place
The memory of the computer instruction run on reason device;Wherein, the processor is above-mentioned for running the computer instruction realization
Anti- crawler method.Wherein, the processor can be a centralized processor, or multiple processor clusters,
It can be multiple distributed processors;The processor includes central processing unit (CPU), microcontroller, programmable logic control
Any one in device (PLC) processed, programming device, other processing equipments or combinations thereof.As non-limiting example, processor
It may include specific integrated circuit (ASIC), system on chip (SOC), logic gate array, programmable gate array (for example, existing
Field programmable gate array (FPGA)), other hardware elements or combinations thereof.Processor is used to execute the calculating of storage on a memory
Machine readable instruction.The memory may include non-transient computer readable storage medium.As non-limiting example, storage
Device includes volatile storage (for example, random access memory (RAM)), non-volatile memories (for example, read-only memory (ROM))
Or combinations thereof.As non-limiting example, memory may include dynamic ram (DRAM), electric programmable read-only memory
(EPROM), hard disk drive, solid state drive, flash drive, disk, removable media (storage card, thumb actuator, light
Disk etc.) or other storage equipment.It will be understood by those skilled in the art that the specific embodiment in above-described embodiment and embodiment
It can be applied to the present embodiment.
The embodiment of the present invention four provides a kind of storage medium, and computer instruction is stored in the storage medium, described
Above-mentioned anti-crawler method is realized when computer instruction is executed by processor.The memory may include non-transient computer
Readable storage medium storing program for executing.As non-limiting example, memory may include dynamic ram (DRAM), electric programmable read-only memory
(EPROM), hard disk drive, solid state drive, flash drive, disk, removable media (storage card, thumb actuator, light
Disk etc.), remote storage device, cloud storage equipment or other storage equipment.It will be understood by those skilled in the art that above-mentioned implementation
Specific embodiment in example and embodiment can be applied to the present embodiment.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme should all cover within the scope of the claims and the description of the invention.
Claims (10)
1. a kind of anti-crawler method characterized by comprising
By queried access log to determine whether abnormal access;
After determining and abnormal access occur, access log is analyzed according to the preset rules in preset rules library, to determine abnormal use
Family parameter set;
Blacklist is added in the abnormal user parameter set;
The abnormal user is carried out to close operation according to abnormal user parameter set described in the blacklist.
2. anti-crawler method according to claim 1, which is characterized in that described to determine whether to visit by access log
Ask that exception includes:
According to trigger condition queried access log;
Abnormal visit is determined whether according to the first threshold of the first preset condition in the first preset time section and access log
It asks.
3. anti-crawler method according to claim 1, which is characterized in that the preset rules library of the anti-crawler method
Preset rules include at least one following rule:
Access behavior whether more than the second preset condition in the second preset time section second threshold;
The keyword whether concentrated comprising preset field in access request head;
Whether include the keyword in request header white list in access request head;
Whether access request IP address is in IP address white list.
4. anti-crawler method according to claim 3, which is characterized in that the preset field collection include following keyword extremely
It is one of few:
Java, Python, C++, C#, PHP, Perl, PHP and GO.
5. anti-crawler method according to claim 1, which is characterized in that before the addition blacklist by the abnormal user
Further include:
Proof listing again is added as verifying user again in the abnormal user parameter set;
Receive the User Page request of verifying again;
It sends the manual authentication page and verifies user again to described;
Receive returning the result for the manual authentication page for verifying user again;
Determined whether the verifying user again blacklist is added according to described returning the result.
6. anti-crawler method according to claim 1, which is characterized in that the abnormal user parameter set includes at least user
IP address.
7. anti-crawler method according to claim 1, which is characterized in that field includes search in the request header white list
Engine requests head file.
8. a kind of anti-crawler device characterized by comprising enquiry module, analysis module, black list module and filtering module;
The enquiry module is configured as determining whether abnormal access by queried access log;
The analysis module is configured as analyzing access log according to the preset rules in preset rules library, to determine that abnormal user is joined
Manifold;
The black list module is configured as abnormal user parameter set blacklist is added;
The filtering module is configured as carrying out the abnormal user according to abnormal user parameter set described in the blacklist
Close operation.
9. a kind of anti-crawler equipment characterized by comprising what at least one processor and capable of storing was run on a processor
The memory of computer instruction;
Wherein, the processor realizes method described in claim 1-7 for running the computer instruction.
10. a kind of storage medium, computer instruction is stored in the storage medium, which is characterized in that the computer instruction
Method described in claim 1-7 is realized when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910294378.XA CN110020512A (en) | 2019-04-12 | 2019-04-12 | A kind of method, apparatus, equipment and the storage medium of anti-crawler |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910294378.XA CN110020512A (en) | 2019-04-12 | 2019-04-12 | A kind of method, apparatus, equipment and the storage medium of anti-crawler |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110020512A true CN110020512A (en) | 2019-07-16 |
Family
ID=67191220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910294378.XA Pending CN110020512A (en) | 2019-04-12 | 2019-04-12 | A kind of method, apparatus, equipment and the storage medium of anti-crawler |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110020512A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111064745A (en) * | 2019-12-30 | 2020-04-24 | 厦门市美亚柏科信息股份有限公司 | Self-adaptive back-climbing method and system based on abnormal behavior detection |
CN112165445A (en) * | 2020-08-13 | 2021-01-01 | 杭州数梦工场科技有限公司 | Method, device, storage medium and computer equipment for detecting network attack |
CN112688919A (en) * | 2020-12-11 | 2021-04-20 | 杭州安恒信息技术股份有限公司 | APP interface-based crawler-resisting method, device and medium |
CN113810358A (en) * | 2021-02-05 | 2021-12-17 | 京东科技控股股份有限公司 | Access limiting method, device, computer equipment and storage medium |
CN114553541A (en) * | 2022-02-17 | 2022-05-27 | 苏州良医汇网络科技有限公司 | Method, device and equipment for verifying crawler prevention in grading manner and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013185612A1 (en) * | 2012-06-13 | 2013-12-19 | 腾讯科技(深圳)有限公司 | Method and device for determining security information of unknown file in cloud security system |
CN104601601A (en) * | 2015-02-25 | 2015-05-06 | 小米科技有限责任公司 | Web crawler detecting method and device |
CN106657057A (en) * | 2016-12-20 | 2017-05-10 | 北京金堤科技有限公司 | Anti-crawler system and method |
CN107590227A (en) * | 2017-09-05 | 2018-01-16 | 成都知道创宇信息技术有限公司 | A kind of log analysis method of combination reptile |
CN109145185A (en) * | 2018-02-02 | 2019-01-04 | 北京数安鑫云信息技术有限公司 | It identifies web crawlers and extracts the method and device of web crawlers feature |
CN109298987A (en) * | 2017-07-25 | 2019-02-01 | 北京国双科技有限公司 | A kind of method and device detecting web crawlers operating status |
-
2019
- 2019-04-12 CN CN201910294378.XA patent/CN110020512A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013185612A1 (en) * | 2012-06-13 | 2013-12-19 | 腾讯科技(深圳)有限公司 | Method and device for determining security information of unknown file in cloud security system |
CN104601601A (en) * | 2015-02-25 | 2015-05-06 | 小米科技有限责任公司 | Web crawler detecting method and device |
CN106657057A (en) * | 2016-12-20 | 2017-05-10 | 北京金堤科技有限公司 | Anti-crawler system and method |
CN109298987A (en) * | 2017-07-25 | 2019-02-01 | 北京国双科技有限公司 | A kind of method and device detecting web crawlers operating status |
CN107590227A (en) * | 2017-09-05 | 2018-01-16 | 成都知道创宇信息技术有限公司 | A kind of log analysis method of combination reptile |
CN109145185A (en) * | 2018-02-02 | 2019-01-04 | 北京数安鑫云信息技术有限公司 | It identifies web crawlers and extracts the method and device of web crawlers feature |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111064745A (en) * | 2019-12-30 | 2020-04-24 | 厦门市美亚柏科信息股份有限公司 | Self-adaptive back-climbing method and system based on abnormal behavior detection |
CN111064745B (en) * | 2019-12-30 | 2022-06-03 | 厦门市美亚柏科信息股份有限公司 | Self-adaptive back-climbing method and system based on abnormal behavior detection |
CN112165445A (en) * | 2020-08-13 | 2021-01-01 | 杭州数梦工场科技有限公司 | Method, device, storage medium and computer equipment for detecting network attack |
CN112688919A (en) * | 2020-12-11 | 2021-04-20 | 杭州安恒信息技术股份有限公司 | APP interface-based crawler-resisting method, device and medium |
CN113810358A (en) * | 2021-02-05 | 2021-12-17 | 京东科技控股股份有限公司 | Access limiting method, device, computer equipment and storage medium |
CN114553541A (en) * | 2022-02-17 | 2022-05-27 | 苏州良医汇网络科技有限公司 | Method, device and equipment for verifying crawler prevention in grading manner and storage medium |
CN114553541B (en) * | 2022-02-17 | 2024-02-06 | 苏州良医汇网络科技有限公司 | Method, device, equipment and storage medium for checking anti-crawlers in grading mode |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110020512A (en) | A kind of method, apparatus, equipment and the storage medium of anti-crawler | |
CN103368904B (en) | The detection of mobile terminal, questionable conduct and decision-making system and method | |
Apruzzese et al. | “real attackers don't compute gradients”: bridging the gap between adversarial ml research and practice | |
US10257222B2 (en) | Cloud checking and killing method, device and system for combating anti-antivirus test | |
KR20200085899A (en) | Identity verification method and apparatus | |
CN110417778B (en) | Access request processing method and device | |
CN109359972B (en) | Core product pushing and core method and system | |
CN113132311B (en) | Abnormal access detection method, device and equipment | |
CN110602029A (en) | Method and system for identifying network attack | |
CN104753909B (en) | Method for authenticating after information updating, Apparatus and system | |
CN111092910B (en) | Database security access method, device, equipment, system and readable storage medium | |
CN110912874B (en) | Method and system for effectively identifying machine access behaviors | |
CN107302586A (en) | A kind of Webshell detection methods and device, computer installation, readable storage medium storing program for executing | |
CN110276198A (en) | A kind of embedded changeable granularity control flow verification method and system based on probabilistic forecasting | |
CN107103237A (en) | A kind of detection method and device of malicious file | |
CN113111359A (en) | Big data resource sharing method and resource sharing system based on information security | |
CN114091042A (en) | Risk early warning method | |
CN110135162A (en) | The recognition methods of the back door WEBSHELL, device, equipment and storage medium | |
CN109657434A (en) | Application access method and device | |
CN112330355B (en) | Method, device, equipment and storage medium for processing consumption coupon transaction data | |
CN114117414A (en) | Security protection system, method, device and storage medium for mobile application | |
CN110427971A (en) | Recognition methods, device, server and the storage medium of user and IP | |
CN115174205A (en) | Network space safety real-time monitoring method, system and computer storage medium | |
CN109309668A (en) | Website verification method, device, system, computer equipment and storage medium | |
CN114356693A (en) | Data monitoring method, device, medium and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190716 |