CN106657057B - Anti-crawler system and method - Google Patents

Anti-crawler system and method Download PDF

Info

Publication number
CN106657057B
CN106657057B CN201611183559.8A CN201611183559A CN106657057B CN 106657057 B CN106657057 B CN 106657057B CN 201611183559 A CN201611183559 A CN 201611183559A CN 106657057 B CN106657057 B CN 106657057B
Authority
CN
China
Prior art keywords
access
time
access behavior
behavior
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611183559.8A
Other languages
Chinese (zh)
Other versions
CN106657057A (en
Inventor
柳超
梁双
闫肃
任靓
毕可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jindi Technology Co Ltd
Original Assignee
Beijing Jindi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jindi Technology Co Ltd filed Critical Beijing Jindi Technology Co Ltd
Priority to CN201611183559.8A priority Critical patent/CN106657057B/en
Publication of CN106657057A publication Critical patent/CN106657057A/en
Application granted granted Critical
Publication of CN106657057B publication Critical patent/CN106657057B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/105Multiple levels of security
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • H04L63/205Network architectures or network communication protocols for network security for managing network security; network security policies in general involving negotiation or determination of the one or more network security mechanisms to be used, e.g. by negotiation between the client and the server or between peers or by selection according to the capabilities of the entities involved

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an anti-crawler system, comprising: the analysis module judges whether the access behavior is normal or not; an acquisition module that acquires an access log judged to be abnormal by the analysis module; the learning module is provided with an updatable blacklist rule base, and extracts a new blacklist rule according to an access log of abnormal access behaviors so as to update the blacklist rule base; a filtering module that prohibits access behavior encompassed by the blacklist rule base. The invention also provides an anti-crawler method. According to the invention, on one hand, the back-climbing can be carried out according to the analysis of the analysis module, on the other hand, the learning module can extract a new blacklist rule from an access log of an abnormal access behavior, and the blacklist rule base is continuously updated to help the back-climbing, so that the accuracy and the speed are both achieved.

Description

Anti-crawler system and method
Technical Field
The invention relates to the technical field of anti-reptiles. More particularly, the present invention relates to a self-learning anti-crawler system and method.
Background
The crawling data refers to capturing data on a page by simulating the operation of a human through a software program without the main permission of a server. There are two common methods of crawling data backwards: 1) verification is performed by setting verification codes to the pages, which are difficult for the computer to recognize. However, some computers can still identify, and some can identify through the way of employer identification, so that the problem cannot be solved fundamentally; 2) by monitoring the abnormal behavior of the IP address, for example, the client of a certain IP address is not called by a browser, or the calling speed is too fast, the calling amount is too large, and the like. Rules are set to seal these IPs. However, the reaction speed is slow, and after the abnormity is found, the rule needs to be set manually to realize the reverse climbing. And the user can call various pseudo IPs to crawl data by means of an IP proxy. Therefore, it is necessary to devise a system and method that can learn the anti-crawl rules on its own.
Disclosure of Invention
An object of the present invention is to provide a system and method capable of extracting new blacklist rules from access logs of abnormal access behaviors to continuously update a blacklist rule base for further back-crawling.
To achieve these objects and other advantages in accordance with the purpose of the invention, there is provided an anti-crawler system comprising:
the analysis module judges whether the access behavior is normal or not;
an acquisition module that acquires an access log judged to be abnormal by the analysis module;
the learning module is provided with an updatable blacklist rule base, and extracts a new blacklist rule according to an access log of abnormal access behaviors so as to update the blacklist rule base;
a filtering module that prohibits access behavior encompassed by the blacklist rule base.
Preferably, in the anti-crawler system, the filtering module stores an updatable IP blacklist, and the filtering module adds an IP address corresponding to an access behavior included in the blacklist rule base to the IP blacklist and prohibits the access behavior of the IP address.
Preferably, in the anti-crawler system, if an access behavior is not included in the blacklist rule base, the analysis module is called to analyze the access behavior, and if the access behavior is abnormal, the filtering module prohibits the access behavior; and if the access behavior is included by the blacklist rule base, the analysis module is not called to continue analyzing the access behavior.
Preferably, in the anti-crawler system, the method for determining whether the access behavior is normal by the analysis module includes:
acquiring the access times of the access behaviors in a first preset time period, and detecting whether mouse behaviors exist or not;
and if the access times in the first preset time exceed a preset threshold and the mouse behavior is not detected, judging that the access behavior is abnormal.
Preferably, in the anti-crawler system, the method for extracting a new blacklist rule by the learning module according to the access log of the abnormal access behavior includes:
calculating the unit time access times and unit time preset threshold of abnormal access behaviors in a first preset time period; the new extracted blacklist rules are: the access times per unit time of the access behaviors are higher than a preset threshold value per unit time.
Preferably, in the anti-crawler system, the method for determining whether the access behavior is normal by the analysis module includes:
acquiring the access times of the access behavior in each second preset time period and time points corresponding to the access times, dividing the second preset time period into N time intervals, and calculating the access frequency of each time interval respectively;
if the access frequency of the N time-sharing periods is lower than the first threshold, dividing the next second preset time period into N/2 time-sharing periods, and then respectively calculating the access frequency of each time-sharing period;
if the access frequency of the N time-sharing periods is higher than a second threshold, dividing the next second preset time period into 2N time-sharing periods, and then respectively calculating the access frequency of each time-sharing period;
if the access frequency of the N time-sharing periods is higher than the first threshold and lower than the second threshold, dividing the next second preset time period into N time-sharing periods, and then respectively calculating the access frequency of each time-sharing period;
wherein if the access frequency of the access behavior in any time interval is higher than the frequency threshold, the access behavior is abnormal;
wherein N is more than or equal to 10;
the first threshold is 1/4 of the frequency threshold, and the second threshold is 3/4 of the frequency threshold.
An anti-crawler method comprising:
judging whether the access behavior is normal or not;
obtaining an access log judged to be abnormal access behavior;
extracting a new blacklist rule according to the access log of the abnormal access behavior so as to update a blacklist rule base;
forbidding the access behavior included by the blacklist rule base.
Preferably, the anti-crawler method further comprises:
and adding the IP address corresponding to the access behavior included in the blacklist rule base into the IP blacklist, and forbidding the access behavior of the IP address.
Preferably, the anti-crawler method further comprises:
if an access behavior is not included in the blacklist rule base, analyzing the access behavior, and if the access behavior is abnormal, forbidding the access behavior; and if the access behavior is included by the blacklist rule base, not continuously analyzing the access behavior.
Preferably, the method for determining whether the access behavior is normal by using the anti-crawler method includes:
acquiring the access times of the access behaviors in a first preset time period, and detecting whether mouse behaviors exist or not;
and if the access times in the first preset time exceed a preset threshold and the mouse behavior is not detected, judging that the access behavior is abnormal.
The invention at least comprises the following beneficial effects:
according to the anti-crawling method and the anti-crawling device, on one hand, the anti-crawling can be performed according to the analysis of the analysis module, on the other hand, the learning module can extract a new blacklist rule from an access log of an abnormal access behavior, the blacklist rule base is continuously updated, the access behavior can be directly prohibited according to the blacklist rule base, the anti-crawling speed and the anti-crawling accuracy are combined, and the anti-crawling speed is obviously improved compared with that of the analysis module only.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Detailed Description
The present invention is further described in detail below with reference to examples to enable those skilled in the art to practice the invention with reference to the description.
The present invention provides an anti-crawler system, comprising:
the analysis module judges whether the access behavior is normal or not;
an acquisition module that acquires an access log judged to be abnormal by the analysis module;
the learning module is provided with an updatable blacklist rule base, and extracts a new blacklist rule according to an access log of abnormal access behaviors so as to update the blacklist rule base;
a filtering module that prohibits access behavior encompassed by the blacklist rule base.
In the technical scheme, the analysis module, the acquisition module, the learning module and the filtering module are all independent server groups, so that the analysis performance and efficiency are high in hardware. The analysis module analyzes the access behavior in the same way as the analysis method in the prior art, for example, whether the access behavior is a normal access behavior is judged according to the IP, the access time and the access times of the access behavior, for example, if one IP accesses a page more than 3000 times in more than three hours, and if no mouse is detected, the access behavior is considered to be an abnormal access. The acquisition module is used for acquiring an access log of the client. The learning module extracts a new blacklist rule according to the abnormal access obtained by the analysis module, wherein the new blacklist rule can be directly the same as a judgment rule of the abnormal access by the analysis module, and can also be an improvement of the judgment rule of the abnormal access. Thus, the analysis module, the acquisition module and the learning module work repeatedly, and the blacklist rule base is continuously updated. And the filtering module compares the next access behavior with the blacklist rules in the blacklist rule base, and if the access behavior conforms to one of the blacklist rules, the access behavior is forbidden.
In another example, in the anti-crawler system, the filtering module stores an updatable IP blacklist, and the filtering module adds an IP address corresponding to an access behavior included in the blacklist rule base to the IP blacklist and prohibits the access behavior of the IP address. Here, the IP blacklist may be continuously enriched according to the blacklist rule base, so that the system directly blocks the IP which has been abnormally accessed according to the IP blacklist without comparing the access behavior with the blacklist rule base.
In another example, if an access behavior is not included in the blacklist rule base, the anti-crawler system calls the analysis module to analyze the access behavior, and if the access behavior is abnormal, the filter module prohibits the access behavior; and if the access behavior is included by the blacklist rule base, the analysis module is not called to continue analyzing the access behavior. Here, the analysis module is optimized for invoking an access behavior that is blocked if it complies with a blacklist rule, and is not invoked for analysis, and for analyzing if it does not comply with any blacklist rule.
In another example, in the anti-crawler system, the method for the analysis module to determine whether the access behavior is normal includes:
acquiring the access times of the access behaviors in a first preset time period, and detecting whether mouse behaviors exist or not;
and if the access times in the first preset time exceed a preset threshold and the mouse behavior is not detected, judging that the access behavior is abnormal.
The technical scheme provides a feasible method for judging whether the access behavior is normal, namely judging whether the access times of the access behavior in a certain time period exceed a preset threshold value, and if the access times exceed the preset threshold value and the mouse behavior is not detected, judging that the access behavior is abnormal. For example, the judgment rule of the analysis module is that one IP accesses a page more than 3000 times in more than three hours, and a mouse is not detected, so that abnormal access is considered.
In another example, in the anti-crawler system, the method for extracting a new blacklist rule from an access log of abnormal access behavior by the learning module includes:
calculating the unit time access times and unit time preset threshold of abnormal access behaviors in a first preset time period; the new extracted blacklist rules are: the access times per unit time of the access behaviors are higher than a preset threshold value per unit time.
The technical scheme provides a method for extracting a new blacklist rule from an abnormal access behavior, namely, the access times and the preset threshold are divided by first preset time respectively to obtain the access times per unit time and the preset threshold per unit time, and the new blacklist rule is as follows: the number of times of access in unit time of the access behavior is higher than a preset threshold value in unit time, and the mouse behavior is not detected. Namely, compared with the judgment rule of the analysis module, the extracted blacklist rule is more flexible and simpler. For example, when the analysis module detects a mouse, but the IP accesses a page more than 3000 times in more than three hours, and finally the analysis module determines that the access also belongs to abnormal access, the analysis module sends the access logs to the learning module as negative samples (the learning module may obtain some positive samples and some negative samples for learning). The learning module extracts a new blacklist rule for the next use according to the negative samples. For example, the next day, another IP has accessed as often as one thousand times per hour, although not three hours, the analysis module has not identified this access as being an abnormal access, which the blacklist rule base of the learning module has covered.
In another example, in the anti-crawler system, the method for the analysis module to determine whether the access behavior is normal includes:
acquiring the access times of the access behavior in each second preset time period and time points corresponding to the access times, dividing the second preset time period into N time intervals, and calculating the access frequency of each time interval respectively;
if the access frequency of the N time-sharing periods is lower than the first threshold, dividing the next second preset time period into N/2 time-sharing periods, and then respectively calculating the access frequency of each time-sharing period;
if the access frequency of the N time-sharing periods is higher than a second threshold, dividing the next second preset time period into 2N time-sharing periods, and then respectively calculating the access frequency of each time-sharing period;
if the access frequency of the N time-sharing periods is higher than the first threshold and lower than the second threshold, dividing the next second preset time period into N time-sharing periods, and then respectively calculating the access frequency of each time-sharing period;
wherein if the access frequency of the access behavior in any time interval is higher than the frequency threshold, the access behavior is abnormal;
wherein N is more than or equal to 10;
the first threshold is 1/4 of the frequency threshold, and the second threshold is 3/4 of the frequency threshold.
The technical scheme provides a method for judging whether the access behavior is normal by an analysis module, namely, dividing a second preset time period into a plurality of time periods, dividing the access times of each time period by the time length of each time period to obtain the access frequency of each time period, comparing the access frequency of each time period with a frequency threshold, and judging that the access behavior is abnormal if one access frequency is higher than the frequency threshold; therefore, the crawler can be identified more accurately, and the condition that the access frequency is reduced when the crawler improves the access frequency due to the fact that the crawler utilizes the system to move in a leakage mode is avoided. In order to reduce the calculation amount, the invention also reduces the technical times of the access frequency when the access frequency of each time interval is lower than a first threshold, and improves the calculation times of the access frequency when the access frequency is higher than a second threshold to prevent the crawler from being missed. Preferred values for the first threshold and the second threshold and preferred values for reducing the number of times the access frequency is calculated are also provided.
The invention also provides an anti-crawler method, which comprises the following steps:
judging whether the access behavior is normal or not;
obtaining an access log judged to be abnormal access behavior;
extracting a new blacklist rule according to the access log of the abnormal access behavior so as to update a blacklist rule base;
forbidding the access behavior included by the blacklist rule base.
In the above technical solution, the method for analyzing the access behavior is the same as the analysis method in the prior art, for example, whether the access behavior is a normal access behavior is determined according to the IP, the access time, and the access times of the access behavior, for example, if one IP accesses a page more than 3000 times in more than three hours, and if a mouse is not detected, the access behavior is considered as an abnormal access. And then obtaining an access log of the client. And then extracting a new blacklist rule according to the abnormal access, wherein the new blacklist rule can be directly the same as the judgment rule of the abnormal access or can be an improvement of the judgment rule of the abnormal access. Thus, the blacklist rule base can be continuously updated, the next access behavior is compared with the blacklist rules in the blacklist rule base, and if the access behavior accords with one of the blacklist rules, the access behavior is forbidden.
In another example, the anti-crawler method further comprises:
and adding the IP address corresponding to the access behavior included in the blacklist rule base into the IP blacklist, and forbidding the access behavior of the IP address. Here, the IP blacklist may be continuously enriched according to the blacklist rule base, so that the system directly blocks the IP which has been abnormally accessed according to the IP blacklist without comparing the access behavior with the blacklist rule base.
In another example, the anti-crawler method further comprises:
if an access behavior is not included in the blacklist rule base, analyzing the access behavior, and if the access behavior is abnormal, forbidding the access behavior; and if the access behavior is included by the blacklist rule base, not continuously analyzing the access behavior. Here, the invocation of the analysis is optimized such that if an access behavior complies with a blacklist rule, the access behavior is blocked without further analysis, and if an access behavior does not comply with any blacklist rule, the analysis continues with whether it should be blocked.
In another example, the anti-crawler method for determining whether the access behavior is normal includes:
acquiring the access times of the access behaviors in a first preset time period, and detecting whether mouse behaviors exist or not;
and if the access times in the first preset time exceed a preset threshold and the mouse behavior is not detected, judging that the access behavior is abnormal. The technical scheme provides a feasible method for judging whether the access behavior is normal, namely judging whether the access times of the access behavior in a certain time period exceed a preset threshold value, and if the access times exceed the preset threshold value and the mouse behavior is not detected, judging that the access behavior is abnormal. For example, it is determined that an IP accesses a page more than 3000 times in three hours or more, and a mouse is not detected, and if the rule is satisfied, it is determined that the access is abnormal.
While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable to various fields of endeavor for which the invention may be embodied with additional modifications as would be readily apparent to those skilled in the art, and the invention is therefore not limited to the details given herein and to the examples shown and described without departing from the generic concept as defined by the claims and their equivalents.

Claims (7)

1. An anti-crawler system, comprising:
the analysis module judges whether the access behavior is normal or not;
an acquisition module that acquires an access log judged to be abnormal by the analysis module;
the learning module is provided with an updatable blacklist rule base, and extracts a new blacklist rule according to an access log of abnormal access behaviors so as to update the blacklist rule base;
a filtering module that prohibits access behavior encompassed by the blacklist rule base;
the method for judging whether the access behavior is normal or not by the analysis module comprises the following steps:
acquiring the access times of the access behavior in each second preset time period and time points corresponding to the access times, dividing the second preset time period into N time intervals, and calculating the access frequency of each time interval respectively;
if the access frequency of the N time-sharing periods is lower than the first threshold, dividing the next second preset time period into N/2 time-sharing periods, and then respectively calculating the access frequency of each time-sharing period;
if the access frequency of the N time-sharing periods is higher than a second threshold, dividing the next second preset time period into 2N time-sharing periods, and then respectively calculating the access frequency of each time-sharing period;
if the access frequency of the N time-sharing periods is higher than the first threshold and lower than the second threshold, dividing the next second preset time period into N time-sharing periods, and then respectively calculating the access frequency of each time-sharing period;
wherein if the access frequency of the access behavior in any time interval is higher than the frequency threshold, the access behavior is abnormal;
the method for extracting the new blacklist rule by the learning module according to the access log of the abnormal access behavior comprises the following steps:
calculating the unit time access times and unit time preset threshold of abnormal access behaviors in a first preset time period; the new extracted blacklist rules are: the access times per unit time of the access behaviors are higher than a preset threshold value per unit time.
2. The anti-crawler system of claim 1, wherein the filter module stores an updatable IP blacklist, wherein the filter module adds an IP address corresponding to an access behavior included in the blacklist rule base to the IP blacklist and prohibits the access behavior of the IP address.
3. The anti-crawler system of claim 2, wherein if an access behavior is not included in the blacklist rule base, the analysis module is invoked to analyze the access behavior, and if the access behavior is abnormal, the filtering module prohibits the access behavior; and if the access behavior is included by the blacklist rule base, the analysis module is not called to continue analyzing the access behavior.
4. The anti-crawler system of claim 1, wherein N ≧ 10; the first threshold is 1/4 of the frequency threshold and the second threshold is 3/4 of the frequency threshold.
5. An anti-crawler method, comprising:
judging whether the access behavior is normal or not;
obtaining an access log judged to be abnormal access behavior;
extracting a new blacklist rule according to the access log of the abnormal access behavior so as to update a blacklist rule base;
forbidding the access behavior included by the blacklist rule base;
the method for judging whether the access behavior is normal or not comprises the following steps:
acquiring the access times of the access behavior in each second preset time period and time points corresponding to the access times, dividing the second preset time period into N time intervals, and calculating the access frequency of each time interval respectively;
if the access frequency of the N time-sharing periods is lower than the first threshold, dividing the next second preset time period into N/2 time-sharing periods, and then respectively calculating the access frequency of each time-sharing period;
if the access frequency of the N time-sharing periods is higher than a second threshold, dividing the next second preset time period into 2N time-sharing periods, and then respectively calculating the access frequency of each time-sharing period;
if the access frequency of the N time-sharing periods is higher than the first threshold and lower than the second threshold, dividing the next second preset time period into N time-sharing periods, and then respectively calculating the access frequency of each time-sharing period;
and if the access frequency of the access behavior in any time interval is higher than the frequency threshold, the access behavior is abnormal.
6. The anti-crawler method as recited in claim 5, further comprising:
and adding the IP address corresponding to the access behavior included in the blacklist rule base into an IP blacklist, and forbidding the access behavior of the IP address.
7. The anti-crawler method as recited in claim 6, further comprising:
if an access behavior is not included in the blacklist rule base, analyzing the access behavior, and if the access behavior is abnormal, forbidding the access behavior; and if the access behavior is included by the blacklist rule base, not continuously analyzing the access behavior.
CN201611183559.8A 2016-12-20 2016-12-20 Anti-crawler system and method Active CN106657057B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611183559.8A CN106657057B (en) 2016-12-20 2016-12-20 Anti-crawler system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611183559.8A CN106657057B (en) 2016-12-20 2016-12-20 Anti-crawler system and method

Publications (2)

Publication Number Publication Date
CN106657057A CN106657057A (en) 2017-05-10
CN106657057B true CN106657057B (en) 2020-09-29

Family

ID=58833462

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611183559.8A Active CN106657057B (en) 2016-12-20 2016-12-20 Anti-crawler system and method

Country Status (1)

Country Link
CN (1) CN106657057B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109246064B (en) * 2017-07-11 2021-09-03 阿里巴巴集团控股有限公司 Method, device and equipment for generating security access control and network access rule
CN107196968B (en) * 2017-07-12 2020-10-20 深圳市活力天汇科技股份有限公司 Crawler identification method
CN107547548B (en) * 2017-09-05 2020-06-30 北京京东尚科信息技术有限公司 Data processing method and system
CN108133140A (en) * 2017-12-08 2018-06-08 成都数聚城堡科技有限公司 A kind of mode of the anti-reptile of dynamic
CN109241733A (en) * 2018-08-07 2019-01-18 北京神州绿盟信息安全科技股份有限公司 Crawler Activity recognition method and device based on web access log
CN109246141B (en) * 2018-10-26 2021-03-12 电子科技大学 SDN-based excessive crawler prevention method
CN109818949A (en) * 2019-01-17 2019-05-28 济南浪潮高新科技投资发展有限公司 A kind of anti-crawler method neural network based
CN110020512A (en) * 2019-04-12 2019-07-16 重庆天蓬网络有限公司 A kind of method, apparatus, equipment and the storage medium of anti-crawler
CN110781366A (en) * 2019-09-09 2020-02-11 深圳壹账通智能科技有限公司 Webpage data processing method and device, computer equipment and storage medium
CN111355728B (en) * 2020-02-27 2023-01-03 紫光云技术有限公司 Malicious crawler protection method
CN111625700B (en) * 2020-05-25 2023-04-07 北京世纪家天下科技发展有限公司 Anti-grabbing method, device, equipment and computer storage medium
CN112003833A (en) * 2020-07-30 2020-11-27 瑞数信息技术(上海)有限公司 Abnormal behavior detection method and device
CN112688919A (en) * 2020-12-11 2021-04-20 杭州安恒信息技术股份有限公司 APP interface-based crawler-resisting method, device and medium
CN113536301A (en) * 2021-07-19 2021-10-22 北京计算机技术及应用研究所 Behavior characteristic analysis-based anti-crawling method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103107948A (en) * 2011-11-15 2013-05-15 阿里巴巴集团控股有限公司 Flow control method and flow control device
CN103297435A (en) * 2013-06-06 2013-09-11 中国科学院信息工程研究所 Abnormal access behavior detection method and system on basis of WEB logs
CN103475637A (en) * 2013-04-24 2013-12-25 携程计算机技术(上海)有限公司 Network access control method and system based on IP access behaviors
CN104902008A (en) * 2015-04-26 2015-09-09 成都创行信息科技有限公司 Crawler data processing method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239758B (en) * 2013-06-13 2018-04-27 阿里巴巴集团控股有限公司 A kind of man-machine recognition methods and corresponding man-machine identifying system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103107948A (en) * 2011-11-15 2013-05-15 阿里巴巴集团控股有限公司 Flow control method and flow control device
CN103475637A (en) * 2013-04-24 2013-12-25 携程计算机技术(上海)有限公司 Network access control method and system based on IP access behaviors
CN103297435A (en) * 2013-06-06 2013-09-11 中国科学院信息工程研究所 Abnormal access behavior detection method and system on basis of WEB logs
CN104902008A (en) * 2015-04-26 2015-09-09 成都创行信息科技有限公司 Crawler data processing method

Also Published As

Publication number Publication date
CN106657057A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN106657057B (en) Anti-crawler system and method
CN107645503B (en) Rule-based method for detecting DGA family to which malicious domain name belongs
CN110351280B (en) Method, system, equipment and readable storage medium for extracting threat information
CN107241296B (en) Webshell detection method and device
CN103279710B (en) Method and system for detecting malicious codes of Internet information system
Murtaza et al. A host-based anomaly detection approach by representing system calls as states of kernel modules
CN108063759B (en) Web vulnerability scanning method
CN106055980A (en) Rule-based JavaScript security testing method
CN108959071B (en) RASP-based PHP deformation webshell detection method and system
WO2017040957A1 (en) Process launch, monitoring and execution control
US9992216B2 (en) Identifying malicious executables by analyzing proxy logs
CN105959316A (en) Network security authentication system
US20130318609A1 (en) Method and apparatus for quantifying threat situations to recognize network threat in advance
CN113992340B (en) User abnormal behavior identification method, device, equipment and storage medium
CN107103237A (en) A kind of detection method and device of malicious file
CN110096872A (en) The detection method and server of homepage invasion script attack tool
CN110135162A (en) The recognition methods of the back door WEBSHELL, device, equipment and storage medium
US10417422B2 (en) Method and apparatus for detecting application
CN110598959A (en) Asset risk assessment method and device, electronic equipment and storage medium
CN116305155A (en) Program safety detection protection method, device, medium and electronic equipment
US11423099B2 (en) Classification apparatus, classification method, and classification program
Zuo Defense of Computer Network Viruses Based on Data Mining Technology.
CN109918901A (en) The method that real-time detection is attacked based on Cache
CN112966264A (en) XSS attack detection method, device, equipment and machine-readable storage medium
KR101608221B1 (en) System and method of sensing cyber threat using database access pattern

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant