CN111209566A

CN111209566A - Intelligent anti-crawler system and method for multi-layer threat interception

Info

Publication number: CN111209566A
Application number: CN201911368288.7A
Authority: CN
Inventors: 陈博; 陈国庆; 谢强
Original assignee: Wuhan Jiyi Network Technology Co ltd
Current assignee: Wuhan Jiyi Network Technology Co ltd
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2020-05-29

Abstract

The invention provides an intelligent anti-crawler system and method for multilayer threat interception, which comprises an information acquisition module, a risk discrimination module and a risk disposal module, wherein the information acquisition module acquires browser running environment information and click track information of a user; the risk judgment module comprehensively judges the environment, the network information, the IP information and the user behavior of the user browser according to the information acquired by the information acquisition module, and judges whether an access user is a malicious user, a high-risk user or a normal user; and the risk handling module is used for intercepting the handling mode of the malicious user and pushing the verification code for the handling mode of the high-risk user according to the judgment result of the risk judgment module. The invention has the beneficial effects that: the multi-dimensional detection capability is provided, and the real-time performance and the accuracy are greatly improved; and intelligent interception is adopted, so that the false sealing rate is obviously reduced.

Description

Intelligent anti-crawler system and method for multi-layer threat interception

Technical Field

The invention relates to the technical field of internet security, in particular to an intelligent anti-crawler system and method for multi-layer threat interception.

Background

The crawler is originally sourced from a search engine, is a program for automatically capturing information from the internet according to a certain rule, is also called as a web spider, a network robot and the like, and now, data resources are more and more precious. The crawlers can be divided into web crawlers and interface crawlers according to functions, and can be divided into legal crawlers and malicious crawlers according to authorization conditions. In order to prevent data leakage, the anti-crawler technology is developed.

At present, the anti-crawler scheme is mostly concentrated in User-Agent and IP interception, crawlers are intercepted by frequency and black and white lists, the anti-crawler mode has certain effect, but for black products, a large number of IP Agent resources only need to be mastered, and User-agents are continuously rotated to bypass easily. Moreover, the cost of IP is currently low, which results in a large array of websites being burdened with crawlers, and some websites even lack basic anti-crawler solutions.

In general, the existing anti-crawler countermeasures are too single, maintenance difficulty exists, an IP list needs to be updated continuously, maintenance difficulty is high, the false sealing rate is high, discrimination factors are few, and the system is not flexible enough.

Disclosure of Invention

In view of the above, the invention provides an intelligent anti-crawler system and method for multi-layer threat interception, which perform joint judgment by using multidimensional data such as browser environment information, network information, IP information, CNN model and the like, and perform intelligent interception on malicious crawlers, that is, directly intercept explicit malicious visitors, and further judge potential high-risk visitors by providing interactive means such as verification codes, and if only normal users are available, allow continuous access to services, significantly reduce false seal rate, and improve accuracy of crawler judgment.

The invention provides an intelligent anti-crawler system for intercepting multilayer threats, which comprises an information acquisition module, a risk discrimination module and a risk disposal module, wherein the information acquisition module acquires browser running environment information and click track information of a user; the risk judgment module comprehensively judges the environment, the network information, the IP information and the user behavior of the user browser according to the information acquired by the information acquisition module, and judges whether an access user is a malicious user, a high-risk user or a normal user; and the risk processing module intercepts the current access user when judging the user to be a malicious user according to the judgment result of the risk judgment module, and pushes the verification code to the current access user when judging the user to be a high-risk user.

Further, the browser running environment information includes an operating system type, running hardware information, display card information, a browser plug-in list, a browser window size, picture loading information, IP information, and user mouse track information.

Further, the risk discrimination module further comprises a browser environment discrimination module, a network discrimination module, an IP discrimination module, and an intelligent behavior discrimination module, wherein:

the browser environment judging module judges whether the browser operated by the user is a normal browser or not according to the browser operation environment information; the network judging module judges whether the browser operated by the user is tampered according to the network information; the IP judging module judges the risk of the user IP according to the IP information of the user; and the intelligent behavior judging module judges whether the user behavior is the machine simulation behavior according to the click track information of the user.

Further, the network information refers to http protocol information used by the user in the web service, and mainly includes protocol header information; the network discrimination module establishes a complete sample library by collecting protocol header information adopted by different browsers, so that whether the browser operated by a user is tampered is judged according to the sample library.

Furthermore, the IP distinguishing module establishes an IP risk library by recording the access behavior of the user, wherein the IP risk library is used for tracking the historical behavior of the user access and determining the risk degree of the IP according to the frequency information of the IP access;

for a new access user, the IP risk library is used for determining the attribute information of the IP according to the IP of the user and further judging the risk of the IP of the user; the attribute information of the IP comprises the fidelity, the affiliated organization, the geographic position and the IP type.

Furthermore, the intelligent behavior discrimination module collects mass normal click trajectory data of users, trains the collected data by using a CNN model to obtain a behavior feature library of the normal users, and judges user click trajectory information by using the behavior feature library for new access users.

Further, the specific process of pushing the verification code is as follows: the risk processing module pushes a verification code to the high-risk user, if the high-risk user successfully passes the verification code, the high-risk user is considered to be continuously accessible, otherwise, the verification code is pushed again, and if the verification code does not successfully pass for multiple times, the high-risk user is intercepted.

The invention also provides an intelligent anti-crawler method for multilayer threat interception, which comprises the following steps:

s1, when a user uses a WEB browser to make an access request, an information acquisition module acquires browser running environment information and user click track information of the user and sends the information to a risk judgment module; (ii) a

S2, the risk judgment module comprehensively judges the user browser environment, the network information, the IP information and the user behavior according to the information collected in the step S1, judges whether the access user is a malicious user, a high-risk user or a normal user, and sends a judgment result to the risk disposal module;

s3, according to the judgment result of the step S2, the risk handling module directly intercepts the malicious user; and for the high-risk user, the risk handling module pushes a verification code to the high-risk user, if the high-risk user successfully passes the verification code, the high-risk user is considered to be continuously accessible, otherwise, the verification code is pushed again, and if the verification code does not successfully pass for multiple times, the high-risk user is intercepted.

Further, the specific process of step S2 is as follows:

s21, judging whether the browser of the user is a normal browser or not by the browser environment judging module according to the browser running environment information;

s22, the network judging module judges whether the browser of the user is tampered by using the network information;

s23, the IP judging module judges whether the user IP has risk by using the IP information of the user;

s24, the intelligent behavior judging module judges whether the user behavior is the machine simulation behavior according to the user click track information;

s25, integrating the results of the step S21, the step S22, the step S23 and the step S24 by the risk judgment module, determining whether the user is a malicious user, a high-risk user or a normal user, and sending the judgment result to the risk handling module.

The technical scheme provided by the invention has the beneficial effects that: the multidimensional detection capabilities of browser environment information, network information, IP information, CNN models and the like are provided, and the real-time performance and the accuracy are greatly improved; by adopting intelligent interception, the false sealing rate is obviously reduced, and the accuracy of crawler judgment is improved.

Drawings

Fig. 1 is a block diagram of an intelligent anti-crawler system for multi-layer threat interception according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.

Referring to fig. 1, an embodiment of the present invention provides an intelligent anti-crawler system for intercepting multilayer threats, including an information acquisition module 1, a risk discrimination module 2, and a risk handling module 3, where the information acquisition module 1 acquires browser running environment information of a user by using JavaScript codes, and acquires click trajectory information of the user; the risk discrimination module 2 comprehensively judges the environment, the network information, the IP information and the user behavior of the user browser according to the information acquired by the information acquisition module 1, and determines whether an access user is a malicious user, a high-risk user or a normal user, wherein the risk discrimination module 2 further comprises a browser environment discrimination module 21, a network discrimination module 22, an IP discrimination module 23 and an intelligent behavior discrimination module 24; and the risk processing module 3 intercepts malicious users according to the judgment result of the risk judgment module 2 and pushes verification codes for high-risk users.

Specifically, the browser environment determination module 21 is configured to determine whether a browser operated by a user is a normal browser; the network judging module 22 is used for judging whether the browser operated by the user is tampered; the IP judging module 23 is configured to judge whether the user IP has a risk; the intelligent behavior judging module is used for judging whether the user behavior is the machine simulation behavior.

The embodiment also provides an intelligent anti-crawler method for multi-layer threat interception, which comprises the following steps:

s1, when a user uses a WEB browser to make an access request, the information acquisition module 1 acquires browser running environment information and click track information of the user and sends the information to the risk judgment module 2; the browser running environment information comprises an operating system type, running hardware information (CPU memory), display card information, a browser plug-in list, browser window size, whether pictures can be loaded or not, IP information, user mouse track information and the like;

s2, the risk judgment module 2 comprehensively judges the user browser running environment, the network information, the IP information and the user behavior according to the information collected in the step S1, judges whether the access user is a malicious user, a high-risk user or a normal user, and sends a judgment result to the risk disposal module 3; specifically, step S2 includes:

s21, judging whether the browser operated by the user is a normal browser or not by the browser environment judging module 21 according to the completeness of the browser operation environment information;

s22, the network discrimination module 22 judges whether the browser of the user is tampered with by using network information, wherein the network information refers to http protocol information used by the user in the web service and mainly includes protocol header information; it should be noted that different browsers generally use different headers and the sequence of the headers is different, so that the network discrimination module 22 determines the browser operated by the user according to the sample library by collecting the header information of the protocols used by the different browsers and establishing a complete sample library;

s23, the IP determination module 23 determines whether the user IP is risky by using the IP information of the user: the IP distinguishing module 23 establishes a huge IP risk library by recording behaviors of each user access process, such as script access, cloud server IP access, simulator access, and the like, and the IP risk library is used for tracking historical behaviors of the user access and determining the risk degree of the IP according to frequency information of the IP access; for a new access user, the IP risk library determines attribute information of an IP according to the IP of the user, such as a true degree, a belonging organization, a geographical location, and an IP type (a common user, a machine room, a large-scale exit, a backbone network, a mobile user, and the like), specifically, a normal user does not access services from the machine room or rarely accesses services from the machine room, and the IP from the cloud machine room is defaulted to have a high risk;

s24, the intelligent behavior judging module 24 judges whether the user behavior is the machine simulation behavior according to the user click track information; specifically, the intelligent behavior discrimination module 24 collects mass normal click trajectory data of users, trains the collected data by using a CNN model to obtain a stable behavior feature library, and when a new access user exists, judges click trajectory information of the user by using the behavior feature library;

s25, the risk determination module 2 integrates the results of step S21, step S22, step S23, and step S24, determines whether the user is a malicious user, a high-risk user, or a normal user, and sends the determination result to the risk handling module 3.

S3, according to the judgment result of the step S2, the risk processing module 3 directly intercepts the malicious user; for high-risk users, the risk processing module 3 pushes verification codes to the high-risk users, if the high-risk users successfully pass the verification codes, the high-risk users are considered to be continuously accessible, otherwise, the verification codes are pushed again, and if the verification codes do not successfully pass for multiple times, the high-risk users are intercepted.

In this document, the terms front, back, upper and lower are used to define the components in the drawings and the positions of the components relative to each other, and are used for clarity and convenience of the technical solution. It is to be understood that the use of the directional terms should not be taken to limit the scope of the claims.

The features of the embodiments and embodiments described herein above may be combined with each other without conflict.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. An intelligent anti-crawler system for multi-layer threat interception is characterized by comprising an information acquisition module, a risk discrimination module and a risk disposal module, wherein the information acquisition module acquires browser running environment information and click track information of a user; the risk judgment module comprehensively judges the environment, the network information, the IP information and the user behavior of the user browser according to the information acquired by the information acquisition module, and judges whether an access user is a malicious user, a high-risk user or a normal user; and the risk processing module intercepts the current access user when judging the user to be a malicious user according to the judgment result of the risk judgment module, and pushes the verification code to the current access user when judging the user to be a high-risk user.

2. The multi-tier threat interception intelligent anti-crawler system according to claim 1, wherein said browser runtime environment information comprises operating system type, running hardware information, graphics card information, browser plug-in list, browser window size, picture loading information, IP information, and user mouse track information.

3. The intelligent anti-crawler system for multi-layered threat interception according to claim 1 or 2, wherein the risk discrimination module further comprises a browser environment discrimination module, a network discrimination module, an IP discrimination module, and an intelligent behavior discrimination module, wherein:

4. The multi-layer threat interception intelligent anti-crawler system according to claim 3, wherein the network information refers to http protocol information used by a user in web service, and mainly comprises protocol header information; the network discrimination module establishes a complete sample library by collecting protocol header information adopted by different browsers, so that whether the browser operated by a user is tampered is judged according to the sample library.

5. The multi-layer threat interception intelligent anti-crawler system according to claim 3, wherein the IP discrimination module establishes an IP risk library by recording access behaviors of users, the IP risk library is used for tracking historical behaviors of user access and determining the risk degree of the IP according to frequency information of IP access;

6. The multi-layer threat interception intelligent anti-crawler system according to claim 3, wherein the intelligent behavior discrimination module acquires a behavior feature library of a normal user by collecting normal click trajectory data of a large number of users and training the collected data by using a CNN (CNN) model, and judges user click trajectory information by using the behavior feature library for a new access user.

7. The multi-layered threat interception intelligent anti-crawler system according to claim 1, wherein the specific process of pushing the verification code is as follows: the risk processing module pushes a verification code to the high-risk user, if the high-risk user successfully passes the verification code, the high-risk user is considered to be continuously accessible, otherwise, the verification code is pushed again, and if the verification code does not successfully pass for multiple times, the high-risk user is intercepted.

8. An intelligent anti-crawler method for multi-layer threat interception, which adopts the system as claimed in any one of claims 1 to 7, and is characterized by comprising the following steps:

9. The intelligent anti-crawler method for multi-layer threat interception according to claim 8, wherein the specific process of step S2 is: