CN111209566A - Intelligent anti-crawler system and method for multi-layer threat interception - Google Patents

Intelligent anti-crawler system and method for multi-layer threat interception Download PDF

Info

Publication number
CN111209566A
CN111209566A CN201911368288.7A CN201911368288A CN111209566A CN 111209566 A CN111209566 A CN 111209566A CN 201911368288 A CN201911368288 A CN 201911368288A CN 111209566 A CN111209566 A CN 111209566A
Authority
CN
China
Prior art keywords
user
risk
information
module
browser
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911368288.7A
Other languages
Chinese (zh)
Inventor
陈博
陈国庆
谢强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Jiyi Network Technology Co ltd
Original Assignee
Wuhan Jiyi Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Jiyi Network Technology Co ltd filed Critical Wuhan Jiyi Network Technology Co ltd
Priority to CN201911368288.7A priority Critical patent/CN111209566A/en
Publication of CN111209566A publication Critical patent/CN111209566A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The invention provides an intelligent anti-crawler system and method for multilayer threat interception, which comprises an information acquisition module, a risk discrimination module and a risk disposal module, wherein the information acquisition module acquires browser running environment information and click track information of a user; the risk judgment module comprehensively judges the environment, the network information, the IP information and the user behavior of the user browser according to the information acquired by the information acquisition module, and judges whether an access user is a malicious user, a high-risk user or a normal user; and the risk handling module is used for intercepting the handling mode of the malicious user and pushing the verification code for the handling mode of the high-risk user according to the judgment result of the risk judgment module. The invention has the beneficial effects that: the multi-dimensional detection capability is provided, and the real-time performance and the accuracy are greatly improved; and intelligent interception is adopted, so that the false sealing rate is obviously reduced.

Description

Intelligent anti-crawler system and method for multi-layer threat interception
Technical Field
The invention relates to the technical field of internet security, in particular to an intelligent anti-crawler system and method for multi-layer threat interception.
Background
The crawler is originally sourced from a search engine, is a program for automatically capturing information from the internet according to a certain rule, is also called as a web spider, a network robot and the like, and now, data resources are more and more precious. The crawlers can be divided into web crawlers and interface crawlers according to functions, and can be divided into legal crawlers and malicious crawlers according to authorization conditions. In order to prevent data leakage, the anti-crawler technology is developed.
At present, the anti-crawler scheme is mostly concentrated in User-Agent and IP interception, crawlers are intercepted by frequency and black and white lists, the anti-crawler mode has certain effect, but for black products, a large number of IP Agent resources only need to be mastered, and User-agents are continuously rotated to bypass easily. Moreover, the cost of IP is currently low, which results in a large array of websites being burdened with crawlers, and some websites even lack basic anti-crawler solutions.
In general, the existing anti-crawler countermeasures are too single, maintenance difficulty exists, an IP list needs to be updated continuously, maintenance difficulty is high, the false sealing rate is high, discrimination factors are few, and the system is not flexible enough.
Disclosure of Invention
In view of the above, the invention provides an intelligent anti-crawler system and method for multi-layer threat interception, which perform joint judgment by using multidimensional data such as browser environment information, network information, IP information, CNN model and the like, and perform intelligent interception on malicious crawlers, that is, directly intercept explicit malicious visitors, and further judge potential high-risk visitors by providing interactive means such as verification codes, and if only normal users are available, allow continuous access to services, significantly reduce false seal rate, and improve accuracy of crawler judgment.
The invention provides an intelligent anti-crawler system for intercepting multilayer threats, which comprises an information acquisition module, a risk discrimination module and a risk disposal module, wherein the information acquisition module acquires browser running environment information and click track information of a user; the risk judgment module comprehensively judges the environment, the network information, the IP information and the user behavior of the user browser according to the information acquired by the information acquisition module, and judges whether an access user is a malicious user, a high-risk user or a normal user; and the risk processing module intercepts the current access user when judging the user to be a malicious user according to the judgment result of the risk judgment module, and pushes the verification code to the current access user when judging the user to be a high-risk user.
Further, the browser running environment information includes an operating system type, running hardware information, display card information, a browser plug-in list, a browser window size, picture loading information, IP information, and user mouse track information.
Further, the risk discrimination module further comprises a browser environment discrimination module, a network discrimination module, an IP discrimination module, and an intelligent behavior discrimination module, wherein:
the browser environment judging module judges whether the browser operated by the user is a normal browser or not according to the browser operation environment information; the network judging module judges whether the browser operated by the user is tampered according to the network information; the IP judging module judges the risk of the user IP according to the IP information of the user; and the intelligent behavior judging module judges whether the user behavior is the machine simulation behavior according to the click track information of the user.
Further, the network information refers to http protocol information used by the user in the web service, and mainly includes protocol header information; the network discrimination module establishes a complete sample library by collecting protocol header information adopted by different browsers, so that whether the browser operated by a user is tampered is judged according to the sample library.
Furthermore, the IP distinguishing module establishes an IP risk library by recording the access behavior of the user, wherein the IP risk library is used for tracking the historical behavior of the user access and determining the risk degree of the IP according to the frequency information of the IP access;
for a new access user, the IP risk library is used for determining the attribute information of the IP according to the IP of the user and further judging the risk of the IP of the user; the attribute information of the IP comprises the fidelity, the affiliated organization, the geographic position and the IP type.
Furthermore, the intelligent behavior discrimination module collects mass normal click trajectory data of users, trains the collected data by using a CNN model to obtain a behavior feature library of the normal users, and judges user click trajectory information by using the behavior feature library for new access users.
Further, the specific process of pushing the verification code is as follows: the risk processing module pushes a verification code to the high-risk user, if the high-risk user successfully passes the verification code, the high-risk user is considered to be continuously accessible, otherwise, the verification code is pushed again, and if the verification code does not successfully pass for multiple times, the high-risk user is intercepted.
The invention also provides an intelligent anti-crawler method for multilayer threat interception, which comprises the following steps:
s1, when a user uses a WEB browser to make an access request, an information acquisition module acquires browser running environment information and user click track information of the user and sends the information to a risk judgment module; (ii) a
S2, the risk judgment module comprehensively judges the user browser environment, the network information, the IP information and the user behavior according to the information collected in the step S1, judges whether the access user is a malicious user, a high-risk user or a normal user, and sends a judgment result to the risk disposal module;
s3, according to the judgment result of the step S2, the risk handling module directly intercepts the malicious user; and for the high-risk user, the risk handling module pushes a verification code to the high-risk user, if the high-risk user successfully passes the verification code, the high-risk user is considered to be continuously accessible, otherwise, the verification code is pushed again, and if the verification code does not successfully pass for multiple times, the high-risk user is intercepted.
Further, the specific process of step S2 is as follows:
s21, judging whether the browser of the user is a normal browser or not by the browser environment judging module according to the browser running environment information;
s22, the network judging module judges whether the browser of the user is tampered by using the network information;
s23, the IP judging module judges whether the user IP has risk by using the IP information of the user;
s24, the intelligent behavior judging module judges whether the user behavior is the machine simulation behavior according to the user click track information;
s25, integrating the results of the step S21, the step S22, the step S23 and the step S24 by the risk judgment module, determining whether the user is a malicious user, a high-risk user or a normal user, and sending the judgment result to the risk handling module.
The technical scheme provided by the invention has the beneficial effects that: the multidimensional detection capabilities of browser environment information, network information, IP information, CNN models and the like are provided, and the real-time performance and the accuracy are greatly improved; by adopting intelligent interception, the false sealing rate is obviously reduced, and the accuracy of crawler judgment is improved.
Drawings
Fig. 1 is a block diagram of an intelligent anti-crawler system for multi-layer threat interception according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present invention provides an intelligent anti-crawler system for intercepting multilayer threats, including an information acquisition module 1, a risk discrimination module 2, and a risk handling module 3, where the information acquisition module 1 acquires browser running environment information of a user by using JavaScript codes, and acquires click trajectory information of the user; the risk discrimination module 2 comprehensively judges the environment, the network information, the IP information and the user behavior of the user browser according to the information acquired by the information acquisition module 1, and determines whether an access user is a malicious user, a high-risk user or a normal user, wherein the risk discrimination module 2 further comprises a browser environment discrimination module 21, a network discrimination module 22, an IP discrimination module 23 and an intelligent behavior discrimination module 24; and the risk processing module 3 intercepts malicious users according to the judgment result of the risk judgment module 2 and pushes verification codes for high-risk users.
Specifically, the browser environment determination module 21 is configured to determine whether a browser operated by a user is a normal browser; the network judging module 22 is used for judging whether the browser operated by the user is tampered; the IP judging module 23 is configured to judge whether the user IP has a risk; the intelligent behavior judging module is used for judging whether the user behavior is the machine simulation behavior.
The embodiment also provides an intelligent anti-crawler method for multi-layer threat interception, which comprises the following steps:
s1, when a user uses a WEB browser to make an access request, the information acquisition module 1 acquires browser running environment information and click track information of the user and sends the information to the risk judgment module 2; the browser running environment information comprises an operating system type, running hardware information (CPU memory), display card information, a browser plug-in list, browser window size, whether pictures can be loaded or not, IP information, user mouse track information and the like;
s2, the risk judgment module 2 comprehensively judges the user browser running environment, the network information, the IP information and the user behavior according to the information collected in the step S1, judges whether the access user is a malicious user, a high-risk user or a normal user, and sends a judgment result to the risk disposal module 3; specifically, step S2 includes:
s21, judging whether the browser operated by the user is a normal browser or not by the browser environment judging module 21 according to the completeness of the browser operation environment information;
s22, the network discrimination module 22 judges whether the browser of the user is tampered with by using network information, wherein the network information refers to http protocol information used by the user in the web service and mainly includes protocol header information; it should be noted that different browsers generally use different headers and the sequence of the headers is different, so that the network discrimination module 22 determines the browser operated by the user according to the sample library by collecting the header information of the protocols used by the different browsers and establishing a complete sample library;
s23, the IP determination module 23 determines whether the user IP is risky by using the IP information of the user: the IP distinguishing module 23 establishes a huge IP risk library by recording behaviors of each user access process, such as script access, cloud server IP access, simulator access, and the like, and the IP risk library is used for tracking historical behaviors of the user access and determining the risk degree of the IP according to frequency information of the IP access; for a new access user, the IP risk library determines attribute information of an IP according to the IP of the user, such as a true degree, a belonging organization, a geographical location, and an IP type (a common user, a machine room, a large-scale exit, a backbone network, a mobile user, and the like), specifically, a normal user does not access services from the machine room or rarely accesses services from the machine room, and the IP from the cloud machine room is defaulted to have a high risk;
s24, the intelligent behavior judging module 24 judges whether the user behavior is the machine simulation behavior according to the user click track information; specifically, the intelligent behavior discrimination module 24 collects mass normal click trajectory data of users, trains the collected data by using a CNN model to obtain a stable behavior feature library, and when a new access user exists, judges click trajectory information of the user by using the behavior feature library;
s25, the risk determination module 2 integrates the results of step S21, step S22, step S23, and step S24, determines whether the user is a malicious user, a high-risk user, or a normal user, and sends the determination result to the risk handling module 3.
S3, according to the judgment result of the step S2, the risk processing module 3 directly intercepts the malicious user; for high-risk users, the risk processing module 3 pushes verification codes to the high-risk users, if the high-risk users successfully pass the verification codes, the high-risk users are considered to be continuously accessible, otherwise, the verification codes are pushed again, and if the verification codes do not successfully pass for multiple times, the high-risk users are intercepted.
In this document, the terms front, back, upper and lower are used to define the components in the drawings and the positions of the components relative to each other, and are used for clarity and convenience of the technical solution. It is to be understood that the use of the directional terms should not be taken to limit the scope of the claims.
The features of the embodiments and embodiments described herein above may be combined with each other without conflict.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (9)

1. An intelligent anti-crawler system for multi-layer threat interception is characterized by comprising an information acquisition module, a risk discrimination module and a risk disposal module, wherein the information acquisition module acquires browser running environment information and click track information of a user; the risk judgment module comprehensively judges the environment, the network information, the IP information and the user behavior of the user browser according to the information acquired by the information acquisition module, and judges whether an access user is a malicious user, a high-risk user or a normal user; and the risk processing module intercepts the current access user when judging the user to be a malicious user according to the judgment result of the risk judgment module, and pushes the verification code to the current access user when judging the user to be a high-risk user.
2. The multi-tier threat interception intelligent anti-crawler system according to claim 1, wherein said browser runtime environment information comprises operating system type, running hardware information, graphics card information, browser plug-in list, browser window size, picture loading information, IP information, and user mouse track information.
3. The intelligent anti-crawler system for multi-layered threat interception according to claim 1 or 2, wherein the risk discrimination module further comprises a browser environment discrimination module, a network discrimination module, an IP discrimination module, and an intelligent behavior discrimination module, wherein:
the browser environment judging module judges whether the browser operated by the user is a normal browser or not according to the browser operation environment information; the network judging module judges whether the browser operated by the user is tampered according to the network information; the IP judging module judges the risk of the user IP according to the IP information of the user; and the intelligent behavior judging module judges whether the user behavior is the machine simulation behavior according to the click track information of the user.
4. The multi-layer threat interception intelligent anti-crawler system according to claim 3, wherein the network information refers to http protocol information used by a user in web service, and mainly comprises protocol header information; the network discrimination module establishes a complete sample library by collecting protocol header information adopted by different browsers, so that whether the browser operated by a user is tampered is judged according to the sample library.
5. The multi-layer threat interception intelligent anti-crawler system according to claim 3, wherein the IP discrimination module establishes an IP risk library by recording access behaviors of users, the IP risk library is used for tracking historical behaviors of user access and determining the risk degree of the IP according to frequency information of IP access;
for a new access user, the IP risk library is used for determining the attribute information of the IP according to the IP of the user and further judging the risk of the IP of the user; the attribute information of the IP comprises the fidelity, the affiliated organization, the geographic position and the IP type.
6. The multi-layer threat interception intelligent anti-crawler system according to claim 3, wherein the intelligent behavior discrimination module acquires a behavior feature library of a normal user by collecting normal click trajectory data of a large number of users and training the collected data by using a CNN (CNN) model, and judges user click trajectory information by using the behavior feature library for a new access user.
7. The multi-layered threat interception intelligent anti-crawler system according to claim 1, wherein the specific process of pushing the verification code is as follows: the risk processing module pushes a verification code to the high-risk user, if the high-risk user successfully passes the verification code, the high-risk user is considered to be continuously accessible, otherwise, the verification code is pushed again, and if the verification code does not successfully pass for multiple times, the high-risk user is intercepted.
8. An intelligent anti-crawler method for multi-layer threat interception, which adopts the system as claimed in any one of claims 1 to 7, and is characterized by comprising the following steps:
s1, when a user uses a WEB browser to make an access request, an information acquisition module acquires browser running environment information and user click track information of the user and sends the information to a risk judgment module; (ii) a
S2, the risk judgment module comprehensively judges the user browser environment, the network information, the IP information and the user behavior according to the information collected in the step S1, judges whether the access user is a malicious user, a high-risk user or a normal user, and sends a judgment result to the risk disposal module;
s3, according to the judgment result of the step S2, the risk handling module directly intercepts the malicious user; and for the high-risk user, the risk handling module pushes a verification code to the high-risk user, if the high-risk user successfully passes the verification code, the high-risk user is considered to be continuously accessible, otherwise, the verification code is pushed again, and if the verification code does not successfully pass for multiple times, the high-risk user is intercepted.
9. The intelligent anti-crawler method for multi-layer threat interception according to claim 8, wherein the specific process of step S2 is:
s21, judging whether the browser of the user is a normal browser or not by the browser environment judging module according to the browser running environment information;
s22, the network judging module judges whether the browser of the user is tampered by using the network information;
s23, the IP judging module judges whether the user IP has risk by using the IP information of the user;
s24, the intelligent behavior judging module judges whether the user behavior is the machine simulation behavior according to the user click track information;
s25, integrating the results of the step S21, the step S22, the step S23 and the step S24 by the risk judgment module, determining whether the user is a malicious user, a high-risk user or a normal user, and sending the judgment result to the risk handling module.
CN201911368288.7A 2019-12-26 2019-12-26 Intelligent anti-crawler system and method for multi-layer threat interception Pending CN111209566A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911368288.7A CN111209566A (en) 2019-12-26 2019-12-26 Intelligent anti-crawler system and method for multi-layer threat interception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911368288.7A CN111209566A (en) 2019-12-26 2019-12-26 Intelligent anti-crawler system and method for multi-layer threat interception

Publications (1)

Publication Number Publication Date
CN111209566A true CN111209566A (en) 2020-05-29

Family

ID=70785227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911368288.7A Pending CN111209566A (en) 2019-12-26 2019-12-26 Intelligent anti-crawler system and method for multi-layer threat interception

Country Status (1)

Country Link
CN (1) CN111209566A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112003833A (en) * 2020-07-30 2020-11-27 瑞数信息技术(上海)有限公司 Abnormal behavior detection method and device
CN116032540A (en) * 2022-12-05 2023-04-28 杭州思律舟到科技有限公司 Network security management method and system based on data processing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945340A (en) * 2012-10-23 2013-02-27 北京神州绿盟信息安全科技股份有限公司 Information object detection method and system
CN108777687A (en) * 2018-06-05 2018-11-09 掌阅科技股份有限公司 Reptile hold-up interception method, electronic equipment, storage medium based on user behavior portrait
CN109391620A (en) * 2018-10-22 2019-02-26 武汉极意网络科技有限公司 Method for building up, system, server and the storage medium of abnormal behaviour decision model
CN109862562A (en) * 2019-01-02 2019-06-07 武汉极意网络科技有限公司 A kind of dynamic verification code choosing method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945340A (en) * 2012-10-23 2013-02-27 北京神州绿盟信息安全科技股份有限公司 Information object detection method and system
CN108777687A (en) * 2018-06-05 2018-11-09 掌阅科技股份有限公司 Reptile hold-up interception method, electronic equipment, storage medium based on user behavior portrait
CN109391620A (en) * 2018-10-22 2019-02-26 武汉极意网络科技有限公司 Method for building up, system, server and the storage medium of abnormal behaviour decision model
CN109862562A (en) * 2019-01-02 2019-06-07 武汉极意网络科技有限公司 A kind of dynamic verification code choosing method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112003833A (en) * 2020-07-30 2020-11-27 瑞数信息技术(上海)有限公司 Abnormal behavior detection method and device
CN116032540A (en) * 2022-12-05 2023-04-28 杭州思律舟到科技有限公司 Network security management method and system based on data processing

Similar Documents

Publication Publication Date Title
US10567407B2 (en) Method and system for detecting malicious web addresses
CN111600850B (en) Method, equipment and storage medium for detecting mine digging virtual currency
US20090094311A1 (en) System and Method for Detecting Internet Bots
CN112685737A (en) APP detection method, device, equipment and storage medium
CN107465651A (en) Network attack detecting method and device
CN104506484A (en) Proprietary protocol analysis and identification method
CN111049786A (en) Network attack detection method, device, equipment and storage medium
US11568277B2 (en) Method and apparatus for detecting anomalies in mission critical environments using word representation learning
CN112511459B (en) Traffic identification method and device, electronic equipment and storage medium
CN108134816B (en) Access to data on remote device
CN110298662B (en) Automatic detection method and device for transaction repeated submission
CN110933103A (en) Anti-crawler method, device, equipment and medium
CN102790706A (en) Safety analyzing method and device of mass events
CN111049783A (en) Network attack detection method, device, equipment and storage medium
EP3340097B1 (en) Analysis device, analysis method, and analysis program
CN111209566A (en) Intelligent anti-crawler system and method for multi-layer threat interception
CN111885007A (en) Information tracing method, device, system and storage medium
CN114422271B (en) Data processing method, device, equipment and readable storage medium
CN102035847B (en) User access behavior processing method and system and client
CN114157568B (en) Browser secure access method, device, equipment and storage medium
CN111859374A (en) Method, device and system for detecting social engineering attack event
CN105515909A (en) Data collection test method and device
CN110955890B (en) Method and device for detecting malicious batch access behaviors and computer storage medium
CN111786990B (en) Defense method and system for WEB active push skip page
CN115484326A (en) Method, system and storage medium for processing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200529