CN107426148B - Crawler-resisting method and system based on running environment feature recognition - Google Patents

Crawler-resisting method and system based on running environment feature recognition Download PDF

Info

Publication number
CN107426148B
CN107426148B CN201710203203.4A CN201710203203A CN107426148B CN 107426148 B CN107426148 B CN 107426148B CN 201710203203 A CN201710203203 A CN 201710203203A CN 107426148 B CN107426148 B CN 107426148B
Authority
CN
China
Prior art keywords
client
server
feature
data
selection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710203203.4A
Other languages
Chinese (zh)
Other versions
CN107426148A (en
Inventor
夏珺峥
蒋平川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Youe Data Co ltd
Original Assignee
Chengdu Youe Data Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Youe Data Co ltd filed Critical Chengdu Youe Data Co ltd
Priority to CN201710203203.4A priority Critical patent/CN107426148B/en
Publication of CN107426148A publication Critical patent/CN107426148A/en
Application granted granted Critical
Publication of CN107426148B publication Critical patent/CN107426148B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a crawler-resisting method and system based on running environment feature recognition, relates to the technical field of crawler-resisting, and solves the technical problems of crawler program recognition and crawler-resisting strategy implementation. Generating a new strategy package and a selection code for operating the new strategy package, updating the current strategy package of the server by using the new strategy package and constructing a characteristic category white list about the operating characteristic data of the new strategy package; sending the selection code to the client, and then sending a response request to the client; the client selectively operates the strategy packet according to the selection code to perform server response, obtains the characteristic data corresponding to the selection code and returns the characteristic data to the server; and analyzing the selection code and the feature data corresponding to the selection code, calculating the feature type of the client, judging whether the client belongs to a feature type white list, and performing access control on the client which does not belong to the feature type white list.

Description

Crawler-resisting method and system based on running environment feature recognition
Technical Field
The invention relates to the technical field of crawler identification and crawler resistance, in particular to a crawler resistance method and a crawler resistance system based on operating environment feature identification.
Background
With the advent of the big data age, data is becoming more and more important. To separate out the valuable data from the data, a large amount of data is first required. Data on the internet has received attention from businesses and individuals due to its openness, mass, and popularity. Many web crawlers are developed and utilized to collect various data on the internet. However, web crawlers can have many negative impacts. The web crawler program can request a large amount of servers in a short time, so that the performance of the servers is reduced; some crawlers can maliciously collect a large amount of public data, gather and sell the public data, and infringe copyright. And the data of other websites has high value density, or the enterprises do not want the information disclosed by the enterprises to be easily collected by the web crawlers, and different anti-crawler measures are implemented, such as verification codes, session check, access frequency and the like, so as to distinguish whether the web crawlers or the real system are accessed manually. But the web crawlers are various and a plurality of technical means for breaking through the anti-crawlers are also adopted. Such as identifying authentication codes through machine identification, manual coding, etc., bypassing session checking through splice access requests, etc., and simulating multi-user access using address proxies, etc. Through the characteristic identification to crawler operational environment, can effectual identification web crawler, prevent that data from revealing.
Disclosure of Invention
Aiming at the prior art, the invention aims to provide a crawler-resisting method and a crawler-resisting system based on running environment feature recognition, and the technical problems that in the prior art, a crawler implementation party continuously accesses a server and obtains a large amount of information, so that the running speed of the server is low, information resources are obtained and stolen in batch, and the like are solved.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
an anti-crawler method comprising the steps of:
step 1, generating a new strategy package and a selection code for operating the new strategy package, updating a current strategy package of a server by using the new strategy package, and constructing a characteristic category white list related to the operating characteristic data of the new strategy package, wherein the generation operation can be performed by an independent server or a local server;
step 2, sending the selection code to the client, and then sending a response request to the client, wherein the sending or receiving operation can be carried out through an independent server or a local server;
step 3, the client selectively operates the strategy packet according to the selection code to perform server response, obtain the characteristic data corresponding to the selection code and return the characteristic data to the server;
and 4, analyzing the selection codes and the feature data corresponding to the selection codes, calculating the feature types of the clients, judging whether the clients belong to a feature type white list, and performing access control on the clients which do not belong to the feature type white list.
In the above method, in step 1, a new policy package and a selection code for operating the new policy package are periodically generated.
In the above method, the step 4 includes the following steps:
step 4.1, receiving the characteristic data by the storage module in a preset time interval;
step 4.2, accessing the storage module through the server processing module, analyzing the feature data in the time interval, calculating the feature category of the client and judging whether the client belongs to a feature category white list or not;
step 4.2.1, marking the client belonging to the white list of the feature categories as legal, and then skipping to the step 1;
and 4.2.2, performing access control on the client which does not belong to the white list of the feature categories.
An anti-crawler method based on operating environment feature recognition comprises the following steps:
step 1, periodically generating a function matrix with different running time program functions as elements and selectors for mapping character strings to different elements in the function matrix at a server, setting a white list of running environment characteristic types, updating the current function matrix of the server through the function matrix, generating a random character string at the server as a selection code, and sending the selection code and a response request to a client;
step 2, a selector is decided at the client through a selection code, then the running time of the corresponding element of the selector in the function matrix is obtained, and then the selection code and the running time of the corresponding element are sent to the server;
and 3, analyzing the selection codes and the running time at the server side, calculating the running environment characteristic category of the client side, judging the client side which does not belong to the running environment characteristic category in the white list, and implementing an access control strategy for the client side.
In the above method, the step 1 further includes periodically generating a selection sub-feature code including the selection sub-text feature.
In the above method, the step 3 includes the following steps:
step 3.1, inquiring the current selection sub-feature code of the server to obtain a function matrix in current use and a selection sub-value for analyzing and calculating initial setting;
step 3.2, selecting all selection codes from the client and the operation time of the corresponding element of the current function matrix in a preset time interval, and calculating the characteristic category of the operation environment of the client through a mean value clustering algorithm or a machine learning algorithm;
and 3.3, judging the client which does not belong to the operating environment characteristic category in the white list, marking the client as illegal and implementing an access control strategy.
In the above method, in step 3, the storage module of the server receives the selection code from the client and the running time of the corresponding element within the preset time interval.
An anti-crawler system based on the feature recognition of operating environment comprises
The server comprises a characteristic algorithm module, a data interface module, a storage module, a data analysis module and an access processing module, wherein the characteristic algorithm module outputs a strategy package for updating and executing to the server;
the client receives the selection code generated by the characteristic algorithm module through the data interface module;
the client executes the strategy package by the selection code;
the server receives the characteristic data responded by the client executing the strategy packet through the data interface module corresponding to the selection code, the data interface module also outputs the characteristic data corresponding to the selection code to the storage module, the data analysis module calculates the characteristic data corresponding to the selection code in the storage module and feeds back information to the access processing module according to the calculation result, and the access processing module executes the preset strategy on the client through the feedback information.
In the above scheme, the policy package includes a function matrix having different runtime program functions as elements, a selector for mapping a string to different elements in the function matrix, and a selector feature code including a characteristic of the selector.
A server with anti-crawler function based on operating environment feature recognition comprises
The characteristic algorithm module periodically generates a function matrix with different running time program functions as elements, a selector for mapping character strings to different elements in the function matrix and a selector characteristic code containing the characteristic of the selector script;
the data interface module is used for interacting data with the client, outputting the selection codes generated by the characteristic algorithm module to the client and receiving the running time of the selection codes corresponding to the client;
the storage module receives data of the corresponding selection code running time received in a preset time interval from the data interface module;
the data analysis module is used for calculating data in a preset time interval in the storage module and judging the operating environment characteristic category of the client according to the calculation result;
and the access processing module executes a preset strategy on the client according to the client operating environment characteristic category calculated by the data analysis module.
Compared with the prior art, the invention has the beneficial effects that:
the method has the advantages that the program functions used for operation in the function matrix are randomly decided through random code selection, the running time of the program functions in a certain time interval is collected, whether the program functions belong to legal operation or not is judged after classification according to the clustering characteristics, the probability of finding the crawler client side is obviously increased, and the misjudgment rate of a white list can be obviously reduced along with the increase of collected data;
in the invention, the generation of the selected sub-feature codes is random, and the function matrix is a variable function matrix, so that the function matrix operation required to be executed by the client is different every time, thereby greatly improving the difficulty of crawler cracking.
Drawings
FIG. 1 is a block diagram of a server according to an embodiment of the present invention;
fig. 2 is a flowchart of an embodiment of an anti-crawler verification method according to the present invention.
Detailed Description
All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.
The running environment refers to the environment in which the client accesses the local application program of the service of the server to run; such as the operating environment of a browser accessing a Web page, the operating environment of a wechat software accessing a wechat page.
The white list is a feature record set which is identified to be in a normal client operation environment through recording; for example, a browser accessed by a Web page can be actually tested when running on different operating systems and different hardware platforms, the running time in a patent description mode is collected, machine learning, pattern recognition and neural network learning are carried out, and then classification is carried out according to algorithm characteristics to form a classified set.
The selection code may be generated by randomly generating a string when the client accesses the network, for example, by obtaining the current time, adding a random integer, and performing a hash algorithm.
The selection son is a section of program, a specific algorithm can be used for solving solutions of a multivariate equation, then the solutions are sorted from small to large, after the column number of a function matrix is subjected to modulus calculation, the operation function of each column is taken in sequence for operation; and after the text of the selected sub program is hashed, the hash value and the file name corresponding to the selected sub program are recorded in a database.
Each value of the function matrix is an index of a section of running program, and one running program can be found according to the value obtained by the function matrix; the running program can be a JavaScript program on a Web page, when the running program runs, the time for running the program code needs to be recorded, and then the running environment of the client program is judged by analyzing the running time record.
The invention is further described below with reference to the accompanying drawings:
a crawler-resisting method and system for identifying operating environment features specifically comprise the following steps:
s1, the server periodically generates a function matrix, selects a sub-feature code, updates the server program and sets a white list of the feature categories of the access operation environment;
s2, the server generates a selection code and sends the selection code to the client to request the client to respond;
s3, the client selects and runs the function in the function matrix according to the received selection code and the selector, and then sends the selection code and the running time of each function to the server;
s4, the server stores the selection codes and the running time of each function in the data storage module;
s5, the server analyzes the stored selection codes and the data of the running time of each function, and calculates the characteristic type of the current client running environment;
s6, according to the configured strategy, implementing an access control strategy for the access not in the white list;
in the method, the main steps of the function matrix, the selection code and the selector generated by the server characteristic algorithm module comprise:
s11, the function matrix is an N x N matrix formed by program functions with different execution times and marked as f (i, j);
s12, the selector is a program for mapping the character strings to a plurality of f (i, j);
s13, selecting the sub feature code as the text feature of the selection sub program;
s14, updating the server program, replacing the function matrix and the selector of the old version, and recording the current selector feature code;
s15, the running environment white list is a preset client running environment which is legally accessed;
in the method, the server generates the selection code and sends the selection code to the client, and the main steps of requesting the client to respond comprise:
s21, selecting a character string generated randomly;
s22, whether the server requests the client response is set by the server;
in the method, the server analyzes the stored selection codes and the data of the running time of each function, and calculates the characteristic category of the current client running environment, and the method mainly comprises the following steps:
s51, inquiring the selector feature code currently recorded by the server, and judging the currently used function matrix and the selector;
s52, selecting stored data of a certain time interval corresponding to the current function matrix, calculating the characteristic category of the current client operating environment, and adopting different recognition algorithms such as clustering and machine learning;
s53, marking a white list which is not in accordance with the operating environment characteristic category to obtain the access of the client;
the method specifically comprises a characteristic algorithm generation module, a data acquisition module, a data interface module, a data storage module, a data analysis module and an access processing module; the characteristic algorithm module mainly generates a function matrix, a selector characteristic code and a selection code; the data interface module of the data acquisition module is mainly used for sending and receiving data; the data storage module is mainly used for storing values of the selection codes and the function matrix which are returned by the client under different feature settings; the data analysis module is mainly used for analyzing the stored data and calculating the characteristic category of the operating environment; the access processing module mainly implements access control according to a set strategy.
Example 1
As shown in fig. 1, the server includes a data acquisition module, a data interface module 1, a feature algorithm generation module 2, a data storage module 3, a data analysis module 4, and an access processing module 5.
The functions of the functional modules are described below:
data acquisition module the data interface module 1 is used to send and receive data.
The feature algorithm generating module 2 is used for generating a function matrix, selecting a sub feature code and a selector.
The data storage module 3 is used for storing the selection codes and the function matrix running time data which are returned by the client and are under different feature settings.
The data analysis module 4 is used for analyzing the data in the data storage module and calculating the running environment feature category by using the data when the function matrix running time data from the client is received so as to verify whether the server receives the access from the crawler.
The access processing module 5 is used for implementing access control on the client according to the set policy.
As shown in fig. 2, the present embodiment provides an anti-crawler verification method, which is implemented by using the above anti-crawler system, and includes the following steps:
step 101, a server generates a function matrix f (i, j), selects a sub-feature code and a selector, and updates a server program and an operation environment feature category white list;
102, the server generates a selection code, sends the selection code to the client and requests the client to respond;
103, the client receives the selection code, and the selector selects a function in the function matrix;
step 104, the client runs the selected function and collects function running time data;
105, the client sends the selection code and the running time data to a server;
step 106, the server stores the selection codes and the running time data from the client;
step 107, the server analyzes the stored selection codes and the function running time data, and calculates the characteristic category of the current client running environment;
step 108, the server enters step 109 when the client is in the running environment white list according to the configured policy, and enters step 110 when the client is not in the running environment white list;
step 109, the server allows the client to access, and the process is ended;
in step 110, the server prohibits the client from accessing, and the process ends.
In order to make the technical solution of the present invention better understood by those skilled in the art, the following description is given by way of a specific example:
the configuration server stores a function matrix f (3,3), that is, 9 functions (functions f (1,1), f (1,2), f (1,3), (2,1), f (2,2), f (2,3), f (3,1), f (3,2), f (3,3) are stored in the matrix.
And the server randomly generates a selection code S, sends the selection code S to the client and waits for the response of the client.
After receiving the selection code S, the client analyzes the selection code S, and selects the functions f (1,1), f (2,1), f (2,1) and (3,1) by the selector according to the analysis result.
The client runs the functions f (1,1), f (2,1), f (2,1), (3,1) and generates the running times T1, T2, T3, T4, respectively. In case the server side requests a client reply, the client sends the runtime data T1, T2, T3, T4 and the selection code to the server.
After receiving the data T1, T2, T3, T4 and the selection code from the client, the server stores the data and triggers the data analysis module to analyze the data.
Specifically, the server compares the time data T1, T2, T3 and T4 with the historical data in the data storage module to calculate the feature class M1 of the current client operating environment.
After obtaining the feature class, the data analysis module sends the feature class M1 to the access control module.
After the access control module obtains the feature class M1, the access control module accesses the operating environment white list, if it is in the white list, the server operates the client to continue accessing the server, and if it is not in the white list, the server prohibits the client from continuing accessing the server.
In the verification process, the selection codes are randomly generated by the server, and further, the operation functions are randomly selected by the selection sub-system, so that the difficulty of cracking by the crawler can be greatly increased.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (8)

1. An anti-crawler method, comprising the steps of:
step 1, generating a new strategy package and a selection code for operating the new strategy package, updating a current strategy package of a server by using the new strategy package and constructing a characteristic category white list of the operating characteristic data of the new strategy package;
the strategy package comprises a function matrix with different running time program functions as elements, a selector for mapping character strings to different elements in the function matrix and a selector feature code containing the characteristics of the selector script;
the characteristic data comprises program function runtime data;
step 2, sending the selection code to the client, and then sending a response request to the client;
step 3, the client selectively operates the program function of the strategy packet according to the selection code to perform server response, obtain the characteristic data corresponding to the selection code and return the characteristic data to the server;
step 4, analyzing the selection codes and the feature data corresponding to the selection codes, calculating the feature types of the client, judging whether the client belongs to a feature type white list or not, and performing access control on the client which does not belong to the feature type white list;
the step 4 comprises the following steps:
step 4.1, receiving the characteristic data by the storage module in a preset time interval;
step 4.2, accessing the storage module through the server processing module, analyzing the feature data in the time interval, calculating the feature category of the client and judging whether the client belongs to a feature category white list or not;
step 4.2.1, marking the client belonging to the white list of the feature categories as legal, and then skipping to the step 1;
and 4.2.2, performing access control on the client which does not belong to the white list of the feature categories.
2. An anti-crawler method according to claim 1, wherein step 1, a new policy package and a selection code for operation of the new policy package are periodically generated.
3. An anti-crawler method based on operating environment feature recognition is characterized by comprising the following steps:
step 1, periodically generating a function matrix with different running time program functions as elements and selectors for mapping character strings to different elements in the function matrix at a server, calculating program text feature codes of the selectors as feature codes of the selectors through a Hash algorithm, setting a white list of running environment feature categories, updating the current function matrix of the server through the function matrix, generating random character strings at the server as selection codes, and sending the selection codes and response requests to a client;
step 2, the client selects and operates the functions in the function matrix according to the received selection codes and the selector, and then the selection codes and the operation time of each function are sent to the server;
and 3, analyzing the selection codes and the running time at the server side, calculating the running environment characteristic category of the client side, judging the client side which does not belong to the running environment characteristic category in the white list, and implementing an access control strategy for the client side.
4. The anti-crawler method based on the running environment feature recognition of claim 3, wherein in the step 1, the selection sub-feature codes containing the selection sub-text features are periodically generated.
5. The anti-crawler method based on the running environment feature recognition according to claim 4, wherein the step 3 comprises the following steps:
step 3.1, inquiring the current selection sub-feature code of the server to obtain a function matrix in current use and a selection sub-value for analyzing and calculating initial setting;
step 3.2, selecting all the selection codes from the client and the operation time of the corresponding elements of the current function matrix in a preset time interval, and calculating the characteristic category of the operation environment of the client through a clustering algorithm, a machine learning algorithm, a pattern recognition algorithm and a deep neural network algorithm;
and 3.3, judging the client which does not belong to the operating environment characteristic category in the white list, marking the client as illegal and implementing an access control strategy.
6. The anti-crawler method based on the running environment feature recognition according to any one of claims 3 to 5, wherein in the step 3, the storage module of the server receives the selection code from the client and the running time of the corresponding element within a preset time interval.
7. An anti-crawler system based on operating environment feature recognition is characterized by comprising
The server comprises a characteristic algorithm module, a data interface module, a storage module, a data analysis module and an access processing module, wherein the characteristic algorithm module outputs a strategy package for updating and executing to the server;
the strategy package comprises a function matrix with different running time program functions as elements, a selector for mapping character strings to different elements in the function matrix and a selector feature code containing selector script features;
the client receives the selection code generated by the characteristic algorithm module through the data interface module;
the client selects and operates the functions in the function matrix according to the received selection codes and the selector, and then sends the selection codes and the operation time of each function to the server;
the server receives the characteristic data responded by the client executing the strategy packet through the data interface module corresponding to the selection code, the data interface module also outputs the characteristic data corresponding to the selection code to the storage module, the data analysis module calculates the characteristic data corresponding to the selection code in the storage module and feeds back information to the access processing module according to the calculation result, and the access processing module executes a preset strategy on the client according to the feedback information;
the characteristic data includes program function runtime data.
8. A server with a crawler-resistant function based on running environment feature recognition is characterized by comprising
The characteristic algorithm module periodically generates a function matrix with different running time program functions as elements, a selector for mapping character strings to different elements in the function matrix and a selector characteristic code containing the characteristic of the selector script;
the data interface module is used for interacting data with the client, outputting the selection codes generated by the characteristic algorithm module to the client and receiving the running time of the program function corresponding to the selection codes from the client;
the storage module receives data of the corresponding selection code running time received in a preset time interval from the data interface module;
the data analysis module is used for calculating data in a preset time interval in the storage module and judging the operating environment characteristic category of the client according to the calculation result;
and the access processing module executes a preset strategy on the client according to the client running environment characteristic category calculated by the data analysis module.
CN201710203203.4A 2017-03-30 2017-03-30 Crawler-resisting method and system based on running environment feature recognition Active CN107426148B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710203203.4A CN107426148B (en) 2017-03-30 2017-03-30 Crawler-resisting method and system based on running environment feature recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710203203.4A CN107426148B (en) 2017-03-30 2017-03-30 Crawler-resisting method and system based on running environment feature recognition

Publications (2)

Publication Number Publication Date
CN107426148A CN107426148A (en) 2017-12-01
CN107426148B true CN107426148B (en) 2020-07-31

Family

ID=60423364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710203203.4A Active CN107426148B (en) 2017-03-30 2017-03-30 Crawler-resisting method and system based on running environment feature recognition

Country Status (1)

Country Link
CN (1) CN107426148B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133140A (en) * 2017-12-08 2018-06-08 成都数聚城堡科技有限公司 A kind of mode of the anti-reptile of dynamic
CN108521428B (en) * 2018-04-20 2020-09-01 武汉极意网络科技有限公司 Realization method and system for preventing reptiles in public network based on jenkins
CN109815380A (en) * 2018-12-20 2019-05-28 山东中创软件工程股份有限公司 A kind of information crawler method, apparatus, equipment and computer readable storage medium
CN109818949A (en) * 2019-01-17 2019-05-28 济南浪潮高新科技投资发展有限公司 A kind of anti-crawler method neural network based
CN110096266B (en) * 2019-05-13 2023-12-22 度小满科技(北京)有限公司 Feature processing method and device
CN112312152B (en) * 2020-10-27 2022-11-04 浙江集享电子商务有限公司 Data processing system in network live broadcast

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008339A (en) * 2014-06-05 2014-08-27 东南大学 Active technology based malicious code capture method
CN104539053A (en) * 2014-12-31 2015-04-22 国家电网公司 Power dispatching automation polling robot and method based on reptile technology
CN104618132A (en) * 2014-12-16 2015-05-13 北京神州绿盟信息安全科技股份有限公司 Generation method and generation device for application program recognition rule
CN105871850A (en) * 2016-04-05 2016-08-17 携程计算机技术(上海)有限公司 Crawler detection method and crawler detection system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9552489B1 (en) * 2013-09-19 2017-01-24 Imdb.Com, Inc. Restricting network spidering
CN105743901B (en) * 2016-03-07 2019-04-09 携程计算机技术(上海)有限公司 Server, anti-crawler system and anti-crawler verification method
CN105577701B (en) * 2016-03-09 2018-11-09 携程计算机技术(上海)有限公司 The recognition methods of web crawlers and system
CN105812366B (en) * 2016-03-14 2019-09-24 携程计算机技术(上海)有限公司 Server, anti-crawler system and anti-crawler verification method
CN106534062B (en) * 2016-09-23 2019-05-10 南京途牛科技有限公司 A kind of method of anti-crawler

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008339A (en) * 2014-06-05 2014-08-27 东南大学 Active technology based malicious code capture method
CN104618132A (en) * 2014-12-16 2015-05-13 北京神州绿盟信息安全科技股份有限公司 Generation method and generation device for application program recognition rule
CN104539053A (en) * 2014-12-31 2015-04-22 国家电网公司 Power dispatching automation polling robot and method based on reptile technology
CN105871850A (en) * 2016-04-05 2016-08-17 携程计算机技术(上海)有限公司 Crawler detection method and crawler detection system

Also Published As

Publication number Publication date
CN107426148A (en) 2017-12-01

Similar Documents

Publication Publication Date Title
CN107426148B (en) Crawler-resisting method and system based on running environment feature recognition
CN111565205B (en) Network attack identification method and device, computer equipment and storage medium
CN110177108B (en) Abnormal behavior detection method, device and verification system
Pan et al. Anomaly based web phishing page detection
CN106209488B (en) Method and device for detecting website attack
US8549645B2 (en) System and method for detection of denial of service attacks
CN110602029B (en) Method and system for identifying network attack
Nelms et al. {ExecScent}: Mining for New {C&C} Domains in Live Networks with Adaptive Control Protocol Templates
KR101001132B1 (en) Method and System for Determining Vulnerability of Web Application
Marchal et al. Proactive discovery of phishing related domain names
US9118704B2 (en) Homoglyph monitoring
CN109831459B (en) Method, device, storage medium and terminal equipment for secure access
CN109981664A (en) Website logging method, device and the realization device of page end
CN111104579A (en) Identification method and device for public network assets and storage medium
CN104202291A (en) Anti-phishing method based on multi-factor comprehensive assessment method
US20210029154A1 (en) Automated security testing system and method
CN112131507A (en) Website content processing method, device, server and computer-readable storage medium
CN109450880A (en) Detection method for phishing site, device and computer equipment based on decision tree
Zaimi et al. Survey paper: Taxonomy of website anti-phishing solutions
CN107231364A (en) A kind of website vulnerability detection method and device, computer installation and storage medium
Roy et al. A large-scale analysis of phishing websites hosted on free web hosting domains
Ghourabi et al. Characterization of attacks collected from the deployment of Web service honeypot
WO2016173327A1 (en) Method and device for detecting website attack
Lin et al. An automatic scheme to categorize user sessions in modern HTTP traffic
Kovacevic et al. Predicting vulnerabilities in web applications based on website security model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant