CN108628722A - A kind of distributed Web Component services detection system - Google Patents

A kind of distributed Web Component services detection system Download PDF

Info

Publication number
CN108628722A
CN108628722A CN201810446405.6A CN201810446405A CN108628722A CN 108628722 A CN108628722 A CN 108628722A CN 201810446405 A CN201810446405 A CN 201810446405A CN 108628722 A CN108628722 A CN 108628722A
Authority
CN
China
Prior art keywords
web
module
component
identification
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810446405.6A
Other languages
Chinese (zh)
Inventor
李瑞轩
彭城易
李玉华
辜希武
龚晶
许武奎
刘冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201810446405.6A priority Critical patent/CN108628722A/en
Publication of CN108628722A publication Critical patent/CN108628722A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • G06F11/3093Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Abstract

The present invention discloses a kind of distributed Web Component services detection system, including:Distributed scheduling module checks the schedule of task for submitting and manage detect operation, the resource of each node in management system, and carries out task fragment to operation, and balanced dispatching distribution to each calculate node ensures the fault-tolerant of calculating task;Reptile module, for carrying out crawling for website html page content to the task fragment in task queue;Web server identification module, type and version information for detecting the Web server used in identification Web site;Web component fingerprint identification modules, type, title and the version information of the application component that website uses for identification;Host services detecting module, the information such as OS Type and version, the well known port service that is opened for detecting identification site hosts.The present invention can effectively take precautions against for component web and the attack of the loophole of host services, safeguard internet security.

Description

A kind of distributed Web Component services detection system
Technical field
The present invention relates to internet security technical fields, are detected more particularly, to a kind of distributed Web Component services System.
Background technology
With the fast development of internet, the scale of Web site, Quantityanddiversity are also increasing with surprising rapidity. The Web service of magnanimity, abundant component of increasing income, large-scale server, bring great convenience for people’s lives.But Be, these Web components and server simultaneously there is also loophole and security risk, they are empty to network as a time bomb Between safety, especially Web brings safely great threat.
It would therefore be desirable to have some technologies and means, comprehensive scanning inspection is carried out to Web application components and server It surveys, counter-measure is timely protected to make, reduce because the loophole of serviced component and server is endangered caused by system, Safeguard the safety of internet.And many Internet companies are directed to this problem, it is proposed that some solutions.As external Shodan can carry out IP address inquiry and network equipment detection in the whole world.Domestic internet security manufacturer knows that Chuan Yu companies create The Zoomeye that builds, Buddha dharma are a search engines made for hardware device in cyberspace and Web service.Their sides It overweights and finds the network equipment and Web service, solve the problems, such as well a part of, also highlight the importance of Internet resources detection And urgency.
It is directed to pure Web application components identification, is mainly to provide single website domain name.Then detection finds its use Which specific application component, more commonly used is this two Web component recognition tool of WhatWeb and Wappalyzer.It Mainly capture the content of the single page, analyze its Banner information and important keyword message, carried out just with Component Gallery It then matches, identifies the Web server and component of website.But since Banner information and keyword message are easy to be usurped Change, cause recognition accuracy low, it is serious that phenomenon is known in component leakage.
It is scanned for host services, the essential information for obtaining host is mainly detected according to IP address, it is more commonly used at present Be a the whole network IPv4 address scan tools --- ZMap, more than fast 1300 times of single machine sweep speed ratio Nmap, it can pass through One machine scans IPv4 address spaces all on internet in 45 minutes, and the result of scanning has reached 98% covering Rate.
But there is following some problems for existing system or tool:
(1) differentiate to the detection identification of Web site server is mainly the Banner information of message according to response, for The case where Banner information is changed, is covered up, being lacked, it is difficult to accurately identify the type and version information of Web server.
(2) it to the fingerprint recognition of Web application components, is mainly identified by keyword message static matching, although speed Soon, the case where but for component keyword by modification or missing, accurate discrimination can not be just made, and for comparing bottom Web Development Frameworks, it is also difficult to find identification, the case where knowing accidentally is known and leaked there are a large amount of components.
(3) traditional examination for Web components or host services mainly lays particular emphasis on a wherein side.But in fact, Web site is by server and Web using dimerous, host services or Web components there are loopholes all can serious shadow Ring the safety of Web site.The information of the two is not combined together by existing system well, comprehensive and accurate reflection Web The information of website.
(4) present system and tool are all single machine mostly, and computational efficiency and ability ratio are relatively limited, and there are no a customizations Distributed computing framework change, efficient can carry out detection analysis to magnanimity Web site component in finite time, and unite Meter excavates potential value therein, forms significant report, in order to administrator's maintenance and decision.
Invention content
In view of the drawbacks of the prior art, it is an object of the invention to solve existing Web Component services detection system Web server, application component cannot be accurately identified, do not consider interaction between Web components and host services and Influence, cannot parallel detection analysis go out the technical problems such as the module information of magnanimity Web site.
To achieve the above object, the present invention provides a kind of distributed Web Component services detection system, including:It is distributed Scheduler module, reptile module, Web server identification module, Web components fingerprint identification module and host services detecting module;
The distributed scheduling module checks that the schedule of task, management are entire for submitting and managing detect operation The resource of each node of system, and carry out task fragment to operation, balanced dispatching distribution, while can be with to each calculate node It is fault-tolerant to the abnormal conditions of node and task;The reptile module, for being carried out in the page to the task fragment in task queue That holds crawls;The Web server identification module, type and version for detecting the Web server used in identification Web site Information;The Web components fingerprint identification module, type, title and the version letter of the application component that website uses for identification Breath;The host services detecting module, OS Type and version for detecting identification site hosts, opened it is common Miniport service information.
Optionally, the distributed scheduling module is used to manage and dispatch the work period of whole system detect operation, It operates in the cluster of common calculate node composition, using the processing capacity of multiple working nodes, to magnanimity Web components and host The detection of information identifies;The distributed scheduling module receives multiple detect operations that user submits, and manages all operation teams Row, and the executive condition of each operation and task fragment is obtained in real time;The resource for monitoring each calculate node, according to node Load, is dynamically divided into multiple tasks fragment by an operation, and is assigned to specific calculate node and executes, logical by heartbeat Believe, in time scheduler task;It for the exception occurred during task execution, can timely capture, and use and retry and hold Wrong mechanism ensures the operation that system high efficiency is stablized.
Optionally, the task fragment is seed URL;The reptile module treats the seed URL for crawling tissue, dynamically All station datas in the scope of organization are crawled, and link of standing outside automatic fitration, fraternal link is analyzed and extract, using dynamic State agent skill group collects the content of pages of website corresponding to the url list of distributed scheduling module assignment;The reptile module The process of crawling is:Using breadth-first search, from seed URL, download content of pages and analyze same range or Queue to be crawled is added in the valid link of person mechanism, and during crawling, dynamic use is acted on behalf of and what is optimized crawls strategy, According to the response of server-side, frequency, time and the access IP for adjusting acquisition protect to effectively avoid various anti-reptile mechanism The corresponding Web pages of the accurate completely crawl seed URL of card.
Wherein, brother's link can refer to the website of same tissue.
Optionally, the Web server identification module uses the means of active probe, according to the response report of server-side HTTP The characteristic behavior of text analyzes specific web server type and version information, the identification of the Web server identification module Cheng Wei sends a TCP probe messages to the homepage address of Web site, if can connect, sends a variety of HTTP detections reports Text simultaneously obtains response message, and analysis extraction characteristic information generates fingerprint from response message, is matched with the data inactivity of fingerprint base, The matching fingerprint number of each Web server is counted, chooses Web server of the maximum matching value therein as the website, and give Go out corresponding confidence level.
Optionally, the Web components fingerprint identification module takes out the Web page content of the website from reptile module, together When construct the HTTP requests of a variety of deformities, keyword, static file, special file knot are extracted from the response message of server-side Above-mentioned 5 category feature, is configured to the component fingerprint of the website, and in component fingerprint base by structure, Cookie and abnormal page info Middle static matching takes the application component for hitting the website fingerprint, as application component used in the Web site.
Optionally, the host services detecting module takes out the IP address of the website from reptile module, to the normal of the IP Test connection is sent with port list, response message is obtained, each response message is respectively then calculated into a hash value, and These hash values are compared in host services fingerprint base, to identify the OS Type of the host and version, be opened Well known port information on services, and the data that will identify that are combined with the module information of the website, to realize component and host Identification comprehensively.
Optionally, which further includes:Component fingerprint library module;The component fingerprint library module, it is all for storing Web component fingerprints.
Optionally, which further includes:Host services fingerprint library module;The host services fingerprint library module, for depositing Store up all hosts and service finger print information.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, have below beneficial to effect Fruit:
(1) present invention is the work(such as a kind of collection Web server identification, Web component recognitions, host services detection, statistical analysis It can be in the distributed Web Component services detection system of one.
(2) it can accurately identify Web server, using the technology of active probe, for Banner information deceptions, lack It the scenes such as loses, distort, still can accurately speculate and identify server info.
(3) it can more accurately identify that Web components, a variety of fingerprint characteristics of acquisition of innovation can be very good to solve single Knowledge problem is accidentally known and leaked caused by one feature for a large amount of components caused by loss of learning, modification etc..
(4) innovatively component recognition and host services detection are combined together, more the analysis Web site of Comprehensive Composition, facilitate the discovery and maintenance of site safety loophole, reinforce the security protection of internet.
(5) Web components fingerprint base is constituted using the feature that is easy to extract and acquire, and can quickly expand very much expansion, To further increase the range and accuracy that the detection of system identifies.
(6) distributed Scheduling Framework can support the reconnaissance probe task of large-scale Web components and host, complete The detection of magnanimity website, while can convenient horizontal expansion.It is detected relative to traditional single machine Web components, in the speed of system Degree, efficiency and investigative range will have great promotion.
Description of the drawings
Fig. 1 is a kind of structural schematic diagram of distributed Web Component services detection system provided in an embodiment of the present invention;
Fig. 2 is a kind of general frame figure of distributed scheduling module provided in an embodiment of the present invention;
Fig. 3 is a kind of working machine drawing of distributed scheduling module provided in an embodiment of the present invention;
Fig. 4 is a kind of structure chart of reptile module provided in an embodiment of the present invention;
Fig. 5 is the schematic diagram that a kind of Web server provided in an embodiment of the present invention detects identification module;
Fig. 6 is the Organization Chart that a kind of Web server provided in an embodiment of the present invention detects identification module;
Fig. 7 is a kind of Web component hierarchical architectures figure provided in an embodiment of the present invention;
Fig. 8 is a kind of schematic diagram of Web components fingerprint identification module provided in an embodiment of the present invention;
Fig. 9 is a kind of Web components dactylotype figure provided in an embodiment of the present invention;
Figure 10 is the schematic diagram that a kind of host services provided in an embodiment of the present invention detect identification module.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below It does not constitute a conflict with each other and can be combined with each other.
The invention discloses a kind of distributed Web Component services detection system, for accurately obtain Web server, The details such as the type and version of Web application components and host services, have the security protection of Web site important meaning Justice can effectively take precautions against for component web and the attack of the loophole of host services, safeguard internet security.Believe for Banner The problems such as breath is easy to be tampered, component keyword is replaced deletion and host information lacks, proposes a kind of new detection system, energy It is enough that comprehensive and accurate component recognition is carried out to a wide range of interior Web site.User submits detect operation (seed URL) to arrive system, The operation of all submissions of distributed scheduling module management, and distribute and scheduler task fragment to working node;On working node Reptile module crawls the content of pages of each website according to being assigned to for task;Web server identification module analyzes each of website Kind response message, thus it is speculated that identify the Web server information of website;Application component fingerprint identification module, according to the website crawled Content generates Web component fingerprints, the details of component is identified with component fingerprint base static matching;Host services detect mould Block, the IP address gone out by reptile module analysis send a variety of probe messages to the host, analyze the feature letter of response message Breath, the information such as operating system, miniport service to identify host.
Fig. 1 is a kind of structural schematic diagram of distributed Web Component services detection system provided in an embodiment of the present invention, such as Shown in Fig. 1, distributed Web components detection system provided by the invention include distributed scheduling module 100, reptile module 200, Web server identification module 300, Web components fingerprint identification module 400, application component fingerprint base 500, host services detect mould Block 600 and host services fingerprint base 700.
Distributed scheduling module 100 checks the schedule of task, manages entire system for submitting and managing detect operation It unites the resource of each node, and task fragment is carried out to operation, balanced dispatching distribution, while can be right to each calculate node The abnormal conditions of node and task are fault-tolerant.
Reptile module 200 is used to carry out crawling for content of pages to the task fragment (seed URL) in task queue;
Web server identification module 300 for accurately detect identify Web site used in Web server type and Version information;
Type, the name for the application component that Web components fingerprint identification module 400 is used for accurate comprehensive identification website Claim, version information;
Component fingerprint library module 500 is for storing all Web component fingerprints;
Host services detecting module 600 is used to detect OS Type, the version for identifying the site hosts, is opened Well known port service, application and its version information;
Host services fingerprint library module 700 is for storing all hosts and service finger print information.
Fig. 2 is a kind of general frame figure of distributed scheduling module provided in an embodiment of the present invention;As shown in Fig. 2, being one A typical master-salve distributed framework.Wherein, Master is responsible for the management of task scheduling and cluster resource, the shape of maintenance task One operation is divided into CRAWL (reptile), SERVER_ANY (Web server detection), COMPONENTS_ANY (components by state Detection), 4 stages of HOST_SCAN (host scanning), task is assigned to according to the load of working node and execution status of task Slave nodes execute, and are communicated simultaneously with Slave, monitor execution and the dispatch situation of task in real time;Slave nodes are responsible for pipe The resource of respective node is managed, the detection mission of Mater distribution is completed, according to specific task burst information, starts corresponding visit Survey task (reptile, Web server, Web components, host detection), and the state of current task execution is reported to Master in real time, Finally detection recognition result is stored in MongoDB databases.In addition, Redis is responsible for storing all job queue datas, by Master and Slave is safeguarded jointly and access.
Fig. 3 is a kind of working machine drawing of distributed scheduling module provided in an embodiment of the present invention, as shown in figure 3, substantially It can be simplified to such as step:User submits detect operation → Master management roles queue → scheduler task to Slave → Slave Detection mission → return implementing result → task is executed to complete.Wherein, the function of links is as follows:
(1) Client (client):Operation is submitted, result of detection is inquired.
(2)Master:Functional interface is provided for Client, management role queue is communicated with Slave, dispatching distribution task It is executed to Slave, manages Slave nodes.
(3)Slave:The communication with Master is kept, the task fragment of Master distribution is received, executes detection mission, and Task execution situation is reported to Master.
(4) data store:Preserve result, the system work log etc. of detection.
Reptile module 200 is used to obtain specific html page content according to the task (seed URL) that scheduler module is distributed, The module is the basis that subsequent probe task starts.It is substantially a distributed reptile, can complete magnanimity Web site Crawl, using breadth first traversal algorithm, limit the depth of reptile, during crawling, dynamically use various customization plans Slightly with dynamic proxy technology, various anti-reptile mechanism are effectively avoided.As shown in figure 4, calculate node receives Master distribution Task (seed URL), parse tissue domain name, crawl html page and extract new site link, judge whether to be phase Link (stand internal chaining or associated stations link) is closed, incoherent outer station URL is filtered out, queue to be crawled is added after duplicate removal In, each page needs to judge the access depth of current page when crawling, and does not handle then, so moves in circles, directly more than limitation To all site pages acquired in the scope of organization.
Web server identification module 300 is used to accurately identify the server info of website, it mainly solves and is directed to Banner loss of learning, situations such as covering, distorting, existing tool are not enough to accurately judge the Web server letter of the website Cease this thorny problem.Its workflow is as shown in figure 5, send 8 kinds of specific HTTP request (GET/Exist, GET/Too Long, GET/Not Exist, GET/Attack, HEAD, OPTION, DELETE, TEST), response message is obtained, response is reported The Header information of text is converted to the fingerprint of set form, by being matched one by one with the data in fingerprint base, calculates hit most More Web server fingerprints, as final recognition result.
The general frame of Web server identification module is as shown in fig. 6, can be clearly seen that the inside principle of the module. Wherein, each corresponding HTTP test request, different web server response messages can be variant, by collection, arrangement, divides These features of class can be used for the identification Web server of efficiently and accurately.
Web components fingerprint identification module 400 is used to detect the module information that one Web site of identification uses, the information pair It is particularly important in the safety of a website.Definition for Web components, in the present invention as shown in fig. 7, it includes:Front end frame Frame, application framework, rearward end frame, language and Web server etc..
The workflow of the module is as shown in figure 8, be mainly divided into 3 steps:
(1) HTTP request:The HTTP request (DELETE, the URL being not present) for constructing normal GET request and deformity, is obtained Take response contents.
(2) Web component fingerprints are acquired:According to the content of http response, Web fingerprints are constructed, include mainly:Keyword, spy Different structure, static resource file, Cookie, abnormal page etc. acquire the information composition website fingerprint of this 5 aspects.
(3) Web components are identified:Traverse component fingerprint base carries out static matching with website fingerprint, identifies Web used Component.
Application component fingerprint base 500 is used for storage assembly finger print information, and specific data structure is as shown in figure 9, first It is whole description information, including:Component Name, generic, component introduction, icon;It is main followed by specific finger print data To include 5 types:HTML keywords, special file path, static resource file MD5, Cookie and abnormal page hash value, And fingerprint base is made of such element one by one.
Host services detecting module 600 is used to detect the miniport service etc. of the operating system, opening that identify Web site host Information, structure are as shown in Figure 10.Basic detection process is:The IP address for parsing reptile Module sites, to these IP address Well known port (21,80,443,8080 etc.) send the probe messages such as TCP, ICMP, UDP, in the response for obtaining these requests Hold, series of features value (Hash) is obtained by hash algorithm, then with fingerprint storehouse matching, to identify information needed.Most Afterwards, the host information analyzed is integrated by the mapping relations of IP and website domain name with the website module data, forms Web The complete and comprehensive information of website.
Finger print information of the host services fingerprint base 700 for preserving all operating system and miniport service, mainly one A huge hash value set.
Distributed Web Comments provided by the invention service detection system be one from grind, high performance distributed Web group Part services detection system, can be deployed in true physical cluster completely, and using the active and standby of master-slave designation and Centroid Design can be very good to prevent single point failure problem.With this system can support the large-scale Web site information of detection (the whole province, The whole nation or even the whole world), greatly strengthen the maintenance and strick precaution of cyberspace safety.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to The limitation present invention, all within the spirits and principles of the present invention made by all any modification, equivalent and improvement etc., should all include Within protection scope of the present invention.

Claims (8)

1. a kind of distributed Web Component services detection system, which is characterized in that including:Distributed scheduling module, reptile mould Block, Web server identification module, Web components fingerprint identification module and host services detecting module;
The distributed scheduling module checks the schedule of task, manages whole system for submitting and managing detect operation The resource of each node, and to operation carry out task fragment, balanced dispatching distribution to each calculate node, while can to save The abnormal conditions of point and task are fault-tolerant;
The reptile module, for carrying out crawling for Website page content to the task fragment in task queue;
The Web server identification module, type and version information for detecting the Web server used in identification Web site;
The Web components fingerprint identification module, type, title and the version letter of the application component that website uses for identification Breath;
The host services detecting module, for detect identification site hosts OS Type and version, opened it is normal With miniport service information.
2. distributed Web Component services detection system according to claim 1, which is characterized in that the distributed tune Degree module is used to manage and dispatch the life cycle of whole system detect operation, operates in the cluster of common calculate node composition In, using the processing capacity of multiple working nodes, detection identification is carried out to magnanimity Web components and host information;
The distributed scheduling module receives multiple detect operations that user submits, and manages all job queues, and obtain in real time Take the executive condition of each operation and task fragment;The resource for monitoring each calculate node dynamically will according to the load of node One operation is divided into multiple tasks fragment, and is assigned to specific calculate node and executes, and is communicated by heartbeat, and scheduling in time is appointed Business;It for the exception occurred during task execution, can timely capture, and use retries and fault tolerant mechanism, ensure system The operation of efficient stable.
3. distributed Web Component services detection system according to claim 1, which is characterized in that the task fragment For seed URL;
The reptile module dynamically crawls all station datas in the scope of organization for the seed URL of tissue to be crawled, And link of standing outside automatic fitration, fraternal link is analyzed and extracted, the URL of scheduler module distribution is collected using dynamic proxy technology The content of pages of website corresponding to list;
The process that crawls of the reptile module is:Content of pages is downloaded from seed URL using breadth-first search And queue to be crawled is added in the valid link for analyzing same range or mechanism, during crawling, dynamically uses generation It manages and what is optimized crawls strategy, according to the response of server-side, frequency, time and the access IP of acquisition are adjusted, to effectively keep away Various anti-reptile mechanism are opened, ensure the corresponding Web pages of accurate complete crawl seed URL.
4. distributed Web Component services detection system according to claim 1, which is characterized in that the Web server Identification module uses the means of active probe, and specific Web is analyzed according to the characteristic behavior of the response message of server-side HTTP Type of server and version information;
The identification process of the Web server identification module is:It sends a TCP detection report to the homepage address of Web site Text sends a variety of HTTP probe messages and obtains response message if can connect, the analysis extraction feature letter from response message Breath generates fingerprint, is matched with the data inactivity of fingerprint base, counts the matching fingerprint number of each Web server, and selection is therein most Big Web server of the matching value as the website, and provide corresponding confidence level.
5. distributed Web Component services detection system according to claim 1, which is characterized in that the Web components refer to Line identification module takes out the Web page content of the website from reptile module, while constructing the HTTP request of a variety of deformities, from clothes It is engaged in extracting keyword, static file, special file structure, Cookie and abnormal page info in the response message at end, it will Above-mentioned 5 category feature is configured to the component fingerprint of the website, and the static matching in component fingerprint base, takes and hits the website fingerprint Application component, as application component used in the Web site.
6. distributed Web Component services detection system according to claim 1, which is characterized in that the host services Detecting module takes out the IP address of the website from reptile module, sends test connection to the well known port list of the IP, obtains Then each response message is respectively calculated a hash value, and compares these in host services fingerprint base by response message Hash value, to identify the OS Type and version, the well known port information on services that is opened of the host, and will identification The data gone out are combined with the module information of the website, to realize comprehensive identification of component and host.
7. distributed Web Component services detection system according to any one of claims 1 to 6, which is characterized in that also wrap It includes:Component fingerprint library module;
The component fingerprint library module, for storing all Web component fingerprints.
8. distributed Web Component services detection system according to any one of claims 1 to 6, which is characterized in that also wrap It includes:Host services fingerprint library module;
The host services fingerprint library module, for storing all hosts and service finger print information.
CN201810446405.6A 2018-05-11 2018-05-11 A kind of distributed Web Component services detection system Pending CN108628722A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810446405.6A CN108628722A (en) 2018-05-11 2018-05-11 A kind of distributed Web Component services detection system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810446405.6A CN108628722A (en) 2018-05-11 2018-05-11 A kind of distributed Web Component services detection system

Publications (1)

Publication Number Publication Date
CN108628722A true CN108628722A (en) 2018-10-09

Family

ID=63692725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810446405.6A Pending CN108628722A (en) 2018-05-11 2018-05-11 A kind of distributed Web Component services detection system

Country Status (1)

Country Link
CN (1) CN108628722A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109495466A (en) * 2018-11-06 2019-03-19 郑州云海信息技术有限公司 A kind of recognition methods and system of unknown miniport service
CN109617728A (en) * 2018-12-14 2019-04-12 中国电子科技网络信息安全有限公司 A kind of distributed IP grade network topology probe method based on multi-protocols
CN109766176A (en) * 2018-12-29 2019-05-17 北京威努特技术有限公司 A kind of scan progress calculation method and device based on large scale network space exploration
CN110198309A (en) * 2019-05-14 2019-09-03 北京墨云科技有限公司 A kind of Web server recognition methods, device, terminal and storage medium
CN110233774A (en) * 2019-05-28 2019-09-13 华中科技大学 A kind of Distributed probing method and system of Socks proxy server
EP3654219A1 (en) * 2018-11-14 2020-05-20 Baden-Württemberg Stiftung gGmbH Determining version information of a network service
CN111212153A (en) * 2019-12-26 2020-05-29 成都烽创科技有限公司 IP address checking method, device, terminal equipment and storage medium
CN111475464A (en) * 2020-03-19 2020-07-31 重庆邮电大学 Method for automatically discovering and mining fingerprints of Web component
CN111597053A (en) * 2020-05-29 2020-08-28 广州万灵数据科技有限公司 Cooperative operation and self-adaptive distributed computing engine
CN111638964A (en) * 2020-06-09 2020-09-08 武汉虹旭信息技术有限责任公司 Centralized internet data acquisition system and acquisition method
CN113946566A (en) * 2021-12-20 2022-01-18 北京大学 Web system fingerprint database construction method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150082438A1 (en) * 2013-11-23 2015-03-19 Universidade Da Coruña System and server for detecting web page changes
CN106888194A (en) * 2015-12-16 2017-06-23 国家电网公司 Intelligent grid IT assets security monitoring systems based on distributed scheduling
US20170214771A1 (en) * 2012-02-01 2017-07-27 Aol Advertising Inc. Systems and methods for identifying a returning web client
CN107679085A (en) * 2017-09-01 2018-02-09 广州大学 Data grabber algorithm based on search and spiders

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170214771A1 (en) * 2012-02-01 2017-07-27 Aol Advertising Inc. Systems and methods for identifying a returning web client
US20150082438A1 (en) * 2013-11-23 2015-03-19 Universidade Da Coruña System and server for detecting web page changes
CN106888194A (en) * 2015-12-16 2017-06-23 国家电网公司 Intelligent grid IT assets security monitoring systems based on distributed scheduling
CN107679085A (en) * 2017-09-01 2018-02-09 广州大学 Data grabber algorithm based on search and spiders

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王永杰等: "基于指纹分析的Web服务探测技术", 《计算机工程》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109495466A (en) * 2018-11-06 2019-03-19 郑州云海信息技术有限公司 A kind of recognition methods and system of unknown miniport service
EP3654219A1 (en) * 2018-11-14 2020-05-20 Baden-Württemberg Stiftung gGmbH Determining version information of a network service
WO2020099485A1 (en) * 2018-11-14 2020-05-22 Baden-Württemberg Stiftung Ggmbh Determining version information of a network service
CN109617728A (en) * 2018-12-14 2019-04-12 中国电子科技网络信息安全有限公司 A kind of distributed IP grade network topology probe method based on multi-protocols
CN109766176A (en) * 2018-12-29 2019-05-17 北京威努特技术有限公司 A kind of scan progress calculation method and device based on large scale network space exploration
CN110198309A (en) * 2019-05-14 2019-09-03 北京墨云科技有限公司 A kind of Web server recognition methods, device, terminal and storage medium
CN110233774A (en) * 2019-05-28 2019-09-13 华中科技大学 A kind of Distributed probing method and system of Socks proxy server
CN110233774B (en) * 2019-05-28 2020-12-29 华中科技大学 Detection method, distributed detection method and system for Socks proxy server
CN111212153A (en) * 2019-12-26 2020-05-29 成都烽创科技有限公司 IP address checking method, device, terminal equipment and storage medium
CN111475464A (en) * 2020-03-19 2020-07-31 重庆邮电大学 Method for automatically discovering and mining fingerprints of Web component
CN111475464B (en) * 2020-03-19 2023-04-25 重庆邮电大学 Method for automatically finding and mining fingerprints of Web component
CN111597053A (en) * 2020-05-29 2020-08-28 广州万灵数据科技有限公司 Cooperative operation and self-adaptive distributed computing engine
CN111638964A (en) * 2020-06-09 2020-09-08 武汉虹旭信息技术有限责任公司 Centralized internet data acquisition system and acquisition method
CN113946566A (en) * 2021-12-20 2022-01-18 北京大学 Web system fingerprint database construction method and device and electronic equipment
CN113946566B (en) * 2021-12-20 2022-03-18 北京大学 Web system fingerprint database construction method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN108628722A (en) A kind of distributed Web Component services detection system
US10977154B2 (en) Method and system for automatic real-time causality analysis of end user impacting system anomalies using causality rules and topological understanding of the system to effectively filter relevant monitoring data
US11182434B2 (en) Cardinality of time series
US9866426B2 (en) Methods and apparatus for analyzing system events
CN107087001B (en) distributed internet important address space retrieval system
Chen et al. Causeinfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems
US8051163B2 (en) Synthetic transactions based on system history and load
US20090125496A1 (en) Network device and method for monitoring of backend transactions in data centers
US11159542B2 (en) Cloud view detection of virtual machine brute force attacks
US9740991B2 (en) Calculating in-flight metrics for non-interruptible business transactions
US10673868B2 (en) Risk based priority processing of data
CN108108288A (en) A kind of daily record data analytic method, device and equipment
CN109074454A (en) Malware is grouped automatically based on artefact
CN110213207A (en) A kind of network security defence method and equipment based on log analysis
Chen et al. Invariants based failure diagnosis in distributed computing systems
Natu et al. Holistic performance monitoring of hybrid clouds: Complexities and future directions
US11792157B1 (en) Detection of DNS beaconing through time-to-live and transmission analyses
CN106453320A (en) Malicious sample identification method and device
Ramachandran et al. Determining configuration parameter dependencies via analysis of configuration data from multi-tiered enterprise applications
Zou et al. Improving log-based fault diagnosis by log classification
US7653742B1 (en) Defining and detecting network application business activities
CN114969450B (en) User behavior analysis method, device, equipment and storage medium
CN114422341B (en) Industrial control asset identification method and system based on fingerprint characteristics
CN113572781A (en) Method for collecting network security threat information
Kalamatianos et al. Domain independent event analysis for log data reduction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181009