CN108628722A - A kind of distributed Web Component services detection system - Google Patents
A kind of distributed Web Component services detection system Download PDFInfo
- Publication number
- CN108628722A CN108628722A CN201810446405.6A CN201810446405A CN108628722A CN 108628722 A CN108628722 A CN 108628722A CN 201810446405 A CN201810446405 A CN 201810446405A CN 108628722 A CN108628722 A CN 108628722A
- Authority
- CN
- China
- Prior art keywords
- web
- module
- component
- identification
- task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3051—Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3089—Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
- G06F11/3093—Configuration details thereof, e.g. installation, enabling, spatial arrangement of the probes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
- G06F21/577—Assessing vulnerabilities and evaluating computer system security
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
- G06F9/5088—Techniques for rebalancing the load in a distributed system involving task migration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5017—Task decomposition
Abstract
The present invention discloses a kind of distributed Web Component services detection system, including:Distributed scheduling module checks the schedule of task for submitting and manage detect operation, the resource of each node in management system, and carries out task fragment to operation, and balanced dispatching distribution to each calculate node ensures the fault-tolerant of calculating task;Reptile module, for carrying out crawling for website html page content to the task fragment in task queue;Web server identification module, type and version information for detecting the Web server used in identification Web site;Web component fingerprint identification modules, type, title and the version information of the application component that website uses for identification;Host services detecting module, the information such as OS Type and version, the well known port service that is opened for detecting identification site hosts.The present invention can effectively take precautions against for component web and the attack of the loophole of host services, safeguard internet security.
Description
Technical field
The present invention relates to internet security technical fields, are detected more particularly, to a kind of distributed Web Component services
System.
Background technology
With the fast development of internet, the scale of Web site, Quantityanddiversity are also increasing with surprising rapidity.
The Web service of magnanimity, abundant component of increasing income, large-scale server, bring great convenience for people’s lives.But
Be, these Web components and server simultaneously there is also loophole and security risk, they are empty to network as a time bomb
Between safety, especially Web brings safely great threat.
It would therefore be desirable to have some technologies and means, comprehensive scanning inspection is carried out to Web application components and server
It surveys, counter-measure is timely protected to make, reduce because the loophole of serviced component and server is endangered caused by system,
Safeguard the safety of internet.And many Internet companies are directed to this problem, it is proposed that some solutions.As external
Shodan can carry out IP address inquiry and network equipment detection in the whole world.Domestic internet security manufacturer knows that Chuan Yu companies create
The Zoomeye that builds, Buddha dharma are a search engines made for hardware device in cyberspace and Web service.Their sides
It overweights and finds the network equipment and Web service, solve the problems, such as well a part of, also highlight the importance of Internet resources detection
And urgency.
It is directed to pure Web application components identification, is mainly to provide single website domain name.Then detection finds its use
Which specific application component, more commonly used is this two Web component recognition tool of WhatWeb and Wappalyzer.It
Mainly capture the content of the single page, analyze its Banner information and important keyword message, carried out just with Component Gallery
It then matches, identifies the Web server and component of website.But since Banner information and keyword message are easy to be usurped
Change, cause recognition accuracy low, it is serious that phenomenon is known in component leakage.
It is scanned for host services, the essential information for obtaining host is mainly detected according to IP address, it is more commonly used at present
Be a the whole network IPv4 address scan tools --- ZMap, more than fast 1300 times of single machine sweep speed ratio Nmap, it can pass through
One machine scans IPv4 address spaces all on internet in 45 minutes, and the result of scanning has reached 98% covering
Rate.
But there is following some problems for existing system or tool:
(1) differentiate to the detection identification of Web site server is mainly the Banner information of message according to response, for
The case where Banner information is changed, is covered up, being lacked, it is difficult to accurately identify the type and version information of Web server.
(2) it to the fingerprint recognition of Web application components, is mainly identified by keyword message static matching, although speed
Soon, the case where but for component keyword by modification or missing, accurate discrimination can not be just made, and for comparing bottom
Web Development Frameworks, it is also difficult to find identification, the case where knowing accidentally is known and leaked there are a large amount of components.
(3) traditional examination for Web components or host services mainly lays particular emphasis on a wherein side.But in fact,
Web site is by server and Web using dimerous, host services or Web components there are loopholes all can serious shadow
Ring the safety of Web site.The information of the two is not combined together by existing system well, comprehensive and accurate reflection Web
The information of website.
(4) present system and tool are all single machine mostly, and computational efficiency and ability ratio are relatively limited, and there are no a customizations
Distributed computing framework change, efficient can carry out detection analysis to magnanimity Web site component in finite time, and unite
Meter excavates potential value therein, forms significant report, in order to administrator's maintenance and decision.
Invention content
In view of the drawbacks of the prior art, it is an object of the invention to solve existing Web Component services detection system
Web server, application component cannot be accurately identified, do not consider interaction between Web components and host services and
Influence, cannot parallel detection analysis go out the technical problems such as the module information of magnanimity Web site.
To achieve the above object, the present invention provides a kind of distributed Web Component services detection system, including:It is distributed
Scheduler module, reptile module, Web server identification module, Web components fingerprint identification module and host services detecting module;
The distributed scheduling module checks that the schedule of task, management are entire for submitting and managing detect operation
The resource of each node of system, and carry out task fragment to operation, balanced dispatching distribution, while can be with to each calculate node
It is fault-tolerant to the abnormal conditions of node and task;The reptile module, for being carried out in the page to the task fragment in task queue
That holds crawls;The Web server identification module, type and version for detecting the Web server used in identification Web site
Information;The Web components fingerprint identification module, type, title and the version letter of the application component that website uses for identification
Breath;The host services detecting module, OS Type and version for detecting identification site hosts, opened it is common
Miniport service information.
Optionally, the distributed scheduling module is used to manage and dispatch the work period of whole system detect operation,
It operates in the cluster of common calculate node composition, using the processing capacity of multiple working nodes, to magnanimity Web components and host
The detection of information identifies;The distributed scheduling module receives multiple detect operations that user submits, and manages all operation teams
Row, and the executive condition of each operation and task fragment is obtained in real time;The resource for monitoring each calculate node, according to node
Load, is dynamically divided into multiple tasks fragment by an operation, and is assigned to specific calculate node and executes, logical by heartbeat
Believe, in time scheduler task;It for the exception occurred during task execution, can timely capture, and use and retry and hold
Wrong mechanism ensures the operation that system high efficiency is stablized.
Optionally, the task fragment is seed URL;The reptile module treats the seed URL for crawling tissue, dynamically
All station datas in the scope of organization are crawled, and link of standing outside automatic fitration, fraternal link is analyzed and extract, using dynamic
State agent skill group collects the content of pages of website corresponding to the url list of distributed scheduling module assignment;The reptile module
The process of crawling is:Using breadth-first search, from seed URL, download content of pages and analyze same range or
Queue to be crawled is added in the valid link of person mechanism, and during crawling, dynamic use is acted on behalf of and what is optimized crawls strategy,
According to the response of server-side, frequency, time and the access IP for adjusting acquisition protect to effectively avoid various anti-reptile mechanism
The corresponding Web pages of the accurate completely crawl seed URL of card.
Wherein, brother's link can refer to the website of same tissue.
Optionally, the Web server identification module uses the means of active probe, according to the response report of server-side HTTP
The characteristic behavior of text analyzes specific web server type and version information, the identification of the Web server identification module
Cheng Wei sends a TCP probe messages to the homepage address of Web site, if can connect, sends a variety of HTTP detections reports
Text simultaneously obtains response message, and analysis extraction characteristic information generates fingerprint from response message, is matched with the data inactivity of fingerprint base,
The matching fingerprint number of each Web server is counted, chooses Web server of the maximum matching value therein as the website, and give
Go out corresponding confidence level.
Optionally, the Web components fingerprint identification module takes out the Web page content of the website from reptile module, together
When construct the HTTP requests of a variety of deformities, keyword, static file, special file knot are extracted from the response message of server-side
Above-mentioned 5 category feature, is configured to the component fingerprint of the website, and in component fingerprint base by structure, Cookie and abnormal page info
Middle static matching takes the application component for hitting the website fingerprint, as application component used in the Web site.
Optionally, the host services detecting module takes out the IP address of the website from reptile module, to the normal of the IP
Test connection is sent with port list, response message is obtained, each response message is respectively then calculated into a hash value, and
These hash values are compared in host services fingerprint base, to identify the OS Type of the host and version, be opened
Well known port information on services, and the data that will identify that are combined with the module information of the website, to realize component and host
Identification comprehensively.
Optionally, which further includes:Component fingerprint library module;The component fingerprint library module, it is all for storing
Web component fingerprints.
Optionally, which further includes:Host services fingerprint library module;The host services fingerprint library module, for depositing
Store up all hosts and service finger print information.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, have below beneficial to effect
Fruit:
(1) present invention is the work(such as a kind of collection Web server identification, Web component recognitions, host services detection, statistical analysis
It can be in the distributed Web Component services detection system of one.
(2) it can accurately identify Web server, using the technology of active probe, for Banner information deceptions, lack
It the scenes such as loses, distort, still can accurately speculate and identify server info.
(3) it can more accurately identify that Web components, a variety of fingerprint characteristics of acquisition of innovation can be very good to solve single
Knowledge problem is accidentally known and leaked caused by one feature for a large amount of components caused by loss of learning, modification etc..
(4) innovatively component recognition and host services detection are combined together, more the analysis Web site of Comprehensive
Composition, facilitate the discovery and maintenance of site safety loophole, reinforce the security protection of internet.
(5) Web components fingerprint base is constituted using the feature that is easy to extract and acquire, and can quickly expand very much expansion,
To further increase the range and accuracy that the detection of system identifies.
(6) distributed Scheduling Framework can support the reconnaissance probe task of large-scale Web components and host, complete
The detection of magnanimity website, while can convenient horizontal expansion.It is detected relative to traditional single machine Web components, in the speed of system
Degree, efficiency and investigative range will have great promotion.
Description of the drawings
Fig. 1 is a kind of structural schematic diagram of distributed Web Component services detection system provided in an embodiment of the present invention;
Fig. 2 is a kind of general frame figure of distributed scheduling module provided in an embodiment of the present invention;
Fig. 3 is a kind of working machine drawing of distributed scheduling module provided in an embodiment of the present invention;
Fig. 4 is a kind of structure chart of reptile module provided in an embodiment of the present invention;
Fig. 5 is the schematic diagram that a kind of Web server provided in an embodiment of the present invention detects identification module;
Fig. 6 is the Organization Chart that a kind of Web server provided in an embodiment of the present invention detects identification module;
Fig. 7 is a kind of Web component hierarchical architectures figure provided in an embodiment of the present invention;
Fig. 8 is a kind of schematic diagram of Web components fingerprint identification module provided in an embodiment of the present invention;
Fig. 9 is a kind of Web components dactylotype figure provided in an embodiment of the present invention;
Figure 10 is the schematic diagram that a kind of host services provided in an embodiment of the present invention detect identification module.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below
It does not constitute a conflict with each other and can be combined with each other.
The invention discloses a kind of distributed Web Component services detection system, for accurately obtain Web server,
The details such as the type and version of Web application components and host services, have the security protection of Web site important meaning
Justice can effectively take precautions against for component web and the attack of the loophole of host services, safeguard internet security.Believe for Banner
The problems such as breath is easy to be tampered, component keyword is replaced deletion and host information lacks, proposes a kind of new detection system, energy
It is enough that comprehensive and accurate component recognition is carried out to a wide range of interior Web site.User submits detect operation (seed URL) to arrive system,
The operation of all submissions of distributed scheduling module management, and distribute and scheduler task fragment to working node;On working node
Reptile module crawls the content of pages of each website according to being assigned to for task;Web server identification module analyzes each of website
Kind response message, thus it is speculated that identify the Web server information of website;Application component fingerprint identification module, according to the website crawled
Content generates Web component fingerprints, the details of component is identified with component fingerprint base static matching;Host services detect mould
Block, the IP address gone out by reptile module analysis send a variety of probe messages to the host, analyze the feature letter of response message
Breath, the information such as operating system, miniport service to identify host.
Fig. 1 is a kind of structural schematic diagram of distributed Web Component services detection system provided in an embodiment of the present invention, such as
Shown in Fig. 1, distributed Web components detection system provided by the invention include distributed scheduling module 100, reptile module 200,
Web server identification module 300, Web components fingerprint identification module 400, application component fingerprint base 500, host services detect mould
Block 600 and host services fingerprint base 700.
Distributed scheduling module 100 checks the schedule of task, manages entire system for submitting and managing detect operation
It unites the resource of each node, and task fragment is carried out to operation, balanced dispatching distribution, while can be right to each calculate node
The abnormal conditions of node and task are fault-tolerant.
Reptile module 200 is used to carry out crawling for content of pages to the task fragment (seed URL) in task queue;
Web server identification module 300 for accurately detect identify Web site used in Web server type and
Version information;
Type, the name for the application component that Web components fingerprint identification module 400 is used for accurate comprehensive identification website
Claim, version information;
Component fingerprint library module 500 is for storing all Web component fingerprints;
Host services detecting module 600 is used to detect OS Type, the version for identifying the site hosts, is opened
Well known port service, application and its version information;
Host services fingerprint library module 700 is for storing all hosts and service finger print information.
Fig. 2 is a kind of general frame figure of distributed scheduling module provided in an embodiment of the present invention;As shown in Fig. 2, being one
A typical master-salve distributed framework.Wherein, Master is responsible for the management of task scheduling and cluster resource, the shape of maintenance task
One operation is divided into CRAWL (reptile), SERVER_ANY (Web server detection), COMPONENTS_ANY (components by state
Detection), 4 stages of HOST_SCAN (host scanning), task is assigned to according to the load of working node and execution status of task
Slave nodes execute, and are communicated simultaneously with Slave, monitor execution and the dispatch situation of task in real time;Slave nodes are responsible for pipe
The resource of respective node is managed, the detection mission of Mater distribution is completed, according to specific task burst information, starts corresponding visit
Survey task (reptile, Web server, Web components, host detection), and the state of current task execution is reported to Master in real time,
Finally detection recognition result is stored in MongoDB databases.In addition, Redis is responsible for storing all job queue datas, by
Master and Slave is safeguarded jointly and access.
Fig. 3 is a kind of working machine drawing of distributed scheduling module provided in an embodiment of the present invention, as shown in figure 3, substantially
It can be simplified to such as step:User submits detect operation → Master management roles queue → scheduler task to Slave → Slave
Detection mission → return implementing result → task is executed to complete.Wherein, the function of links is as follows:
(1) Client (client):Operation is submitted, result of detection is inquired.
(2)Master:Functional interface is provided for Client, management role queue is communicated with Slave, dispatching distribution task
It is executed to Slave, manages Slave nodes.
(3)Slave:The communication with Master is kept, the task fragment of Master distribution is received, executes detection mission, and
Task execution situation is reported to Master.
(4) data store:Preserve result, the system work log etc. of detection.
Reptile module 200 is used to obtain specific html page content according to the task (seed URL) that scheduler module is distributed,
The module is the basis that subsequent probe task starts.It is substantially a distributed reptile, can complete magnanimity Web site
Crawl, using breadth first traversal algorithm, limit the depth of reptile, during crawling, dynamically use various customization plans
Slightly with dynamic proxy technology, various anti-reptile mechanism are effectively avoided.As shown in figure 4, calculate node receives Master distribution
Task (seed URL), parse tissue domain name, crawl html page and extract new site link, judge whether to be phase
Link (stand internal chaining or associated stations link) is closed, incoherent outer station URL is filtered out, queue to be crawled is added after duplicate removal
In, each page needs to judge the access depth of current page when crawling, and does not handle then, so moves in circles, directly more than limitation
To all site pages acquired in the scope of organization.
Web server identification module 300 is used to accurately identify the server info of website, it mainly solves and is directed to
Banner loss of learning, situations such as covering, distorting, existing tool are not enough to accurately judge the Web server letter of the website
Cease this thorny problem.Its workflow is as shown in figure 5, send 8 kinds of specific HTTP request (GET/Exist, GET/Too
Long, GET/Not Exist, GET/Attack, HEAD, OPTION, DELETE, TEST), response message is obtained, response is reported
The Header information of text is converted to the fingerprint of set form, by being matched one by one with the data in fingerprint base, calculates hit most
More Web server fingerprints, as final recognition result.
The general frame of Web server identification module is as shown in fig. 6, can be clearly seen that the inside principle of the module.
Wherein, each corresponding HTTP test request, different web server response messages can be variant, by collection, arrangement, divides
These features of class can be used for the identification Web server of efficiently and accurately.
Web components fingerprint identification module 400 is used to detect the module information that one Web site of identification uses, the information pair
It is particularly important in the safety of a website.Definition for Web components, in the present invention as shown in fig. 7, it includes:Front end frame
Frame, application framework, rearward end frame, language and Web server etc..
The workflow of the module is as shown in figure 8, be mainly divided into 3 steps:
(1) HTTP request:The HTTP request (DELETE, the URL being not present) for constructing normal GET request and deformity, is obtained
Take response contents.
(2) Web component fingerprints are acquired:According to the content of http response, Web fingerprints are constructed, include mainly:Keyword, spy
Different structure, static resource file, Cookie, abnormal page etc. acquire the information composition website fingerprint of this 5 aspects.
(3) Web components are identified:Traverse component fingerprint base carries out static matching with website fingerprint, identifies Web used
Component.
Application component fingerprint base 500 is used for storage assembly finger print information, and specific data structure is as shown in figure 9, first
It is whole description information, including:Component Name, generic, component introduction, icon;It is main followed by specific finger print data
To include 5 types:HTML keywords, special file path, static resource file MD5, Cookie and abnormal page hash value,
And fingerprint base is made of such element one by one.
Host services detecting module 600 is used to detect the miniport service etc. of the operating system, opening that identify Web site host
Information, structure are as shown in Figure 10.Basic detection process is:The IP address for parsing reptile Module sites, to these IP address
Well known port (21,80,443,8080 etc.) send the probe messages such as TCP, ICMP, UDP, in the response for obtaining these requests
Hold, series of features value (Hash) is obtained by hash algorithm, then with fingerprint storehouse matching, to identify information needed.Most
Afterwards, the host information analyzed is integrated by the mapping relations of IP and website domain name with the website module data, forms Web
The complete and comprehensive information of website.
Finger print information of the host services fingerprint base 700 for preserving all operating system and miniport service, mainly one
A huge hash value set.
Distributed Web Comments provided by the invention service detection system be one from grind, high performance distributed Web group
Part services detection system, can be deployed in true physical cluster completely, and using the active and standby of master-slave designation and Centroid
Design can be very good to prevent single point failure problem.With this system can support the large-scale Web site information of detection (the whole province,
The whole nation or even the whole world), greatly strengthen the maintenance and strick precaution of cyberspace safety.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to
The limitation present invention, all within the spirits and principles of the present invention made by all any modification, equivalent and improvement etc., should all include
Within protection scope of the present invention.
Claims (8)
1. a kind of distributed Web Component services detection system, which is characterized in that including:Distributed scheduling module, reptile mould
Block, Web server identification module, Web components fingerprint identification module and host services detecting module;
The distributed scheduling module checks the schedule of task, manages whole system for submitting and managing detect operation
The resource of each node, and to operation carry out task fragment, balanced dispatching distribution to each calculate node, while can to save
The abnormal conditions of point and task are fault-tolerant;
The reptile module, for carrying out crawling for Website page content to the task fragment in task queue;
The Web server identification module, type and version information for detecting the Web server used in identification Web site;
The Web components fingerprint identification module, type, title and the version letter of the application component that website uses for identification
Breath;
The host services detecting module, for detect identification site hosts OS Type and version, opened it is normal
With miniport service information.
2. distributed Web Component services detection system according to claim 1, which is characterized in that the distributed tune
Degree module is used to manage and dispatch the life cycle of whole system detect operation, operates in the cluster of common calculate node composition
In, using the processing capacity of multiple working nodes, detection identification is carried out to magnanimity Web components and host information;
The distributed scheduling module receives multiple detect operations that user submits, and manages all job queues, and obtain in real time
Take the executive condition of each operation and task fragment;The resource for monitoring each calculate node dynamically will according to the load of node
One operation is divided into multiple tasks fragment, and is assigned to specific calculate node and executes, and is communicated by heartbeat, and scheduling in time is appointed
Business;It for the exception occurred during task execution, can timely capture, and use retries and fault tolerant mechanism, ensure system
The operation of efficient stable.
3. distributed Web Component services detection system according to claim 1, which is characterized in that the task fragment
For seed URL;
The reptile module dynamically crawls all station datas in the scope of organization for the seed URL of tissue to be crawled,
And link of standing outside automatic fitration, fraternal link is analyzed and extracted, the URL of scheduler module distribution is collected using dynamic proxy technology
The content of pages of website corresponding to list;
The process that crawls of the reptile module is:Content of pages is downloaded from seed URL using breadth-first search
And queue to be crawled is added in the valid link for analyzing same range or mechanism, during crawling, dynamically uses generation
It manages and what is optimized crawls strategy, according to the response of server-side, frequency, time and the access IP of acquisition are adjusted, to effectively keep away
Various anti-reptile mechanism are opened, ensure the corresponding Web pages of accurate complete crawl seed URL.
4. distributed Web Component services detection system according to claim 1, which is characterized in that the Web server
Identification module uses the means of active probe, and specific Web is analyzed according to the characteristic behavior of the response message of server-side HTTP
Type of server and version information;
The identification process of the Web server identification module is:It sends a TCP detection report to the homepage address of Web site
Text sends a variety of HTTP probe messages and obtains response message if can connect, the analysis extraction feature letter from response message
Breath generates fingerprint, is matched with the data inactivity of fingerprint base, counts the matching fingerprint number of each Web server, and selection is therein most
Big Web server of the matching value as the website, and provide corresponding confidence level.
5. distributed Web Component services detection system according to claim 1, which is characterized in that the Web components refer to
Line identification module takes out the Web page content of the website from reptile module, while constructing the HTTP request of a variety of deformities, from clothes
It is engaged in extracting keyword, static file, special file structure, Cookie and abnormal page info in the response message at end, it will
Above-mentioned 5 category feature is configured to the component fingerprint of the website, and the static matching in component fingerprint base, takes and hits the website fingerprint
Application component, as application component used in the Web site.
6. distributed Web Component services detection system according to claim 1, which is characterized in that the host services
Detecting module takes out the IP address of the website from reptile module, sends test connection to the well known port list of the IP, obtains
Then each response message is respectively calculated a hash value, and compares these in host services fingerprint base by response message
Hash value, to identify the OS Type and version, the well known port information on services that is opened of the host, and will identification
The data gone out are combined with the module information of the website, to realize comprehensive identification of component and host.
7. distributed Web Component services detection system according to any one of claims 1 to 6, which is characterized in that also wrap
It includes:Component fingerprint library module;
The component fingerprint library module, for storing all Web component fingerprints.
8. distributed Web Component services detection system according to any one of claims 1 to 6, which is characterized in that also wrap
It includes:Host services fingerprint library module;
The host services fingerprint library module, for storing all hosts and service finger print information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810446405.6A CN108628722A (en) | 2018-05-11 | 2018-05-11 | A kind of distributed Web Component services detection system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810446405.6A CN108628722A (en) | 2018-05-11 | 2018-05-11 | A kind of distributed Web Component services detection system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108628722A true CN108628722A (en) | 2018-10-09 |
Family
ID=63692725
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810446405.6A Pending CN108628722A (en) | 2018-05-11 | 2018-05-11 | A kind of distributed Web Component services detection system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108628722A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109495466A (en) * | 2018-11-06 | 2019-03-19 | 郑州云海信息技术有限公司 | A kind of recognition methods and system of unknown miniport service |
CN109617728A (en) * | 2018-12-14 | 2019-04-12 | 中国电子科技网络信息安全有限公司 | A kind of distributed IP grade network topology probe method based on multi-protocols |
CN109766176A (en) * | 2018-12-29 | 2019-05-17 | 北京威努特技术有限公司 | A kind of scan progress calculation method and device based on large scale network space exploration |
CN110198309A (en) * | 2019-05-14 | 2019-09-03 | 北京墨云科技有限公司 | A kind of Web server recognition methods, device, terminal and storage medium |
CN110233774A (en) * | 2019-05-28 | 2019-09-13 | 华中科技大学 | A kind of Distributed probing method and system of Socks proxy server |
EP3654219A1 (en) * | 2018-11-14 | 2020-05-20 | Baden-Württemberg Stiftung gGmbH | Determining version information of a network service |
CN111212153A (en) * | 2019-12-26 | 2020-05-29 | 成都烽创科技有限公司 | IP address checking method, device, terminal equipment and storage medium |
CN111475464A (en) * | 2020-03-19 | 2020-07-31 | 重庆邮电大学 | Method for automatically discovering and mining fingerprints of Web component |
CN111597053A (en) * | 2020-05-29 | 2020-08-28 | 广州万灵数据科技有限公司 | Cooperative operation and self-adaptive distributed computing engine |
CN111638964A (en) * | 2020-06-09 | 2020-09-08 | 武汉虹旭信息技术有限责任公司 | Centralized internet data acquisition system and acquisition method |
CN113946566A (en) * | 2021-12-20 | 2022-01-18 | 北京大学 | Web system fingerprint database construction method and device and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150082438A1 (en) * | 2013-11-23 | 2015-03-19 | Universidade Da Coruña | System and server for detecting web page changes |
CN106888194A (en) * | 2015-12-16 | 2017-06-23 | 国家电网公司 | Intelligent grid IT assets security monitoring systems based on distributed scheduling |
US20170214771A1 (en) * | 2012-02-01 | 2017-07-27 | Aol Advertising Inc. | Systems and methods for identifying a returning web client |
CN107679085A (en) * | 2017-09-01 | 2018-02-09 | 广州大学 | Data grabber algorithm based on search and spiders |
-
2018
- 2018-05-11 CN CN201810446405.6A patent/CN108628722A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170214771A1 (en) * | 2012-02-01 | 2017-07-27 | Aol Advertising Inc. | Systems and methods for identifying a returning web client |
US20150082438A1 (en) * | 2013-11-23 | 2015-03-19 | Universidade Da Coruña | System and server for detecting web page changes |
CN106888194A (en) * | 2015-12-16 | 2017-06-23 | 国家电网公司 | Intelligent grid IT assets security monitoring systems based on distributed scheduling |
CN107679085A (en) * | 2017-09-01 | 2018-02-09 | 广州大学 | Data grabber algorithm based on search and spiders |
Non-Patent Citations (1)
Title |
---|
王永杰等: "基于指纹分析的Web服务探测技术", 《计算机工程》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109495466A (en) * | 2018-11-06 | 2019-03-19 | 郑州云海信息技术有限公司 | A kind of recognition methods and system of unknown miniport service |
EP3654219A1 (en) * | 2018-11-14 | 2020-05-20 | Baden-Württemberg Stiftung gGmbH | Determining version information of a network service |
WO2020099485A1 (en) * | 2018-11-14 | 2020-05-22 | Baden-Württemberg Stiftung Ggmbh | Determining version information of a network service |
CN109617728A (en) * | 2018-12-14 | 2019-04-12 | 中国电子科技网络信息安全有限公司 | A kind of distributed IP grade network topology probe method based on multi-protocols |
CN109766176A (en) * | 2018-12-29 | 2019-05-17 | 北京威努特技术有限公司 | A kind of scan progress calculation method and device based on large scale network space exploration |
CN110198309A (en) * | 2019-05-14 | 2019-09-03 | 北京墨云科技有限公司 | A kind of Web server recognition methods, device, terminal and storage medium |
CN110233774A (en) * | 2019-05-28 | 2019-09-13 | 华中科技大学 | A kind of Distributed probing method and system of Socks proxy server |
CN110233774B (en) * | 2019-05-28 | 2020-12-29 | 华中科技大学 | Detection method, distributed detection method and system for Socks proxy server |
CN111212153A (en) * | 2019-12-26 | 2020-05-29 | 成都烽创科技有限公司 | IP address checking method, device, terminal equipment and storage medium |
CN111475464A (en) * | 2020-03-19 | 2020-07-31 | 重庆邮电大学 | Method for automatically discovering and mining fingerprints of Web component |
CN111475464B (en) * | 2020-03-19 | 2023-04-25 | 重庆邮电大学 | Method for automatically finding and mining fingerprints of Web component |
CN111597053A (en) * | 2020-05-29 | 2020-08-28 | 广州万灵数据科技有限公司 | Cooperative operation and self-adaptive distributed computing engine |
CN111638964A (en) * | 2020-06-09 | 2020-09-08 | 武汉虹旭信息技术有限责任公司 | Centralized internet data acquisition system and acquisition method |
CN113946566A (en) * | 2021-12-20 | 2022-01-18 | 北京大学 | Web system fingerprint database construction method and device and electronic equipment |
CN113946566B (en) * | 2021-12-20 | 2022-03-18 | 北京大学 | Web system fingerprint database construction method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108628722A (en) | A kind of distributed Web Component services detection system | |
US10977154B2 (en) | Method and system for automatic real-time causality analysis of end user impacting system anomalies using causality rules and topological understanding of the system to effectively filter relevant monitoring data | |
US11182434B2 (en) | Cardinality of time series | |
US9866426B2 (en) | Methods and apparatus for analyzing system events | |
CN107087001B (en) | distributed internet important address space retrieval system | |
Chen et al. | Causeinfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems | |
US8051163B2 (en) | Synthetic transactions based on system history and load | |
US20090125496A1 (en) | Network device and method for monitoring of backend transactions in data centers | |
US11159542B2 (en) | Cloud view detection of virtual machine brute force attacks | |
US9740991B2 (en) | Calculating in-flight metrics for non-interruptible business transactions | |
US10673868B2 (en) | Risk based priority processing of data | |
CN108108288A (en) | A kind of daily record data analytic method, device and equipment | |
CN109074454A (en) | Malware is grouped automatically based on artefact | |
CN110213207A (en) | A kind of network security defence method and equipment based on log analysis | |
Chen et al. | Invariants based failure diagnosis in distributed computing systems | |
Natu et al. | Holistic performance monitoring of hybrid clouds: Complexities and future directions | |
US11792157B1 (en) | Detection of DNS beaconing through time-to-live and transmission analyses | |
CN106453320A (en) | Malicious sample identification method and device | |
Ramachandran et al. | Determining configuration parameter dependencies via analysis of configuration data from multi-tiered enterprise applications | |
Zou et al. | Improving log-based fault diagnosis by log classification | |
US7653742B1 (en) | Defining and detecting network application business activities | |
CN114969450B (en) | User behavior analysis method, device, equipment and storage medium | |
CN114422341B (en) | Industrial control asset identification method and system based on fingerprint characteristics | |
CN113572781A (en) | Method for collecting network security threat information | |
Kalamatianos et al. | Domain independent event analysis for log data reduction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181009 |