CN114745390B

CN114745390B - Network target information acquisition system and method based on micro-service architecture

Info

Publication number: CN114745390B
Application number: CN202210659009.8A
Authority: CN
Inventors: 王国金; 谢峥; 高庆官
Original assignee: Nanjing Cyber Peace Technology Co Ltd
Current assignee: Nanjing Cyber Peace Technology Co Ltd
Priority date: 2022-06-13
Filing date: 2022-06-13
Publication date: 2022-10-04
Anticipated expiration: 2042-06-13
Also published as: CN114745390A

Abstract

The invention discloses a network target information acquisition system and method based on a micro-service architecture. The invention deploys a plurality of engine tools for specifically realizing information acquisition of network targets into one engine service module, the engine service module is deployed in the Pod of the Kubernets cluster in a micro-service mode, and the operation of the Pod copy is managed by a copy controller of the cluster. Each engine tool corresponds to an independent engine execution queue, the management module is established when the engine tools are registered and used for transmitting execution information of the engine tools, the management module plans an acquisition process according to an acquisition task, the engine tools related in the process are called in sequence, and balanced scheduling is achieved among the engine tools in different engine services. The invention can improve the stability, pressure resistance and maintainability of the system.

Description

Network target information acquisition system and method based on micro-service architecture

Technical Field

The invention relates to a network target information acquisition system and method based on a micro-service architecture, belonging to the technical field of network security and computer software.

Background

The network target information acquisition system retrieves and acquires information data of ip and domain names of equipment in a network through a computer technology, analyzes the data, and can provide effective technical support for equipment safety inspection and network protection. The system acquisition engine tool consumes resources and is greatly influenced by network fluctuation, the single machine deployment performance has a bottleneck, and the throughput and the stability of the system can be effectively improved by using a multi-machine distributed deployment scheme.

Fig. 1 is a schematic diagram of a deployment structure of a current information collection system, which mainly includes a Web application module, a management module, a message middleware, an engine service module, and the like. And the Web application module interacts with the user to complete the management of the acquisition task. The management module is mainly responsible for management of tasks and engine scheduling and finishing data analysis functions. The message middleware is used for message communication and finishes the interaction function of the management module and the engine service module. The engine service module consists of a large number of independently deployed engine tools, runs in the docker container, is an engine actual execution environment for task calling, and is responsible for completing calling execution and result returning operation of specific engine tools.

The collection task scheduling execution process mainly comprises the following steps: 1. a user inputs information such as task targets and types and establishes an acquisition task; 2. the management module disassembles the acquisition tasks, analyzes the tasks and needs to use an engine, and arranges the execution steps of the acquisition engine; 3. inquiring the current registered engine tool, and issuing engine execution information to an execution queue of an engine corresponding to the rabbitMQ according to the registration information; 4. the engine service module consumes the execution queue message, executes the engine and sends the execution result to the result queue of the rabbitMQ; 5. the management module consumes the execution result queue message, analyzes and processes the result data, and is used for the user to check.

The existing scheme has the following defects: 1. the engine has many tools, is separately deployed and difficult to maintain, cannot provide uninterrupted service, and cannot well monitor the whole system. 2. The engine tool execution has high requirements on equipment resources and network quality, and when the process is terminated due to the two aspects of abnormity, manual operation is required, and automatic recovery cannot be realized. 3. The system fault tolerance rate is lower, and load can not be balanced when the traffic increases suddenly, and the compressive resistance is poorer.

Disclosure of Invention

The purpose of the invention is as follows: in view of the problems in the prior art, the present invention aims to provide a network target information acquisition system and method based on a microservice architecture, so as to improve the stability, pressure resistance and maintainability of the system.

The technical scheme is as follows: in order to realize the purpose of the invention, the invention adopts the following technical scheme:

a network target information acquisition system based on micro-service architecture comprises:

the Web application module is used for interacting with a user to complete acquisition task management;

the management module is used for planning an acquisition flow according to an acquisition task created by a user and realizing the scheduling of an engine tool according to the acquisition flow;

the system comprises a plurality of engine service modules, a plurality of engine service modules and a plurality of engine service modules, wherein each engine service module is provided with at least one engine tool; the engine tool is used for realizing information acquisition of a network target;

and, a plurality of message queues comprising:

the task issuing queue is used for transmitting and collecting task related information, a producer of the task issuing queue is a Web application module, and a consumer is a management module;

the engine registration queue is used for transmitting engine tool registration information, a producer of the engine registration queue is an engine service module, and a consumer is a management module; the engine tool registration information at least comprises an engine tool number and an engine service number of an engine service where the engine tool is located;

the engine execution queue is established by the management module when the engine tools are registered and is used for transmitting the execution information of the engine tools, a producer of the engine execution queue is the management module, and a consumer is the engine tool corresponding to the queue;

the task result queue is used for transmitting acquired task result information, a generator of the task result queue is a data analysis module for processing data acquired by the engine tool, and a consumer is a Web application module;

the management module, the engine service module and the data analysis module are deployed in the Pod of the Kubernetes cluster in a micro-service mode, and the copy controller of the Kubernetes cluster is used for managing the running of the Pod copy.

Preferably, when registering the engine tools, the management module generates a registration identifier for the engine tool of each engine service and sends the registration identifier to the corresponding engine service module, and the engine service module stores the registration identifier, sends heartbeat information at regular time, and maintains the available state of the engine tool.

Preferably, the management module judges whether the engine service where the engine tool is located is alive before issuing the execution task to the engine tool, and issues the task when at least one engine service is alive.

Preferably, the management module, the engine service module and the data analysis module are registered in an apiserver of a Kubernetes cluster and are uniformly monitored and managed by the cluster.

Preferably, the collection process planned according to the collection task comprises the sequence of engine tool calls, the condition for triggering the sub-process and the condition for terminating the sub-process.

Preferably, the kubernets cluster automatically scales Pod based on Pod level auto-scaling techniques or request counts.

Preferably, the system optimizes the engine tool combinations in the engine service by counting the frequency of use of the engine tools, and the engine tools with similar frequencies of use are combined into one or more engine services.

Preferably, the engine service module is provided with a probe for checking a network state and a process state, and notifying the cluster master node to perform exception recovery when an exception occurs.

Preferably, the Web application module and the database for storing the collection task information and the task result information are deployed in a server outside the kubernets cluster, and the Web application module accesses a message queue inside the cluster through a kubernets Proxy API and sends the collection task to the cluster for processing.

The network target information acquisition method realized by the network target information acquisition system based on the micro service architecture comprises the following steps:

integrating one or more engine tools into different engine service modules, deploying a management module, the engine service module and a data analysis module in a Pod of a Kubernets cluster in a microservice mode, and configuring Pod copy number or Pod copy automatic expansion and capacity rules of each module;

after being successfully started, each micro service is registered in the apiserver of the Kubernetes cluster; after the engine service is started, the installed engine tools are scanned, and the engine tool numbers and the engine service numbers are sent to the management module through an engine registration queue;

after receiving the engine tool registration information, the management module creates an engine execution queue corresponding to the engine number of each engine tool for transmitting the execution information of the engine tool;

the Web application module receives an acquisition task created by a user and sends the acquisition task to the management module through a task issuing queue;

the management module plans an acquisition flow according to an acquisition task created by a user, schedules a scheduling sequence of a plurality of engine tools in the acquisition flow, and writes execution information of the engine tools into an engine execution queue corresponding to the engine tool number according to the engine tool number in the acquisition flow; an idle consumer of the engine execution queue acquires execution information to realize information acquisition of a network target, and stores acquired original result data;

and the data analysis module processes the acquired original result data and sends the task result information to the Web application module through the task result queue.

Has the advantages that: compared with the prior art, the invention has the following advantages:

1. the invention manages each engine tool based on k8s cluster architecture, and can provide uninterrupted service. And moreover, pod elastic expansion and contraction of the cluster can be realized in multiple angles according to the number of used resources and service requests of the system based on HPA and KPA technologies, and the stability and the pressure resistance of the system operation are ensured.

2. According to the invention, all functional modules are decoupled and run in the cluster in a micro-service mode, and the number of Pod copies can be increased and the system performance can be improved aiming at module services with higher use frequency and higher pressure.

3. The invention integrates a plurality of engine tools into one engine service, is convenient for maintenance, reduces the number of micro-services to be created, and ensures that the engine service is always operated through the monitoring and the assurance of the apiserver; and the load balance among the same engine tools in a plurality of engine services is realized through the message queue technology.

Drawings

Fig. 1 is a schematic diagram of a conventional network target information acquisition system deployment.

Fig. 2 is a schematic deployment diagram of a network target information acquisition system in an embodiment of the present invention.

FIG. 3 is a flow chart illustrating engine services and engine tool registration in an embodiment of the present invention.

Fig. 4 is a schematic diagram illustrating a collection task creating and implementing process in the embodiment of the present invention.

Fig. 5 is a schematic diagram illustrating task detection flow planning in the embodiment of the present invention.

Detailed Description

The technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings and specific embodiments.

As shown in fig. 2, the network target information acquisition system based on the micro-service architecture disclosed in the embodiment of the present invention mainly includes a Web application module, a management module, a plurality of engine service modules, and a plurality of message queues; the Web application module is used for interacting with a user to complete collection task management; the management module is used for planning an acquisition flow according to an acquisition task created by a user and realizing the scheduling of an engine tool according to the acquisition flow; each engine service is provided with at least one engine tool, and at least one engine service is provided with a plurality of engine tools; the task issuing queue is used for transmitting relevant information of the collected tasks, a producer of the task issuing queue is a Web application module, and a consumer is a management module; the engine registration queue is used for transmitting engine tool registration information, a producer of the engine registration queue is an engine service module, a consumer is a management module, and the engine tool is registered to the management module through the queue when being started; each engine tool corresponds to an engine execution queue, the engine execution queue is created by a management module during engine tool registration and is used for transmitting execution information of the engine tools, a producer of the engine execution queue is the management module, a consumer is the engine tool corresponding to the queue, one queue corresponds to a plurality of consumers, and idle consumers in each queue are selected when the execution information is issued; the task result queue is used for transmitting acquired task result information, a generator of the task result queue is a data analysis module for processing data acquired by the engine tool, and a consumer is a Web application module.

In the embodiment of the invention, modules except the Web application module are distributed and deployed in a Kubernets cluster (k 8s for short) Pod (minimum unit for deployment and control) in a micro-service mode, and service information is registered in an apiserver and is uniformly monitored and managed by the cluster. And a replication controller (replica controller) is used for managing the operation of the Pod replica container, so that a certain number of pods are ensured to operate at the cluster time, and the stable operation of the system is ensured.

The engine tool is used for realizing information acquisition of network targets, and mainly comprises two types, namely a command type tool and a local tool. The imperative tool is independent of the environment and can be executed in any docker container. The local tool needs to be manufactured into a docker image because the installation process is complex or needs to be executed in a specific operating system (windows, linux), language environment (java, python, go and the like), and the execution operation is completed by depending on a container created by the image. In the embodiment of the invention, a plurality of engine tool combinations are installed in one or more engine services according to the operating environment (windows, linux, etc.), the use frequency (the use frequency of each tool is counted, and the use frequencies are combined in a similar way), so that the resources are reasonably distributed, and the total number of micro services needing to be started is reduced.

The embodiment of the invention moves the Web application module to the outside of the cluster, can support a plurality of Web clients to access services in different domains by a remote calling mode, and is suitable for a scene of data isolation among the plurality of clients. The Web application part + Mysql (which stores and collects relevant information of the task and can also be other databases) is deployed in a server outside the cluster, accesses the rabbitMQ (which can also be other message queues) inside the cluster through a Kubernets Proxy API, and sends the task to the cluster for processing.

One server is used as a master node, a plurality of servers are used as node nodes to build a k8s cluster, an ipvs is adopted to realize a request forwarding function, all modules of the system are stored in the cluster in a docker mirror image mode, and a Pod copy is created and started through the mirror image when the system is used. The number of the operated Pod copies is controlled by setting the replicas parameter values in the configuration file, the replication controller module manages the started Pod, closes the abnormal Pod and creates a new Pod, so that the Pod copies with specific number are always operated in the cluster.

Each module in the embodiment of the invention can adopt the following scheme to perform Pod automatic expansion and contraction according to actual conditions.

Based on HPA (Horizontal Pod automation) Pod level automatic scaling technology, namely, capacity expansion is automatically carried out by monitoring that some indexes (CPU utilization rate, disk and the like) of a Pod exceed a specified threshold. If the upper limit and the lower limit of the Pod copy, the threshold range of the CPU utilization rate or the threshold range of the disk utilization rate are set, the capacity is expanded when the range is exceeded, and the capacity is reduced when the range is lower.

Based on KPA (Knative Pod Autoscaler): the method is suitable for expanding capacity when the utilization rate of individual engine tools in engine services is suddenly increased, and the process is similar to HPA, and the upper limit and the lower limit of Pod copies and the threshold of the number of requests are set.

A StartProbe probe is added in a Pod copy for checking the network state, the service process state and the like, and a master node is timely notified when an exception occurs, so that the abnormal Pod is recovered.

The invention can also construct a physical domain and a logical domain, wherein the physical domain refers to that the system is deployed based on a plurality of clusters and carries out information acquisition or differentiation comparison aiming at domestic targets and foreign targets using domestic clusters and foreign clusters; the logic domain is to split a cluster into a plurality of logical clusters, which are isolated from each other, if the public network is isolated from the company intranet, the different company intranets are isolated from each other, and the use diversity and safety of the system are improved.

The method for realizing the network target information acquisition based on the network target information acquisition system mainly comprises the following steps: integrating one or more engine tools into different engine service modules, deploying a management module, the engine service module and a data analysis module in a Pod of a Kubernets cluster in a microservice mode, and configuring Pod copy number or Pod copy automatic expansion and capacity rules of each module; after being successfully started, each micro service is registered in the apiserver of the Kubernetes cluster; after the engine service is started, the installed engine tools are scanned, and the engine tool numbers and the engine service numbers are sent to the management module through an engine registration queue; after receiving the engine tool registration information, the management module creates an engine execution queue corresponding to the engine number of each engine tool for transmitting the execution information of the engine tool;

the Web application module receives an acquisition task created by a user and sends the acquisition task to the management module through a task issuing queue; the management module plans an acquisition flow according to an acquisition task created by a user, schedules a scheduling sequence of a plurality of engine tools in the acquisition flow, and writes execution information of the engine tools into an engine execution queue corresponding to the engine tool number according to the engine tool number in the acquisition flow; an idle consumer of the engine execution queue acquires execution information to realize information acquisition of a network target, and stores acquired original result data; and the data analysis module processes the acquired original result data and sends the task result information to the Web application module through the task result queue.

The following describes in detail the registration process, the creation of the information collection task, and the implementation process involved in the embodiments of the present invention with reference to fig. 3 to 5.

In the embodiment of the invention, the registration management of the engine service and the engine tool is completed by adopting the k8s native apiserver and combining the rabbitMQ message queue, and the specific registration is as shown in FIG. 3.

Engine service registration: integrating and installing a plurality of engine tools with the same running environment into an engine service mirror image, creating a Pod copy running based on the mirror image, and registering the Pod copy running into the apiserver after the startup is successful.

Engine tool registration: after the engine service registration is completed, the engine tool starts to be registered. The method comprises the steps that engine services scan installed engine tools in an environment, the engine tools send relevant information such as engine service numbers (each engine service is automatically generated and not repeated when being started) where the engine tools are located to a management module through a rabbitMQ engine registration queue, the management module generates registration identifiers and creates independent engine execution queues for each engine tool, and queue name rules can adopt the engine tool numbers and random identifiers (the random identifiers are automatically generated through registration each time) and store the information of a volume of interest to a memory; the stored content includes engine tool number, engine service number, engine execution queue name, registration mark, etc. The management module sends the registration identifier and the engine execution queue information back to the engine service, and the engine service stores the registration identifier and sends heartbeat information regularly to keep the available state of the registration engine tool.

FIG. 4 illustrates a network object information collection task creation and implementation process. The method specifically comprises the following steps:

(1) Acquisition task creation and distribution

In the embodiment of the invention, one or more types of network target information in ip, domain name, enterprise keywords and the like can be acquired, when a user creates a task on a system page, besides a default engine tool (such as whois, domain name ip reverse-check and the like) which needs to be associated with each task type, the user can select the engine tool which needs to be used in a scanning process, such as: and acquiring engine tools with related functions such as geographic position information, operating system information, vulnerability information, an open port protocol and the like, and after a user determines that a task is created, transmitting the task related information to a management module through a rabbitMQ task lower transmission queue.

(2) Generating a probing flow

And associating default engine tools with engine tools selected by a user according to the task types, and finishing the planning of the detection flow before the task starts to be executed, wherein the planning comprises the sequence of engine calling, conditions for triggering the sub-flow, conditions for terminating the sub-flow and the like. As shown in fig. 5, a collection task is created for the network object X, wherein the engine tool 1 collects the result data set, and if there is a result, the engine tool 2 and the engine tool 4 are used for a to perform the next information collection. In addition, more intelligent detection flow planning can be supported, behavior habits of users on engine tools, such as detection result screening and the like are analyzed, the detection flow of arranging tasks is completed in a guidance selection mode, and operation steps of manual input of the users are reduced.

(3) Engine execution task issuing

Before the task is issued, the management module splits the task detection flow into execution information of a plurality of independent engine tools, including task information collection (collection target, priority and the like) and engine tool execution information (engine tool number, input parameters and the like).

When a task is issued, the engine execution information disassembled in the steps is issued one by one, and the management module can firstly check the engine service running state of the engine tool; when at least one engine service in the survival state exists, the execution information of the engine tool starts to be sent to one of the engine service survival states running the engine tool through the independent engine execution queue.

(4) Load balancing scheduling

Load balancing can be divided into two aspects of system module load balancing and engine tool load balancing: the apiserver is responsible for load balancing of each module of the system, such as the management module and the data analysis module, which start load balancing among a plurality of Pod copies with the same function. The rabbitMQ is responsible for servicing load balancing among internal engine tools for multiple engines, such as: the engine service 1 runs the engine tool a and the engine tool b, and the engine service 2 runs the engine tool a and the engine tool c, and balanced scheduling of the engine tool a in the two engine services is completed through independent engine execution queues.

(5) Data storage

The data storage in the embodiment uses an elastic search cluster + fastdfs, and is stored by taking a collection task id and an engine tool id as main keys;

the high-performance key-value database Redis is used as a cache structure and used for storing data with extremely small change and extremely high utilization rate, such as a task default detection flow, user dictionary table data and the like; the process temporary data occupying more resources and the data influencing the subsequent scheduling process of the system are not suitable for being stored in a memory and a database, such as recording the execution progress of the number of task engine tools.

(6) Data analysis

And the data analysis is processed by using a java program, the original result data acquired by the engine tool is converted and cleaned to finally obtain the data required by the user, and the data are sent to the Web application module through a task result queue of the rabbitMQ and stored in the Mysql database.

For a scene with a large data volume, more efficient components for data distributed processing can be used, such as kafka replacing rabbitMQ, hdfs replacing fastdfs and the like, a distributed data processing engine is used for completing data analysis processing, such as spark, flash and the like, technology development data analysis modules are used, and the processing performance of the large data volume is improved.

Claims

1. A network target information acquisition system based on micro-service architecture is characterized by comprising:

the system comprises a plurality of engine service modules, a plurality of engine service modules and a plurality of engine service modules, wherein each engine service module is provided with at least one engine tool; the engine tool is used for realizing information acquisition of a network target; integrating and installing a plurality of engine tools with the same operating environment into an engine service docker mirror image, and creating a Kubernetes cluster Pod copy based on the engine service docker mirror image; the system optimizes engine tool combinations in the engine services by counting the use frequency of the engine tools, and the engine tools with similar use frequencies are combined into one or more engine services;

and, a plurality of message queues comprising:

the engine execution queue is established by the management module when the engine tools are registered and is used for transmitting the execution information of the engine tools, the producer of the engine execution queue is the management module, and the consumer is the engine tool corresponding to the queue; load balancing among the same engine tools in a plurality of engine services is realized through an engine execution queue;

2. The microservice-architecture-based network target information acquisition system according to claim 1, wherein the management module generates a registration identifier for each engine tool of the engine service when performing engine tool registration, and sends the registration identifier to the corresponding engine service module, and the engine service module stores the registration identifier, and sends heartbeat information periodically to keep the available state of the engine tool.

3. The micro-service architecture based network target information collection system of claim 1, wherein the management module determines whether engine services of the engine tool are alive before issuing the execution task to the engine tool, and issues the task when at least one engine service is alive.

4. The micro-service architecture based network target information acquisition system according to claim 1, wherein the management module, the engine service module and the data analysis module are registered in the apicerver of the kubernets cluster and are managed by cluster unified monitoring.

5. The micro-service architecture based network target information acquisition system according to claim 1, wherein the acquisition flow planned according to the acquisition task includes a sequence of engine tool calls, a condition for triggering the sub-flow, and a condition for terminating the sub-flow.

6. The micro-service architecture based network target information collection system of claim 1, wherein kubernets cluster is based on Pod level auto scaling technology or request number to Pod auto scaling.

7. The micro-service architecture based network target information acquisition system according to claim 1, wherein the engine service module is provided with a probe for checking a network status and a process status, and notifying the cluster master node of an exception recovery when an exception occurs.

8. The system according to claim 1, wherein the Web application module, and the database storing the collection task information and the task result information are deployed in a server outside the kubernets cluster, and access a message queue inside the cluster through a kubernets Proxy API to send the collection task to the cluster for processing.

9. The network target information collection method realized based on the network target information collection system based on the micro-service architecture according to any one of claims 1 to 8, characterized by comprising the following steps:

integrating one or more engine tools into different engine service modules, deploying a management module, an engine service module and a data analysis module in a Pod of a Kubernetes cluster in a micro-service mode, and configuring the Pod copy number or Pod copy automatic expansion and capacity rules of each module;