Disclosure of Invention
The invention aims to provide a cross-domain operation and maintenance system design based on a one-way network gate environment, so as to solve the problem of monitoring the resource situation between different security network ends in the comprehensive operation and maintenance process of an enterprise large-scale data center.
The technical solution for realizing the purpose of the invention is as follows: a cross-domain operation and maintenance system based on a unidirectional gatekeeper environment comprises n sub-network operation and maintenance management systems (n is a natural number), a unidirectional gatekeeper and a core network operation and maintenance management system;
the sub-network operation and maintenance management system is used for sending operation and maintenance data to the outside through the unidirectional network gate;
the subnet operation and maintenance management system is also used for monitoring the operation and maintenance state of the current data center equipment, and simultaneously receiving operation and maintenance monitoring data of a data aggregation center (the data aggregation center can be understood as a machine room) which belongs to the current data center and is in a different geographical position;
the core network operation and maintenance management system is used for monitoring the operation and maintenance state of the current data center equipment and receiving operation and maintenance monitoring data of other data centers through the unidirectional network gate;
the operation and maintenance data of the data center monitored by other sub-network terminals can be checked in the core network operation and maintenance management system, and only the operation and maintenance data of the data center monitored by the network terminal can be checked in the sub-network operation and maintenance management system;
the unidirectional network gate is in unidirectional communication and is responsible for receiving operation and maintenance monitoring data of various resources of each sub-network operation and maintenance management system and uploading the operation and maintenance monitoring data to the core network operation and maintenance management system.
The system can operate on a single server and also can operate on m servers, wherein m is a natural number more than or equal to 2;
when the system is deployed on a server, monitoring the operation and maintenance state of a single data center;
and when the m sets of corresponding systems are deployed on the m servers, monitoring the operation and maintenance states of more than two independent data centers.
A data center, i.e. a collection of large IT resources (computing devices, storage devices, network devices, etc.), may be understood as a collection of multiple computer rooms.
And the security level of the sub-network operation and maintenance management system is lower than that of the core network operation and maintenance management system.
Each sub-network operation and maintenance management system comprises a sub-network operation and maintenance management system data acquisition module and an alarm management module;
the subnet operation and maintenance management system carries out self management according to a regional autonomy mode, a subnet operation and maintenance management system data acquisition module receives basic information, state information and operation and maintenance data of various resource devices of each data aggregation center within the range of the network, the received data are merged and uniformly formatted, then the state information of the resources is extracted, the comprehensive operation and maintenance situation of the network is formed and graphically displayed, and the data are pushed inwards and outwards;
the alarm management module is used for defining an alarm threshold value and sending alarm information.
The various resource devices comprise hardware servers, virtual machines, memories, switches, routers, middleware, containers, software and application programs;
the basic information comprises equipment model information, configuration information, purchase use information, production use information and position information of the resources;
the state information comprises the running state (normal/abnormal) of the resource and the alarm state (whether to alarm or not/alarm level);
the operation and maintenance data comprise CPU utilization rate, disk utilization rate, memory utilization rate and port flow.
The data internal pushing means that the sub-network operation and maintenance management system data acquisition module pushes the processed operation and maintenance data to the alarm management module, the operation and maintenance data is compared with an alarm threshold defined by the alarm management module, alarm information is automatically sent out for the data exceeding the threshold range, an alarm source is positioned, namely the data is specifically positioned to the position of the fault equipment, and the resource unified planning in the network range is assisted.
The alarm threshold value is that a value range is defined for a monitoring index item of the resource, if an actual value acquired by monitoring is within the defined value range, the system judges that the resource state is normal, and if the actual value acquired by monitoring is outside the defined value range, the system judges that the resource state is abnormal, and then an alarm is given.
The data push-out means that the sub-network operation and maintenance management system pushes the processed operation and maintenance data to the unidirectional gatekeeper, and finally the processed operation and maintenance data reaches the core network operation and maintenance management system through the unidirectional gatekeeper.
The core network operation and maintenance management system collects the state information and operation and maintenance data of various resource devices which are reported by each sub-network operation and maintenance management system and are formatted, extracts the state information of the resources, forms the comprehensive operation and maintenance situation of the whole network in the core network operation and maintenance management system, displays the comprehensive operation and maintenance situation graphically, and provides an alarm function.
The core network operation and maintenance management system comprises a core network operation and maintenance management system data acquisition module and an alarm management center module;
the core network operation and maintenance management system data acquisition module pushes operation and maintenance data to an alarm management center module, the operation and maintenance data are compared with an alarm threshold defined by the alarm management center module, alarm information is automatically sent out for the data exceeding the threshold range, an alarm source is positioned, namely the alarm source is specifically positioned to a data aggregation center below a network, and unified planning of resources in the whole network range is assisted.
The system completes cross-domain operation and maintenance by executing the following steps:
step 1, installing and deploying a cross-domain operation and maintenance system at a high-security core network end and a low-security subnet end respectively, wherein the core network operation and maintenance management system opens an operation and maintenance situation data receiving interface, the subnet operation and maintenance management system opens an operation and maintenance situation pushing interface, and the two are linked through a single gatekeeper;
step 2, after the system deployment is finished, configuring monitoring items and alarm thresholds according to the monitored resource types; if the monitoring items of the computing resource configuration are: CPU utilization rate, memory utilization rate, disk utilization rate and the like; the alarm threshold is set to 85% of the CPU utilization rate, 80% of the memory utilization rate and 90% of the disk utilization rate. When the system actually monitors that the CPU utilization rate of the acquired monitoring item index value is greater than 85%, or the memory utilization rate is greater than 80%, or the disk utilization rate is greater than 90%, an alarm is generated.
Step 3, after the system is put into operation, the sub-network operation and maintenance management system collects the resource state information of each data aggregation center in the administered range, compares the resource state information with the alarm threshold value defined by the alarm management module, automatically sends out alarm information for the data exceeding the threshold value range, positions the alarm source, specifically positions the position of the fault equipment, simultaneously forms the resource situation of each point according to the points of the data aggregation center, and carries out graphical display;
step 4, the subnet operation and maintenance management system extracts the basic information and the state information of the monitored resource equipment and sends the basic information and the state information to the unidirectional network gate, and the unidirectional network gate writes a subnet operation and maintenance information file after receiving the data sent by the subnet operation and maintenance management system, encrypts the file and stores the encrypted file;
step 5, the core network operation and maintenance management system accesses the subnet operation and maintenance information file through the position designated by the unidirectional network gate, decrypts and analyzes the subnet operation and maintenance information file, and then graphically displays all data to finally form a full-network comprehensive operation and maintenance situation map; and simultaneously, carrying out cooperative positioning on the operation and maintenance faults of each sub-network end, namely specifically positioning to a data aggregation center below the network, and carrying out exception prompt on the whole network comprehensive operation and maintenance situation map.
Aiming at the problems in the prior art, the invention designs a cross-domain operation and maintenance system based on a one-way gatekeeper environment in consideration of the conditions that data of data centers in different geographical positions and different security levels cannot be directly accessed and the whole network resource monitoring cannot be carried out at the same network end. The data flow is limited by arranging a unidirectional network gate between networks with different security levels, and a low-security level network end transmits data to a high-security level network end through the unidirectional network gate; and the high-security network terminal analyzes and displays the data, sends the fault handling scheme and the resource comprehensive planning scheme for the sub-network to the sub-network operation and maintenance management system, and delivers the operation and maintenance personnel at the sub-network terminal to execute the operation. The system realizes comprehensive monitoring of the cross-security-level and cross-domain resources, has the characteristics of wider coverage range of monitoring resources, more accurate and efficient fault solution, more reasonable resource distribution and the like, and can promote the intelligent and unmanned development of operation and maintenance of the enterprise data center.
Has the advantages that: compared with the prior art, the invention has the following remarkable advantages: the network domain range of resource monitoring is enlarged, the function that the comprehensive operation and maintenance management system originally can only monitor the network resources is expanded to the function that the resources of different areas and different security networks can be monitored, the accuracy of fault positioning and the rationality of resource distribution are improved, and the intelligent development of operation and maintenance management can be promoted.
Examples
Fig. 1 is a system composition diagram of a cross-domain operation and maintenance system based on a unidirectional gatekeeper environment according to the present invention.
When a certain cloud data center wants to introduce the operation and maintenance system of the present invention, the number of network ends of the data center and the operation state of the device are not determined, and this example assumes that there are 3 network ends (core network end, subnet end a, subnet end B), and takes 4 typical cases to explain.
Communication among the 3 network ends is communicated, and cross-domain operation and maintenance systems are respectively deployed at a high-security core network end and a low-security sub-network end, wherein the core network end opens a whole-network comprehensive operation and maintenance situation function, namely an operation and maintenance situation data receiving interface. And respectively configuring monitoring on resource equipment such as computing, storage, network, safety and the like at each network end.
Case 1: after the data center is put into use for a period of time, if equipment monitored by the core network end fails, if a certain computing resource server is down and cannot be started automatically, the core network end operation and maintenance management system sends detailed alarm information, including information such as basic information, failure occurrence time, failure type, failure duration and failure state of the computing resource. Meanwhile, the monitoring interface of the comprehensive operation and maintenance situation of the whole network sends an alarm to prompt that one computing resource server at the core network end fails. And the alarm is not released until the operation and maintenance personnel at the core network end confirm that the fault is solved.
Case 2: after the data center is put into use for a period of time, when the equipment monitored by the subnet end a fails, if a certain switch server is down and cannot be used, the operation and maintenance management system of the subnet end a sends detailed alarm information, which includes information such as basic information, failure occurrence time, failure type, failure influence range, failure duration, failure state and the like of the switch server. Meanwhile, the monitoring interface of the comprehensive operation and maintenance situation of the whole network of the core network end sends an alarm to prompt that one switch of the sub-network end A has a fault, and the operation and maintenance personnel of the core network end can check the specific position and the influence range of the fault on the topological graph. When the fault of the subnet end A is solved and operation and maintenance personnel confirm on the system, the alarms of the subnet end A and the core network end are all released; if the alarm exceeds a certain time threshold range and is not released, the operation and maintenance management system of the core network end sends a service notice to the operation and maintenance management system of the subnet end A, and the operation and maintenance management system of the subnet end A sends a service reminding notice to responsible operation and maintenance personnel.
Case 3: after the data center is put into use for a period of time, the equipment monitored by the sub-network A fails, if a certain virtual machine fails to continue to be used, but the problem can be solved only by restarting, the operation and maintenance management system of the sub-network A sends detailed alarm information, which includes information such as basic information, failure occurrence time, failure type, failure influence range, failure duration, failure state and the like of the virtual machine. And the operation and maintenance management system of the subnet end A automatically selects a fault solution according to the fault type, sends the instruction to the corresponding service management system, and the service management system automatically executes the corresponding script according to the received instruction. The core network end also reflects and records the process from occurrence to solution of the fault.
Case 4: after the data center is put into use for a period of time, the utilization rate of the device resources monitored by the subnet end B becomes too high and becomes tight, and if the disk utilization rate of a certain storage device C reaches 90%, the operation and maintenance management system of the subnet end B sends out early warning information, including information such as basic information, resource utilization rate, influence range and the like of the storage device. Meanwhile, the monitoring interface of the comprehensive operation and maintenance situation of the whole network of the core network end sends out early warning to prompt that the disk space of the storage equipment C of the sub-network end B is tight. After the tightening of the subnet end B is solved, the early warning of the subnet end B and the core network end can be released; if the early warning exceeds a certain time threshold range and is not released, the operation and maintenance management system of the core network end sends a service notice to the operation and maintenance management system of the subnet end B and gives a planning suggestion, and the operation and maintenance management system of the subnet end B sends a service reminding notice to responsible operation and maintenance personnel.
The present invention provides a cross-domain operation and maintenance system based on unidirectional gatekeeper environment, and the method and the way for implementing the technical solution are many, the above description is only the preferred embodiment of the present invention, it should be noted that, for those skilled in the art, without departing from the principle of the present invention, several improvements and modifications can be made, and these improvements and modifications should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.