CN115766715B - Super-fusion cluster monitoring method and system - Google Patents

Super-fusion cluster monitoring method and system Download PDF

Info

Publication number
CN115766715B
CN115766715B CN202211331535.8A CN202211331535A CN115766715B CN 115766715 B CN115766715 B CN 115766715B CN 202211331535 A CN202211331535 A CN 202211331535A CN 115766715 B CN115766715 B CN 115766715B
Authority
CN
China
Prior art keywords
monitoring
virtual machine
super
service
configuration file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211331535.8A
Other languages
Chinese (zh)
Other versions
CN115766715A (en
Inventor
杜英杰
徐文豪
张凯
王弘毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SmartX Inc
Original Assignee
SmartX Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SmartX Inc filed Critical SmartX Inc
Priority to CN202211331535.8A priority Critical patent/CN115766715B/en
Publication of CN115766715A publication Critical patent/CN115766715A/en
Application granted granted Critical
Publication of CN115766715B publication Critical patent/CN115766715B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The application provides a high-availability super-fusion cluster monitoring method and system. The high-availability super-fusion cluster monitoring method specifically comprises the following steps: deploying a monitoring service in a virtual machine mode, wherein the monitoring service is arranged in a virtual machine mirror image, and presetting a corresponding configuration file according to a monitoring strategy; monitoring each node in the super-fusion cluster according to the acquired configuration file, acquiring monitoring data corresponding to a monitoring strategy, and aggregating the monitoring data; and establishing communication of the virtual machine super-fusion cluster, providing corresponding data query requirements and/or alarm service requirements according to the configuration file and the monitoring data, deploying the monitoring service in a virtual machine mode, isolating the monitoring service from the super-fusion cluster, ensuring high availability of the monitoring service, continuously collecting the monitoring data, updating the configuration file in a hot reloading mode, and further improving the stability of cluster monitoring.

Description

Super-fusion cluster monitoring method and system
Technical Field
The application relates to the technical field of server clusters, in particular to a high-availability super-fusion cluster monitoring method and system.
Background
The monitoring and/or alarming function in the super-fusion cluster is a function capable of collecting cluster state data, providing monitoring data inquiry to the outside and/or sending alarm. The implementation of the super-fusion cluster monitoring needs to have the following functions: and collecting various monitoring indexes (including resource utilization rate, service running state, cluster performance indexes and the like) of all nodes in the cluster, providing monitoring data query to the outside, triggering an alarm and the like according to a preset alarm rule, and transmitting the monitoring data query to a corresponding target user in various forms.
The realization of the functions of the existing super-fusion cluster monitoring system mainly has the following problems:
firstly, when the monitoring service is directly deployed on the super-fusion cluster node, the monitoring service cannot be isolated from the super-fusion cluster software, so that the stability of the super-fusion software is affected when resources such as a CPU, a memory, a disk, a network bandwidth and the like are tense; storage and computing resources required by cluster monitoring cannot be configured as required, if monitoring services are deployed by all nodes on the cluster, resource redundancy is caused, and consistency of monitoring data on multiple nodes cannot be ensured.
Secondly, when the monitoring service is deployed on the super-fusion cluster node directly, if a serious fault which causes data loss occurs on the node, the monitoring data of the whole cluster is lost, and because the cluster monitoring service operates on a single node, when the abnormal whole of the node is unavailable, the monitoring service is unavailable, so that the monitoring function of the super-fusion cluster is unstable.
Thirdly, when the monitoring data is directly obtained from the nodes of the cluster, when the source of the monitoring data is required to be changed or the supply end of the monitoring data is changed, the corresponding monitoring service is also required to be changed and adapted, the coupling degree of different components is higher,
fourth, in practical applications, users often have various monitoring data query and/or alarm receiving requirements, but current implementation can only query and receive alarms in a fixed manner, and the data query and/or alarm of cluster monitoring is not flexible enough.
Disclosure of Invention
In order to solve the problems that in the prior art, monitoring services are directly deployed on cluster nodes, so that storage and calculation resources required by the monitoring services cannot be configured as required, resource waste is caused, cluster monitoring is unstable, and query and/or alarm services of monitoring data are not flexible enough. The application provides a high-availability super-fusion cluster monitoring method and system. Specifically, a first aspect of the present application provides a highly available super-fusion cluster monitoring method, including the following steps:
deploying a monitoring service in a virtual machine mode, wherein the monitoring service is built in a virtual machine mirror image;
under the condition that the deployment of the monitoring service is completed, presetting a corresponding configuration file according to a monitoring strategy;
monitoring each node in the super-fusion cluster according to the acquired configuration file, acquiring monitoring data corresponding to a monitoring strategy, and aggregating the monitoring data;
and establishing communication of the virtual machine super-fusion cluster, and providing corresponding data query requirements and/or alarm service requirements according to the configuration file and the monitoring data.
In one possible implementation of the first aspect, the monitoring service is embedded in a virtual machine image, including:
constructing a container mirror image of the monitoring service;
loading a container mirror image of the monitoring service under the condition that the installation of the virtual machine operating system is completed;
the monitoring service is built into the virtual machine image based on the container image.
In a possible implementation of the first aspect, deploying the monitoring service in the form of a virtual machine further includes:
pre-configuring life cycle management services;
in the case of a virtual machine start-up, a lifecycle management service is invoked to control the virtual machine and the container state inside the virtual machine through a periodic task.
In one possible implementation of the first aspect, invoking the lifecycle management service to control the virtual machine and the container state inside the virtual machine by the periodic task includes:
acquiring a real-time current state of a virtual machine and an expected state of the virtual machine in a current checking period;
determining a virtual machine operation instruction of a current checking period based on the real-time state of the virtual machine and the expected state of the virtual machine, wherein the virtual machine operation instruction at least comprises the steps of creating a virtual machine or starting the virtual machine, starting a container or shutting down the virtual machine, and deleting the virtual machine or null value;
virtual machine instructions are executed to control the virtual machine and the state of the container within the virtual machine.
In a possible implementation manner of the first aspect, in a case that the virtual machine operation instruction is a null value, the virtual machine operation instruction is determined again waiting for entering a next current checking period.
In one possible implementation of the first aspect, after presetting a corresponding configuration file according to a monitoring policy, establishing communication between a monitoring service and a virtual machine, further includes:
configuring an internal virtual network bridge on a node of the super fusion cluster, wherein the internal virtual network bridge presets a static IP;
the communication between the virtual machine deploying the monitoring service and the related service of the super-fusion cluster is realized through static IP;
the virtual machine is provided with a network card correspondingly connected with an internal virtual network bridge of the host machine.
In one possible implementation manner of the first aspect, monitoring each node in the super-fusion cluster according to the obtained configuration file, obtaining monitoring data corresponding to the monitoring policy, and aggregating the monitoring data includes:
setting a data acquisition interface at a node of the super fusion cluster, wherein the data acquisition interface is used for acquiring monitoring data;
registering a configuration file to a distributed database, wherein the configuration file at least comprises a data acquisition interface, data acquisition configuration and alarm rules;
and collecting monitoring data corresponding to the monitoring strategy according to the acquisition interface, and aggregating the monitoring data.
In a possible implementation manner of the first aspect, the monitoring method further includes:
in the event of an update of the configuration file in the distributed database, the configuration file is updated in a hot-reload manner.
In a possible implementation of the first aspect, providing the corresponding data query requirement and/or the alarm service requirement according to the configuration file and the monitoring data includes: the results of the inquiry and/or alarm service are displayed through application layer protocols, an email system and webpage side information.
The second aspect of the present application provides a high availability super-fusion cluster monitoring system, which is applied to the aforementioned high availability super-fusion cluster monitoring method, and the system includes:
the deployment module is used for deploying the monitoring service in the form of a virtual machine, and the monitoring service is built in the virtual machine image;
the acquisition module is used for presetting a corresponding configuration file according to a monitoring strategy under the condition that the monitoring service deployment is completed;
the monitoring module monitors each node in the super-fusion cluster according to the acquired configuration file, acquires monitoring data corresponding to the monitoring strategy, and aggregates the monitoring data;
and the processing module is used for establishing communication of the virtual machine super-fusion cluster and providing corresponding data query requirements and/or alarm service requirements according to the configuration file and the monitoring data.
Through the technical scheme that this application put forward, possess following beneficial technical effect at least:
1. the monitoring service is rapidly deployed in the form of the virtual machine image, and a user can deploy the monitoring service by one key only by uploading the virtual machine image file; the isolation of the service calculation, storage, network and other resources of other systems of the super-fusion cluster is realized in the form of an independent monitoring service virtual machine; when the virtual machines of the monitoring service are independently deployed to run, the condition that the monitoring service runs in the cluster is not required to be considered, and when the monitoring service is abnormal and needs to be arranged, the role node change record in the cluster in the previous period is also not required to be considered, so that the operation and maintenance difficulty is reduced;
2. the high-availability function of the virtual machine provided by the super-fusion cluster improves the stability of monitoring service, the monitoring service virtual machine guarantees the normal operation of a monitoring related container in the virtual machine in a state machine mode, and the monitoring data is prevented from being lost based on the distributed storage function provided by the super-fusion cluster;
4. when the source of the monitoring data is required to be modified, only the configuration file corresponding to the corresponding monitoring data providing end is required to be modified, and decoupling of the cluster monitoring and the monitoring data providing end is realized;
5. various alarm sending and monitoring data inquiring modes can be configured as required, for example, inquiring results and alarm results are sent through a snmp protocol, mail, web page message and the like;
6. when the configuration file corresponding to the monitoring service changes, the latest configuration file can be applied in a hot loading mode, and the monitoring system does not need to be restarted to avoid interruption of monitoring data acquisition.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings.
FIG. 1 is a flow diagram of a highly available hyper-fusion cluster monitoring method according to an embodiment of the present application;
FIG. 2 illustrates a schematic diagram of an architecture of a virtual machine lifecycle management service, according to some embodiments of the present application;
FIG. 3 illustrates a method flow diagram for controlling a virtual machine and container states inside the virtual machine, according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a process for implementing a super-fusion cluster monitoring data query and/or monitoring alarm according to an embodiment of the present application;
fig. 5 provides a schematic structural diagram of a super-fusion cluster monitoring system according to an embodiment of the present application.
Detailed Description
The present application is described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present application, but are not intended to limit the present application in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the spirit of the present application. These are all within the scope of the present application.
The term "comprising" and variations thereof as used herein means open ended, i.e., "including but not limited to. The term "or" means "and/or" unless specifically stated otherwise. The term "based on" means "based at least in part on". The terms "one example embodiment" and "one embodiment" mean "at least one example embodiment. The term "another embodiment" means "at least one additional embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below.
The problems existing in the prior art are solved. In some embodiments provided herein, fig. 1 shows a flow diagram of a highly available super-fusion cluster monitoring method. As shown in fig. 1, the above-mentioned high-availability super-fusion cluster monitoring method may include:
step 100: and deploying the monitoring service in the form of a virtual machine, wherein the monitoring service is built in the virtual machine image. It can be understood that the monitoring deployment aims at monitoring the whole super-fusion cluster, by deploying the monitoring service in the form of a virtual machine, the deployment of the monitoring service can be realized by configuring the storage, calculation and network resources of the virtual machine without additionally occupying the resources of the super-fusion cluster, the isolation of the monitoring service and the super-fusion cluster is realized, and the actual situation of the current super-fusion cluster does not need to be considered when the monitoring service is deployed.
Specifically, all preset components required by the monitoring service are packaged into a virtual machine image file, uploading of the virtual machine image is completed based on the virtual machine function provided by the super fusion cluster through the virtual machine management function of the super fusion cluster, and after the virtual machine image is uploaded and started, the monitoring service of the whole super fusion cluster can be monitored to be deployed.
Step 200: and under the condition that the deployment of the monitoring service is completed, presetting a corresponding configuration file according to a monitoring strategy. It can be understood that the source of the monitoring data is determined according to the setting of the monitoring policy, the data corresponding to the nodes of the super-fusion cluster are the monitoring data, and the relevant configuration of the providing end of the monitoring data is saved according to the configuration file set by the monitoring policy, so that when the monitoring policy is changed, the providing end of the monitoring data is decoupled from the whole monitoring system, and the monitoring of the cluster is more flexible and easy to use.
Step 300: and monitoring each node in the super-fusion cluster according to the acquired configuration file, acquiring monitoring data corresponding to the monitoring strategy, and aggregating the monitoring data. It can be understood that the configuration file sets the acquisition mode, the acquisition configuration, the possible alarm rules and the like of the monitoring data, and defines the acquisition and aggregation modes and the like of the monitoring data according to the configuration file set by the monitoring strategy.
For example, when the aggregated data amount of the monitoring data reaches a certain threshold, whether the alarm service of the threshold can be triggered is judged according to the alarm rule of the configuration file.
Step 400: and establishing communication of the virtual machine super-fusion cluster, and providing corresponding data query requirements and/or alarm service requirements according to the configuration file and the monitoring data. It can be understood that the monitoring service virtual machine can realize that the related services such as the monitoring query instruction, the alarm sending instruction and the like on the host machine in the super-fusion cluster still depend on the related services in the super-fusion cluster, and the communication of the virtual machine super-fusion cluster needs to be established, the monitoring query instruction, the alarm sending instruction and the like are sent, and the corresponding data query requirement and/or the alarm service requirement are provided for the target user according to the configuration file and the monitoring data.
In the step 100, the embedding the monitoring service in the virtual machine image includes:
constructing a container mirror image of the monitoring service;
loading a container mirror image of the monitoring service under the condition that the installation of the virtual machine operating system is completed;
the monitoring service is built into the virtual machine image based on the container image.
It can be understood that the monitoring service is rapidly deployed based on the virtual machine image, the container image of the monitoring service is constructed, the monitoring service is built in the virtual machine image in the form of the container image, the virtual machine operating system is installed, the user only needs to upload the virtual machine image, load the monitoring service container image, package and compress the virtual machine image, and the monitoring service is rapidly deployed in the form of the virtual machine built in the virtual machine image.
In the step 100, deploying the monitoring service in the form of a virtual machine further includes: pre-configuring life cycle management services; in the case of a virtual machine start-up, a lifecycle management service is invoked to control the virtual machine and the container state inside the virtual machine through a periodic task. FIG. 2 illustrates a schematic diagram of an architecture of a virtual machine lifecycle management service, implementing a basic functional demonstration of a lifecycle, according to some embodiments of the present application. It will be appreciated that the specific implementation steps for controlling the virtual machine and the state of the container within the virtual machine by periodic tasks will be described in detail below.
FIG. 3 illustrates a method flow diagram for controlling a virtual machine and container states within the virtual machine, according to some embodiments of the present application. As shown in fig. 3, invoking the lifecycle management service to control the virtual machine and the container state inside the virtual machine through the periodic tasks may include:
step 001: acquiring a real-time current state of a virtual machine and an expected state of the virtual machine in a current checking period;
step 002: determining a virtual machine operation instruction of a current checking period based on the real-time state of the virtual machine and the expected state of the virtual machine, wherein the virtual machine operation instruction at least comprises the steps of creating a virtual machine or starting the virtual machine, starting a container or shutting down the virtual machine, and deleting the virtual machine or null value;
step 003: virtual machine instructions are executed to control the virtual machine and the state of the container within the virtual machine.
Further, in the above step 002, when the virtual machine operation instruction is null, step 001 is repeated to wait for the next current inspection cycle to redetermine the virtual machine operation instruction.
In some embodiments of the present application, the storing of virtual machine states in the database may include the following states:
INIT initializing, create in creation, STOPPED STOPPED, startING in startup
Delete, RUNNING, STOPENG, stop, etc.
It can be understood that, based on the life cycle management service, the real-time current state and the expected state of the virtual machine in the current checking period are obtained from the database storing the states of the virtual machines, the operation executed by the virtual machine in the current checking period can be set according to the change trend of the states, and the high availability of the monitoring service virtual machine is ensured by the high availability function of the virtual machine provided by the super-fusion cluster.
The monitoring service virtual machine guarantees normal operation of the relevant monitoring container in the virtual machine in a state machine mode.
For example, if the virtual machine does not exist, but the desired state in the database is RUNNING, then executing the creation of the virtual machine;
if the virtual machine is in the stored state and the expected state is RUNNING, executing a virtual machine operation instruction for starting the virtual machine and starting the container mirror image;
if the virtual machine is in the RUNNING state and the expected state is the delete, the virtual machine is turned off and DELETED;
if the virtual machine is in other states, the current checking period does not designate any virtual machine operation instruction, skips, enters the next current checking period, and repeatedly acquires the real-time current state of the virtual machine and the expected state of the virtual machine in the current checking period to obtain the corresponding virtual machine operation instruction, so that the monitoring service virtual machine is continuously in a high-availability state.
In some embodiments of the present application, the lifecycle management service may also implement abnormal state detection and recovery of the monitoring service, and specific implementation manners may be implemented by those skilled in the art according to the existing detection and recovery functions of the lifecycle management service, which are not limited herein.
In some embodiments of the present application, the lifecycle management service is responsible for monitoring discovery of services and monitoring agents or request instructions for data query requests to enable communication with the services within the superset cluster.
In some embodiments of the present application, establishing communication of the virtual machine super-fusion cluster may further include:
configuring an internal virtual network bridge on a node of the super fusion cluster, wherein the internal virtual network bridge presets a static IP;
the virtual machine deploying the monitoring service is communicated with other services of the super-converged cluster through static IP;
the virtual machine is configured with a network card correspondingly connected with a virtual network bridge of the host machine.
It can be understood that the monitoring service in the form of a virtual machine, that is, the monitoring service virtual machine only runs the service related to monitoring, and for the alarm service issuing instruction, the monitoring data query instruction and the like to be executed, the monitoring service virtual machine still depends on the related service in the super-converged cluster, so that the monitoring service virtual machine needs to communicate with other services in the cluster through a fixed static IP.
It can be understood that the super-converged cluster node is configured with an internal virtual bridge which is not associated with any physical network port, the internal virtual bridge is configured with a fixed static IP (for example, 169.254.169.254) and opens a dhcp service, another network card is configured for the monitoring service virtual machine, the internal virtual bridge is connected to a host machine, the virtual machine automatically acquires another IP of the same network segment as the static IP (for example, 169.254.169.254), and the virtual machine can access each relevant service on the host machine in the super-converged cluster through the other IP to realize the communication between the monitoring service virtual machine and the super-converged cluster.
In some embodiments of the present application, in the event of an update to a configuration file in a distributed database, the configuration file is updated in a hot-reload manner. It will be appreciated that the latest configuration file may be applied by means of hot loading when the configuration file changes, without restarting the monitoring system to avoid interruption of monitoring data collection.
In some embodiments of the present application, providing corresponding data query requirements and/or alarm service requirements based on the configuration file and the monitoring data includes: the results of the inquiry and/or alarm service are displayed through application layer protocols, an email system and webpage side information. It will be appreciated that various ways of sending alarms and querying monitoring data, such as by means of snmp protocol, mail, web page messages, etc., can be flexibly configured as desired.
Based on the description of the foregoing embodiments, the technical solution provided by the present invention can implement a monitoring method for a super-fusion cluster. Specific application examples are provided below to provide details of implementing data query and/or monitoring alarm functions.
As shown in fig. 4, a schematic diagram of a super-converged cluster monitoring data query and/or monitoring alarm implementation process is shown, according to some embodiments of the present application.
Taking a node A, a node B and a node C of the super-fusion cluster as an example, setting a data acquisition interface (exporter) and a configuration file in the node, and acquiring monitoring data from each node in the cluster according to the configuration file and aggregating the monitoring data according to a mode specified by the configuration file; monitoring whether the configuration files registered in the distributed database are updated or not to maintain the configuration files, and if so, realizing the latest configuration file to take effect in a mode that the hot reloading of the service is not interrupted; and evaluating whether an alarm reaching a trigger threshold exists currently according to the configuration file, providing monitoring data query modes of various modes for the user, and sending the alarm of the trigger state to the user from a plurality of channels.
In some embodiments of the present invention, fig. 5 provides a schematic structural diagram of a super-fusion cluster monitoring system according to some embodiments of the present application, which is applied to the super-fusion cluster monitoring method provided in the foregoing embodiments. As shown in fig. 5, the super-fusion cluster monitoring system may include:
the deployment module 1 is used for deploying the monitoring service in the form of a virtual machine, and the monitoring service is built in the virtual machine image;
the acquisition module 2 is used for presetting a corresponding configuration file according to a monitoring strategy under the condition that the monitoring service deployment is completed;
the monitoring module 3 monitors each node in the super-fusion cluster according to the acquired configuration file, acquires monitoring data corresponding to the monitoring strategy, and aggregates the monitoring data;
and the processing module 4 is used for establishing communication of the virtual machine super-fusion cluster and providing corresponding data query requirements and/or alarm service requirements according to the configuration file and the monitoring data.
It can be understood that the functions implemented by the model deployment modules 1 to 4 in the above-mentioned functional modules correspond to the operations executed in the steps 100 to 400 one by one, which are not described herein.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
It is to be appreciated that aspects of the present subject matter can be implemented as a system, method, or program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects that may be commonly referred to herein as a "circuit," module, "or" platform.
It will be appreciated by those skilled in the art that the elements or modules or steps of the invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device, or distributed over a network of computing devices, or they may alternatively be implemented in program code executable by computing devices, such that they are stored in a storage medium and, in some cases, executed by computing devices, in a different order than that shown or described, or they may be implemented as individual integrated circuit modules, or as individual integrated circuit modules.
Although this embodiment does not specifically recite other specific implementations, in some possible implementations, various aspects described in the technical solutions of the present invention may also be implemented in a form of a program product, which includes program code for causing a terminal device to execute steps according to the implementations of the embodiments of the technical solutions of the present invention described in the image stitching method area when the program product is run on the terminal device.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (8)

1. The super-fusion cluster monitoring method is characterized by comprising the following steps of:
deploying a monitoring service in the form of a virtual machine, wherein the monitoring service is built in a virtual machine image;
under the condition that the deployment of the monitoring service is completed, presetting a corresponding configuration file according to a monitoring strategy;
monitoring each node in the super-fusion cluster according to the obtained configuration file, obtaining monitoring data corresponding to the monitoring strategy, and aggregating the monitoring data;
establishing communication between the virtual machine and the super fusion cluster, and providing corresponding data query requirements and/or alarm service requirements according to the configuration file and the monitoring data;
wherein deploying the monitoring service in the form of a virtual machine further comprises:
pre-configuring life cycle management services;
under the condition that the virtual machine is started, invoking the life cycle management service to control the virtual machine and the container state inside the virtual machine through the periodic task;
wherein establishing communication between the virtual machine and the super fusion cluster comprises:
configuring an internal virtual network bridge on the node of the super fusion cluster, wherein the internal virtual network bridge presets a static IP;
the virtual machine deploying the monitoring service is communicated with the super fusion cluster through the static IP;
the virtual machine is configured with a network card correspondingly connected with an internal virtual network bridge of the host machine.
2. The method of claim 1, wherein the monitoring service is built in a virtual machine image, comprising:
constructing a container mirror image of the monitoring service;
loading the container mirror image of the monitoring service under the condition that the installation of the virtual machine operating system is completed;
and the monitoring service is built in the virtual machine image based on the container image.
3. The method of claim 1, wherein invoking the lifecycle management service to control the virtual machine and the container state within the virtual machine via the periodic tasks comprises:
acquiring a real-time state of a virtual machine and an expected state of the virtual machine in a current checking period;
determining a virtual machine operation instruction of a current check period based on the real-time state of the virtual machine and the expected state of the virtual machine, wherein the virtual machine operation instruction at least comprises creating a virtual machine, starting the virtual machine, starting a container, powering off the virtual machine, deleting the virtual machine and null values;
and executing the virtual machine operation instruction to control the virtual machine and the state of a container in the virtual machine.
4. A method of monitoring a super-fusion cluster according to claim 3, wherein in the case that the virtual machine operation instruction is null, waiting for the next current check cycle to be entered, and re-determining the virtual machine operation instruction.
5. The method for monitoring a super-fusion cluster according to claim 1, wherein monitoring each node in the super-fusion cluster according to the obtained configuration file, obtaining monitoring data corresponding to the monitoring policy, and aggregating the monitoring data comprises:
setting a data acquisition interface at a node of the super fusion cluster, wherein the data acquisition interface is used for acquiring the monitoring data;
registering the configuration file to a distributed database, wherein the configuration file at least comprises the data acquisition interface, data acquisition configuration and alarm rules;
and collecting the monitoring data corresponding to the monitoring strategy according to the data collection interface, and aggregating the monitoring data.
6. The method of claim 5, further comprising:
in the event that an update occurs to the configuration file in the distributed database, the configuration file is updated in a thermally reloaded manner.
7. The method of claim 1, wherein providing corresponding data query requirements and/or alarm service requirements based on the configuration file and the monitoring data comprises:
the results of the inquiry and/or alarm service are displayed through application layer protocols, an email system and webpage side information.
8. A hyper-fusion cluster monitoring system, characterized in that the system is applied in a hyper-fusion cluster monitoring method according to any of the claims 1 to 7, the system comprising:
the deployment module is used for deploying the monitoring service in the form of a virtual machine, and the monitoring service is built in the virtual machine image;
the acquisition module is used for presetting a corresponding configuration file according to a monitoring strategy under the condition that the deployment of the monitoring service is completed;
the monitoring module is used for monitoring each node in the super-fusion cluster according to the acquired configuration file, acquiring monitoring data corresponding to the monitoring strategy and aggregating the monitoring data;
the processing module is used for establishing communication between the virtual machine and the super fusion cluster and providing corresponding data query requirements and/or alarm service requirements according to the configuration file and the monitoring data;
wherein deploying the monitoring service in the form of a virtual machine further comprises:
pre-configuring life cycle management services; under the condition that the virtual machine is started, invoking the life cycle management service to control the virtual machine and the container state inside the virtual machine through the periodic task;
wherein establishing communication between the virtual machine and the super fusion cluster comprises:
configuring an internal virtual network bridge on the node of the super fusion cluster, wherein the internal virtual network bridge presets a static IP;
the virtual machine deploying the monitoring service is communicated with the super fusion cluster through the static IP;
the virtual machine is configured with a network card correspondingly connected with an internal virtual network bridge of the host machine.
CN202211331535.8A 2022-10-28 2022-10-28 Super-fusion cluster monitoring method and system Active CN115766715B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211331535.8A CN115766715B (en) 2022-10-28 2022-10-28 Super-fusion cluster monitoring method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211331535.8A CN115766715B (en) 2022-10-28 2022-10-28 Super-fusion cluster monitoring method and system

Publications (2)

Publication Number Publication Date
CN115766715A CN115766715A (en) 2023-03-07
CN115766715B true CN115766715B (en) 2024-01-30

Family

ID=85354770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211331535.8A Active CN115766715B (en) 2022-10-28 2022-10-28 Super-fusion cluster monitoring method and system

Country Status (1)

Country Link
CN (1) CN115766715B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117170985B (en) * 2023-11-02 2024-01-12 武汉大学 Distributed monitoring method and system for open geographic information network service

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103368785A (en) * 2012-04-09 2013-10-23 鸿富锦精密工业(深圳)有限公司 Server operation monitoring system and method
CN103957237A (en) * 2014-04-03 2014-07-30 华南理工大学 Architecture of elastic cloud
CN107122228A (en) * 2017-04-18 2017-09-01 北京华云网际科技有限公司 The dispositions method and device of the management platform of super emerging system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011138225A (en) * 2009-12-25 2011-07-14 Canon Inc Cluster system, information processing apparatus, control method, and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103368785A (en) * 2012-04-09 2013-10-23 鸿富锦精密工业(深圳)有限公司 Server operation monitoring system and method
CN103957237A (en) * 2014-04-03 2014-07-30 华南理工大学 Architecture of elastic cloud
CN107122228A (en) * 2017-04-18 2017-09-01 北京华云网际科技有限公司 The dispositions method and device of the management platform of super emerging system

Also Published As

Publication number Publication date
CN115766715A (en) 2023-03-07

Similar Documents

Publication Publication Date Title
JP7362650B2 (en) Task processing methods, devices and systems
US8375363B2 (en) Mechanism to change firmware in a high availability single processor system
US9348706B2 (en) Maintaining a cluster of virtual machines
CN110830283B (en) Fault detection method, device, equipment and system
US11706080B2 (en) Providing dynamic serviceability for software-defined data centers
CN109656742B (en) Node exception handling method and device and storage medium
US10430082B2 (en) Server management method and server for backup of a baseband management controller
WO2005057318A2 (en) Method and an apparatus for controlling executables running on blade servers
CN107197012B (en) Service publishing and monitoring system and method based on metadata management system
CN106657167B (en) Management server, server cluster, and management method
CN115766715B (en) Super-fusion cluster monitoring method and system
CN114416200A (en) System and method for monitoring, acquiring, configuring and dynamically managing and loading configuration of declarative cloud platform
CN104850416A (en) Upgrading system, method and device and cloud computing node
CN111679888A (en) Deployment method and device of agent container
CN112583630B (en) Device management method, device, system, device and storage medium
CN113515316A (en) Novel edge cloud operating system
CN111342986B (en) Distributed node management method and device, distributed system and storage medium
CN112612635B (en) Multi-level protection method for application program
WO2023125482A1 (en) Cluster management method and device, and computing system
CN115981670A (en) Container cluster service deployment method, device, server and storage medium
US20150244780A1 (en) System, method and computing apparatus to manage process in cloud infrastructure
CN116126457A (en) Container migration method and server cluster
JP2011018223A (en) System and method for communicating information
CN114978885A (en) Log management method and device, computer equipment and system
CN107783855B (en) Fault self-healing control device and method for virtual network element

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant