CN115766715A - High-availability super-fusion cluster monitoring method and system - Google Patents

High-availability super-fusion cluster monitoring method and system Download PDF

Info

Publication number
CN115766715A
CN115766715A CN202211331535.8A CN202211331535A CN115766715A CN 115766715 A CN115766715 A CN 115766715A CN 202211331535 A CN202211331535 A CN 202211331535A CN 115766715 A CN115766715 A CN 115766715A
Authority
CN
China
Prior art keywords
monitoring
virtual machine
super
cluster
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211331535.8A
Other languages
Chinese (zh)
Other versions
CN115766715B (en
Inventor
杜英杰
徐文豪
张凯
王弘毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SmartX Inc
Original Assignee
SmartX Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SmartX Inc filed Critical SmartX Inc
Priority to CN202211331535.8A priority Critical patent/CN115766715B/en
Publication of CN115766715A publication Critical patent/CN115766715A/en
Application granted granted Critical
Publication of CN115766715B publication Critical patent/CN115766715B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a high-availability ultra-fusion cluster monitoring method and system. The highly-available super-fusion cluster monitoring method specifically comprises the following steps: deploying monitoring service in a virtual machine form, wherein the monitoring service is built in a virtual machine mirror image, and presetting a corresponding configuration file according to a monitoring strategy; monitoring each node in the super-fusion cluster according to the acquired configuration file, acquiring monitoring data corresponding to a monitoring strategy, and aggregating the monitoring data; the method comprises the steps of establishing communication of a virtual machine super-fusion cluster, providing corresponding data query requirements and/or alarm service requirements according to configuration files and monitoring data, deploying monitoring services in a virtual machine mode, isolating the monitoring services from the super-fusion cluster, ensuring high availability of the monitoring services, continuously acquiring monitoring data, updating the configuration files in a hot-reloading mode, and further improving stability of cluster monitoring.

Description

High-availability super-fusion cluster monitoring method and system
Technical Field
The present application relates to the technical field of server clustering, and in particular, to a method and a system for monitoring a super-fusion cluster with high availability.
Background
The monitoring and/or alarming function in the super-fusion cluster is that the cluster state data can be collected, and monitoring data is provided for inquiry and/or alarm sending. The super-fusion cluster monitoring needs to be realized by the following functions: collecting all monitoring indexes (including resource utilization rate, service running state, cluster performance indexes and the like) of all nodes in the cluster, inquiring externally provided monitoring data, triggering alarm and the like according to preset alarm rules, and sending the alarm and the like to corresponding target users in various forms.
The existing super-convergence cluster monitoring system mainly has the following problems in terms of function realization:
firstly, when the monitoring service is directly deployed on the super-fusion cluster node, the monitoring service cannot be isolated from the super-fusion cluster software, so that the stability of the super-fusion software is influenced when resources such as a CPU (central processing unit), a memory, a disk, network bandwidth and the like are in shortage; storage and computing resources required by cluster monitoring cannot be configured as required, if monitoring services are deployed on all nodes on a cluster, resource redundancy is caused, and the consistency of monitoring data on multiple nodes cannot be ensured.
Secondly, when the monitoring service is deployed directly on the super-fusion cluster node, if a serious fault which causes data loss occurs on the node, the monitoring data of the whole cluster is lost, and because the cluster monitoring service runs on a single node, when the node is abnormal and the whole node is unavailable, the monitoring service is unavailable, so that the monitoring function of the super-fusion cluster is unstable.
Thirdly, when the monitoring data is directly obtained from the nodes of the cluster, when the source of the monitoring data needs to be changed or the providing end of the monitoring data changes, the corresponding monitoring service also needs to be changed and adapted, the coupling degree of different components is higher,
fourth, in practical applications, users often have a variety of monitoring data query and/or alarm receiving requirements, but current implementations often query and receive alarms only in a fixed manner, and data query and/or alarm of cluster monitoring is not flexible enough.
Disclosure of Invention
The method and the device aim to solve the problems that in the prior art, monitoring services are directly deployed on cluster nodes, so that storage and computing resources required by the monitoring services cannot be configured as required, resource waste is caused, cluster monitoring is unstable, query and/or alarm services of monitoring data are not flexible enough, and the like. The application provides a high-availability super-fusion cluster monitoring method and system. Specifically, a first aspect of the present application provides a highly available hyper-converged cluster monitoring method, which includes the following steps:
deploying monitoring service in a virtual machine form, wherein the monitoring service is arranged in a virtual machine mirror image;
presetting a corresponding configuration file according to a monitoring strategy under the condition that the monitoring service deployment is finished;
monitoring each node in the super-fusion cluster according to the acquired configuration file, acquiring monitoring data corresponding to a monitoring strategy, and aggregating the monitoring data;
and establishing communication of the virtual machine super-convergence cluster, and providing corresponding data query requirements and/or alarm service requirements according to the configuration file and the monitoring data.
In a possible implementation of the first aspect, the embedding of the monitoring service in the virtual machine image includes:
constructing a container mirror image of the monitoring service;
loading a container mirror image of a monitoring service under the condition that the installation of the virtual machine operating system is completed;
the monitoring service is built into the virtual machine image based on the container image.
In a possible implementation of the first aspect, deploying the monitoring service in the form of a virtual machine further includes:
pre-configuring a lifecycle management service;
when a virtual machine is started, a lifecycle management service is invoked to control the virtual machine and the container state inside the virtual machine by a cycle task.
In one possible implementation of the first aspect, invoking the lifecycle management service to control the virtual machine and the container state inside the virtual machine through the periodic task includes:
acquiring a real-time current state of the virtual machine and an expected state of the virtual machine in a current inspection period;
determining a virtual machine operation instruction of a current check period based on a real-time state of a virtual machine and an expected state of the virtual machine, wherein the virtual machine operation instruction at least comprises the steps of creating the virtual machine or starting the virtual machine, starting a container or shutting down the virtual machine, and deleting the virtual machine or a null value;
the virtual machine instructions are executed to control the virtual machine and the container state within the virtual machine.
In a possible implementation of the first aspect, in a case that the virtual machine operation instruction is null, waiting for entering a next current check period, and re-determining the virtual machine operation instruction.
In a possible implementation of the first aspect, after presetting a corresponding configuration file according to a monitoring policy, establishing communication between a monitoring service and a virtual machine, further includes:
configuring an internal virtual bridge on a node of the super-convergence cluster, wherein the internal virtual bridge is preset with a static IP;
realizing the communication between the virtual machine for deploying the monitoring service and the related service of the super-convergence cluster through the static IP;
and the virtual machine is provided with a network card correspondingly connected with the internal virtual network bridge of the host machine.
In a possible implementation of the first aspect, monitoring each node in the super-fusion cluster according to the obtained configuration file, obtaining monitoring data corresponding to the monitoring policy, and aggregating the monitoring data includes:
setting a data acquisition interface at a node of the hyper-converged cluster, wherein the data acquisition interface is used for acquiring monitoring data;
registering a configuration file to a distributed database, wherein the configuration file at least comprises a data acquisition interface, data acquisition configuration and an alarm rule;
and acquiring monitoring data corresponding to the monitoring strategy according to the acquisition interface, and aggregating the monitoring data.
In a possible implementation of the first aspect, the monitoring method further includes:
and in the case of updating the configuration file in the distributed database, updating the configuration file in a hot reloading mode.
In a possible implementation of the first aspect, providing a corresponding data query requirement and/or an alarm service requirement according to the configuration file and the monitoring data includes: and displaying the result of the query and/or alarm service through an application layer protocol, an electronic mail system and a webpage end message.
A second aspect of the present application provides a highly available super-converged cluster monitoring system, which is applied to the highly available super-converged cluster monitoring method, and the system includes:
the deployment module is used for deploying monitoring service in a virtual machine form, and the monitoring service is arranged in a virtual machine mirror image;
the acquisition module is used for presetting a corresponding configuration file according to a monitoring strategy under the condition that the monitoring service deployment is finished;
the monitoring module is used for monitoring each node in the super-fusion cluster according to the acquired configuration file, acquiring monitoring data corresponding to the monitoring strategy and aggregating the monitoring data;
and the processing module is used for establishing communication of the virtual machine super-fusion cluster and providing corresponding data query requirements and/or alarm service requirements according to the configuration file and the monitoring data.
Through the technical scheme that this application provided, possess following beneficial technological effect at least:
1. monitoring service is quickly deployed in a virtual machine mirror image mode, and a user can deploy the monitoring service by one key only by uploading a virtual machine mirror image file; the isolation of resources such as service calculation, storage, network and the like with other systems of the super-convergence cluster is realized in the form of an independent monitoring service virtual machine; when the virtual machine independently deploying the monitoring service runs, the condition that the monitoring service runs in the cluster does not need to be considered, and when the monitoring service is abnormal and needs to be arranged, the change records of role nodes in the cluster in the previous time period do not need to be considered, so that the operation and maintenance difficulty is reduced;
2. the monitoring service stability is improved through the high available function of the virtual machine provided by the super fusion cluster, the monitoring service virtual machine guarantees the normal operation of monitoring related containers in the virtual machine in the form of a state machine, and the monitoring data loss is avoided based on the distributed storage function provided by the super fusion cluster;
4. when the monitoring data source needs to be modified, only the configuration file corresponding to the corresponding monitoring data providing end needs to be modified, so that the decoupling of the cluster monitoring and the monitoring data providing end is realized;
5. various alarm sending and monitoring data query modes can be configured as required, for example, query results and alarm results are sent in a snmp protocol, mails, web page messages and other modes;
6. when the configuration file corresponding to the monitoring service changes, the latest configuration file can be applied in a hot loading mode, and the monitoring system does not need to be restarted to avoid monitoring data acquisition interruption.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart illustrating a highly available hyper-converged cluster monitoring method according to an embodiment of the present application;
FIG. 2 illustrates a schematic diagram of an architecture of a virtual machine lifecycle management service, according to some embodiments of the present application;
FIG. 3 illustrates a flow diagram of a method of controlling virtual machines and container states within the virtual machines, according to an embodiment of the present application;
FIG. 4 is a diagram illustrating a super-converged cluster monitoring data query and/or monitoring alarm implementation process according to an embodiment of the application;
fig. 5 provides a schematic structural diagram of a super-converged cluster monitoring system according to an embodiment of the present application.
Detailed Description
The present application will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present application, but are not intended to limit the present application in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the application. All falling within the scope of protection of the present application.
The term "include" and variations thereof as used herein is meant to be inclusive in an open-ended manner, i.e., "including but not limited to". Unless specifically stated otherwise, the term "or" means "and/or". The term "based on" means "based at least regionally. The terms "one example embodiment" and "one embodiment" mean "at least one example embodiment". The term "another embodiment" means "at least one additional embodiment". The terms "first," "second," and the like may refer to different or the same object. Other explicit and implicit definitions are also possible below.
The problems existing in the prior art are solved. In some embodiments provided by the present application, fig. 1 illustrates a flowchart of a highly available hyper-converged cluster monitoring method. As shown in fig. 1, the highly available super-fusion cluster monitoring method may include:
step 100: and deploying monitoring service in a virtual machine form, wherein the monitoring service is arranged in a virtual machine mirror image. It can be understood that the deployment of the monitoring aims at monitoring the whole super-converged cluster, and by deploying the monitoring service in the form of a virtual machine, the deployment of the monitoring service can be realized by configuring storage, calculation and network resources of the virtual machine without additionally occupying resources of the super-converged cluster, so that the monitoring service and the super-converged cluster are isolated, and the actual condition of the current super-converged cluster does not need to be considered when the monitoring service is deployed.
Specifically, all preset components required by the monitoring service are packaged into a virtual machine image file, the uploading of the virtual machine image is completed based on the virtual machine function provided by the super-fusion cluster through the virtual machine management function of the super-fusion cluster, and the monitoring service of the whole super-fusion cluster can be monitored and deployed immediately after the virtual machine image is uploaded and started.
Step 200: and presetting a corresponding configuration file according to a monitoring strategy under the condition that the monitoring service deployment is finished. It can be understood that the source of the monitoring data is determined according to the setting of the monitoring policy, the data corresponding to the nodes of the hyper-converged cluster are the monitoring data, and the relevant configuration of the providing end of the monitoring data is saved through the configuration file set according to the monitoring policy, so that when the monitoring policy is changed, the providing end of the monitoring data is decoupled from the whole monitoring system, and the monitoring of the cluster is more flexible and easy to use.
Step 300: and monitoring each node in the super-fusion cluster according to the acquired configuration file, acquiring monitoring data corresponding to the monitoring strategy, and aggregating the monitoring data. It can be understood that the configuration file sets an acquisition mode, an acquisition configuration, a possible alarm rule, and the like of the monitoring data, and defines an acquisition and aggregation mode and the like of the monitoring data according to the configuration file set by the monitoring policy.
For example, when the aggregated data volume of the monitoring data reaches a certain threshold, whether the alarm service of the threshold can be triggered is judged according to the alarm rule of the configuration file.
Step 400: and establishing communication of the virtual machine super-converged cluster, and providing corresponding data query requirements and/or alarm service requirements according to the configuration file and the monitoring data. It can be understood that the monitoring service virtual machine can realize that the relevant services such as the monitoring query instruction, the alarm sending instruction and the like on the host machine in the super-fusion cluster still depend on the relevant services in the super-fusion cluster, the communication of the virtual machine super-fusion cluster needs to be established, the monitoring query instruction, the alarm sending instruction and the like are sent, and the corresponding data query requirement and/or the alarm service requirement are provided for the target user according to the configuration file and the monitoring data.
In step 100, the step of embedding the monitoring service in the virtual machine image includes:
constructing a container mirror image of the monitoring service;
loading a container mirror image of a monitoring service under the condition that the installation of the virtual machine operating system is completed;
the monitoring service is built in the virtual machine image based on the container image.
It can be understood that the monitoring service is rapidly deployed based on the virtual machine image, a container image of the monitoring service is constructed, the monitoring service is built in the virtual machine image in the form of the container image, the virtual machine operating system is installed, a user only needs to upload the virtual machine image, load the monitoring service container image, pack and compress the virtual machine image, and the monitoring service is rapidly deployed in the form of a virtual machine in the virtual machine image.
In step 100, deploying the monitoring service in the form of a virtual machine further includes: pre-configuring a lifecycle management service; when a virtual machine is started, a lifecycle management service is invoked to control the virtual machine and the container state inside the virtual machine by a cycle task. Fig. 2 illustrates a schematic diagram of an architecture of a virtual machine lifecycle management service, implementing a basic functional demonstration of a lifecycle, according to some embodiments of the present application. It is understood that the detailed description will be given below about the specific implementation steps for controlling the virtual machine and the container state inside the virtual machine by the periodic task.
FIG. 3 illustrates a flow diagram of a method of controlling virtual machines and container states within virtual machines, according to some embodiments of the present application. As shown in fig. 3, invoking the lifecycle management service to control the virtual machine and the container state inside the virtual machine through the periodic tasks may include:
step 001: acquiring a real-time current state of the virtual machine and an expected state of the virtual machine in a current inspection period;
step 002: determining a virtual machine operation instruction of a current check period based on a real-time state of a virtual machine and an expected state of the virtual machine, wherein the virtual machine operation instruction at least comprises the steps of creating the virtual machine or starting the virtual machine, starting a container or shutting down the virtual machine, and deleting the virtual machine or a null value;
step 003: the virtual machine instructions are executed to control the virtual machine and the container state within the virtual machine.
Further, in step 002, when the virtual machine operation command is null, step 001 is repeated, that is, the next current check period is waited to enter, and the virtual machine operation command is re-determined.
In some embodiments of the present application, the storage of the virtual machine state in the database may include the following states:
INIT initialization, CREATING CREATING, STOPPED STOPPED, STARTING STARTING
DELETING, RUNNING, STOPPING, etc.
It can be understood that the real-time current state and the expected state of the virtual machine in the current check period are acquired from the database for storing the states of the virtual machines based on the life cycle management service, the operation executed by the virtual machine in the current check period can be set according to the change trend of the states, and the high availability of the monitoring service virtual machine is ensured through the high availability function of the virtual machine provided by the hyper-fusion cluster.
The monitoring service virtual machine guarantees normal operation of monitoring related containers in the virtual machine in a state machine mode.
For example, if the virtual machine does not exist, but the expected state in the database is RUNNING, then the creation of the virtual machine is executed;
if the virtual machine is in the STOPPED state and the expected state is RUNNING, executing a virtual machine operation instruction for starting the virtual machine and starting the container mirror image;
if the virtual machine is in the RUNNING state and the expected state is DELETED, shutting down the virtual machine and deleting the virtual machine;
and if the virtual machine is in other states, the current check period does not designate any virtual machine operation instruction, skipping, entering the next current check period, repeatedly acquiring the real-time current state and the expected state of the virtual machine in the current check period, and obtaining the corresponding virtual machine operation instruction, so that the monitoring service virtual machine is continuously in a high-availability state.
In some embodiments of the present application, the lifecycle management service may also implement abnormal state detection and recovery of the monitoring service, and a specific implementation manner may be implemented by a person skilled in the art according to the detection and recovery function of the existing lifecycle management service, which is not limited herein.
In some embodiments of the present application, the lifecycle management service is responsible for monitoring discovery of services and monitoring agents or request instructions for data query requests to enable communication with each service within the super-converged cluster.
In some embodiments of the present application, establishing communication of the virtual machine hyper-converged cluster may further include:
configuring an internal virtual bridge on a node of the super-convergence cluster, wherein the internal virtual bridge is preset with a static IP;
the communication between the virtual machine for deploying the monitoring service and other services of the super-convergence cluster is realized through the static IP;
and the virtual machine is provided with a network card correspondingly connected with the virtual network bridge of the host machine.
It can be understood that the monitoring service in the form of a virtual machine, i.e. the monitoring service virtual machine, only runs monitoring-related services, and still depends on related services in the super-converged cluster for the alarm service issuing instruction, the monitoring data query instruction, and the like, which need to be executed, so that the monitoring service virtual machine needs to communicate with other services in the cluster through a fixed static IP.
It can be understood that, an internal virtual bridge not associated with any physical port is configured at the super-convergence cluster node, the internal virtual bridge configures a fixed static IP (e.g. 169.254.169.254) and starts the dhcp service, configures another network card for the monitoring service virtual machine, and connects to the internal virtual bridge of the host, the virtual machine will automatically acquire another IP in the same network segment as the static IP (e.g. 169.254.169.254), and the virtual machine can access each relevant service on the host in the super-convergence cluster through another access IP, thereby realizing the communication between the monitoring service virtual machine and the super-convergence cluster.
In some embodiments of the present application, in the case of an update of a configuration file in a distributed database, the configuration file is updated in a hot reload manner. It can be understood that the latest configuration file can be applied by means of hot loading when the configuration file changes, and the monitoring system does not need to be restarted to avoid interruption of monitoring data acquisition.
In some embodiments of the present application, providing a corresponding data query requirement and/or an alarm service requirement according to the configuration file and the monitoring data includes: and displaying the result of the query and/or alarm service through an application layer protocol, an electronic mail system and a webpage end message. It will be appreciated that a variety of ways of sending alarms and querying monitoring data may be flexibly configured as desired, such as by way of snmp protocol, mail, web page messages, and the like.
Based on the description of the foregoing embodiment, the technical solution provided by the present invention can implement the monitoring method for the super-fusion cluster. The following provides a detailed description of a specific application example for implementing data query and/or monitoring alarm functions.
FIG. 4 is a diagram illustrating a super-converged cluster monitoring data query and/or monitoring alarm implementation process, according to some embodiments of the present application.
Taking a node A, a node B and a node C of the super-fusion cluster as an example, a data acquisition interface (exporter) and a configuration file are arranged in the nodes, and are responsible for acquiring monitoring data from each node in the cluster according to the configuration file and aggregating the monitoring data according to a mode specified by the configuration file; monitoring whether the configuration files registered in the distributed database are updated or not to maintain the configuration files, and if the configuration files are updated, realizing the effect of the latest configuration files in a mode of not interrupting the heat overload of the service; and evaluating whether the alarm reaching the trigger threshold exists currently according to the configuration file, providing a monitoring data query mode with multiple modes for the user, and sending the alarm of the trigger state to the user from multiple channels.
In some embodiments of the present invention, fig. 5 provides a schematic structural diagram of a super-converged cluster monitoring system according to some embodiments of the present application, and is applied to the super-converged cluster monitoring method provided in the foregoing embodiments. Specifically, as shown in fig. 5, the super-converged cluster monitoring system may include:
the deployment module 1 is used for deploying monitoring services in a virtual machine form, and the monitoring services are arranged in a virtual machine mirror image;
the acquisition module 2 is used for presetting a corresponding configuration file according to a monitoring strategy under the condition that the monitoring service deployment is completed;
the monitoring module 3 is used for monitoring each node in the super-fusion cluster according to the acquired configuration file, acquiring monitoring data corresponding to the monitoring strategy and aggregating the monitoring data;
and the processing module 4 is used for establishing communication of the virtual machine super-fusion cluster and providing corresponding data query requirements and/or alarm service requirements according to the configuration file and the monitoring data.
It can be understood that, among the above functional modules, the functions implemented by the model deployment module 1 to the processing module 4 correspond to the operations executed in the foregoing steps 100 to 400 one by one, and are not described herein again.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
It should be understood that aspects of the present technology may be implemented as a system, method or program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," module "or" platform.
It will be understood by those skilled in the art that the above-described units or modules or steps of the present invention may be implemented by a general purpose computing device, they may be centralized in a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage medium and executed by a computing apparatus, and in some cases, the steps shown or described may be executed in an order different from that described herein, or they may be separately fabricated as individual integrated circuit modules, or multiple modules or steps thereof may be fabricated as a single integrated circuit module.
Although this embodiment does not exhaustively list other specific embodiments, in some possible embodiments, the aspects described in the technical solution of the present invention can also be implemented in the form of a program product, which includes program code for causing a terminal device to perform the steps according to the embodiments described in the various embodiments of the technical solution of the present invention in the area of the image stitching method in the technical solution of the present invention when the program product is run on the terminal device.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the market, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A high-availability hyper-converged cluster monitoring method is characterized by comprising the following steps:
deploying monitoring service in a virtual machine form, wherein the monitoring service is arranged in a virtual machine mirror image;
presetting a corresponding configuration file according to a monitoring strategy under the condition that the monitoring service deployment is finished;
monitoring each node in the super-fusion cluster according to the acquired configuration file, acquiring monitoring data corresponding to the monitoring strategy, and aggregating the monitoring data;
and establishing communication between the virtual machine and the super-convergence cluster, and providing corresponding data query requirements and/or alarm service requirements according to the configuration file and the monitoring data.
2. The method for monitoring the super-converged cluster with high availability according to claim 1, wherein the embedding of the monitoring service into a virtual machine image comprises:
constructing a container mirror image of the monitoring service;
loading the container mirror image of the monitoring service when the installation of the virtual machine operating system is completed;
and the monitoring service is built in the virtual machine image based on the container image.
3. The method for monitoring the super-converged cluster with high availability according to claim 1, wherein deploying the monitoring service in the form of a virtual machine further comprises:
pre-configuring a lifecycle management service;
and when the virtual machine is started, calling the life cycle management service to control the virtual machine and the container state in the virtual machine through the cycle task.
4. The method for monitoring the super-converged cluster with high availability according to claim 3, wherein the invoking of the lifecycle management service to control the virtual machine and the container state inside the virtual machine through the periodic task comprises:
acquiring a real-time current state of the virtual machine and an expected state of the virtual machine in a current inspection period;
determining a virtual machine operation instruction of a current check period based on the real-time state of the virtual machine and the expected state of the virtual machine, wherein the virtual machine operation instruction at least comprises the steps of creating the virtual machine or starting the virtual machine, starting a container or shutting down the virtual machine, and deleting the virtual machine or a null value;
and executing the virtual machine instruction to control the virtual machine and the container state in the virtual machine.
5. The method for monitoring the super-converged cluster with high availability according to claim 4, wherein in a case that the virtual machine operation command is null, waiting for entering a next current check period, and re-determining the virtual machine operation command.
6. The method for monitoring the super-converged cluster with high availability according to claim 1, wherein establishing communication between the virtual machine and the super-converged cluster comprises:
configuring an internal virtual bridge on the node of the super-converged cluster, wherein the internal virtual bridge is preset with a static IP;
the virtual machine for deploying the monitoring service is communicated with the super-convergence cluster through the static IP;
and the virtual machine is provided with a network card correspondingly connected with the internal virtual network bridge of the host machine.
7. The method for monitoring the super-converged cluster with high availability according to claim 1, wherein the monitoring each node in the super-converged cluster according to the obtained configuration file, obtaining monitoring data corresponding to the monitoring policy, and aggregating the monitoring data comprises:
setting a data acquisition interface at a node of the super-fusion cluster, wherein the data acquisition interface is used for acquiring the monitoring data;
registering the configuration file to a distributed database, wherein the configuration file at least comprises the data acquisition interface, data acquisition configuration and alarm rules;
and collecting the monitoring data corresponding to the monitoring strategy according to the acquisition interface, and aggregating the monitoring data.
8. The method for monitoring the super-converged cluster with high availability according to claim 1, further comprising:
updating the configuration file in a hot reloading manner in case of an update of the configuration file in the distributed database.
9. The method for monitoring the super-converged cluster with high availability according to claim 1, wherein providing the corresponding data query requirement and/or the alarm service requirement according to the configuration file and the monitoring data comprises:
and displaying the result of the query and/or alarm service through an application layer protocol, an electronic mail system and a webpage end message.
10. A high availability super-converged cluster monitoring system, which is applied to the high availability super-converged cluster monitoring method according to any one of claims 1 to 9, and comprises:
the deployment module is used for deploying monitoring services in a virtual machine form, and the monitoring services are arranged in a virtual machine mirror image;
the acquisition module is used for presetting a corresponding configuration file according to a monitoring strategy under the condition that the monitoring service deployment is finished;
the monitoring module is used for monitoring each node in the super-fusion cluster according to the acquired configuration file, acquiring monitoring data corresponding to the monitoring strategy and aggregating the monitoring data;
and the processing module is used for establishing communication of the super-convergence cluster of the virtual machine and providing corresponding data query requirements and/or alarm service requirements according to the configuration file and the monitoring data.
CN202211331535.8A 2022-10-28 2022-10-28 Super-fusion cluster monitoring method and system Active CN115766715B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211331535.8A CN115766715B (en) 2022-10-28 2022-10-28 Super-fusion cluster monitoring method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211331535.8A CN115766715B (en) 2022-10-28 2022-10-28 Super-fusion cluster monitoring method and system

Publications (2)

Publication Number Publication Date
CN115766715A true CN115766715A (en) 2023-03-07
CN115766715B CN115766715B (en) 2024-01-30

Family

ID=85354770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211331535.8A Active CN115766715B (en) 2022-10-28 2022-10-28 Super-fusion cluster monitoring method and system

Country Status (1)

Country Link
CN (1) CN115766715B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117170985A (en) * 2023-11-02 2023-12-05 武汉大学 Distributed monitoring method and system for open geographic information network service

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110161724A1 (en) * 2009-12-25 2011-06-30 Canon Kabushiki Kaisha Data management apparatus, monitoring apparatus, replica apparatus, cluster system, control method and computer-readable medium
CN103368785A (en) * 2012-04-09 2013-10-23 鸿富锦精密工业(深圳)有限公司 Server operation monitoring system and method
CN103957237A (en) * 2014-04-03 2014-07-30 华南理工大学 Architecture of elastic cloud
CN107122228A (en) * 2017-04-18 2017-09-01 北京华云网际科技有限公司 The dispositions method and device of the management platform of super emerging system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110161724A1 (en) * 2009-12-25 2011-06-30 Canon Kabushiki Kaisha Data management apparatus, monitoring apparatus, replica apparatus, cluster system, control method and computer-readable medium
CN103368785A (en) * 2012-04-09 2013-10-23 鸿富锦精密工业(深圳)有限公司 Server operation monitoring system and method
CN103957237A (en) * 2014-04-03 2014-07-30 华南理工大学 Architecture of elastic cloud
CN107122228A (en) * 2017-04-18 2017-09-01 北京华云网际科技有限公司 The dispositions method and device of the management platform of super emerging system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117170985A (en) * 2023-11-02 2023-12-05 武汉大学 Distributed monitoring method and system for open geographic information network service
CN117170985B (en) * 2023-11-02 2024-01-12 武汉大学 Distributed monitoring method and system for open geographic information network service

Also Published As

Publication number Publication date
CN115766715B (en) 2024-01-30

Similar Documents

Publication Publication Date Title
JP7362650B2 (en) Task processing methods, devices and systems
CN111277460B (en) ZooKeeper containerization control method and device, storage medium and electronic equipment
CN110830283B (en) Fault detection method, device, equipment and system
CN104486108A (en) Node configuration method base on Zookeeper and node configuration system based on Zookeeper
CN115248826B (en) Method and system for large-scale distributed graph database cluster operation and maintenance management
CN111090495A (en) Node management method, device, equipment, storage medium and system
EP4030690A1 (en) Device management method, apparatus, and system
CN115766715B (en) Super-fusion cluster monitoring method and system
CN112583630B (en) Device management method, device, system, device and storage medium
CN114629883B (en) Service request processing method and device, electronic equipment and storage medium
CN117130730A (en) Metadata management method for federal Kubernetes cluster
CN112087516A (en) Storage upgrading method and device based on Docker virtualization technology
CN112199240A (en) Method for switching nodes during node failure and related equipment
CN114185734A (en) Cluster monitoring method and device and electronic equipment
CN112231123A (en) Message processing method, message processing device, storage medium and electronic device
CN113765690A (en) Cluster switching method, system, device, terminal, server and storage medium
CN109510730A (en) Distributed system and its monitoring method, device, electronic equipment and storage medium
US9973569B2 (en) System, method and computing apparatus to manage process in cloud infrastructure
CN115981670A (en) Container cluster service deployment method, device, server and storage medium
CN111309515A (en) Disaster recovery control method, device and system
CN110650059B (en) Fault cluster detection method, device, computer equipment and storage medium
CN113157493A (en) Backup method, device and system based on ticket checking system and computer equipment
CN112714035A (en) Monitoring method and system
CN115905271B (en) Virus library updating method and device and multi-engine detection system
CN116820686B (en) Physical machine deployment method, virtual machine and container unified monitoring method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant