CN116302896A

CN116302896A - Cluster performance monitoring method, device, apparatus, storage medium and program product

Info

Publication number: CN116302896A
Application number: CN202310539636.2A
Authority: CN
Inventors: 尤明超; 何宏烨; 刘轶伦; 何嘉珉
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-05-12
Filing date: 2023-05-12
Publication date: 2023-06-23

Abstract

The application relates to a cluster performance monitoring method, device, equipment, storage medium and program product, and relates to the technical field of big data. The method comprises the following steps: acquiring attribute information of each cluster and cluster nodes in each cluster, and historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period; according to historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period, determining first performance abnormal data of each cluster and cluster nodes in each cluster; determining second performance abnormality data of each cluster and cluster nodes in each cluster according to attribute information of each cluster and cluster nodes in each cluster; determining each cluster and a performance abnormality detection result of cluster nodes in each cluster according to the first performance abnormality data and the second performance abnormality data; the performance abnormality detection result is used for regulating and controlling each cluster and cluster nodes in each cluster to achieve load balancing. By adopting the method, the load balance among the cluster nodes can be ensured.

Description

Cluster performance monitoring method, device, apparatus, storage medium and program product

Technical Field

The present invention relates to the field of big data technologies, and in particular, to a method, an apparatus, a device, a storage medium, and a program product for monitoring cluster performance.

Background

With the rapid development of the field of computer technology, computers are becoming indispensable devices in various industries. The IT (Information Technology ) industry typically employs distributed high availability clusters to support continuous normal operation of computers, thereby achieving the goal of reducing service interruption time.

At present, when a distributed high-availability cluster is built in the IT industry, in order to enable all nodes among the clusters to automatically switch and take over services, the continuity of the services is further guaranteed, and the building is usually completed without incurring costs. Therefore, when the cluster constructed in the construction method is put into operation, there may be a problem that the load of each cluster or each cluster node is unbalanced. There is a need for a solution.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a cluster performance monitoring method, apparatus, device, storage medium, and program product that can ensure load balancing among cluster nodes.

In a first aspect, the present application provides a cluster performance monitoring method. The method comprises the following steps:

Acquiring attribute information of each cluster and cluster nodes in each cluster, and historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period;

according to historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period, determining first performance abnormal data of each cluster and cluster nodes in each cluster;

determining second performance abnormality data of each cluster and cluster nodes in each cluster according to attribute information of each cluster and cluster nodes in each cluster;

determining each cluster and a performance abnormality detection result of cluster nodes in each cluster according to the first performance abnormality data and the second performance abnormality data; the performance abnormality detection result is used for regulating and controlling each cluster and cluster nodes in each cluster to achieve load balancing.

In one embodiment, determining the performance anomaly detection result of each cluster and the cluster nodes in each cluster according to the first performance anomaly data and the second performance anomaly data includes:

screening candidate performance anomaly data falling in the second performance anomaly data from the first performance anomaly data;

and determining the performance abnormality detection results of each cluster and cluster nodes in each cluster according to the candidate performance abnormality data.

In one embodiment, determining the performance anomaly detection result of each cluster and the cluster nodes in each cluster according to the candidate performance anomaly data includes:

judging whether the duration and the abnormal severity of the candidate performance abnormal data exceed preset requirements;

if yes, determining that the performance abnormality detection results of all clusters and cluster nodes in all clusters are clusters and/or cluster node abnormalities corresponding to the candidate performance abnormality data.

In one embodiment, determining first performance anomaly data of each cluster and cluster nodes in each cluster according to historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period includes:

processing historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period through an operation trend prediction model to obtain initial performance abnormal data of each cluster and cluster nodes in each cluster;

and screening the first performance anomaly data from the initial performance anomaly data according to the anomaly severity of the initial performance anomaly data through a data mining model.

In one embodiment, the method further includes processing, by running a trend prediction model, historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period, and the method further includes:

Performing time sequence pretreatment on historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period to obtain the treated historical performance indexes;

and processing the processed historical performance indexes through an operation trend prediction model.

In one embodiment, obtaining attribute information of each cluster and cluster nodes in each cluster, and historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period includes:

acquiring attribute information of each cluster and cluster nodes in each cluster through monitoring tools deployed on each cluster;

and acquiring historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period through a coordination tool deployed on each cluster.

In one embodiment, the obtaining, by a coordination tool deployed on each cluster, historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period includes:

acquiring hardware resource performance data of virtual machines installed by cluster nodes in each cluster through a coordination tool deployed on each cluster, and taking the hardware resource performance data as historical performance indexes of each cluster in a preset time period;

transaction performance data recorded in hosts to which the cluster nodes in the clusters belong are obtained through coordination tools deployed on the clusters to serve as historical performance indexes of the cluster nodes in the clusters in a preset time period.

In a second aspect, the present application further provides a cluster performance monitoring apparatus. The device comprises:

the data acquisition module is used for acquiring attribute information of each cluster and cluster nodes in each cluster and historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period;

the first determining module is used for determining first performance abnormality data of each cluster and cluster nodes in each cluster according to historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period;

the second determining module is used for determining second performance abnormal data of each cluster and cluster nodes in each cluster according to the attribute information of each cluster and the cluster nodes in each cluster;

the detection result determining module is used for determining the performance abnormality detection results of each cluster and cluster nodes in each cluster according to the first performance abnormality data and the second performance abnormality data; the performance abnormality detection result is used for regulating and controlling each cluster and cluster nodes in each cluster to achieve load balancing.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a fifth aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, performs the steps of:

The cluster performance monitoring method, the device, the equipment, the storage medium and the program product firstly acquire attribute information of each cluster and cluster nodes in each cluster and historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period, then respectively determine first performance abnormal data and second performance abnormal data for the clusters and the cluster nodes in the clusters according to the corresponding attribute information and the historical performance indexes, and then combine the second performance abnormal data and the first performance abnormal data to determine more accurate performance abnormal detection results. Because the first performance anomaly data and the second performance anomaly data are determined according to the information of each cluster and cluster nodes in each cluster with different dimensions, the performance anomaly detection result is determined by the second performance anomaly data and the first performance anomaly data together, so that the performance anomaly detection result is more accurate, the load strategy for adjusting the clusters and the cluster nodes determined according to the performance anomaly detection result is more accurate, and the clusters and the cluster nodes in each cluster better achieve load balancing.

Drawings

Fig. 1 is an application environment diagram of a cluster performance monitoring method provided in this embodiment;

Fig. 2 is a flow chart of a first cluster performance monitoring method provided in the present embodiment;

fig. 3 is a flowchart illustrating a step of determining abnormal performance detection results of each cluster and cluster nodes in each cluster according to the present embodiment;

fig. 4 is a flow chart of a second cluster performance monitoring method according to the present embodiment;

fig. 5 is a block diagram of a first cluster performance monitoring device according to the present embodiment;

fig. 6 is a block diagram of a second cluster performance monitoring apparatus according to the present embodiment;

fig. 7 is a block diagram of a third cluster performance monitoring apparatus according to the present embodiment;

fig. 8 is a block diagram of a fourth cluster performance monitoring apparatus according to the present embodiment;

fig. 9 is an internal structure diagram of a computer device according to the present embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The cluster performance monitoring method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in FIG. 1. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing relevant data for cluster performance monitoring. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a cluster performance monitoring method.

In one embodiment, as shown in fig. 2, a cluster performance monitoring method is provided, and the method is applied to the computer in fig. 1 for illustration, and includes the following steps:

s201, obtaining attribute information of each cluster and cluster nodes in each cluster, and historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period.

Wherein, each cluster may be a distributed cluster, and in this embodiment, each cluster may be at least two clusters for synchronizing data. The data synchronization technology can be adopted among the clusters, information of at least two clusters and cluster nodes in the clusters is monitored, then the information of the clusters and the cluster nodes in the clusters is obtained in real time through distributed coordination service, and then request access of corresponding data service clusters is carried out according to the obtained information of the clusters and the cluster nodes in the clusters in real time. The attribute information of each cluster and the cluster nodes in each cluster includes at least state information and IP (Internet Protocol ) address information of each cluster and the cluster nodes in each cluster. The historical performance indexes of each cluster and the cluster nodes in each cluster at least comprise performance indexes such as memory, disk I/O (Input/Output), TCP flow conditions and the like.

In this embodiment, there may be a plurality of ways to obtain attribute information of each cluster and each cluster node in each cluster, and historical performance indexes of each cluster and each cluster node in each cluster in a preset period of time, and one implementation manner may be to obtain attribute information of each cluster node according to device attribute information of a device to which each cluster node belongs in each cluster, and determine attribute information of each cluster according to attribute information of each cluster node, for example, use the device attribute information of the device to which the cluster node belongs as attribute information of a cluster node, analyze attribute information of each cluster node of the same cluster, and determine attribute information shared by each cluster node as attribute information of the cluster. The performance index data of each cluster and the cluster nodes in each cluster in the data synchronization within a preset time period are counted, and the historical performance index of each cluster and the cluster nodes in each cluster within the preset time period is further determined according to the acquired performance index data and a predetermined historical performance index determining strategy.

Another preferred embodiment may be that attribute information of each cluster and cluster nodes in each cluster is obtained through a monitoring tool deployed on each cluster; and acquiring historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period through a coordination tool deployed on each cluster. The monitoring tool may be a work deployed on each cluster in advance for monitoring attribute information of the cluster and cluster nodes in the cluster. The coordination tool can be a distributed coordination service tool, and is a tool which is deployed on each cluster in advance and is used for acquiring performance index data of each cluster and cluster nodes in each cluster. In this embodiment, attribute information of each cluster and cluster nodes in each cluster, and historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period are obtained through different tools deployed on each cluster. The tool used for acquiring the attribute information and the historical performance index is different and has more professionality, so that the acquired attribute information and the historical performance index are more accurate.

Further, in one embodiment, the manners of obtaining the historical performance indexes of each cluster and the cluster nodes in each cluster in the preset time period are different, specifically, the hardware resource performance data of the virtual machine installed by the cluster nodes in each cluster is obtained through a coordination tool deployed on each cluster, and the hardware resource performance data is used as the historical performance indexes of each cluster in the preset time period; transaction performance data recorded in hosts to which the cluster nodes in the clusters belong are obtained through coordination tools deployed on the clusters to serve as historical performance indexes of the cluster nodes in the clusters in a preset time period. The hardware resource performance data may include performance indexes such as a central processing unit (Central Processing Unit, CPU), a memory, a disk I/O, a transmission control protocol (Transmission Control Protocol, TCP) traffic condition, and the like of a virtual machine installed by a cluster node in each cluster. The transaction performance data may include performance metrics such as database transaction rate, response time, disk I/O, etc. for each database deployed in the host. In this embodiment, the coordination tool deployed on each cluster respectively obtains the historical performance indexes of each cluster and the cluster nodes in each cluster on the virtual machine and the host, so that the ways of obtaining the historical performance indexes of each cluster and the cluster nodes in each cluster in a preset time period are more various, the obtaining ways are more specialized, and the obtained historical performance indexes of the cluster nodes in each cluster in the preset time period are more accurate.

S202, determining first performance abnormal data of each cluster and cluster nodes in each cluster according to historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period.

The first performance anomaly data may be determined according to historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period, and is used for representing the performance anomaly data of each cluster and cluster nodes in each cluster.

Optionally, in this embodiment, according to the obtained historical performance indexes of each cluster and the cluster nodes in each cluster in a preset time period, a predetermined first performance anomaly data determining algorithm may be combined to determine the first performance anomaly data of each cluster and the cluster nodes in each cluster. In addition, in this embodiment, the obtained historical performance indexes of each cluster and the cluster nodes in each cluster in a preset time period may also be processed through a pre-trained first performance anomaly data prediction model, so as to output first performance anomaly data of each cluster and the cluster nodes in each cluster.

Optionally, in order to improve accuracy of determining the first performance anomaly data of each cluster and the cluster nodes in each cluster, the first performance anomaly data of each cluster and the cluster nodes in each cluster may be predicted by a plurality of models, for example, by running a trend prediction model, historical performance indexes of each cluster and the cluster nodes in each cluster in a preset time period are processed to obtain initial performance anomaly data of each cluster and the cluster nodes in each cluster; and screening the first performance anomaly data from the initial performance anomaly data according to the anomaly severity of the initial performance anomaly data through a data mining model. The running trend prediction model may be trained in advance, and may process historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period, and output a model of running trend of each cluster and cluster nodes in each cluster in a future preset time period. The initial performance anomaly data may be operational trend data of each cluster and cluster nodes in each cluster within a preset time period in the future, which is output by the operational trend prediction model. Then, the first performance anomaly data is screened from the initial performance anomaly data according to the anomaly severity of the initial performance anomaly data, the anomaly severity of the initial performance anomaly data is illustratively ordered, and the initial performance anomaly data with the preset number of the initial performance anomaly data which are ordered in front are used as the first performance anomaly data according to the number of the first anomaly data which are determined in advance.

In the above embodiment, the initial performance anomaly data of each cluster and the cluster nodes in each cluster output by the operation trend prediction model are not used as the first performance anomaly data, but more representative data are screened out as the first performance anomaly data, so that the first performance anomaly data are more accurate, and a guarantee is provided for subsequently determining the performance anomaly detection result of the cluster nodes in each cluster.

Further, in order to make the processing result of the processing of the historical performance indexes of each cluster and the cluster nodes in each cluster in the preset time period more accurate through the running trend prediction model, in one embodiment, the processing result may be that the historical performance indexes of each cluster and the cluster nodes in each cluster in the preset time period are subjected to time sequence preprocessing, so as to obtain the processed historical performance indexes; and processing the processed historical performance indexes through an operation trend prediction model. The time sequence preprocessing can be to process historical performance indexes of each cluster and cluster nodes in each cluster under different dimensionalities within a preset time period into data under the same time unit, so that the processed historical performance indexes are obtained conveniently. For example, the historical performance index 1 is acquired in a preset period with one acquisition cycle per second, the historical performance index 2 is acquired in a preset period with one acquisition cycle per minute, and the time series preprocessing may be to process both the historical performance index 1 and the historical performance index 2 into data in time units of seconds. And processing the processed historical performance indexes through an operation trend model to obtain initial performance abnormal data of each cluster and cluster nodes in each cluster. In the above embodiment, before the historical performance indexes are processed by the running trend model, the historical performance indexes of each cluster and the cluster nodes in each cluster in a preset time period are preprocessed in a time sequence, so that the processed historical performance indexes are more convenient to process, and the accuracy and the representativeness of the initial performance abnormal data of each cluster and the cluster nodes in each cluster are further improved.

S203, determining second performance abnormality data of each cluster and cluster nodes in each cluster according to the attribute information of each cluster and the cluster nodes in each cluster.

Optionally, in this embodiment, according to attribute information of each cluster and cluster nodes in each cluster, a predetermined second performance anomaly data determining rule is combined to determine second performance anomaly data of each cluster and cluster nodes in each cluster. In addition, in this embodiment, the obtained attribute information of each cluster and the cluster node in each cluster may be processed by using a pre-trained second performance anomaly data prediction model, so as to output second performance anomaly data of each cluster and the cluster node in each cluster.

S204, determining the performance abnormality detection results of each cluster and cluster nodes in each cluster according to the first performance abnormality data and the second performance abnormality data.

The performance abnormality detection result is used for regulating and controlling each cluster and cluster nodes in each cluster to achieve load balancing. Accurate performance anomaly data for each cluster and for cluster nodes in each cluster can be characterized.

Specifically, in this embodiment, the performance anomaly detection result may be determined by using the second performance anomaly data and the first performance anomaly data together, for example, the first performance anomaly data and the second performance anomaly data may be integrated together to be used as the performance anomaly detection result of each cluster and the cluster node in each cluster. The abnormal data contained in the first performance abnormal data and the second performance abnormal data may be used as the performance abnormal detection result of each cluster and the cluster nodes in each cluster.

Therefore, the determined performance abnormality detection results of each cluster and the cluster nodes in each cluster are more accurate. And further, the effect of regulating and controlling each cluster and cluster nodes in each cluster according to the performance abnormality detection result to achieve load balancing is better.

In the cluster performance monitoring method, attribute information of each cluster and cluster nodes in each cluster and historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period are acquired, then, for the clusters and the cluster nodes in the clusters, first performance abnormal data and second performance abnormal data are respectively determined according to the corresponding attribute information and the historical performance indexes, and then, more accurate performance abnormal detection results are determined by combining the second performance abnormal data and the first performance abnormal data. Because the first performance anomaly data and the second performance anomaly data are determined according to the information of each cluster and cluster nodes in each cluster with different dimensions, the performance anomaly detection result is determined by the second performance anomaly data and the first performance anomaly data together, so that the performance anomaly detection result is more accurate, the load strategy for adjusting the clusters and the cluster nodes determined according to the performance anomaly detection result is more accurate, and the clusters and the cluster nodes in each cluster better achieve load balancing.

Further, in one embodiment, in order to make the performance anomaly detection result of each cluster and the cluster node in each cluster determined according to the first performance anomaly data and the second performance anomaly data more accurate, in one embodiment, as shown in fig. 3, the method may be:

s301, screening candidate performance anomaly data falling in the second performance anomaly data from the first performance anomaly data.

Wherein the candidate performance anomaly data may be first performance anomaly data falling within the second performance anomaly data.

Specifically, in this embodiment, the determined first performance anomaly data and the determined second performance anomaly data may be compared, and the first performance anomaly data having an intersection with the second performance anomaly data may be used as candidate performance anomaly data.

S302, determining the performance abnormality detection results of each cluster and cluster nodes in each cluster according to the candidate performance abnormality data.

In this embodiment, the candidate performance anomaly data is screened out again to obtain the performance anomaly detection results of each cluster and the cluster nodes in each cluster. Alternatively, in this embodiment, the candidate performance anomaly data may be processed by using a predetermined performance anomaly detection result determination model, so as to determine the performance anomaly detection results of each cluster and the cluster nodes in each cluster. Another preferred embodiment may be to determine whether the duration and the severity of the abnormality of the candidate performance abnormality data both exceed preset requirements; if yes, determining that the performance abnormality detection results of all clusters and cluster nodes in all clusters are clusters and/or cluster node abnormalities corresponding to the candidate performance abnormality data. Specifically, for the candidate performance anomaly data, determining the duration and the anomaly severity of each candidate performance anomaly data respectively, and combining a predetermined performance anomaly detection result determining strategy to judge whether the candidate performance anomaly data has data with the duration and the anomaly severity exceeding the preset requirements, if so, taking the performance anomaly detection results of each cluster and cluster nodes in each cluster as clusters and/or cluster nodes corresponding to the candidate performance anomaly data. If not, the candidate performance abnormality data is directly output as the performance abnormality detection result of each cluster and the cluster nodes in each cluster. For example, the predetermined performance anomaly detection result determining policy may be "if the load of any CPU in a cluster node in the cluster exceeds a maximum load threshold (for example, 50%), it is determined that the severity of the anomaly of the CPU exceeds a preset requirement, and if the duration of exceeding the maximum load threshold exceeds a duration threshold (for example, 2 hours), the output performance anomaly detection result is the cluster anomaly".

In the above embodiment, candidate performance anomaly data falling in the second performance anomaly data is screened out from the first performance anomaly data, and then the candidate performance anomaly data is updated according to a predetermined performance anomaly detection result determining strategy, so that the output performance anomaly detection result is more accurate, and the load strategy of adjusting clusters and cluster nodes determined according to the performance anomaly detection result is more accurate, so that each cluster and cluster nodes in each cluster achieve load balancing better.

For the convenience of understanding of those skilled in the art, the above cluster performance monitoring method is described in detail, and as shown in fig. 4, the method may include:

s401, acquiring attribute information of each cluster and cluster nodes in each cluster through monitoring tools deployed on each cluster.

S402, acquiring hardware resource performance data of virtual machines installed by cluster nodes in each cluster through a coordination tool deployed on each cluster, and taking the hardware resource performance data as historical performance indexes of each cluster in a preset time period.

S403, acquiring transaction performance data recorded in hosts to which the cluster nodes in the clusters belong through coordination tools deployed on the clusters, and taking the transaction performance data as historical performance indexes of the cluster nodes in the clusters in a preset time period.

S404, performing time sequence pretreatment on historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period to obtain the treated historical performance indexes.

S405, processing the processed historical performance indexes through an operation trend prediction model to obtain initial performance abnormal data of each cluster and cluster nodes in each cluster.

S406, screening the first performance anomaly data from the initial performance anomaly data according to the anomaly severity of the initial performance anomaly data through a data mining model.

S407, determining second performance abnormal data of each cluster and cluster nodes in each cluster according to the attribute information of each cluster and the cluster nodes in each cluster.

S408, screening candidate performance anomaly data falling in the second performance anomaly data from the first performance anomaly data.

S409, judging whether the duration and the severity of the candidate performance anomaly data exceed the preset requirements, if so, executing S410, and if not, executing S411.

S410, determining that each cluster and the performance abnormality detection result of the cluster nodes in each cluster are clusters and/or cluster node abnormalities corresponding to the candidate performance abnormality data. The performance abnormality detection result is used for regulating and controlling each cluster and cluster nodes in each cluster to achieve load balancing.

S411, determining candidate performance abnormality data as the performance abnormality detection results of each cluster and cluster nodes in each cluster.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a cluster performance monitoring device for realizing the above related cluster performance monitoring method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the cluster performance monitoring device or devices provided below may be referred to the limitation of the cluster performance monitoring method hereinabove, and will not be described herein.

In one embodiment, as shown in fig. 5, there is provided a cluster performance monitoring apparatus 1, including: a data acquisition module 10, a first determination module 11, a second determination module 12, and a detection result determination module 13, wherein:

the data acquisition module 10 is configured to acquire attribute information of each cluster and each cluster node in each cluster, and historical performance indexes of each cluster and each cluster node in each cluster in a preset time period.

The first determining module 11 is configured to determine first performance abnormality data of each cluster and cluster nodes in each cluster according to historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period.

And the second determining module 12 is configured to determine second performance abnormality data of each cluster and the cluster nodes in each cluster according to attribute information of each cluster and the cluster nodes in each cluster.

The detection result determining module 13 is configured to determine, according to the first performance anomaly data and the second performance anomaly data, a performance anomaly detection result of each cluster and a cluster node in each cluster.

The performance abnormality detection result is used for regulating and controlling each cluster and cluster nodes in each cluster to achieve load balancing.

In one embodiment, as shown in fig. 6, the detection result determining module 13 includes an abnormal data screening unit 130 and a detection result determining unit 131. Wherein, the liquid crystal display device comprises a liquid crystal display device,

And an abnormal data screening unit 130 for screening out candidate abnormal performance data falling in the second abnormal performance data from the first abnormal performance data.

The detection result determining unit 131 is configured to determine, according to the candidate performance anomaly data, performance anomaly detection results of each cluster and cluster nodes in each cluster.

In one embodiment, the detection result determining unit 131 includes a judging subunit and a detection result determining subunit, wherein,

and the judging subunit is used for judging whether the duration time and the abnormal severity degree of the candidate performance abnormal data exceed the preset requirements.

And the detection result determining subunit is used for determining that the performance abnormality detection result of each cluster and the cluster nodes in each cluster is the cluster and/or cluster node abnormality corresponding to the candidate performance abnormality data if the performance abnormality detection result is the candidate performance abnormality data.

In one embodiment, as shown in fig. 7, the first determination module 11 includes an initial data determination unit 110 and a first determination unit 111. Wherein, the liquid crystal display device comprises a liquid crystal display device,

the initial data determining unit 110 is configured to process, through the running trend prediction model, historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period, to obtain initial performance abnormal data of each cluster and cluster nodes in each cluster.

The first determining unit 111 is configured to screen, through the data mining model, the first performance anomaly data from the initial performance anomaly data according to the anomaly severity of the initial performance anomaly data.

In one embodiment, the initial data determination unit 110 includes a preprocessing subunit and a processing subunit, wherein,

the preprocessing subunit is used for performing time sequence preprocessing on the historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period to obtain the processed historical performance indexes.

And the processing subunit is used for processing the processed historical performance indexes through the running trend prediction model.

In one embodiment, as shown in fig. 8, the data acquisition module 10 includes a first acquisition unit 100 and a second acquisition unit 101. Wherein, the liquid crystal display device comprises a liquid crystal display device,

the first obtaining unit 100 is configured to obtain, through a monitoring tool deployed on each cluster, attribute information of each cluster and cluster nodes in each cluster.

The second obtaining unit 101 is configured to obtain, through a coordination tool deployed on each cluster, historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period.

In one embodiment, the second acquisition unit 101 comprises a first acquisition subunit and a second acquisition subunit, wherein,

The first obtaining subunit is configured to obtain, through a coordination tool deployed on each cluster, hardware resource performance data of a virtual machine installed by a cluster node in each cluster, where the hardware resource performance data is used as a historical performance index of each cluster in a preset time period.

The second obtaining subunit is configured to obtain, through a coordination tool deployed on each cluster, transaction performance data recorded in a host to which a cluster node in each cluster belongs, as a historical performance index of the cluster node in each cluster in a preset time period.

The modules in the cluster performance monitoring device may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 9. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program, when executed by a processor, implements a cluster performance monitoring method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

In one embodiment, the processor when executing the computer program further performs the steps of:

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

The information (including, but not limited to, attribute information of each cluster and cluster nodes in each cluster) and the data (including, but not limited to, first performance anomaly data and second performance anomaly data) related in the present application are both information and data authorized by the user or sufficiently authorized by each party.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not thereby to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method for monitoring cluster performance, the method comprising:

determining first performance abnormality data of each cluster and cluster nodes in each cluster according to historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period;

Determining second performance abnormal data of each cluster and cluster nodes in each cluster according to the attribute information of each cluster and the cluster nodes in each cluster;

determining the performance abnormality detection results of each cluster and cluster nodes in each cluster according to the first performance abnormality data and the second performance abnormality data; and the performance abnormality detection result is used for regulating and controlling each cluster and cluster nodes in each cluster to achieve load balancing.

2. The method of claim 1, wherein determining the performance anomaly detection result for each cluster and the cluster nodes in each cluster according to the first performance anomaly data and the second performance anomaly data comprises:

and determining the performance abnormality detection results of each cluster and the cluster nodes in each cluster according to the candidate performance abnormality data.

3. The method according to claim 2, wherein determining the performance anomaly detection result of each cluster and the cluster nodes in each cluster according to the candidate performance anomaly data comprises:

Judging whether the duration time and the abnormal severity degree of the candidate performance abnormal data exceed preset requirements;

if yes, determining that the performance abnormality detection results of all clusters and cluster nodes in all clusters are clusters and/or cluster nodes corresponding to the candidate performance abnormality data.

4. The method of claim 1, wherein determining the first performance anomaly data for each of the clusters and the cluster nodes in each of the clusters based on the historical performance indicators for each of the clusters and the cluster nodes in each of the clusters over a predetermined period of time comprises:

and screening first performance anomaly data from the initial performance anomaly data according to the anomaly severity of the initial performance anomaly data through a data mining model.

5. The method according to claim 4, wherein the historical performance indexes of each cluster and the cluster nodes in each cluster in a preset time period are processed through a running trend prediction model, and the method further comprises:

Performing time sequence pretreatment on the historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period to obtain the treated historical performance indexes;

6. The method according to any one of claims 1 to 5, wherein the obtaining attribute information of each cluster and cluster nodes in each cluster, and historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period includes:

acquiring attribute information of each cluster and cluster nodes in each cluster through a monitoring tool deployed on each cluster;

7. The method of claim 6, wherein the obtaining, by the coordination tool deployed on each cluster, the historical performance indicators of each cluster and the cluster nodes in each cluster in a preset time period includes:

acquiring hardware resource performance data of a virtual machine installed by a cluster node in each cluster through a coordination tool deployed on each cluster, and taking the hardware resource performance data as a historical performance index of each cluster in a preset time period;

And acquiring transaction performance data recorded in hosts to which the cluster nodes in each cluster belong through coordination tools deployed on the clusters, and taking the transaction performance data as historical performance indexes of the cluster nodes in each cluster in a preset time period.

8. A cluster performance monitoring apparatus, the apparatus comprising:

the first determining module is used for determining first performance abnormal data of each cluster and cluster nodes in each cluster according to historical performance indexes of each cluster and cluster nodes in each cluster in a preset time period;

the second determining module is used for determining second performance abnormality data of each cluster and each cluster node in each cluster according to the attribute information of each cluster and each cluster node in each cluster;

the detection result determining module is used for determining the performance abnormality detection results of each cluster and cluster nodes in each cluster according to the first performance abnormality data and the second performance abnormality data; and the performance abnormality detection result is used for regulating and controlling each cluster and cluster nodes in each cluster to achieve load balancing.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.

11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.