CN113051147A

CN113051147A - Database cluster monitoring method, device, system and equipment

Info

Publication number: CN113051147A
Application number: CN202110448149.6A
Authority: CN
Inventors: 刘煜; 郭玉章; 陈洁; 李颖; 李颢; 马力; 苏德庭
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2021-04-25
Filing date: 2021-04-25
Publication date: 2021-06-29

Abstract

The application discloses a monitoring method, a monitoring device, a monitoring system and monitoring equipment of a database cluster. And acquiring a result file obtained after the information acquisition is carried out by the main node. And sending an alarm prompt to the user when the numerical value of the alarm index is detected to be abnormal, and prompting the user that the functional fault occurs in the database cluster. And inputting the numerical value of the routing inspection index into the health degree model, and displaying the health degree model to a user through a preset interface. Compared with the prior art, the method and the system have the advantages that the functional fault condition of the database cluster is obtained without manual intervention, the operation and maintenance efficiency is remarkably improved, and the operation and maintenance manpower is saved. In addition, the health degree model and the routing inspection indexes are used for assisting the user in learning the good and bad degree of the hardware performance of the database cluster, and the health state of the database cluster can be effectively sensed.

Description

Database cluster monitoring method, device, system and equipment

Technical Field

The present application relates to the field of database technologies, and in particular, to a method, an apparatus, a system, and a device for monitoring a database cluster.

Background

The GP is called a Greenplus database, is a large-scale parallel computing database developed based on a PostgreSQL database, and the architecture of the GP is designed for managing a large-scale analysis type data warehouse and business intelligent workload. The existing GP monitoring tool GPCC is a native automatic operation and maintenance tool of GP, which is oriented to database administrators and users, and provides monitoring and management functions based on a visual graphical interface of a browser. However, the existing monitoring tool needs to manually perform hardware inspection on GP clusters, the inspection efficiency is very low in the GP cluster environment with a huge volume, and the health state of each GP cluster cannot be perceived, so that the working efficiency of the GP cluster is reduced.

Therefore, how to improve the hardware routing inspection efficiency of the GP cluster and effectively sense the health state of the GP cluster becomes a problem to be solved urgently in the field.

Disclosure of Invention

The application provides a monitoring method, a monitoring device, a monitoring system and monitoring equipment of a database cluster, and aims to improve hardware inspection efficiency of a GP cluster and effectively sense health status of the GP cluster.

In order to achieve the above object, the present application provides the following technical solutions:

a monitoring method of a database cluster comprises the following steps:

under the condition that a triggering operation of a user is received, a pre-configured acquisition task is distributed to a main node of a database cluster, so that the main node acquires information according to the acquisition task; the acquisition task comprises a plurality of index information acquisition tasks; the indexes comprise alarm indexes and routing inspection indexes; the alarm indicator is used for indicating information items influencing the service function of the database cluster; the inspection index is used for indicating information items which affect the hardware performance of the database cluster but do not affect the service function;

acquiring a result file obtained after the host node acquires information; the result file comprises an alarm log and a routing inspection log; the alarm log is used for recording the numerical value of the alarm index; the inspection log is used for recording the numerical value of the inspection index;

sending an alarm prompt to the user when the numerical value of the alarm index is detected to be abnormal, and prompting the user that the database cluster has a functional fault;

and inputting the numerical value of the routing inspection index into a pre-constructed health degree model, and displaying the health degree model to the user through a preset front-end interface.

Optionally, the method further includes:

inputting the numerical value of the routing inspection index recorded in the routing inspection log in a preset historical time period into a preset trend prediction model to obtain the predicted value of the routing inspection index; the predicted value is used for reflecting the performance of the database cluster hardware;

counting the number of routing inspection indexes with the predicted values larger than a preset threshold value;

and prompting the user that the health state of the database cluster is not good under the condition that the predicted value is greater than the number of routing inspection indexes of the preset threshold value and is greater than a preset numerical value.

Optionally, when the abnormal value of the alarm indicator is detected, sending an alarm prompt to the user to prompt the user that the database cluster has a functional fault, where the alarm prompt includes:

carrying out keyword detection on the alarm log;

determining that the numerical value of the alarm index is abnormal under the condition that the alarm log is detected to contain preset alarm characters;

and sending an alarm prompt to a preset monitoring alarm platform, and triggering the monitoring alarm platform to send a prompt that the database cluster has a functional fault to the user.

A monitoring apparatus of a database cluster, comprising:

the distribution unit is used for distributing a pre-configured acquisition task to a main node of a database cluster under the condition of receiving a triggering operation of a user, so that the main node acquires information according to the acquisition task; the acquisition task comprises a plurality of index information acquisition tasks; the indexes comprise alarm indexes and routing inspection indexes; the alarm indicator is used for indicating information items influencing the service function of the database cluster; the inspection index is used for indicating information items which affect the hardware performance of the database cluster but do not affect the service function;

the acquisition unit is used for acquiring a result file obtained after the information acquisition is carried out by the main node; the result file comprises an alarm log and a routing inspection log; the alarm log is used for recording the numerical value of the alarm index; the inspection log is used for recording the numerical value of the inspection index;

the warning unit is used for sending a warning prompt to the user when the abnormal numerical value of the warning index is detected, and prompting the user that the database cluster has a functional fault;

and the display unit is used for inputting the numerical value of the inspection index into a pre-constructed health degree model and displaying the health degree model to the user through a preset front-end interface.

Optionally, the method further includes:

the evaluation unit is used for inputting the numerical value of the routing inspection index recorded in the routing inspection log in a preset historical time period into a preset trend prediction model to obtain the predicted value of the routing inspection index; the predicted value is used for reflecting the performance of the database cluster hardware; counting the number of routing inspection indexes with the predicted values larger than a preset threshold value; and prompting the user that the health state of the database cluster is not good under the condition that the predicted value is greater than the number of routing inspection indexes of the preset threshold value and is greater than a preset numerical value.

Optionally, the alarm unit is configured to:

carrying out keyword detection on the alarm log;

A monitoring system for a database cluster, comprising:

the system comprises a scheduling module, an acquisition module and an analysis module;

the scheduling module is used for pre-configuring an acquisition task according to the information of the database cluster under the condition of receiving the triggering operation of the user and sending the acquisition task to the acquisition module; the acquisition task comprises a plurality of index information acquisition tasks; the indexes comprise alarm indexes and routing inspection indexes; the alarm indicator is used for indicating an information item influencing the service function of the database cluster; the inspection index is used for indicating information items which affect the hardware performance of the database cluster but do not affect the service function;

the acquisition module is used for distributing the acquisition task to the main node of the database cluster under the condition of receiving the acquisition task, so that the main node acquires information according to the acquisition task and acquires a result file obtained after the main node acquires the information; the result file comprises an alarm log and a routing inspection log; the alarm log is used for recording the numerical value of the alarm index; the inspection log is used for recording the numerical value of the inspection index;

the acquisition module is also used for sending the alarm log and the routing inspection log to the analysis module;

the analysis module is used for sending an alarm prompt to the user when the abnormal numerical value of the alarm index is detected, and prompting the user that the database cluster has a functional fault;

the analysis module is further used for inputting the numerical value of the routing inspection index into a pre-constructed health degree model, and displaying the health degree model to the user through a preset front-end interface.

Optionally, the analysis module is further configured to:

A computer-readable storage medium comprising a stored program, wherein the program performs the database cluster monitoring method.

A monitoring device of a database cluster, comprising: a processor, a memory, and a bus; the processor and the memory are connected through the bus;

the memory is used for storing a program, and the processor is used for running the program, wherein the monitoring method of the database cluster is executed when the program runs.

According to the technical scheme, the pre-configured acquisition task is distributed to the main node of the database cluster under the condition that the triggering operation of the user is received, so that the main node can acquire information according to the acquisition task. The collection task comprises a plurality of indexes of information collection tasks, the indexes comprise an alarm index and a routing inspection index, the alarm index is used for indicating information items influencing the service function of the database cluster, and the routing inspection index is used for indicating information items influencing the hardware performance of the database cluster but not influencing the service function. And acquiring a result file obtained after the information acquisition is carried out by the main node. The result file includes an alarm log and a patrol log. The alarm log is used for recording the numerical value of the alarm index, and the routing inspection log is used for recording the numerical value of the routing inspection index. And sending an alarm prompt to the user when the numerical value of the alarm index is detected to be abnormal, and prompting the user that the functional fault occurs in the database cluster. And inputting the numerical value of the routing inspection index into a pre-constructed health degree model, and displaying the health degree model to a user through a preset front-end interface. Compared with the prior art, the method and the system have the advantages that the functional fault condition of the database cluster is obtained without manual intervention, the operation and maintenance efficiency is remarkably improved, and a large amount of operation and maintenance manpower is saved. In addition, the health degree model and the routing inspection indexes are used for assisting the user in learning the quality degree of the hardware performance of the database cluster, so that the health state of the database cluster can be effectively sensed. Therefore, by the scheme, the hardware inspection efficiency of the GP cluster can be obviously improved, and the health state of the GP cluster can be effectively perceived.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1a is a schematic architecture diagram of a monitoring system of a database cluster according to an embodiment of the present application;

FIG. 1b is a schematic diagram of a fitness model provided by an embodiment of the present application;

fig. 1c is a schematic diagram of a monitoring process implemented by a monitoring system of a database cluster according to an embodiment of the present application;

fig. 2 is a schematic diagram of a monitoring method for a database cluster according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a monitoring apparatus of a database cluster according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As shown in fig. 1a, an architectural schematic diagram of a monitoring system of a database cluster provided in the embodiment of the present application includes:

a scheduling module 101, an acquisition module 102, and an analysis module 103.

The scheduling module 101 is configured to, when a trigger operation of a user is received, pre-configure an acquisition task according to information of a database cluster (hereinafter referred to as a cluster). The information of the cluster includes but is not limited to: a cluster name, a floating IP of the master node of the cluster, a database name, a user of the cluster, and a password of the user of the cluster. In addition, the scheduling module 101 is further configured to issue an acquisition task to the acquisition module 102, and specifically, the acquisition task may be sent to the acquisition module 102 under the condition that a trigger operation of a user is received, and the acquisition task may also be sent to the acquisition module 102 at regular time.

It should be noted that, the specific implementation manner of sending the collection task to the collection module 102 at regular time is common knowledge familiar to those skilled in the art, for example, a preset crontab process may be called to implement the timed issuing of the collection task.

It is emphasized that the collection task comprises an information collection task with multiple indexes, wherein the indexes comprise an alarm index and a routing inspection index. The alarm index is used for indicating information items influencing the cluster service function, and the patrol index is used for indicating information items influencing the cluster hardware performance but not influencing the service function.

In the embodiment of the application, the acquisition time specified in the information acquisition task of each index can be set by technical personnel according to actual conditions, specifically, aiming at the alarm index, the acquisition time interval specified in the information acquisition task is shorter, and aiming at the routing inspection index, the acquisition time interval specified in the information acquisition task is longer.

The acquisition module 102 is configured to distribute the acquisition task to the master node of the cluster through a preset acquisition server under the condition that the acquisition task is received, so that the master node of the cluster performs information acquisition according to the acquisition task. In addition, the acquisition module 102 is further configured to acquire a result file obtained after the master node of the cluster performs information acquisition, and send the result file to the analysis module 103.

It should be noted that, in the information acquisition process, the master nodes of different clusters all use the same acquisition script (for example, the shell script and the parameter items used by the script are the same), and the file formats of the result files of different clusters are uniform and can be identified by the preset monitoring and warning platform.

Specifically, the directory structure of the collection script can be seen from table 1.

TABLE 1

In table 1, the "serial number", "first-level directory", "second-level directory", "third-level directory", and "description" are commonly used in the art for directory structures. In addition, the contents of each of the "serial number", "primary directory", "secondary directory", "tertiary directory", and "description" are well known to those skilled in the art.

It should be noted that the contents shown in table 1 are only for illustration.

In the embodiment of the application, the result file comprises an alarm log and an inspection log, wherein the alarm log is used for recording the numerical value of the alarm index, and the inspection log is used for recording the numerical value of the inspection index.

Specifically, the specific setting style of the alarm indicator can be seen in table 2.

TABLE 2

In table 2, the so-called "check item", that is, the information item described in the embodiment of the present application, the "function index" and the "performance index" are further sub-divisions of the information item, the "monitoring frequency" is used to represent the collection time interval specified in the information collection task, and the contents indicated in the "serial number", the "check item", the "alarm or not", and the "log code" are common knowledge familiar to those skilled in the art, and are not described herein again.

It should be noted that the contents described in table 2 above are only for illustration.

Specifically, the specific setting style of the routing inspection index can be seen in table 3.

TABLE 3

In table 3, the so-called "check item" is the information item described in the embodiment of the present application, and the "function index" and the "performance index" are further sub-divisions of the information item, and the "timing inspection", "daily monitoring", and "real-time inspection" are all used to represent the acquisition time interval specified in the information acquisition task, and the contents indicated in the "serial number", "check item", and "log code" are all common knowledge familiar to those skilled in the art, and are not described herein again.

It should be noted that the contents described in table 3 above are only for illustration.

And the analysis module 103 is configured to send an alarm prompt to the user when detecting that the numerical value of the alarm indicator is abnormal, and prompt the user that the database cluster has a functional fault. Specifically, the analysis module 103 is configured to: and carrying out keyword detection on the alarm log, determining that the numerical value of the alarm index is abnormal under the condition that the alarm log is detected to contain a preset alarm character, sending an alarm prompt to a preset monitoring alarm platform, and triggering the monitoring alarm platform to send a prompt that the database cluster has a functional fault to a user.

The analysis module 103 is further configured to input the numerical value of the inspection index into a pre-constructed health degree model, and display the health degree model to the user through a preset front-end interface.

The health degree model is a health degree evaluation index system, and an index system for expressing the health state of the cluster by using a multi-dimensional index. In particular, a health model of the cluster can be seen as shown in FIG. 1 b. Specifically, the health degree model of the cluster can be optimized by using a machine learning algorithm.

Furthermore, the analysis module 103 is further configured to: inputting the numerical value of the routing inspection index recorded in the routing inspection log in a preset historical time period into a preset trend prediction model to obtain a predicted value of the routing inspection index, wherein the predicted value is used for reflecting the performance degree of cluster hardware; counting the number of routing inspection indexes with predicted values larger than a preset threshold value; and prompting the user that the cluster health state is not good under the condition that the number of the routing inspection indexes with the predicted values larger than the preset threshold is larger than a preset numerical value.

It should be noted that the trend prediction model is common knowledge familiar to those skilled in the art, and will not be described herein.

It is emphasized that the existing monitoring index of the GP cluster has the defect of single monitoring index, and a GP performance monitoring index system with rich acquisition index and strong expansibility is lacked, and is used as an operation and maintenance support of an enterprise-level big data platform. In addition, the existing monitoring tool of the GP cluster lacks the distinction of the alarm index and the patrol index. The system of the embodiment of the application realizes the distinguishing of function alarm and hardware inspection aiming at different types of indexes, and the alarm indexes influencing the GP cluster function are in butt joint with the preset monitoring alarm platform, so that automatic alarm is realized, and the operation and maintenance efficiency is improved. The routing inspection indexes affecting the performance of GP cluster hardware are displayed through the health degree model, and an effective reference basis is provided for optimizing the cluster performance by a user (such as operation and maintenance personnel).

In view of the functions of the above modules, as shown in fig. 1c, the monitoring system of the database cluster according to the present application implements a process of monitoring the hardware performance and the service function of the database cluster, and includes the following steps:

s101: and under the condition of receiving the triggering operation of the user, the scheduling module pre-configures an acquisition task according to the information of the database cluster.

S102: and the scheduling module sends an acquisition task to the acquisition module.

S103: the acquisition module distributes the acquisition task to the main node of the database cluster, so that the main node acquires information according to the acquisition task.

Wherein, the result file comprises an alarm log and a patrol log.

S104: the acquisition module acquires a result file obtained after the information acquisition is carried out by the main node.

Wherein, the result file comprises an alarm log and a patrol log.

S105: the acquisition module sends an alarm log and a routing inspection log to the analysis module.

S106: and under the condition that the numerical value of the alarm index is detected to be abnormal, the analysis module sends an alarm prompt to the user to prompt the user that the database cluster has a functional fault.

S107: the analysis module inputs the numerical value of the routing inspection index into a pre-constructed health degree model, and the health degree model is displayed to a user through a preset front-end interface.

To sum up, compare in prior art, need not to learn the functional fault condition of database cluster through the mode of artifical intervention again, fortune dimension efficiency obtains showing and promotes to a large amount of fortune dimension manpower are saved. In addition, the health degree model and the routing inspection indexes are used for assisting the user in learning the quality degree of the hardware performance of the database cluster, so that the health state of the database cluster can be effectively sensed. Therefore, by using the scheme of the embodiment, the hardware inspection efficiency of the GP cluster can be obviously improved, and the health state of the GP cluster can be effectively perceived.

As shown in fig. 2, a schematic diagram of a monitoring method for a database cluster provided in the embodiment of the present application includes the following steps:

s201: and under the condition of receiving the triggering operation of the user, distributing the pre-configured acquisition task to the main node of the database cluster, so that the main node acquires information according to the acquisition task.

The collection task comprises a plurality of indexes of information collection tasks, the indexes comprise an alarm index and a routing inspection index, the alarm index is used for indicating information items influencing the service function of the database cluster, and the routing inspection index is used for indicating information items influencing the hardware performance of the database cluster but not influencing the service function.

S202: and acquiring a result file obtained after the information acquisition is carried out by the main node.

The result file comprises an alarm log and an inspection log, wherein the alarm log is used for recording the numerical value of the alarm index, and the inspection log is used for recording the numerical value of the inspection index.

S203: and sending an alarm prompt to the user when the numerical value of the alarm index is detected to be abnormal, and prompting the user that the functional fault occurs in the database cluster.

Optionally, detecting keywords in the alarm log; determining that the numerical value of the alarm index is abnormal under the condition that the alarm log is detected to contain preset alarm characters; and sending an alarm prompt to a preset monitoring alarm platform, and triggering the monitoring alarm platform to send a prompt that the database cluster has a functional fault to a user.

S204: and inputting the numerical value of the routing inspection index into a pre-constructed health degree model, and displaying the health degree model to a user through a preset front-end interface.

Optionally, inputting the numerical value of the inspection index recorded in the inspection log in a preset historical time period into a preset trend prediction model to obtain a predicted value of the inspection index; the predicted value is used for reflecting the performance of the database cluster hardware; counting the number of routing inspection indexes with predicted values larger than a preset threshold value; and prompting the user that the cluster health state of the database is not good under the condition that the number of the routing inspection indexes with the predicted values larger than the preset threshold is larger than a preset numerical value.

Corresponding to the monitoring method for the database cluster provided by the embodiment of the application, the embodiment of the application also provides a monitoring device for the database cluster.

As shown in fig. 3, an architecture diagram of a monitoring device of a database cluster provided in the embodiment of the present application is shown, including:

the distribution unit 100 is configured to distribute a preconfigured acquisition task to a master node of a database cluster under the condition that a triggering operation of a user is received, so that the master node performs information acquisition according to the acquisition task; the acquisition task comprises a plurality of index information acquisition tasks; the indexes comprise alarm indexes and routing inspection indexes; the alarm indicator is used for indicating information items influencing the service function of the database cluster; the patrol indicator is used to indicate an item of information that affects database cluster hardware performance but does not affect service functionality.

An obtaining unit 200, configured to obtain a result file obtained after the master node performs information acquisition; the result file comprises an alarm log and a routing inspection log; the alarm log is used for recording the numerical value of the alarm index; the inspection log is used for recording numerical values of inspection indexes.

And the alarm unit 300 is configured to send an alarm prompt to a user when detecting that the value of the alarm indicator is abnormal, and prompt the user that the database cluster has a functional fault.

Wherein, the alarm unit 300 is configured to: carrying out keyword detection on the alarm log; determining that the numerical value of the alarm index is abnormal under the condition that the alarm log is detected to contain preset alarm characters; and sending an alarm prompt to a preset monitoring alarm platform, and triggering the monitoring alarm platform to send a prompt that the database cluster has a functional fault to a user.

And the display unit 400 is used for inputting the numerical value of the inspection index into a pre-constructed health degree model and displaying the health degree model to a user through a preset front-end interface.

The evaluation unit 500 is used for inputting the numerical value of the routing inspection index recorded in the routing inspection log in a preset historical time period into a preset trend prediction model to obtain the predicted value of the routing inspection index; the predicted value is used for reflecting the performance of the database cluster hardware; counting the number of routing inspection indexes with predicted values larger than a preset threshold value; and prompting the user that the cluster health state of the database is not good under the condition that the number of the routing inspection indexes with the predicted values larger than the preset threshold is larger than a preset numerical value.

The application also provides a computer readable storage medium, which includes a stored program, wherein the program executes the monitoring method of the database cluster provided by the application.

The present application further provides a monitoring device of a database cluster, including: a processor, a memory, and a bus. The processor is connected with the memory through a bus, the memory is used for storing programs, and the processor is used for running the programs, wherein when the programs run, the monitoring method for the database cluster provided by the application comprises the following steps:

Optionally, the method further includes:

carrying out keyword detection on the alarm log;

The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A monitoring method for a database cluster is characterized by comprising the following steps:

2. The method of claim 1, further comprising:

3. The method according to claim 1, wherein the sending an alarm prompt to the user in case of detecting that the value of the alarm indicator is abnormal, the prompt of the user that the database cluster is out of function comprises:

carrying out keyword detection on the alarm log;

4. A monitoring apparatus for a database cluster, comprising:

5. The apparatus of claim 4, further comprising:

6. The apparatus of claim 4, wherein the alert unit is configured to:

carrying out keyword detection on the alarm log;

7. A monitoring system for a database cluster, comprising:

8. The database cluster monitoring system of claim 7, wherein the analysis module is further configured to:

9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program performs the method of monitoring a database cluster according to any one of claims 1 to 3.

10. A monitoring device of a database cluster, comprising: a processor, a memory, and a bus; the processor and the memory are connected through the bus;

the memory is used for storing a program and the processor is used for running the program, wherein the program is used for executing the database cluster monitoring method in any one of claims 1-3 during running.