CN113051147A - Database cluster monitoring method, device, system and equipment - Google Patents

Database cluster monitoring method, device, system and equipment Download PDF

Info

Publication number
CN113051147A
CN113051147A CN202110448149.6A CN202110448149A CN113051147A CN 113051147 A CN113051147 A CN 113051147A CN 202110448149 A CN202110448149 A CN 202110448149A CN 113051147 A CN113051147 A CN 113051147A
Authority
CN
China
Prior art keywords
alarm
database cluster
index
user
routing inspection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110448149.6A
Other languages
Chinese (zh)
Inventor
刘煜
郭玉章
陈洁
李颖
李颢
马力
苏德庭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202110448149.6A priority Critical patent/CN113051147A/en
Publication of CN113051147A publication Critical patent/CN113051147A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Abstract

The application discloses a monitoring method, a monitoring device, a monitoring system and monitoring equipment of a database cluster. And acquiring a result file obtained after the information acquisition is carried out by the main node. And sending an alarm prompt to the user when the numerical value of the alarm index is detected to be abnormal, and prompting the user that the functional fault occurs in the database cluster. And inputting the numerical value of the routing inspection index into the health degree model, and displaying the health degree model to a user through a preset interface. Compared with the prior art, the method and the system have the advantages that the functional fault condition of the database cluster is obtained without manual intervention, the operation and maintenance efficiency is remarkably improved, and the operation and maintenance manpower is saved. In addition, the health degree model and the routing inspection indexes are used for assisting the user in learning the good and bad degree of the hardware performance of the database cluster, and the health state of the database cluster can be effectively sensed.

Description

Database cluster monitoring method, device, system and equipment
Technical Field
The present application relates to the field of database technologies, and in particular, to a method, an apparatus, a system, and a device for monitoring a database cluster.
Background
The GP is called a Greenplus database, is a large-scale parallel computing database developed based on a PostgreSQL database, and the architecture of the GP is designed for managing a large-scale analysis type data warehouse and business intelligent workload. The existing GP monitoring tool GPCC is a native automatic operation and maintenance tool of GP, which is oriented to database administrators and users, and provides monitoring and management functions based on a visual graphical interface of a browser. However, the existing monitoring tool needs to manually perform hardware inspection on GP clusters, the inspection efficiency is very low in the GP cluster environment with a huge volume, and the health state of each GP cluster cannot be perceived, so that the working efficiency of the GP cluster is reduced.
Therefore, how to improve the hardware routing inspection efficiency of the GP cluster and effectively sense the health state of the GP cluster becomes a problem to be solved urgently in the field.
Disclosure of Invention
The application provides a monitoring method, a monitoring device, a monitoring system and monitoring equipment of a database cluster, and aims to improve hardware inspection efficiency of a GP cluster and effectively sense health status of the GP cluster.
In order to achieve the above object, the present application provides the following technical solutions:
a monitoring method of a database cluster comprises the following steps:
under the condition that a triggering operation of a user is received, a pre-configured acquisition task is distributed to a main node of a database cluster, so that the main node acquires information according to the acquisition task; the acquisition task comprises a plurality of index information acquisition tasks; the indexes comprise alarm indexes and routing inspection indexes; the alarm indicator is used for indicating information items influencing the service function of the database cluster; the inspection index is used for indicating information items which affect the hardware performance of the database cluster but do not affect the service function;
acquiring a result file obtained after the host node acquires information; the result file comprises an alarm log and a routing inspection log; the alarm log is used for recording the numerical value of the alarm index; the inspection log is used for recording the numerical value of the inspection index;
sending an alarm prompt to the user when the numerical value of the alarm index is detected to be abnormal, and prompting the user that the database cluster has a functional fault;
and inputting the numerical value of the routing inspection index into a pre-constructed health degree model, and displaying the health degree model to the user through a preset front-end interface.
Optionally, the method further includes:
inputting the numerical value of the routing inspection index recorded in the routing inspection log in a preset historical time period into a preset trend prediction model to obtain the predicted value of the routing inspection index; the predicted value is used for reflecting the performance of the database cluster hardware;
counting the number of routing inspection indexes with the predicted values larger than a preset threshold value;
and prompting the user that the health state of the database cluster is not good under the condition that the predicted value is greater than the number of routing inspection indexes of the preset threshold value and is greater than a preset numerical value.
Optionally, when the abnormal value of the alarm indicator is detected, sending an alarm prompt to the user to prompt the user that the database cluster has a functional fault, where the alarm prompt includes:
carrying out keyword detection on the alarm log;
determining that the numerical value of the alarm index is abnormal under the condition that the alarm log is detected to contain preset alarm characters;
and sending an alarm prompt to a preset monitoring alarm platform, and triggering the monitoring alarm platform to send a prompt that the database cluster has a functional fault to the user.
A monitoring apparatus of a database cluster, comprising:
the distribution unit is used for distributing a pre-configured acquisition task to a main node of a database cluster under the condition of receiving a triggering operation of a user, so that the main node acquires information according to the acquisition task; the acquisition task comprises a plurality of index information acquisition tasks; the indexes comprise alarm indexes and routing inspection indexes; the alarm indicator is used for indicating information items influencing the service function of the database cluster; the inspection index is used for indicating information items which affect the hardware performance of the database cluster but do not affect the service function;
the acquisition unit is used for acquiring a result file obtained after the information acquisition is carried out by the main node; the result file comprises an alarm log and a routing inspection log; the alarm log is used for recording the numerical value of the alarm index; the inspection log is used for recording the numerical value of the inspection index;
the warning unit is used for sending a warning prompt to the user when the abnormal numerical value of the warning index is detected, and prompting the user that the database cluster has a functional fault;
and the display unit is used for inputting the numerical value of the inspection index into a pre-constructed health degree model and displaying the health degree model to the user through a preset front-end interface.
Optionally, the method further includes:
the evaluation unit is used for inputting the numerical value of the routing inspection index recorded in the routing inspection log in a preset historical time period into a preset trend prediction model to obtain the predicted value of the routing inspection index; the predicted value is used for reflecting the performance of the database cluster hardware; counting the number of routing inspection indexes with the predicted values larger than a preset threshold value; and prompting the user that the health state of the database cluster is not good under the condition that the predicted value is greater than the number of routing inspection indexes of the preset threshold value and is greater than a preset numerical value.
Optionally, the alarm unit is configured to:
carrying out keyword detection on the alarm log;
determining that the numerical value of the alarm index is abnormal under the condition that the alarm log is detected to contain preset alarm characters;
and sending an alarm prompt to a preset monitoring alarm platform, and triggering the monitoring alarm platform to send a prompt that the database cluster has a functional fault to the user.
A monitoring system for a database cluster, comprising:
the system comprises a scheduling module, an acquisition module and an analysis module;
the scheduling module is used for pre-configuring an acquisition task according to the information of the database cluster under the condition of receiving the triggering operation of the user and sending the acquisition task to the acquisition module; the acquisition task comprises a plurality of index information acquisition tasks; the indexes comprise alarm indexes and routing inspection indexes; the alarm indicator is used for indicating an information item influencing the service function of the database cluster; the inspection index is used for indicating information items which affect the hardware performance of the database cluster but do not affect the service function;
the acquisition module is used for distributing the acquisition task to the main node of the database cluster under the condition of receiving the acquisition task, so that the main node acquires information according to the acquisition task and acquires a result file obtained after the main node acquires the information; the result file comprises an alarm log and a routing inspection log; the alarm log is used for recording the numerical value of the alarm index; the inspection log is used for recording the numerical value of the inspection index;
the acquisition module is also used for sending the alarm log and the routing inspection log to the analysis module;
the analysis module is used for sending an alarm prompt to the user when the abnormal numerical value of the alarm index is detected, and prompting the user that the database cluster has a functional fault;
the analysis module is further used for inputting the numerical value of the routing inspection index into a pre-constructed health degree model, and displaying the health degree model to the user through a preset front-end interface.
Optionally, the analysis module is further configured to:
inputting the numerical value of the routing inspection index recorded in the routing inspection log in a preset historical time period into a preset trend prediction model to obtain the predicted value of the routing inspection index; the predicted value is used for reflecting the performance of the database cluster hardware;
counting the number of routing inspection indexes with the predicted values larger than a preset threshold value;
and prompting the user that the health state of the database cluster is not good under the condition that the predicted value is greater than the number of routing inspection indexes of the preset threshold value and is greater than a preset numerical value.
A computer-readable storage medium comprising a stored program, wherein the program performs the database cluster monitoring method.
A monitoring device of a database cluster, comprising: a processor, a memory, and a bus; the processor and the memory are connected through the bus;
the memory is used for storing a program, and the processor is used for running the program, wherein the monitoring method of the database cluster is executed when the program runs.
According to the technical scheme, the pre-configured acquisition task is distributed to the main node of the database cluster under the condition that the triggering operation of the user is received, so that the main node can acquire information according to the acquisition task. The collection task comprises a plurality of indexes of information collection tasks, the indexes comprise an alarm index and a routing inspection index, the alarm index is used for indicating information items influencing the service function of the database cluster, and the routing inspection index is used for indicating information items influencing the hardware performance of the database cluster but not influencing the service function. And acquiring a result file obtained after the information acquisition is carried out by the main node. The result file includes an alarm log and a patrol log. The alarm log is used for recording the numerical value of the alarm index, and the routing inspection log is used for recording the numerical value of the routing inspection index. And sending an alarm prompt to the user when the numerical value of the alarm index is detected to be abnormal, and prompting the user that the functional fault occurs in the database cluster. And inputting the numerical value of the routing inspection index into a pre-constructed health degree model, and displaying the health degree model to a user through a preset front-end interface. Compared with the prior art, the method and the system have the advantages that the functional fault condition of the database cluster is obtained without manual intervention, the operation and maintenance efficiency is remarkably improved, and a large amount of operation and maintenance manpower is saved. In addition, the health degree model and the routing inspection indexes are used for assisting the user in learning the quality degree of the hardware performance of the database cluster, so that the health state of the database cluster can be effectively sensed. Therefore, by the scheme, the hardware inspection efficiency of the GP cluster can be obviously improved, and the health state of the GP cluster can be effectively perceived.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1a is a schematic architecture diagram of a monitoring system of a database cluster according to an embodiment of the present application;
FIG. 1b is a schematic diagram of a fitness model provided by an embodiment of the present application;
fig. 1c is a schematic diagram of a monitoring process implemented by a monitoring system of a database cluster according to an embodiment of the present application;
fig. 2 is a schematic diagram of a monitoring method for a database cluster according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a monitoring apparatus of a database cluster according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As shown in fig. 1a, an architectural schematic diagram of a monitoring system of a database cluster provided in the embodiment of the present application includes:
a scheduling module 101, an acquisition module 102, and an analysis module 103.
The scheduling module 101 is configured to, when a trigger operation of a user is received, pre-configure an acquisition task according to information of a database cluster (hereinafter referred to as a cluster). The information of the cluster includes but is not limited to: a cluster name, a floating IP of the master node of the cluster, a database name, a user of the cluster, and a password of the user of the cluster. In addition, the scheduling module 101 is further configured to issue an acquisition task to the acquisition module 102, and specifically, the acquisition task may be sent to the acquisition module 102 under the condition that a trigger operation of a user is received, and the acquisition task may also be sent to the acquisition module 102 at regular time.
It should be noted that, the specific implementation manner of sending the collection task to the collection module 102 at regular time is common knowledge familiar to those skilled in the art, for example, a preset crontab process may be called to implement the timed issuing of the collection task.
It is emphasized that the collection task comprises an information collection task with multiple indexes, wherein the indexes comprise an alarm index and a routing inspection index. The alarm index is used for indicating information items influencing the cluster service function, and the patrol index is used for indicating information items influencing the cluster hardware performance but not influencing the service function.
In the embodiment of the application, the acquisition time specified in the information acquisition task of each index can be set by technical personnel according to actual conditions, specifically, aiming at the alarm index, the acquisition time interval specified in the information acquisition task is shorter, and aiming at the routing inspection index, the acquisition time interval specified in the information acquisition task is longer.
The acquisition module 102 is configured to distribute the acquisition task to the master node of the cluster through a preset acquisition server under the condition that the acquisition task is received, so that the master node of the cluster performs information acquisition according to the acquisition task. In addition, the acquisition module 102 is further configured to acquire a result file obtained after the master node of the cluster performs information acquisition, and send the result file to the analysis module 103.
It should be noted that, in the information acquisition process, the master nodes of different clusters all use the same acquisition script (for example, the shell script and the parameter items used by the script are the same), and the file formats of the result files of different clusters are uniform and can be identified by the preset monitoring and warning platform.
Specifically, the directory structure of the collection script can be seen from table 1.
TABLE 1
Figure BDA0003037576460000071
In table 1, the "serial number", "first-level directory", "second-level directory", "third-level directory", and "description" are commonly used in the art for directory structures. In addition, the contents of each of the "serial number", "primary directory", "secondary directory", "tertiary directory", and "description" are well known to those skilled in the art.
It should be noted that the contents shown in table 1 are only for illustration.
In the embodiment of the application, the result file comprises an alarm log and an inspection log, wherein the alarm log is used for recording the numerical value of the alarm index, and the inspection log is used for recording the numerical value of the inspection index.
Specifically, the specific setting style of the alarm indicator can be seen in table 2.
TABLE 2
Figure BDA0003037576460000072
Figure BDA0003037576460000081
In table 2, the so-called "check item", that is, the information item described in the embodiment of the present application, the "function index" and the "performance index" are further sub-divisions of the information item, the "monitoring frequency" is used to represent the collection time interval specified in the information collection task, and the contents indicated in the "serial number", the "check item", the "alarm or not", and the "log code" are common knowledge familiar to those skilled in the art, and are not described herein again.
It should be noted that the contents described in table 2 above are only for illustration.
Specifically, the specific setting style of the routing inspection index can be seen in table 3.
TABLE 3
Figure BDA0003037576460000082
Figure BDA0003037576460000091
In table 3, the so-called "check item" is the information item described in the embodiment of the present application, and the "function index" and the "performance index" are further sub-divisions of the information item, and the "timing inspection", "daily monitoring", and "real-time inspection" are all used to represent the acquisition time interval specified in the information acquisition task, and the contents indicated in the "serial number", "check item", and "log code" are all common knowledge familiar to those skilled in the art, and are not described herein again.
It should be noted that the contents described in table 3 above are only for illustration.
And the analysis module 103 is configured to send an alarm prompt to the user when detecting that the numerical value of the alarm indicator is abnormal, and prompt the user that the database cluster has a functional fault. Specifically, the analysis module 103 is configured to: and carrying out keyword detection on the alarm log, determining that the numerical value of the alarm index is abnormal under the condition that the alarm log is detected to contain a preset alarm character, sending an alarm prompt to a preset monitoring alarm platform, and triggering the monitoring alarm platform to send a prompt that the database cluster has a functional fault to a user.
The analysis module 103 is further configured to input the numerical value of the inspection index into a pre-constructed health degree model, and display the health degree model to the user through a preset front-end interface.
The health degree model is a health degree evaluation index system, and an index system for expressing the health state of the cluster by using a multi-dimensional index. In particular, a health model of the cluster can be seen as shown in FIG. 1 b. Specifically, the health degree model of the cluster can be optimized by using a machine learning algorithm.
Furthermore, the analysis module 103 is further configured to: inputting the numerical value of the routing inspection index recorded in the routing inspection log in a preset historical time period into a preset trend prediction model to obtain a predicted value of the routing inspection index, wherein the predicted value is used for reflecting the performance degree of cluster hardware; counting the number of routing inspection indexes with predicted values larger than a preset threshold value; and prompting the user that the cluster health state is not good under the condition that the number of the routing inspection indexes with the predicted values larger than the preset threshold is larger than a preset numerical value.
It should be noted that the trend prediction model is common knowledge familiar to those skilled in the art, and will not be described herein.
It is emphasized that the existing monitoring index of the GP cluster has the defect of single monitoring index, and a GP performance monitoring index system with rich acquisition index and strong expansibility is lacked, and is used as an operation and maintenance support of an enterprise-level big data platform. In addition, the existing monitoring tool of the GP cluster lacks the distinction of the alarm index and the patrol index. The system of the embodiment of the application realizes the distinguishing of function alarm and hardware inspection aiming at different types of indexes, and the alarm indexes influencing the GP cluster function are in butt joint with the preset monitoring alarm platform, so that automatic alarm is realized, and the operation and maintenance efficiency is improved. The routing inspection indexes affecting the performance of GP cluster hardware are displayed through the health degree model, and an effective reference basis is provided for optimizing the cluster performance by a user (such as operation and maintenance personnel).
In view of the functions of the above modules, as shown in fig. 1c, the monitoring system of the database cluster according to the present application implements a process of monitoring the hardware performance and the service function of the database cluster, and includes the following steps:
s101: and under the condition of receiving the triggering operation of the user, the scheduling module pre-configures an acquisition task according to the information of the database cluster.
S102: and the scheduling module sends an acquisition task to the acquisition module.
S103: the acquisition module distributes the acquisition task to the main node of the database cluster, so that the main node acquires information according to the acquisition task.
Wherein, the result file comprises an alarm log and a patrol log.
S104: the acquisition module acquires a result file obtained after the information acquisition is carried out by the main node.
Wherein, the result file comprises an alarm log and a patrol log.
S105: the acquisition module sends an alarm log and a routing inspection log to the analysis module.
S106: and under the condition that the numerical value of the alarm index is detected to be abnormal, the analysis module sends an alarm prompt to the user to prompt the user that the database cluster has a functional fault.
S107: the analysis module inputs the numerical value of the routing inspection index into a pre-constructed health degree model, and the health degree model is displayed to a user through a preset front-end interface.
To sum up, compare in prior art, need not to learn the functional fault condition of database cluster through the mode of artifical intervention again, fortune dimension efficiency obtains showing and promotes to a large amount of fortune dimension manpower are saved. In addition, the health degree model and the routing inspection indexes are used for assisting the user in learning the quality degree of the hardware performance of the database cluster, so that the health state of the database cluster can be effectively sensed. Therefore, by using the scheme of the embodiment, the hardware inspection efficiency of the GP cluster can be obviously improved, and the health state of the GP cluster can be effectively perceived.
As shown in fig. 2, a schematic diagram of a monitoring method for a database cluster provided in the embodiment of the present application includes the following steps:
s201: and under the condition of receiving the triggering operation of the user, distributing the pre-configured acquisition task to the main node of the database cluster, so that the main node acquires information according to the acquisition task.
The collection task comprises a plurality of indexes of information collection tasks, the indexes comprise an alarm index and a routing inspection index, the alarm index is used for indicating information items influencing the service function of the database cluster, and the routing inspection index is used for indicating information items influencing the hardware performance of the database cluster but not influencing the service function.
S202: and acquiring a result file obtained after the information acquisition is carried out by the main node.
The result file comprises an alarm log and an inspection log, wherein the alarm log is used for recording the numerical value of the alarm index, and the inspection log is used for recording the numerical value of the inspection index.
S203: and sending an alarm prompt to the user when the numerical value of the alarm index is detected to be abnormal, and prompting the user that the functional fault occurs in the database cluster.
Optionally, detecting keywords in the alarm log; determining that the numerical value of the alarm index is abnormal under the condition that the alarm log is detected to contain preset alarm characters; and sending an alarm prompt to a preset monitoring alarm platform, and triggering the monitoring alarm platform to send a prompt that the database cluster has a functional fault to a user.
S204: and inputting the numerical value of the routing inspection index into a pre-constructed health degree model, and displaying the health degree model to a user through a preset front-end interface.
Optionally, inputting the numerical value of the inspection index recorded in the inspection log in a preset historical time period into a preset trend prediction model to obtain a predicted value of the inspection index; the predicted value is used for reflecting the performance of the database cluster hardware; counting the number of routing inspection indexes with predicted values larger than a preset threshold value; and prompting the user that the cluster health state of the database is not good under the condition that the number of the routing inspection indexes with the predicted values larger than the preset threshold is larger than a preset numerical value.
To sum up, compare in prior art, need not to learn the functional fault condition of database cluster through the mode of artifical intervention again, fortune dimension efficiency obtains showing and promotes to a large amount of fortune dimension manpower are saved. In addition, the health degree model and the routing inspection indexes are used for assisting the user in learning the quality degree of the hardware performance of the database cluster, so that the health state of the database cluster can be effectively sensed. Therefore, by using the scheme of the embodiment, the hardware inspection efficiency of the GP cluster can be obviously improved, and the health state of the GP cluster can be effectively perceived.
Corresponding to the monitoring method for the database cluster provided by the embodiment of the application, the embodiment of the application also provides a monitoring device for the database cluster.
As shown in fig. 3, an architecture diagram of a monitoring device of a database cluster provided in the embodiment of the present application is shown, including:
the distribution unit 100 is configured to distribute a preconfigured acquisition task to a master node of a database cluster under the condition that a triggering operation of a user is received, so that the master node performs information acquisition according to the acquisition task; the acquisition task comprises a plurality of index information acquisition tasks; the indexes comprise alarm indexes and routing inspection indexes; the alarm indicator is used for indicating information items influencing the service function of the database cluster; the patrol indicator is used to indicate an item of information that affects database cluster hardware performance but does not affect service functionality.
An obtaining unit 200, configured to obtain a result file obtained after the master node performs information acquisition; the result file comprises an alarm log and a routing inspection log; the alarm log is used for recording the numerical value of the alarm index; the inspection log is used for recording numerical values of inspection indexes.
And the alarm unit 300 is configured to send an alarm prompt to a user when detecting that the value of the alarm indicator is abnormal, and prompt the user that the database cluster has a functional fault.
Wherein, the alarm unit 300 is configured to: carrying out keyword detection on the alarm log; determining that the numerical value of the alarm index is abnormal under the condition that the alarm log is detected to contain preset alarm characters; and sending an alarm prompt to a preset monitoring alarm platform, and triggering the monitoring alarm platform to send a prompt that the database cluster has a functional fault to a user.
And the display unit 400 is used for inputting the numerical value of the inspection index into a pre-constructed health degree model and displaying the health degree model to a user through a preset front-end interface.
The evaluation unit 500 is used for inputting the numerical value of the routing inspection index recorded in the routing inspection log in a preset historical time period into a preset trend prediction model to obtain the predicted value of the routing inspection index; the predicted value is used for reflecting the performance of the database cluster hardware; counting the number of routing inspection indexes with predicted values larger than a preset threshold value; and prompting the user that the cluster health state of the database is not good under the condition that the number of the routing inspection indexes with the predicted values larger than the preset threshold is larger than a preset numerical value.
To sum up, compare in prior art, need not to learn the functional fault condition of database cluster through the mode of artifical intervention again, fortune dimension efficiency obtains showing and promotes to a large amount of fortune dimension manpower are saved. In addition, the health degree model and the routing inspection indexes are used for assisting the user in learning the quality degree of the hardware performance of the database cluster, so that the health state of the database cluster can be effectively sensed. Therefore, by using the scheme of the embodiment, the hardware inspection efficiency of the GP cluster can be obviously improved, and the health state of the GP cluster can be effectively perceived.
The application also provides a computer readable storage medium, which includes a stored program, wherein the program executes the monitoring method of the database cluster provided by the application.
The present application further provides a monitoring device of a database cluster, including: a processor, a memory, and a bus. The processor is connected with the memory through a bus, the memory is used for storing programs, and the processor is used for running the programs, wherein when the programs run, the monitoring method for the database cluster provided by the application comprises the following steps:
under the condition that a triggering operation of a user is received, a pre-configured acquisition task is distributed to a main node of a database cluster, so that the main node acquires information according to the acquisition task; the acquisition task comprises a plurality of index information acquisition tasks; the indexes comprise alarm indexes and routing inspection indexes; the alarm indicator is used for indicating information items influencing the service function of the database cluster; the inspection index is used for indicating information items which affect the hardware performance of the database cluster but do not affect the service function;
acquiring a result file obtained after the host node acquires information; the result file comprises an alarm log and a routing inspection log; the alarm log is used for recording the numerical value of the alarm index; the inspection log is used for recording the numerical value of the inspection index;
sending an alarm prompt to the user when the numerical value of the alarm index is detected to be abnormal, and prompting the user that the database cluster has a functional fault;
and inputting the numerical value of the routing inspection index into a pre-constructed health degree model, and displaying the health degree model to the user through a preset front-end interface.
Optionally, the method further includes:
inputting the numerical value of the routing inspection index recorded in the routing inspection log in a preset historical time period into a preset trend prediction model to obtain the predicted value of the routing inspection index; the predicted value is used for reflecting the performance of the database cluster hardware;
counting the number of routing inspection indexes with the predicted values larger than a preset threshold value;
and prompting the user that the health state of the database cluster is not good under the condition that the predicted value is greater than the number of routing inspection indexes of the preset threshold value and is greater than a preset numerical value.
Optionally, when the abnormal value of the alarm indicator is detected, sending an alarm prompt to the user to prompt the user that the database cluster has a functional fault, where the alarm prompt includes:
carrying out keyword detection on the alarm log;
determining that the numerical value of the alarm index is abnormal under the condition that the alarm log is detected to contain preset alarm characters;
and sending an alarm prompt to a preset monitoring alarm platform, and triggering the monitoring alarm platform to send a prompt that the database cluster has a functional fault to the user.
The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A monitoring method for a database cluster is characterized by comprising the following steps:
under the condition that a triggering operation of a user is received, a pre-configured acquisition task is distributed to a main node of a database cluster, so that the main node acquires information according to the acquisition task; the acquisition task comprises a plurality of index information acquisition tasks; the indexes comprise alarm indexes and routing inspection indexes; the alarm indicator is used for indicating information items influencing the service function of the database cluster; the inspection index is used for indicating information items which affect the hardware performance of the database cluster but do not affect the service function;
acquiring a result file obtained after the host node acquires information; the result file comprises an alarm log and a routing inspection log; the alarm log is used for recording the numerical value of the alarm index; the inspection log is used for recording the numerical value of the inspection index;
sending an alarm prompt to the user when the numerical value of the alarm index is detected to be abnormal, and prompting the user that the database cluster has a functional fault;
and inputting the numerical value of the routing inspection index into a pre-constructed health degree model, and displaying the health degree model to the user through a preset front-end interface.
2. The method of claim 1, further comprising:
inputting the numerical value of the routing inspection index recorded in the routing inspection log in a preset historical time period into a preset trend prediction model to obtain the predicted value of the routing inspection index; the predicted value is used for reflecting the performance of the database cluster hardware;
counting the number of routing inspection indexes with the predicted values larger than a preset threshold value;
and prompting the user that the health state of the database cluster is not good under the condition that the predicted value is greater than the number of routing inspection indexes of the preset threshold value and is greater than a preset numerical value.
3. The method according to claim 1, wherein the sending an alarm prompt to the user in case of detecting that the value of the alarm indicator is abnormal, the prompt of the user that the database cluster is out of function comprises:
carrying out keyword detection on the alarm log;
determining that the numerical value of the alarm index is abnormal under the condition that the alarm log is detected to contain preset alarm characters;
and sending an alarm prompt to a preset monitoring alarm platform, and triggering the monitoring alarm platform to send a prompt that the database cluster has a functional fault to the user.
4. A monitoring apparatus for a database cluster, comprising:
the distribution unit is used for distributing a pre-configured acquisition task to a main node of a database cluster under the condition of receiving a triggering operation of a user, so that the main node acquires information according to the acquisition task; the acquisition task comprises a plurality of index information acquisition tasks; the indexes comprise alarm indexes and routing inspection indexes; the alarm indicator is used for indicating information items influencing the service function of the database cluster; the inspection index is used for indicating information items which affect the hardware performance of the database cluster but do not affect the service function;
the acquisition unit is used for acquiring a result file obtained after the information acquisition is carried out by the main node; the result file comprises an alarm log and a routing inspection log; the alarm log is used for recording the numerical value of the alarm index; the inspection log is used for recording the numerical value of the inspection index;
the warning unit is used for sending a warning prompt to the user when the abnormal numerical value of the warning index is detected, and prompting the user that the database cluster has a functional fault;
and the display unit is used for inputting the numerical value of the inspection index into a pre-constructed health degree model and displaying the health degree model to the user through a preset front-end interface.
5. The apparatus of claim 4, further comprising:
the evaluation unit is used for inputting the numerical value of the routing inspection index recorded in the routing inspection log in a preset historical time period into a preset trend prediction model to obtain the predicted value of the routing inspection index; the predicted value is used for reflecting the performance of the database cluster hardware; counting the number of routing inspection indexes with the predicted values larger than a preset threshold value; and prompting the user that the health state of the database cluster is not good under the condition that the predicted value is greater than the number of routing inspection indexes of the preset threshold value and is greater than a preset numerical value.
6. The apparatus of claim 4, wherein the alert unit is configured to:
carrying out keyword detection on the alarm log;
determining that the numerical value of the alarm index is abnormal under the condition that the alarm log is detected to contain preset alarm characters;
and sending an alarm prompt to a preset monitoring alarm platform, and triggering the monitoring alarm platform to send a prompt that the database cluster has a functional fault to the user.
7. A monitoring system for a database cluster, comprising:
the system comprises a scheduling module, an acquisition module and an analysis module;
the scheduling module is used for pre-configuring an acquisition task according to the information of the database cluster under the condition of receiving the triggering operation of the user and sending the acquisition task to the acquisition module; the acquisition task comprises a plurality of index information acquisition tasks; the indexes comprise alarm indexes and routing inspection indexes; the alarm indicator is used for indicating an information item influencing the service function of the database cluster; the inspection index is used for indicating information items which affect the hardware performance of the database cluster but do not affect the service function;
the acquisition module is used for distributing the acquisition task to the main node of the database cluster under the condition of receiving the acquisition task, so that the main node acquires information according to the acquisition task and acquires a result file obtained after the main node acquires the information; the result file comprises an alarm log and a routing inspection log; the alarm log is used for recording the numerical value of the alarm index; the inspection log is used for recording the numerical value of the inspection index;
the acquisition module is also used for sending the alarm log and the routing inspection log to the analysis module;
the analysis module is used for sending an alarm prompt to the user when the abnormal numerical value of the alarm index is detected, and prompting the user that the database cluster has a functional fault;
the analysis module is further used for inputting the numerical value of the routing inspection index into a pre-constructed health degree model, and displaying the health degree model to the user through a preset front-end interface.
8. The database cluster monitoring system of claim 7, wherein the analysis module is further configured to:
inputting the numerical value of the routing inspection index recorded in the routing inspection log in a preset historical time period into a preset trend prediction model to obtain the predicted value of the routing inspection index; the predicted value is used for reflecting the performance of the database cluster hardware;
counting the number of routing inspection indexes with the predicted values larger than a preset threshold value;
and prompting the user that the health state of the database cluster is not good under the condition that the predicted value is greater than the number of routing inspection indexes of the preset threshold value and is greater than a preset numerical value.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program performs the method of monitoring a database cluster according to any one of claims 1 to 3.
10. A monitoring device of a database cluster, comprising: a processor, a memory, and a bus; the processor and the memory are connected through the bus;
the memory is used for storing a program and the processor is used for running the program, wherein the program is used for executing the database cluster monitoring method in any one of claims 1-3 during running.
CN202110448149.6A 2021-04-25 2021-04-25 Database cluster monitoring method, device, system and equipment Pending CN113051147A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110448149.6A CN113051147A (en) 2021-04-25 2021-04-25 Database cluster monitoring method, device, system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110448149.6A CN113051147A (en) 2021-04-25 2021-04-25 Database cluster monitoring method, device, system and equipment

Publications (1)

Publication Number Publication Date
CN113051147A true CN113051147A (en) 2021-06-29

Family

ID=76520419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110448149.6A Pending CN113051147A (en) 2021-04-25 2021-04-25 Database cluster monitoring method, device, system and equipment

Country Status (1)

Country Link
CN (1) CN113051147A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505044A (en) * 2021-09-09 2021-10-15 格创东智(深圳)科技有限公司 Database warning method, device, equipment and storage medium
CN113641567A (en) * 2021-10-13 2021-11-12 北京易真学思教育科技有限公司 Database inspection method and device, electronic equipment and storage medium
CN114090382A (en) * 2021-11-22 2022-02-25 北京志凌海纳科技有限公司 Health inspection method and device for super-converged cluster
CN114584455A (en) * 2022-03-04 2022-06-03 吉林大学 Small and medium-sized high-performance cluster monitoring system based on enterprise WeChat
CN114598624A (en) * 2022-03-15 2022-06-07 平安科技(深圳)有限公司 Cluster monitoring method and device, electronic equipment and readable storage medium
CN116032574A (en) * 2022-12-16 2023-04-28 深圳市网安信科技有限公司 Intelligent safe operation and maintenance monitoring data processing system
CN116127149A (en) * 2023-04-14 2023-05-16 杭州悦数科技有限公司 Quantification method and system for health degree of graph database cluster

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140195853A1 (en) * 2013-01-09 2014-07-10 Microsoft Corporation Cloud management using a component health model
CN105337765A (en) * 2015-10-10 2016-02-17 上海新炬网络信息技术有限公司 Distributed hadoop cluster fault automatic diagnosis and restoration system
CN109857613A (en) * 2018-12-25 2019-06-07 南京南瑞信息通信科技有限公司 A kind of automation operational system based on acquisition cluster
CN111984499A (en) * 2020-08-04 2020-11-24 中国建设银行股份有限公司 Fault detection method and device for big data cluster

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140195853A1 (en) * 2013-01-09 2014-07-10 Microsoft Corporation Cloud management using a component health model
CN105337765A (en) * 2015-10-10 2016-02-17 上海新炬网络信息技术有限公司 Distributed hadoop cluster fault automatic diagnosis and restoration system
CN109857613A (en) * 2018-12-25 2019-06-07 南京南瑞信息通信科技有限公司 A kind of automation operational system based on acquisition cluster
CN111984499A (en) * 2020-08-04 2020-11-24 中国建设银行股份有限公司 Fault detection method and device for big data cluster

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505044A (en) * 2021-09-09 2021-10-15 格创东智(深圳)科技有限公司 Database warning method, device, equipment and storage medium
CN113641567A (en) * 2021-10-13 2021-11-12 北京易真学思教育科技有限公司 Database inspection method and device, electronic equipment and storage medium
CN113641567B (en) * 2021-10-13 2022-03-25 北京易真学思教育科技有限公司 Database inspection method and device, electronic equipment and storage medium
CN114090382A (en) * 2021-11-22 2022-02-25 北京志凌海纳科技有限公司 Health inspection method and device for super-converged cluster
CN114090382B (en) * 2021-11-22 2022-07-22 北京志凌海纳科技有限公司 Health inspection method and device for super-converged cluster
CN114584455A (en) * 2022-03-04 2022-06-03 吉林大学 Small and medium-sized high-performance cluster monitoring system based on enterprise WeChat
CN114598624A (en) * 2022-03-15 2022-06-07 平安科技(深圳)有限公司 Cluster monitoring method and device, electronic equipment and readable storage medium
CN114598624B (en) * 2022-03-15 2023-11-07 平安科技(深圳)有限公司 Cluster monitoring method and device, electronic equipment and readable storage medium
CN116032574A (en) * 2022-12-16 2023-04-28 深圳市网安信科技有限公司 Intelligent safe operation and maintenance monitoring data processing system
CN116127149A (en) * 2023-04-14 2023-05-16 杭州悦数科技有限公司 Quantification method and system for health degree of graph database cluster

Similar Documents

Publication Publication Date Title
CN113051147A (en) Database cluster monitoring method, device, system and equipment
CN111614491B (en) Power monitoring system oriented safety situation assessment index selection method and system
CN104468282B (en) cluster monitoring processing system and method
CN106951360B (en) Data statistical integrity calculation method and system
CN111241059B (en) Database optimization method and device based on database
CN110109906B (en) Data storage system and method
CN111221890A (en) Automatic monitoring and early warning method and device for general indexes
CN114138601A (en) Service alarm method, device, equipment and storage medium
CN113886130A (en) Method, device and medium for processing database fault
CN108337100B (en) Cloud platform monitoring method and device
CN108809729A (en) The fault handling method and device that CTDB is serviced in a kind of distributed system
CN113504996A (en) Load balance detection method, device, equipment and storage medium
CN114726649B (en) Situation awareness evaluation method and device, terminal equipment and storage medium
CN110162444A (en) A kind of system performance monitoring method and platform
CN115529219A (en) Alarm analysis method and device, computer readable storage medium and electronic equipment
CN112965793B (en) Identification analysis data-oriented data warehouse task scheduling method and system
CN108551444A (en) A kind of log processing method, device and equipment
CN113778831A (en) Data application performance analysis method, device, equipment and medium
CN104516916A (en) Method and device for analyzing network report incidence relation
CN114039878A (en) Network request processing method and device, electronic equipment and storage medium
CN112883253A (en) Data processing method, device, equipment and readable storage medium
CN111080325A (en) System and method for analyzing civil aviation customer relationship
CN110995500A (en) Node log management and control method, system and related components
CN115794479B (en) Log data processing method and device, electronic equipment and storage medium
CN112783732B (en) Database table capacity monitoring method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination