CN112486780A - Database performance real-time monitoring and diagnosing method and system based on message middleware - Google Patents

Database performance real-time monitoring and diagnosing method and system based on message middleware Download PDF

Info

Publication number
CN112486780A
CN112486780A CN202011502406.1A CN202011502406A CN112486780A CN 112486780 A CN112486780 A CN 112486780A CN 202011502406 A CN202011502406 A CN 202011502406A CN 112486780 A CN112486780 A CN 112486780A
Authority
CN
China
Prior art keywords
database
performance
data
model
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011502406.1A
Other languages
Chinese (zh)
Inventor
赵祖国
刘莉
熊文
王帆
崔波波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Secondary Vocational Internet Of Things Hubei Information Technology Co ltd
Original Assignee
Secondary Vocational Internet Of Things Hubei Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Secondary Vocational Internet Of Things Hubei Information Technology Co ltd filed Critical Secondary Vocational Internet Of Things Hubei Information Technology Co ltd
Priority to CN202011502406.1A priority Critical patent/CN112486780A/en
Publication of CN112486780A publication Critical patent/CN112486780A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3457Performance evaluation by simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging

Abstract

The invention discloses a database performance real-time monitoring and diagnosing method and system based on message middleware, wherein the method comprises the following steps: collecting database system running characteristics and operating system running characteristics of a node where a database is located; sending the collected data to a message middleware; the control node acquires the data from the message middleware in real time and stores the data; training a performance model by taking the data as input and taking the performance index of the database as output; and calling the performance model in real time, positioning database operation influencing performance indexes according to information output by the model, and guiding a system administrator to perform problem disposal. The system adopts the method to carry out real-time monitoring and diagnosis. The invention transmits the collected data to the control node through the message middleware, the control node establishes the performance model through the collected data, calls the performance model in real time, positions the database operation influencing the performance index according to the output information, and achieves the aims of real-time, intelligent diagnosis and large-scale management.

Description

Database performance real-time monitoring and diagnosing method and system based on message middleware
Technical Field
The invention relates to the technical field of databases, in particular to a database performance monitoring and diagnosing technology.
Background
The database is a core component of an information infrastructure, and database abnormality diagnosis and performance optimization are important problems faced by software developers, database managers (hereinafter referred to as DBAs) and system operation and maintenance engineers and the like, and are key for stable operation of various service systems. When a business system uses a database, two performance indexes are generally considered: the first indicator is latency, which refers to the time required from the submission of a request (operation to add, delete, query, or modify a database table) to completion, e.g., 10 milliseconds, and the smaller the value of this indicator, the better. The second metric is throughput, which refers to the requested value that the database can handle per unit time, e.g., 1200 requests/second, with larger values being better. Normally, these two performances can only be kept within a certain range, and the service system can normally operate.
Due to hardware failure (such as damage to a hard disk), software failure (such as unreasonable SQL statements), and the like, the database may fail or the performance may be drastically degraded, so that the service is interrupted. For example, the performance level delay is changed from 10 milliseconds to 5 seconds. The conventional database anomaly diagnosis and performance optimization method comprises the following processes: after the database fails or the performance is reduced rapidly, the DBA receives user feedback, firstly checks the database system running log file, tries a solution and implements according to the database log file such as binlog prompt information, analysis and positioning problems by using the personal experience of the DBA, and the process is repeated for many times until the database is recovered to be normal and the service system is operated normally.
Specifically, the monitoring tools pt-query-digest and MySQLdumpshow for MySQL respectively realize the analysis and visual display of the slow log, and the Percona Toolkit tool set also provides a series of tools for monitoring the MySQL database. However, these tools are too dependent on the personal experience of the operation and maintenance personnel.
The chinese patent "CN 111459905A MySQL database monitoring script implementation method and system" proposes a monitoring script implemented based on Python, which obtains some system running states from the database running log, such as the number of adding, deleting, modifying and querying operations, average waiting time, and slow query volume. And further calculating some characteristic indexes such as cache hit rate and the like. The method does so with time-spaced data acquisition and display, but does not directly provide the functions of intelligent diagnosis and performance optimization.
The existing database anomaly diagnosis and performance optimization methods have three main defects, which are respectively as follows:
firstly, in non-real time, the DBA can start to analyze the log and position the problem after the database has a problem, and the normal operation of the business system is affected;
secondly, the intelligent diagnosis is lacked, the problem solving efficiency depends on the personal experience of the DBA seriously, and the excellent DBA employment cost is high or needs to grow for a long time;
and thirdly, the scale is small, one DBA can only process one database server at a time, and if a business system uses a plurality of servers, one DBA can obviously not deal with the database servers.
Disclosure of Invention
The invention aims to provide a database performance real-time monitoring and diagnosing method based on message middleware, and aims to solve the technical problems of non-real-time, lack of intelligent diagnosis and small scale of the existing method.
In order to achieve the purpose, the invention provides the following scheme: a message-middleware based database performance real-time monitoring and diagnosis method, the method comprising: collecting database system operation characteristics of a node where a database is located; collecting operating system operating characteristics of a node where a database is located; transmitting the collected database system running characteristics and operating system running characteristics to a message middleware; the control node acquires and stores the database system running characteristics and the operating system running characteristics in real time from the message middleware; training a performance model by taking the database system operation characteristics and the operation characteristic data of the operating system as input and taking the database performance indexes as output; and calling the performance model in real time, positioning database operation influencing performance indexes according to information output by the performance model, and guiding a system administrator to perform problem disposal.
Optionally, the database system running characteristics and the operating system running characteristics collected at one or more database nodes are coded and compressed, and then submitted to the message middleware through a data sending unit.
Optionally, after acquiring the database system running characteristic and the operating system running characteristic data from the publish/subscribe function of the message middleware in real time, the data receiving unit stores the data according to the database node IP where the characteristic source is located and the timestamp.
Optionally, the database performance index includes throughput T or latency L, the database system operation characteristics include number of queries per second, number of transactions per second, average time consumed by each query, cache hit rate of key values, hit rate of query cache, table locking state, ratio of temporary table to actual table number, number of opened files and tables, time-consuming slow query, and frequency corresponding to slow query, and the operation system operation characteristics include CPU utilization, CPU load, process switching, memory utilization, disk read speed, disk read frequency, disk write speed, disk write frequency, network transmission speed, network transmission frequency, network reception speed, and network reception frequency.
Optionally, the process of constructing the performance model of the performance index throughput T includes: at a certain moment, the operating system operating characteristics, the database system operating characteristics and the performance index throughput T form a vector<a1,a2,a3,......,an,T>Wherein a isiRepresenting the ith characteristic, after a period of time, the operating system operating characteristic, the database system operating characteristic and the performance index throughput T form a matrix Amn
Figure BDA0002843981640000031
Adopting an SGBRT (stored Gradient Regression Tree) random Gradient descent Regression tree to construct a performance model of an independent variable T and calculate the importance of a dependent variable, wherein for any one tree T, the dependent variable T is used
Figure BDA0002843981640000032
To calculate each independent dependent variable metricxjThe calculation formula is shown below,
Figure BDA0002843981640000033
wherein nt is xjNumber of times selected in the decomposition of the T-model of the tree, P2(i) Is that the tree model is in the iththImprovement of variance after sub-decomposition if xjTaking M trees in an integrated model as a decomposition basis, wherein x in the model isjThe importance of (d) can be expressed as:
Figure BDA0002843981640000034
after the importance is calculated, normalizing the importance of the characteristics to make the sum of the importance of all the characteristics be 100%, wherein the larger the importance is, the larger the influence of the dependent variable on the performance index throughput T is;
similarly, a performance model of the performance index delay L is constructed by adopting an SGBRT stochastic gradient descent regression tree, the importance of the dependent variable is calculated, and the larger the importance is, the larger the influence of the dependent variable on the performance index delay L is.
The invention also provides a database performance real-time monitoring and diagnosing system based on the message middleware, which comprises:
the data acquisition unit A is used for acquiring the database system running logs at regular time, calculating the database system running characteristics according to the information of the running logs, coding and compressing the database system running characteristics and submitting the database system running characteristics to the data transmission unit;
the data acquisition unit B is used for acquiring the operating characteristics of the operating system, coding and compressing the operating characteristics of the operating system and submitting the operating characteristics to the data transmission unit;
the data sending unit is used for receiving the data submitted by the data acquisition unit A and the data acquisition unit B and submitting the data to the message middleware;
the data receiving unit acquires data from the message middleware;
the data receiving unit is used for acquiring data from the publish/subscribe function of the message middleware in real time and storing the data in the data storage unit;
the data storage unit is used for storing data according to the IP of the database node where the characteristic source is located and the timestamp;
the performance model training unit is used for establishing a model which takes database system operation characteristics and operating system operation characteristic data as input and takes database performance indexes as output, and the model can discriminate the most key factors influencing the performance;
and the performance diagnosis unit can position the database operation influencing the performance index by utilizing the model trained in the performance model training unit when receiving the database system operation characteristics, the operating system operation characteristic data and the performance index.
Optionally, the data acquisition unit a, the data acquisition unit B, and the data transmission unit operate at a node where the database is located, and may operate on one or more database nodes.
Optionally, the data receiving unit, the data storage unit, the performance model training unit, the performance model diagnosing unit and the user interaction unit operate at a performance model operating node.
Optionally, the performance model training unit constructs a performance model of the performance index by using an SGBRT stochastic gradient descent regression tree, and calculates the importance of the dependent variable, where the greater the importance is, the greater the influence of the dependent variable on the performance index is.
Optionally, the system further comprises a user interaction unit, configured to display diagnostic information of a real-time operating state and an abnormal condition of the database, and an administrator sends a command to the performance diagnostic unit or the performance model training unit through the user interaction unit to construct and update the performance model.
According to the method and the system for monitoring and diagnosing the performance of the database in real time, the operation system operation characteristics, the database operation characteristics and the performance indexes of the node where the database is located are collected in real time and transmitted to the control node through the message middleware, the control node establishes the performance model through the collected data, calls the performance model in real time, positions the database operation influencing the performance indexes according to the information output by the performance model, and guides a system administrator to perform problem handling, so that the aims of real-time, intelligent diagnosis and large-scale management are fulfilled, and the efficiency is greatly improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the structures shown in the drawings without inventive labor.
FIG. 1 is a detailed workflow diagram of a database performance real-time monitoring and diagnostic method according to an embodiment of the invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The technical solution of the present invention is described in detail by taking MySQL database as an example. FIG. 1 is a detailed workflow diagram of a database performance real-time monitoring and diagnostic method according to an embodiment of the invention. As shown in fig. 1, the method for monitoring and diagnosing database performance in real time according to the present invention comprises the following steps: collecting database system operation characteristics of a node where a MySQL database is located; collecting operating system operating characteristics of a node where a database is located; sending the collected database system running characteristics and operating system running characteristics to a message middleware kafka; the control node acquires and stores the database system running characteristics and the operating system running characteristics in real time from the message middleware kafka; training a performance model by taking the database system operation characteristics and the operation characteristic data of the operating system as input and taking the database performance indexes as output; and calling the performance model in real time, positioning database operation influencing performance indexes according to information output by the performance model, and guiding a system administrator to perform problem disposal. In the process, the message middleware kafka is used as a safe, rapid, stable and reliable data transmission channel, the collected database system running characteristics and the collected operating system running characteristic data are received and transmitted, the control node establishes a performance model through the collected data, then the performance model is called in real time, and the database operation influencing the performance index is positioned according to the information output by the performance model, so that the purpose of real-time and intelligent diagnosis is achieved.
Preferably, the database performance real-time monitoring and diagnosing method encodes and compresses the database system running characteristics and operating system running characteristics collected at one or more database nodes, and submits the compressed data to the message middleware kafka through a data sending unit. The method can be used for simultaneously acquiring and transmitting data of a plurality of database nodes so as to realize large-scale management and greatly improve the diagnosis efficiency.
Preferably, after acquiring the database system running characteristics and the operating system running characteristics data from the publish/subscribe function of the message middleware kafka in real time, the data receiving unit stores the data according to the database node IP where the characteristics source is located and the timestamp, so as to retrieve and send the data later.
Preferably, the database performance index includes throughput T or latency L, the database system operation characteristics include number of queries per second, number of transactions per second, average time consumed by each query, cache hit rate of key values, hit rate of query cache, table locking state, ratio of temporary table to actual table number, number of opened files to table number, time-consuming slow query, and frequency corresponding to slow query, and the operation system operation characteristics include CPU utilization, CPU load, process switching, memory utilization, disk read speed, disk read frequency, disk write speed, disk write frequency, network transmission speed, network transmission frequency, network reception speed, and network reception frequency. It should be noted that the collected database operating characteristics and operating system operating characteristics include, but are not limited to, those listed above.
Preferably, the database performance index includes throughput T or delay L, and the process of constructing the performance model of the performance index throughput T includes: at a certain moment, the operating system operating characteristics, the database system operating characteristics and the performance index throughput T form a vector<a1,a2,a3,......,an,T>Wherein a isiRepresenting the ith characteristic, after a period of time, the operating system operating characteristic, the database system operating characteristic and the performance index throughput T form a matrix Amn
Figure BDA0002843981640000061
An SGBRT (stored Gradient Regression Tree) random Gradient descent Regression tree is adopted to construct a performance model of an independent variable T, the input of the model is the operating characteristics of an operating system and MySQL, the model is expressed by < s >, and the output is the performance index throughput T of the database. The importance of the performance characteristics influencing the performance index T can be quantitatively analyzed based on the performance model constructed by the SBBRT.
SGBRT makes final predictions or classifications based on the results output from multiple independent regression trees by combining them. Therefore, it is generally considered in the industry that the model established by the SGBRT has higher accuracy and better robustness than the model established by the traditional machine learning. The SGBRT has good effect in a series of application fields, and only relatively few training sets are needed for construction of the SGBRT. Therefore, the SGBRT is adopted as a performance model in the technical scheme.
First, taking the performance index throughput T as an example, once a performance model of the independent variables T is established, the importance of the dependent variables can be calculated. Wherein, for any one tree T, use
Figure BDA0002843981640000062
To calculate each independent dependent variable metricxjThe calculation formula is shown below,
Figure BDA0002843981640000063
wherein nt is xjNumber of times selected in the decomposition of the T-model of the tree, P2(i) Is that the tree model is in the iththImprovement of variance after sub-decomposition. For example, in the i-ththAfter the sub-decomposition, P (i) is the relative error of T (T)i-Ti-1)/Ti-1. If xjTaking M trees in an integrated model as a decomposition basis, wherein x in the model isjThe importance of (d) can be expressed as:
Figure BDA0002843981640000064
after the importance is calculated, the importance of the features is normalized so that the sum of the importance of all the features is 100%, and the greater the importance, the greater the influence of the dependent variable on the performance index throughput T.
Similarly, a performance model of the performance index delay L is constructed by adopting an SGBRT stochastic gradient descent regression tree, the importance of the dependent variable is calculated, and the larger the importance is, the larger the influence of the dependent variable on the performance index delay L is.
Fig. 1 also shows a real-time database performance monitoring and diagnosis system based on message middleware. As shown in fig. 1, the present invention relates to the following modules: the system comprises a data acquisition unit A, a data acquisition unit B and a data transmission unit which operate at a node where a MySQL database is located; the system comprises a message middleware kafka, a data receiving unit, a data storage unit, a performance model training unit, a performance diagnosis unit and a user interaction unit, wherein the data receiving unit, the data storage unit, the performance model training unit, the performance diagnosis unit and the user interaction unit are operated on a performance model working node, namely a control node.
The data acquisition unit A, the data acquisition unit B, the data transmission unit and the like are deployed and operated on nodes of the MySQL database at the same time. And the method can be deployed on one or more MySQL database nodes. The message middleware kafka is open source message middleware, and provides a safe, quick, stable and reliable data transmission channel.
As shown in fig. 1, the database performance real-time monitoring and diagnosing system proposed by the present invention includes:
(1) and the data acquisition unit A is responsible for acquiring the running logs of the MySQL database system at regular time and calculating a series of database running characteristics according to the log information. The detailed log information is shown in table 1. It should be noted that the collected database operation characteristics include, but are not limited to, the contents in table 1. And after the database operation characteristic data are collected, the database operation characteristic data are coded and compressed and submitted to a data sending unit. It should be noted that, the database operation characteristics may be collected at a node where one or more databases are located and submitted to the data sending unit.
Table 1 MySQL run features
Figure BDA0002843981640000071
(2) And the data acquisition unit B is responsible for acquiring the resource utilization conditions of the CPU, the memory, the disk, the network and other operating systems of the node where the database is located. The specific operating system operating characteristics refer to the contents in table 2, and after operating system operating characteristic data is acquired, the operating system operating characteristic data is encoded and compressed and submitted to the data sending unit. It should be noted that, the operating system operating characteristics may be collected at a node where one or more databases are located and submitted to the data sending unit.
TABLE 2 operating System run characteristics
Figure BDA0002843981640000081
(3) And the data sending unit receives the MySQL operation characteristic data submitted by the data acquisition unit A and the operation system operation characteristic data submitted by the data acquisition unit B, and then submits the data to the message middleware kafka.
(4) The message middleware kafka provides a reliable, safe and efficient message transmission channel for the data sending unit and the data receiving unit in a distributed environment. The data sending unit submits the collected operation characteristics to the message middleware, and the data receiving unit acquires data from the message middleware.
(5) And the data receiving unit acquires data from the publishing/subscribing function of the message middleware kafka in real time and stores the data in the storage unit.
(6) And the data storage unit is used for storing data according to a certain rule, such as storing data according to the IP (Internet protocol) of the database node where the characteristic source is located and a time stamp.
(7) And the performance model training unit is used for establishing a model which takes the operating system operating characteristics and MySQL operating characteristics as input and takes the database performance indexes as output. The model quantifies key factors affecting system performance and discriminates the most key factors affecting performance.
(8) And (3) when the performance diagnosis unit receives the system operation characteristics and the performance indexes, the database operation influencing the performance indexes can be positioned by using the model trained in the step (7), so that a system administrator is further guided to perform problem treatment.
The core of the system performance diagnosis is to construct a performance model, wherein the input of the performance model is the characteristic of an operating system and a database during operation, and the output is the real-time performance index of the database. To build the performance model, data acquisition is first performed.
The characteristics from the operating system may be the number of bytes per second (unit MB/s, megabyte per second), CPU utilization (percentage), etc., as shown in table 2; the characteristics from the MySQL running log may be cache hit rate (unit percentage), table locking status (percentage), as shown in table 1; the performance index of the database MySQL is Throughput T or time delay L, T is Throughput (Throughput), and L is time delay (Latency).
At a certain moment, the operation characteristics of the operating system, the operation characteristics of MySQL and the performance indexes form a vector<a1,a2,a3,......,an,T>Wherein a isiRepresenting the ith characteristic. After a period of time, the operating characteristics and performance indicators of the system form a matrix Amn
Figure BDA0002843981640000091
The invention adopts SGBRT (stored Gradient Regression Tree) random Gradient descent Regression tree to construct a performance model, the input of the model is the operating characteristics of an operating system and MySQL, and is expressed by < s >, and the output is the performance index throughput or time delay of a database, and is expressed by T or L. The importance of the performance characteristics influencing the performance indexes T and L can be quantitatively analyzed based on the performance model constructed by the SBBRT.
SGBRT makes final predictions or classifications based on the results output from multiple independent regression trees by combining them. Therefore, it is generally considered in the industry that the model established by the SGBRT has higher accuracy and better robustness than the model established by the traditional machine learning. The SGBRT has good effect in a series of application fields, and only relatively few training sets are needed for construction of the SGBRT. Therefore, the SGBRT is adopted as a performance model in the technical scheme.
Specifically, taking the performance index throughput T as an example, once a performance model of the independent variables T is established, the importance of these dependent variables may be calculated. Working principle of performance modelThe following were used: for any one tree T, use
Figure BDA0002843981640000092
To calculate each independent dependent variable metricxjIs dependent on x in the process of number decomposition (e.g. the amount of network data transmission)jThe number of times it is selected. The calculation formula is as follows:
Figure BDA0002843981640000093
wherein nt is xjNumber of times selected in the decomposition of the T-model of the tree, P2(i) Is that the tree model is in the iththImprovement of variance after sub-decomposition. For example, in the i-ththAfter the sub-decomposition, P (i) is the relative error of T (T)i-Ti-1)/Ti-1. If xjTaking M trees in an integrated model as a decomposition basis, wherein x in the model isjThe importance of (d) can be expressed as:
Figure BDA0002843981640000094
once the importance of these variables (features) is calculated, the importance of these features is first normalized to give a total of 100% of the importance of all the features. The greater the value of the importance of a feature means that it has the greatest effect on a particular performance metric.
Similarly, similar operations are also adopted for the performance index delay L, an SGBRT random gradient descent regression tree is adopted to construct a performance model of the performance index L, the importance of the dependent variable is calculated, and the larger the importance is, the larger the influence of the dependent variable on the performance index L is.
Preferably, the system may further comprise a user interaction unit for interaction of a system administrator with the monitoring and diagnostic system. The interaction unit has two main functions: firstly, displaying the real-time running state of the database and the diagnosis information of abnormal conditions to an administrator; secondly, the administrator sends a command to the performance diagnosis unit or the performance model training unit through the interaction unit interface, specifies conditions such as a database, a time range or system operation characteristics and the like to construct a new performance model, and updates the performance model.
Specifically, the technical scheme of the invention is implemented on an internet of things big data platform of middle-time internet of things (Beijing) company Limited. The platform deploys 10 MySQL databases, and by the technical scheme of the invention, a system administrator can monitor the performance of the MySQL databases in real time and quickly locate faults by means of the solution. The specific process is described as follows:
(1) using bash and iostat to realize that the automatic script regularly acquires the operating characteristics of an operating system of a node where the database is located, and using bash to operate the MySQL shell to regularly acquire the operating characteristics and performance indexes of the MySQL shell, wherein the operating characteristics and the performance indexes form single sample data;
(2) when the number of samples reaches a certain number, starting to train a performance Model, wherein the Model takes the operation characteristics as input and the performance indexes as output, and is marked as y-Model (x1, x2, x3, … xn);
(3) and (3) the real-time performance diagnosis unit operates the performance model in the step (2) after receiving the latest operation characteristic data and performance indexes. The output of the performance model will mark the importance of the features, so that the features are sorted according to the importance;
(4) and the administrator positions and checks the problems according to the output prompt information. For example, a query Q1 is time consuming, the amount of network traffic is large, and it is likely that a full table scan and data transfer has been performed.
The system provided by the invention regularly acquires data for real-time monitoring and diagnosis, does not depend on personal experience of DBA too much, adopts an SGBRT random gradient descent regression tree to construct a performance model, takes the operating characteristics of an operating system and mysql as input, and takes performance indexes of throughput, time delay and the like of a database as output, so as to construct the performance model. On one hand, the current mysql operation can be positioned, intelligent diagnosis is achieved through the performance model, on the other hand, large-scale management can be achieved, and one DBA can process a plurality of database servers.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for real-time monitoring and diagnosing database performance based on message middleware, the method comprising: collecting database system operation characteristics of a node where a database is located; collecting operating system operating characteristics of a node where a database is located; transmitting the collected database system running characteristics and operating system running characteristics to a message middleware; the control node acquires and stores the database system running characteristics and the operating system running characteristics in real time from the message middleware; training a performance model by taking the database system operation characteristics and the operation characteristic data of the operating system as input and taking the database performance indexes as output; and calling the performance model in real time, positioning database operation influencing performance indexes according to information output by the performance model, and guiding a system administrator to perform problem disposal.
2. The method for real-time monitoring and diagnosing database performance based on message middleware of claim 1, wherein the database system running characteristics and the operating system running characteristics collected at one or more database nodes are coded and compressed and then submitted to the message middleware through a data sending unit.
3. The message-middleware-based database performance real-time monitoring and diagnosis method according to claim 1, wherein after the data receiving unit obtains the database system running characteristic and the operating system running characteristic data from the publish/subscribe function of the message middleware in real time, the data is stored according to the database node IP where the characteristic source is located and the timestamp.
4. The method for real-time monitoring and diagnosing database performance based on message middleware of claim 1, wherein the database performance index includes throughput T or latency L, the database system operation characteristics include number of queries per second, number of transactions per second, average time consumed by each query, cache hit rate of key values, hit rate of query cache, table locking state, ratio of temporary table to actual table, number of opened files and tables, time-consuming slow query, and frequency corresponding to slow query, and the operation system operation characteristics include CPU utilization, CPU load, process switching, memory utilization, disk read speed, disk read times, disk write speed, disk write times, network transmit times, network receive speed, and network receive times.
5. The message-middleware-based database performance real-time monitoring and diagnosis method of claim 4, wherein the process of constructing the performance model of the performance index throughput T comprises: at a certain moment, the operating system operating characteristics, the database system operating characteristics and the performance index throughput T form a vector<a1,a2,a3,......,an,T>Wherein a isiRepresenting the ith characteristic, after a period of time, the operating system operating characteristic, the database system operating characteristic and the performance index throughput T form a matrix Amn
Figure FDA0002843981630000011
Adopting an SGBRT (stored Gradient Regression Tree) random Gradient descent Regression tree to construct a performance model of an independent variable T and calculate the importance of a dependent variable, wherein for any one tree T, the dependent variable T is used
Figure FDA0002843981630000021
To calculate each independent dependent variable metricxjThe calculation formula is shown below,
Figure FDA0002843981630000022
wherein nt is xjNumber of times selected in the decomposition of the T-model of the tree, P2(i) Is that the tree model is in the iththImprovement of variance after sub-decomposition if xjTaking M trees in an integrated model as a decomposition basis, wherein x in the model isjThe importance of (d) can be expressed as:
Figure FDA0002843981630000023
after the importance is calculated, normalizing the importance of the characteristics to make the sum of the importance of all the characteristics be 100%, wherein the larger the importance is, the larger the influence of the dependent variable on the performance index throughput T is;
similarly, a performance model of the performance index delay L is constructed by adopting an SGBRT stochastic gradient descent regression tree, the importance of the dependent variable is calculated, and the larger the importance is, the larger the influence of the dependent variable on the performance index delay L is.
6. A message-middleware based database performance real-time monitoring and diagnostic system, the system comprising:
the data acquisition unit A is used for acquiring the database system running logs at regular time, calculating the database system running characteristics according to the information of the running logs, coding and compressing the database system running characteristics and submitting the database system running characteristics to the data transmission unit;
the data acquisition unit B is used for acquiring the operating characteristics of the operating system, coding and compressing the operating characteristics of the operating system and submitting the operating characteristics to the data transmission unit;
the data sending unit is used for receiving the data submitted by the data acquisition unit A and the data acquisition unit B and submitting the data to the message middleware;
the data receiving unit acquires data from the message middleware;
the data receiving unit is used for acquiring data from the publish/subscribe function of the message middleware in real time and storing the data in the data storage unit;
the data storage unit is used for storing data according to the IP of the database node where the characteristic source is located and the timestamp;
the performance model training unit is used for establishing a model which takes database system operation characteristics and operating system operation characteristic data as input and takes database performance indexes as output, and the model can discriminate the most key factors influencing the performance;
and the performance diagnosis unit can position the database operation influencing the performance index by utilizing the model trained in the performance model training unit when receiving the database system operation characteristics, the operating system operation characteristic data and the performance index.
7. The message-middleware based database performance real-time monitoring and diagnosis system of claim 6, wherein the data acquisition unit A, the data acquisition unit B and the data transmission unit are operated at a node where the database is located and can be operated on one or more database nodes.
8. The message-middleware based database performance real-time monitoring and diagnosis system of claim 6, wherein the data receiving unit, the data storage unit, the performance model training unit, the performance model diagnosis unit and the user interaction unit operate at a performance model work node.
9. The system for real-time monitoring and diagnosing performance of a database based on message middleware as claimed in claim 6, wherein the performance model training unit adopts an SGBRT stochastic gradient descent regression tree to construct the performance model of the performance index, and calculates the importance of the dependent variable, and the greater the importance, the greater the influence of the dependent variable on the performance index.
10. The message-middleware based database performance real-time monitoring and diagnosis system of claim 6, further comprising a user interaction unit for displaying diagnosis information of real-time operation state and abnormal condition of the database and an administrator sending a command to the performance diagnosis unit or the performance model training unit through the user interaction unit to construct and update the performance model.
CN202011502406.1A 2020-12-17 2020-12-17 Database performance real-time monitoring and diagnosing method and system based on message middleware Pending CN112486780A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011502406.1A CN112486780A (en) 2020-12-17 2020-12-17 Database performance real-time monitoring and diagnosing method and system based on message middleware

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011502406.1A CN112486780A (en) 2020-12-17 2020-12-17 Database performance real-time monitoring and diagnosing method and system based on message middleware

Publications (1)

Publication Number Publication Date
CN112486780A true CN112486780A (en) 2021-03-12

Family

ID=74914803

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011502406.1A Pending CN112486780A (en) 2020-12-17 2020-12-17 Database performance real-time monitoring and diagnosing method and system based on message middleware

Country Status (1)

Country Link
CN (1) CN112486780A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505044A (en) * 2021-09-09 2021-10-15 格创东智(深圳)科技有限公司 Database warning method, device, equipment and storage medium
CN114978889A (en) * 2022-05-13 2022-08-30 厦门兆翔智能科技有限公司 Airport enterprise service bus system
CN115640278A (en) * 2022-09-30 2023-01-24 北京柏睿数据技术股份有限公司 Method and system for intelligently optimizing database performance

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101183377A (en) * 2007-12-10 2008-05-21 华中科技大学 High availability data-base cluster based on message middleware
CN102981945A (en) * 2012-12-31 2013-03-20 北京京东世纪贸易有限公司 System and method for monitoring reliability performance
CN103412805A (en) * 2013-07-31 2013-11-27 交通银行股份有限公司 IT (information technology) fault source diagnosis method and IT fault source diagnosis system
CN110515799A (en) * 2019-08-15 2019-11-29 浙江万朋教育科技股份有限公司 MySQL monitoring system and implementation method based on python language

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101183377A (en) * 2007-12-10 2008-05-21 华中科技大学 High availability data-base cluster based on message middleware
CN102981945A (en) * 2012-12-31 2013-03-20 北京京东世纪贸易有限公司 System and method for monitoring reliability performance
CN103412805A (en) * 2013-07-31 2013-11-27 交通银行股份有限公司 IT (information technology) fault source diagnosis method and IT fault source diagnosis system
CN110515799A (en) * 2019-08-15 2019-11-29 浙江万朋教育科技股份有限公司 MySQL monitoring system and implementation method based on python language

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHIBIN YU 等: "MIA: Metric Importance Analysis for Big Data Workload Characterization", IEEE, pages 1371 - 1384 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505044A (en) * 2021-09-09 2021-10-15 格创东智(深圳)科技有限公司 Database warning method, device, equipment and storage medium
CN114978889A (en) * 2022-05-13 2022-08-30 厦门兆翔智能科技有限公司 Airport enterprise service bus system
CN114978889B (en) * 2022-05-13 2024-04-16 厦门兆翔智能科技有限公司 Airport enterprise service bus system
CN115640278A (en) * 2022-09-30 2023-01-24 北京柏睿数据技术股份有限公司 Method and system for intelligently optimizing database performance
CN115640278B (en) * 2022-09-30 2023-08-08 北京柏睿数据技术股份有限公司 Method and system for intelligently optimizing database performance
WO2024066635A1 (en) * 2022-09-30 2024-04-04 北京柏睿数据技术股份有限公司 Method and system for intelligently optimizing database performance

Similar Documents

Publication Publication Date Title
CN112486780A (en) Database performance real-time monitoring and diagnosing method and system based on message middleware
US20200372007A1 (en) Trace and span sampling and analysis for instrumented software
CN107220892B (en) Intelligent preprocessing tool and method applied to massive P2P network loan financial data
US20080147586A1 (en) Method and system for obtaining a combination of faulty parts from a dispersed parts tree
CN107783985A (en) A kind of distributed networks database query method, apparatus and management system
CN111737076A (en) Server monitoring method, device, equipment and storage medium
WO2023071761A1 (en) Anomaly positioning method and device
CN110874744A (en) Data anomaly detection method and device
CN112416902A (en) One-key inspection method for host and database
CN113504996B (en) Load balancing detection method, device, equipment and storage medium
CN110489142A (en) Appraisal procedure and device, storage medium, the terminal of equipment software upgrading
CN113553312A (en) Digital twin construction method based on elevator production process
CN112711488B (en) Device initialization method and system based on micro-service and message queue
CN115080363B (en) System capacity evaluation method and device based on service log
CN111190790A (en) Cloud computing cluster monitoring method and system based on peak prediction
CN111209314A (en) System for processing massive log data of power information system in real time
US11838171B2 (en) Proactive network application problem log analyzer
CN111367775B (en) Problem node positioning method, computer device, and computer-readable storage medium
CN112561388A (en) Information processing method, device and equipment based on Internet of things
CN111127208A (en) Abnormal transaction real-time monitoring system and calculation method
CN112988529A (en) Method and system for predicting database system performance based on machine learning
CN110413496A (en) A method of realizing that the operation data modularization of electronics license is collected
CN116743618B (en) Data acquisition and analysis method, equipment and medium of station remote equipment
CN110995815B (en) Information transmission method based on Gaia big data analysis system
CN117931952A (en) Industrial equipment data synchronous processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination