CN111913939B

CN111913939B - Database cluster optimization system and method based on reinforcement learning

Info

Publication number: CN111913939B
Application number: CN202010807625.4A
Authority: CN
Inventors: 莫毓昌
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-08-12
Filing date: 2020-08-12
Publication date: 2023-10-03
Anticipated expiration: 2040-08-12
Also published as: CN111913939A

Abstract

The invention discloses a database cluster optimization system and method based on reinforcement learning, wherein the optimization system comprises a current configuration information acquisition subsystem, a current performance information acquisition subsystem, an optimization strategy execution subsystem and an optimization engine subsystem; and the subsystems are coordinated and matched, optimal guidance is provided for the selection of the database cluster optimization strategy according to the current configuration information and the current performance information of the database cluster, and the optimization strategy execution subsystem is controlled to adjust the configuration information of the database cluster. The advantages are that: the method for strengthening learning is used for automatically optimizing the configuration parameters of the database clusters, so that the processing performance of the database clusters is remarkably improved; and when the load changes, the dynamic adaptability optimization adjustment can be realized rapidly, and the labor cost and the time cost are reduced greatly.

Description

Database cluster optimization system and method based on reinforcement learning

Technical Field

The invention relates to the field of database cluster optimization, in particular to a database cluster optimization system and method based on reinforcement learning.

Background

The world is an informationized world, and people can not leave the support of an information system for life, work and study. And the place behind the information system for storing and processing the final results is the database. Thus, database systems become particularly important, meaning that if the database is problematic, it means that the entire application system is also challenged with serious losses and consequences.

The word "big data" has become very popular today, although it is not known how this concept falls to the ground. However, it can be determined that with the rise of the internet of things and mobile applications, the data volume has a geometric grade improvement compared with the past. In view of the above challenges, it is obvious that a plurality of servers are grouped into a cluster, so that resources of each server can be fully utilized and client loads can be distributed to different servers, and as application loads increase, only new servers need to be added to the cluster.

Often, a database cluster administrator optimizes configuration parameters of the database cluster according to historical operation conditions of the database cluster and real-time states of the database cluster so as to improve processing performance of the database cluster.

There is a delay between optimizing the database cluster configuration parameters and feedback of the database cluster processing performance, and if a continuous number of optimization actions are taken, it is difficult to determine which optimization action is functioning or what effect each optimization action has on the results. Therefore, the manual optimization is not free from deviation, and factors such as huge parameter search space, load continuity, load and equipment diversity determine that the traditional manual optimization method is very inefficient.

Disclosure of Invention

The invention aims to provide a database cluster optimization system and method based on reinforcement learning, so as to solve the problems in the prior art.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a database cluster optimization system based on reinforcement learning, the optimization system comprising,

a current configuration information acquisition subsystem; the system comprises an optimization engine subsystem, a database cluster, a database, a control module and a control module, wherein the optimization engine subsystem is used for receiving a current configuration information acquisition command issued by the optimization engine subsystem and acquiring current configuration information of the database cluster according to the current configuration information acquisition command; the collected current configuration information of the database cluster is sent to an optimization engine subsystem;

the current performance information acquisition subsystem; the system comprises a database cluster, an optimization engine subsystem, a database control module and a control module, wherein the database cluster is used for storing the current performance information of the database cluster; the collected current performance information of the database cluster is sent to an optimization engine subsystem;

optimizing a policy enforcement subsystem; the system comprises an optimization engine subsystem, a database cluster management subsystem and a database cluster management subsystem, wherein the optimization engine subsystem is used for receiving an optimization strategy issued by the optimization engine subsystem and adjusting configuration information of the database cluster according to the optimization strategy; the optimization strategy comprises optimization parameters and optimization directions;

An optimization engine subsystem; the system comprises a current configuration information acquisition subsystem, a current performance information acquisition subsystem and an optimization strategy execution subsystem, wherein the current configuration information acquisition subsystem is used for acquiring current configuration information, the current performance information acquisition subsystem is used for acquiring current performance information, and the optimization strategy execution subsystem is used for respectively transmitting a current configuration information acquisition command, a current performance information acquisition command and an optimization parameter to the current configuration information acquisition subsystem, the current performance information acquisition subsystem and the optimization strategy execution subsystem; and generating a database cluster optimization strategy according to the acquired current configuration information and the current performance information of the database cluster, and controlling an optimization strategy execution subsystem to adjust the configuration information of the database cluster.

Preferably, the current configuration information acquisition subsystem comprises,

the first acquisition command receiving module; receiving a current configuration information acquisition command issued by the optimization engine subsystem through the network monitoring optimization engine subsystem, and calling a configuration information acquisition module according to the received current configuration information acquisition command so as to acquire the current configuration information of the database cluster;

configuring an information acquisition module; calling each configuration information sub-module to collect corresponding configuration information of the database cluster according to the current configuration information, and sending the corresponding configuration information to an optimization engine subsystem;

a cache configuration information sub-module; the configuration information related to database cache is collected and comprises a query cache size, a single query available cache area size and a sequencing cache size;

An operation configuration information sub-module; the configuration information related to database operation is collected and comprises a read operation buffer area size, a temporary table size, a maximum heap table size, an index buffer area size, a batch insertion data buffer area size and a joint operation queue size;

a network configuration information sub-module; the configuration information is used for collecting configuration information related to a database network, and comprises a maximum value of one-time message transmission quantity, a maximum database connection number and a maximum number of abnormal interruption times of a database connection request in network transmission;

a system configuration information sub-module; the configuration information is used for collecting configuration information related to the database system; including the number of files allowed to open, the number of database requests that can be stored in the stack in a short time, the number of threads stored in the cache, the number of concurrent threads, and the stack size for each thread.

Preferably, the current performance information acquisition subsystem includes,

the second acquisition command receiving module; receiving a current performance information acquisition command issued by the optimization engine subsystem through the network monitoring optimization engine subsystem, and calling a performance information acquisition module according to the received current performance information acquisition command so as to acquire the current performance information of the database cluster;

A performance information acquisition module; calling each performance information sub-module to collect corresponding performance information of the database cluster according to the current performance information, and sending the corresponding performance information to an optimization engine subsystem;

transaction and query information sub-modules; the method comprises the steps of acquiring performance information related to database transactions and queries, wherein the performance information comprises the steps of acquiring average per-second select statement execution times by utilizing a database management command, acquiring average per-second insert statement execution times by utilizing a database management command, acquiring average update statement execution times by utilizing a database management command, acquiring average delete statement execution times by utilizing a database management command, calculating the number of transactions per second, calculating the number of queries per second and utilizing the database management command to query operation response time statistics;

a thread performance information sub-module; the method comprises the steps of collecting performance information related to database threads; the method comprises the steps of obtaining the number of threads currently in an activated state by using an operating system management command, and obtaining the number of threads currently connected by using the operating system management command;

a network traffic performance information sub-module; the system is used for collecting performance information related to the network flow of the database; including obtaining an average number of bytes received from all clients per second using network management commands, and obtaining an average number of bytes sent to all clients per second using network management commands.

Preferably, the optimization strategy execution subsystem comprises,

an optimization strategy receiving module; receiving an optimization strategy issued by an optimization engine subsystem through a network monitoring optimization engine subsystem;

an optimization strategy executing module; the optimizing strategy is used for receiving an optimizing strategy issued by the optimizing engine, searching the configuration file according to the optimizing parameter, and finding out the configuration parameter corresponding to the optimizing parameter; according to the optimization direction, adjusting configuration parameter values corresponding to the optimization parameters; the adjustment content may specifically include the content of,

a1, when the optimization direction is +, if the configuration parameters corresponding to the optimization parameters are switching items, setting the parameter values to be on;

a2, when the optimization direction is-, if the configuration parameter corresponding to the optimization parameter is a switch item, setting the parameter value to be off;

a3, when the optimization direction is +, if the configuration parameter corresponding to the optimization parameter is an integer within 10, the parameter value is set to be increased by 1;

a4, when the optimization direction is-, if the configuration parameter corresponding to the optimization parameter is an integer within 10, setting the parameter value to be reduced by 1;

a5, when the optimization direction is +, if the configuration parameter corresponding to the optimization parameter is an integer within 256, the parameter value is set to be increased by 8;

a6, when the optimization direction is-, if the configuration parameter corresponding to the optimization parameter is an integer within 256, setting the parameter value to be reduced by 8;

A7, when the optimization direction is +, if the configuration parameter corresponding to the optimization parameter is an integer greater than 256, the parameter value is set to be multiplied by 2;

a8, when the optimization direction is-, if the configuration parameter corresponding to the optimization parameter is an integer greater than 256, the parameter value is set to be divided by 2.

Preferably, the optimization engine subsystem comprises an optimization policy evaluation network comprising an input layer, two hidden layers and an output layer,

an input layer comprising 17 inputs, each from 17 current values of configuration information of the database cluster;

the first hidden layer, comprising 128 neurons, has a calculation formula,

O ₁ ＝relu(w ₁ ·x+b ₁ )

wherein x is the input of the optimization strategy evaluation network; w (w) ₁ Is a weight matrix; b ₁ Is biased; o (O) ₁ Is the output vector of the first hidden layer 128 dimension;

the second hidden layer, comprising 64 neurons, has a calculation formula,

O ₂ ＝relu(w ₂ ·O ₁ +b ₂ )

wherein w is ₂ Is a weight matrix; b ₂ Is biased; o (O) ₂ Is the output vector of the second hidden layer 64 dimension;

the output layer comprises 34 neurons, the calculation formula is,

y＝relu(w ₃ ·O ₂ +b ₃ )

wherein w is ₃ Is a weight matrix; b ₃ Is biased; y is an output vector of the output layer, which includes 34 outputs, each corresponding to an evaluation value of the optimization strategy; since there are 17 configuration information, 2 optimization directions per configuration information, there are 34 optimization strategies.

The invention also aims to provide a database cluster optimization method based on reinforcement learning, which is realized by using any one of the optimization systems; the optimization method comprises the following steps,

s1, initializing: initializing the optimization strategy evaluation network, namely initializing the ownership matrix parameters and the bias parameters into random values;

s2, learning: performing reinforcement learning process once every first preset time period until the learning process is finished, and obtaining a trained optimization strategy evaluation network;

s3, application stage: and (3) carrying out optimization adjustment on the database cluster parameters once by utilizing the trained optimization strategy evaluation network every a first preset time period until the database cluster stops running.

Preferably, step S2 specifically includes,

s21, every a first preset time length, the optimization engine subsystem commands the current configuration information acquisition subsystem to acquire the current configuration information of the database cluster once, and the optimization engine subsystem acquires the current configuration information state S of the database cluster;

s22, inputting the current configuration information state S of the acquired database cluster into an optimization strategy evaluation network, and outputting evaluation value vectors V_s of 34 optimization strategies through calculation of a neural network; selecting an optimization strategy h_max with the maximum evaluation value vector from all the optimization strategies;

S23, the optimization engine subsystem selects one optimization strategy from all optimization strategies according to a database cluster optimization strategy selection mechanism;

s24, the optimization engine subsystem sends the selected optimization strategy h to an optimization strategy execution subsystem, and the optimization strategy execution subsystem executes the optimization strategy h to update the configuration information state of the database cluster from S to S';

s25, delaying a second preset time length, and commanding the current performance information acquisition subsystem to acquire the current performance information of the database cluster by the optimization engine subsystem, and calculating a return value r corresponding to an optimization strategy h for updating the configuration information state of the database cluster from S to S';

s26, inputting the updated database cluster configuration information configuration state S' into the optimization strategy evaluation network, obtaining evaluation values of 34 optimization strategies through calculation of a neural network, and selecting a maximum evaluation value V_max from the evaluation values;

s27, calculating a corresponding updated evaluation value h_val of the database cluster under the configuration information state S and the optimization strategy h by using a classical reinforcement learning bellman formula, wherein h_val=r+V_max;

s28, updating the evaluation value vector by using the updated evaluation value h_val and the evaluation value vector V_s, namely replacing a value corresponding to the optimization strategy h in the evaluation value vector V_s with h_val to obtain an updated evaluation value vector V_s';

S29, storing the configuration information state S of the database cluster and the updated evaluation value vector V_s' into a playback pool as a training sample;

s210, repeating the steps S21 to S29 32 times, wherein the number of training samples in the playback pool is 32;

s211, training an optimization strategy evaluation network by using the 32 training samples and using a gradient descent neural network training algorithm so as to update parameters of the optimization strategy evaluation network;

s212, repeatedly executing the steps S21 to S211 until the error of the optimization strategy evaluation network is smaller than a preset threshold value, and ending the reinforcement learning process.

Preferably, the optimization strategy selection mechanism is specifically,

randomly selecting an optimization strategy h_rand from 34 optimization strategies according to the epsilon probability, and taking the optimization strategy h_rand as an optimization strategy h; or selecting the maximum optimization strategy h_max in the configuration information state s with the probability of 1-epsilon, and taking the maximum optimization strategy h_max as an optimization strategy h.

Preferably, the return value r is calculated by the following steps,

b1, calculating the difference Dtps between the transaction number tps per second collected by the optimization engine subsystem and the transaction number tps per second collected by the optimization engine subsystem at the previous moment after executing the optimization strategy;

Wherein dps=tps-tps';

b2, calculating the difference Dqps between the query number qps per second collected by the optimizing engine subsystem and the query number qps per second collected by the optimizing engine subsystem at the previous moment after executing the optimizing strategy;

wherein Dqps = qps-qps';

b3, calculating the difference Dquery_response_time between the query operation response time query_response_time acquired by the optimization engine subsystem and the query operation response time query_response_time' acquired by the optimization engine subsystem at the previous moment after executing the optimization strategy;

wherein dqery_response_time=query_response_time-query_response_time';

b4, calculating the difference Dthread_running between the number of threads of the activated state acquired by the optimizing engine subsystem and the number of threads of the activated state acquired by the optimizing engine subsystem at the previous moment after executing the optimizing strategy;

wherein, dthreads_running = threads_running-threads_running';

b5, calculating the difference Dthreads_connected between the number threads_connected of the current connection collected by the optimizing engine subsystem and the number threads_connected' of the current connection collected by the optimizing engine subsystem every second at the last moment after executing the optimizing strategy;

Wherein, dthreads_connected = threads_connected-threads_connected';

b6, calculating the difference between the average byte number received from all clients per second collected by the optimization engine subsystem and the average byte number received from all clients per second collected by the optimization engine subsystem at the previous moment after executing the optimization strategy;

wherein dbytes_received_ps=bytes_received_ps-bytes_received_ps';

b7, calculating the difference between the byte number Bytes_send_ps which is acquired by the optimization engine subsystem and is transmitted to all clients every second and the byte number Bytes_send_ps' which is acquired by the optimization engine subsystem and is transmitted to all clients every second after the optimization strategy is executed;

wherein dbytes_send_ps=bytes_send_ps-bytes_send_ps';

b8, calculating the return rate r according to the difference value obtained in the steps B1 to B7, wherein the calculation formula is,

wherein, gamma ₁ And gamma ₂ Is a weight and satisfies gamma ₁ ＜γ ₂ ；γ ₃ Is the weight.

Preferably, step S3 comprises in particular,

s31, every first preset time length, the optimizing engine subsystem commands the current configuration information acquisition subsystem to acquire the current configuration information of the database cluster once, and the optimizing engine subsystem acquires the current configuration information state S of the database cluster;

S32, inputting the current configuration information state S of the acquired database cluster into a trained optimization strategy evaluation network, and outputting evaluation value vectors V_s of 34 optimization strategies through calculation of a neural network; selecting an optimization strategy h_max with the maximum evaluation value vector from all the optimization strategies;

s33, the optimization engine subsystem sends a maximum optimization strategy h_max to an optimization strategy execution subsystem, and the optimization strategy execution subsystem executes the maximum optimization strategy h_max to update the configuration information state of the database cluster from S to S';

s34, repeating the steps S31 to S34 until the database cluster stops working and the parameter optimization is finished.

The beneficial effects of the invention are as follows: the method for strengthening learning is used for automatically optimizing the configuration parameters of the database clusters, so that the processing performance of the database clusters is remarkably improved; and when the load changes, the dynamic adaptability optimization adjustment can be realized rapidly, and the labor cost and the time cost are reduced greatly.

Drawings

FIG. 1 is a flow chart of an optimization method in an embodiment of the invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the invention.

Example 1

In this embodiment, there is provided a reinforcement learning-based database cluster optimization system, the optimization system including,

In this embodiment, the current configuration information acquisition subsystem, the current performance information acquisition subsystem, and the optimization policy execution subsystem all operate on each server that constitutes the database cluster.

In this embodiment, the optimization engine subsystem performs a reinforcement learning process at intervals (for example, half an hour), specifically: firstly, commanding a current configuration information acquisition subsystem to acquire current configuration information of a database cluster; calculating an evaluation value vector V_s according to an optimization strategy evaluation network, then sending the optimization strategy h to an optimization strategy execution subsystem according to an optimization strategy h selected by a database cluster optimization strategy selection mechanism, and then delaying for a certain time (for example, 5 minutes) to order a current performance information acquisition subsystem to acquire current performance information; the evaluation value vector v_s' is updated according to the current performance information using the bellman formula, thus completing one iteration. With continuous iterative operation, the optimization engine subsystem continuously trains an optimization strategy evaluation network, and provides optimal guidance for optimization strategy selection of the optimization engine subsystem. See in particular the following steps S21 to S24.

and S24, the optimization engine subsystem sends the selected optimization strategy h to an optimization strategy execution subsystem, and the optimization strategy execution subsystem executes the optimization strategy h.

In this embodiment, the current configuration information acquisition subsystem includes,

A cache configuration information sub-module; the configuration information related to database cache is collected, and comprises query cache size query_cache_size, single query available cache area size query_cache_limit and sequencing cache size sort_cache_size;

an operation configuration information sub-module; the configuration information is used for collecting configuration information related to database operation, and comprises a read operation buffer size read_buffer_size, a temporary table size tmp_table_size, a maximum heap table size max_head_table_size, an index buffer size key_buffer_size, a batch insertion data buffer size bulk_insert_buffer_size and a joint operation queue size join_queue_size;

a network configuration information sub-module; the configuration information is used for collecting configuration information related to a database network, and comprises a maximum value max_allowed_shacket of one-time message transmission quantity in network transmission, a maximum database connection number max_connections and a maximum number max_connection_error of abnormal interruption of a database connection request;

a system configuration information sub-module; the configuration information is used for collecting configuration information related to the database system; including the number of files open_files_limit allowed to open, the number of database requests back_log that can be stored in the stack in a short time, the number of threads stored in the cache thread_cache_size, the number of concurrent threads thread_concurrency, the stack size per thread.

In this embodiment, the current performance information acquisition subsystem includes,

transaction and query information sub-modules; the method comprises the steps of acquiring performance information related to database transactions and queries, wherein the performance information comprises the steps of acquiring average per-second select statement execution times com_select_ps by utilizing a database management command, acquiring average per-second insert statement execution times com_insert_ps by utilizing the database management command, acquiring average per-second update statement execution times com_update_ps by utilizing the database management command, acquiring average per-second delete statement execution times com_delete_ps by utilizing the database management command, calculating transaction number tps per second, calculating query number per second qps, and calculating query response time by utilizing the database management command; wherein tps=com_insert_ps+com_update_ps+com_delete_ps; qps = com_select_ps+com_insert_ps+com_update_ps+com_delete_ps;

A thread performance information sub-module; the method comprises the steps of collecting performance information related to database threads; acquiring the number of threads (threads_running) currently in an active state by using an operating system management command, and acquiring the number of threads (threads_connected) currently connected by using the operating system management command;

a network traffic performance information sub-module; the system is used for collecting performance information related to the network flow of the database; including obtaining the average number of Bytes per second received from all clients using network management commands, and obtaining the average number of Bytes per second sent to all clients using network management commands, byte_send_ps.

In this embodiment, the optimization strategy execution subsystem includes,

In this embodiment, each configuration parameter has constraint conditions of a maximum value and a minimum value, and when the optimized configuration parameter value is greater than the maximum value or less than the minimum value, the corresponding configuration parameter is set to be the maximum value or the minimum value, so as to ensure the normal operation of the system.

In this embodiment, the optimization engine subsystem includes an optimization policy evaluation network that includes an input layer, two hidden layers, and an output layer,

the first hidden layer, comprising 128 neurons, has a calculation formula,

O ₁ ＝relu(w ₁ ·x+b ₁ )

the second hidden layer, comprising 64 neurons, has a calculation formula,

O ₂ ＝relu(w ₂ ·O ₁ +b ₂ )

the output layer comprises 34 neurons, the calculation formula is,

y＝relu(w ₃ ·O ₂ +b ₃ )

Example two

In this embodiment, a database cluster optimization method based on reinforcement learning is provided, where the optimization method is implemented using the optimization system described above; the optimization method comprises the following steps,

In this embodiment, the step S2 specifically includes the following,

s21, every a first preset time length, the optimization engine subsystem commands the current configuration information acquisition subsystem to acquire the current configuration information of the database cluster once, and the optimization engine subsystem acquires the current configuration information state S of the database cluster; the first preset duration may be specifically set according to specific situations, where the first preset duration may be selected to be half an hour;

s25, delaying a second preset time length, and commanding the current performance information acquisition subsystem to acquire the current performance information of the database cluster by the optimization engine subsystem, and calculating a return value r corresponding to an optimization strategy h for updating the configuration information state of the database cluster from S to S'; the second preset time period can be specifically set according to specific situations, and can be selected to be 5 minutes;

In this embodiment, the optimization policy selection mechanism is specifically that,

randomly selecting an optimization strategy h_rand from 34 optimization strategies according to the epsilon probability, and taking the optimization strategy h_rand as an optimization strategy h; or selecting the maximum optimization strategy h_max in the configuration information state s with the probability of 1-epsilon, and taking the maximum optimization strategy h_max as an optimization strategy h. Epsilon has a small value and is generally set to 0.01.

In this embodiment, the setting of the return value r is the most critical point of reinforcement learning, because the training of the model is performed depending on the return value r, the quality of the setting of the return value r often determines that reinforcement learning cannot be successfully applied at last. In addition, the load of the database cluster is continuously changed, if the report value r is simply defined as the difference between the current database cluster performance and the database cluster performance at the last moment, when the load is changed drastically, the report value r will be changed greatly correspondingly, and then the database cluster optimization engine subsystem cannot distinguish whether the report value r is caused by the load change or the database optimization, so that reinforcement learning cannot be converged.

In the present invention, therefore, the return value r is calculated by the process of,

wherein dps=tps-tps';

Wherein Dqps = qps-qps';

wherein dqery_response_time=query_response_time-query_response_time';

wherein, dthreads_running = threads_running-threads_running';

wherein, dthreads_connected = threads_connected-threads_connected';

Wherein dbytes_received_ps=bytes_received_ps-bytes_received_ps';

wherein dbytes_send_ps=bytes_send_ps-bytes_send_ps';

Since the transaction and query performance information is a performance index reflecting the granularity of the database cluster from the perspective of database operation, and the thread performance information is a performance index reflecting the whole database cluster from the perspective of threads, different weights gamma need to be given ₁ And gamma ₂ Makes a distinction and satisfies gamma ₁ Less than gamma ₂ Thereby increasing the specific gravity of the thread performance information. Gamma ray ₁ And gamma ₂ The specific value of (2) can be selected according to specific conditions so as to better meet the actual needs; but both must meet gamma ₁ Less than gamma ₂ 。

The improvement of the database cluster performance caused by the network traffic performance information can accurately reflect the load change of the user, and is not necessarily caused by the optimization of the database. The proportion of improvement in database cluster performance that counts into the return value is reduced as the user load increases. Division is used here, meaning that the greater the network traffic performance, the greater the user load, and the smaller the proportion of improvement in database cluster performance that counts into the rate of return r. In order to avoid too small a value of r, a weight gamma may be set ₃ Avoid excessive denominator, gamma ₃ The specific value of (2) can be selected according to specific conditions so as to better meet the actual needs; here, 0.001 may be selected.

In this embodiment, the step S3 specifically includes the following,

Example III

In this embodiment, how the optimization system and the optimization method of the present invention are embodied are specifically described as examples.

Specific: a mysql database cluster consisting of 5 mysql database servers; the method comprises 10 database clients, wherein the clients send database operation requests to a database cluster, and the database operation requests form a database load. Comprises a database optimization server.

And the 5 mysql database servers respectively operate a current configuration information acquisition subsystem, a current performance information acquisition subsystem and an optimization strategy execution subsystem. A database optimization server runs the optimization engine subsystem.

The specific implementation process comprises the following steps:

1) 10 database clients randomly generate database operation requests, and a mysql database server is randomly selected to send the database operation requests.

2) And adopting mysql database default parameter configuration to test for 3 times, and taking the average value of the performance information as a comparison object.

3) And starting a reinforcement learning mechanism, training an optimization strategy evaluation network in the optimization engine subsystem for a certain time (24 hours), and storing the trained optimization strategy evaluation network for calling.

4) Generating mysql database parameter configuration test for 3 times (the test duration is about 10h each time) by adopting a trained optimization strategy evaluation network, and taking the average value of the performance information as a comparison object; the parameter configuration is shown in the following table.

From the table, the optimization strategy evaluation network generated after a period of learning can more accurately find the efficient database cluster optimization strategy by starting the reinforcement learning mechanism, so that better performance of the database cluster is ensured.

From the transaction and query performance information, it can be seen that the database cluster is able to handle more database operations under reinforcement learning parameter configuration.

As can be seen from the thread performance information, the database cluster can fully utilize more threads to perform database operations under the reinforcement learning parameter configuration.

As can be seen from the network traffic performance information, the database cluster can respond to requests more quickly without delaying the response or failing the response, under the reinforcement learning parameter configuration, at the same level of database request quantity.

By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:

the invention provides a database cluster optimization system and a method based on reinforcement learning, wherein the reinforcement learning method is used for automatically optimizing the configuration parameters of the database cluster, so that the processing performance of the database cluster is remarkably improved; and when the load changes, the dynamic adaptability optimization adjustment can be realized rapidly, and the labor cost and the time cost are reduced greatly.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which is also intended to be covered by the present invention.

Claims

1. A database cluster optimization method based on reinforcement learning is characterized by comprising the following steps: the optimization method comprises the following steps,

s1, initializing: initializing an optimization strategy evaluation network, namely initializing ownership matrix parameters and bias parameters into random values;

s3, application stage: performing optimization adjustment on database cluster parameters once by using the trained optimization strategy evaluation network every a first preset time period until the database clusters stop running;

step S2 specifically includes the following,

s21, every a first preset time length, the optimizing engine subsystem commands the current configuration information acquisition subsystem to acquire the current configuration information of the database cluster once, and the optimizing engine subsystem acquires the current configuration information state S of the database cluster;

S27, calculating a corresponding updated evaluation value h_val of the database cluster under the configuration information state S and the optimization strategy h by using a classical reinforcement learning bel lman formula, wherein h_val=r+V_max;

2. The reinforcement learning-based database cluster optimization method of claim 1, wherein: the optimization policy selection mechanism is specifically that,

3. The reinforcement learning-based database cluster optimization method of claim 1, wherein: the calculation process of the return value r is that,

wherein dps=tps-tps';

wherein Dqps = qps-qps';

wherein dqery_response_time=query_response_time-query_response_time';

Wherein, dthreads_running = threads_running-threads_running';

wherein, dthreads_connected = threads_connected-threads_connected';

wherein dbytes_received_ps=bytes_received_ps-bytes_received_ps';

wherein dbytes_send_ps=bytes_send_ps-bytes_send_ps';

4. The reinforcement learning-based database cluster optimization method of claim 1, wherein: step S3 specifically includes the following,

5. A reinforcement learning based database cluster optimization system for implementing the optimization method of any one of the above claims 1 to 4; the method is characterized in that: the optimization system comprises a system and a control system,

6. The reinforcement learning based database cluster optimization system of claim 5, wherein: the current configuration information acquisition subsystem includes,

7. The reinforcement learning based database cluster optimization system of claim 5, wherein: the current performance information acquisition subsystem includes,

8. The reinforcement learning based database cluster optimization system of claim 5, wherein: the optimization strategy execution subsystem includes,

9. The reinforcement learning based database cluster optimization system of claim 5, wherein: the optimization engine subsystem includes an optimization policy evaluation network including an input layer, two hidden layers, and an output layer,

the first hidden layer, comprising 128 neurons, has a calculation formula,

O ₁ ＝relu(w ₁ ·x+b ₁ )

the second hidden layer, comprising 64 neurons, has a calculation formula,

O ₂ ＝relu(w ₂ ·O ₁ +b ₂ )

the output layer comprises 34 neurons, the calculation formula is,

y＝relu(w ₃ ·O ₂ +b ₃ )

wherein w is ₃ For the rightA value matrix; b ₃ Is biased; y is an output vector of the output layer, which includes 34 outputs, each corresponding to an evaluation value of the optimization strategy; since there are 17 configuration information, 2 optimization directions per configuration information, there are 34 optimization strategies.