CN117633109A

CN117633109A - Method, device, equipment and medium for detecting and optimizing data blocks in database cluster

Info

Publication number: CN117633109A
Application number: CN202311630942.3A
Authority: CN
Inventors: 陈婧; 张磊; 孙蕾; 田红策; 黄宇昕
Original assignee: Agricultural Bank of China
Current assignee: Agricultural Bank of China
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2024-03-01

Abstract

The embodiment of the invention discloses a method, a device, equipment and a medium for detecting and optimizing data blocks in a database cluster. The method comprises the following steps: responding to a detection request of a data block in a database cluster, and acquiring performance index information of the data block; analyzing and processing the performance index information to obtain early warning information of the data block; generating a target tuning task list according to the data block early warning information; executing the target tuning task list according to the set time; updating the target tuning task list based on the execution state of each tuning task in the target tuning task list. According to the technical scheme, the data block performance index information of the database is detected in real time, the data block subjected to analysis processing automatically generates the tuning task, the tuning task can be executed according to the configured execution time, the influence on the reading and writing of the database when the database tuning task is automatically triggered is avoided, and the cluster performance of the database is optimized.

Description

Method, device, equipment and medium for detecting and optimizing data blocks in database cluster

Technical Field

The present invention relates to the field of big data technologies, and in particular, to a method, an apparatus, a device, and a medium for detecting and optimizing a data block in a database cluster.

Background

With the advent of the big data age, how to reliably store and efficiently access massive data is a major challenge facing various industries today. HBase, as a distributed database most commonly used in unstructured fields, has the advantages of easy expansion, high-efficiency access, high reliability and the like.

There is a Master node Master and multiple slave node Region servers in the database HBase cluster, each Region Server has multiple data block regions, each data block Region has multiple stores, each Store has a MemStore and multiple StoreFile, storeFile persisted in HFile.

The tuning task of Region is usually triggered automatically when a certain condition is met, and Hbase automatically executes the tuning task. The existing Region processing strategy meeting certain condition automatic triggering cannot control the execution time of the Region management task, the processing time has randomness, server resources are occupied when the Region management task is executed, and performance influence can be generated on normal HBase reading and writing business.

Disclosure of Invention

The invention provides a method, a device, equipment and a medium for detecting and adjusting data blocks in a database cluster, wherein the data blocks subjected to analysis and processing automatically generate an adjusting task by detecting the performance index information of the data blocks of the database in real time, and the adjusting task can be executed according to the configured execution time, so that the influence on the reading and writing of the database when the adjusting task of the database is automatically triggered is avoided, and the performance of the database cluster is optimized.

According to an aspect of the present invention, there is provided a method for detecting and optimizing data blocks in a database cluster, including:

responding to a detection request of a data block in a database cluster, and acquiring performance index information of the data block;

analyzing and processing the performance index information to obtain early warning information of the data block;

generating a target tuning task list according to the data block early warning information;

executing the target tuning task list according to the set time;

updating the target tuning task list based on the execution state of each tuning task in the target tuning task list; wherein the execution state includes executed and unexecuted.

Optionally, the performance index information of the data block includes the number information of the bottom files in the data block, the capacity information of the data block and the hot spot information of the data block.

Optionally, analyzing the performance index information to obtain early warning information of the data block, including:

judging whether the quantity information of the bottom files in the data block is larger than a first threshold value or not;

under the condition that the number information of the bottom files in the data block is larger than the first threshold value, the number information of the bottom files in the data block is used as early warning information of the data block;

Judging whether the capacity information of the data block is larger than a second threshold or smaller than a third threshold under the condition that the quantity information of the bottom files in the data block is smaller than or equal to the first threshold;

taking the data block capacity information as early warning information of a data block under the condition that the data block capacity information is larger than the second threshold value or smaller than the third threshold value;

judging whether the data block hot spot information is larger than a fourth threshold value or not under the condition that the data block capacity information is smaller than or equal to the second threshold value or larger than or equal to the third threshold value;

and under the condition that the data block hot spot information is larger than the fourth threshold value, taking the data block hot spot information as early warning information of the data block.

Optionally, generating the target tuning task list according to the data block early warning information includes:

acquiring a white list of the data block;

judging whether each data block corresponding to the data block early warning information is in a white list or not;

if the data block corresponding to the data block early warning information is not in the white list, determining a tuning task type based on the data block early warning information so as to obtain an original tuning task list;

and adjusting the original tuning task list based on the set priority to obtain a target tuning task list.

Optionally, the tuning task type includes a compression task, a segmentation task, a merging task and a migration task;

determining the type of the tuning task based on the data block early warning information comprises the following steps:

generating a data block compression task for the data block based on the early warning information of the bottom file quantity information in the data block;

based on the early warning information of the data block capacity information, if the data block capacity information is larger than a second threshold value, generating a segmentation task for the data block; if the capacity information of the data block is smaller than a third threshold value, generating a merging task for the data block;

and generating a data block migration task for the data block based on the early warning information of the data block hot spot information.

Optionally, adjusting the original tuning task list based on the set priority to obtain a target tuning task list, including:

acquiring a first priority order corresponding to the task type;

adjusting the original tuning task list according to the first priority order to obtain a first tuning task list;

and adjusting the tuning tasks with the same task types in the first tuning task list according to a second priority order to obtain a target tuning task list.

Optionally, updating the target tuning task list based on the execution state of each tuning task in the target tuning task list includes:

and removing the tuning tasks with the execution states of the tuning tasks in the target tuning task list being executed from the target tuning task list so as to update the target tuning task list.

According to another aspect of the present invention, there is provided a data block detection tuning apparatus in a database cluster, including:

the information acquisition module is used for responding to a detection request of a data block in the database cluster and acquiring performance index information of the data block;

the information analysis processing module is used for analyzing and processing the performance index information to obtain early warning information of the data block;

the tuning task generating module is used for generating a target tuning task list according to the data block early warning information;

the tuning task execution module is used for executing the target tuning task list according to the set time;

the list updating module is used for updating the target tuning task list based on the execution state of each tuning task in the target tuning task list; wherein the execution state includes executed and unexecuted.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method for detecting and optimizing data blocks in a database cluster according to any one of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement a method for detecting and optimizing data blocks in a database cluster according to any one of the embodiments of the present invention when the computer instructions are executed.

According to the technical scheme, performance index information of the data blocks is obtained by responding to detection requests of the data blocks in the database cluster; analyzing and processing the performance index information to obtain early warning information of the data block; generating a target tuning task list according to the data block early warning information; executing the target tuning task list according to the set time; updating the target tuning task list based on the execution state of each tuning task in the target tuning task list. According to the technical scheme, the data block performance index information of the database is detected in real time, the data block subjected to analysis processing automatically generates the tuning task, the tuning task can be executed according to the configured execution time, the influence on the reading and writing of the database when the database tuning task is automatically triggered is avoided, and the cluster performance of the database is optimized.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for detecting and optimizing data blocks in a database cluster according to a first embodiment of the present invention;

fig. 2 is a flowchart of a method for detecting and optimizing data blocks in a database cluster according to a second embodiment of the present invention;

FIG. 3 is a diagram illustrating an overall functional framework provided in accordance with a second embodiment of the present invention;

fig. 4 is a schematic structural diagram of a data block detecting and optimizing device in a database cluster according to a third embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," "target," and "original," etc. in the description and claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

Fig. 1 is a flowchart of a method for detecting and optimizing data blocks in a database cluster, which is provided according to an embodiment of the present invention, where the method may be performed by a device for detecting and optimizing data blocks in a database cluster, and the device for detecting and optimizing data blocks in a database cluster may be implemented in hardware and/or software, and the device for detecting and optimizing data blocks in a database cluster may be configured in an electronic device with data processing capability. As shown in fig. 1, the method includes:

this embodiment may be performed by a detection tuning system. In this embodiment, the Region of the data block in the database HBase cluster may be automatically detected and optimized to adjust the Region capacity and number in the HBase cluster and the number of hfiles in the Region, so as to reduce the read-write pressure of the cluster memory and the disk, and achieve the purpose of load balancing in the cluster. Further, the detection and tuning system of the embodiment may include a detection unit, an analysis unit, an execution unit, a configuration and scheduling unit, and the like.

S110, responding to a detection request of a data block in a database cluster, and acquiring performance index information of the data block;

Wherein, the database cluster is a system composed of at least two or more database servers. The database cluster in this embodiment may be an HBase cluster. HBase is a NoSQL database facing column storage, and bottom data is stored in a Key-Value format and can be stored by using HDFS; the data access speed is high, the expandability is good, and high concurrency is supported. The data block may be a Region in the HBase cluster. The data block Region can be used for storing a certain segment of continuous data in the table; typically a table will be divided into a plurality of regions of approximately equal size.

In this embodiment, the data block is obtained by dividing data according to a preset data length. The detection request may be a request to detect a block of data. The detection request can be preset, can be a real-time trigger detection request, can be a timing trigger detection request, and can be set according to actual requirements. In this embodiment, the detection unit may respond to the detection request of the database in the database cluster in real time to obtain the performance index information of the data block.

In this embodiment, optionally, the performance index information of the data block includes the number information of the bottom files in the data block, the capacity information of the data block, and the hot spot information of the data block.

The underlying file may be a Hfile file in the HBase cluster. HFile is a storage file underlying HBase. The memory in this embodiment is used for storing data, and may be divided into MemStore and StoreFile, memStore to serve as memory write caches, with a default size of 64MB, and when the MemStore exceeds a threshold, the data in the MemStore is flushed to persistent storage in the memory file. The StoreFile is implemented by HFile in the underlying file system. The hot spot information of a data block can be understood as access frequency information of the data block. The performance index information of the database in this embodiment may include the number of hfiles in the middle-bottom layer file of the data block Region, the capacity information of the data block Region, and the hot spot information of the data block Region; in addition, data block Region number information may be included.

By means of the arrangement, the performance information in the HBase cluster can be conveniently adjusted by detecting the index information which influences the performance of the database cluster HBase.

S120, analyzing and processing the performance index information to obtain early warning information of the data block.

The analysis processing may be an operation of performing analysis processing on the performance index information according to a set threshold value. The early warning information can be understood as early warning information of the data block by taking the performance index information exceeding the set threshold value as the early warning information of the data block after analysis and processing.

Specifically, in this embodiment, the detection threshold may be set in advance by the configuration and scheduling unit. When the analysis unit receives the performance index information of the data block Region in the HBase cluster detected by the detection unit, the data index information is analyzed and judged whether to perform tuning or not by setting a threshold value, and the performance index information of the data block to be tuned is used as early warning information of the data block.

In this embodiment, optionally, the analyzing the performance index information to obtain the early warning information of the data block includes: judging whether the quantity information of the bottom files in the data block is larger than a first threshold value or not; under the condition that the number information of the bottom files in the data block is larger than a first threshold value, the number information of the bottom files in the data block is used as early warning information of the data block; judging whether the capacity information of the data block is larger than a second threshold or smaller than a third threshold under the condition that the quantity information of the bottom files in the data block is smaller than or equal to the first threshold; taking the data block capacity information as early warning information of the data block under the condition that the data block capacity information is larger than a second threshold value or smaller than a third threshold value; judging whether the data block hot spot information is larger than a fourth threshold value or not under the condition that the data block capacity information is smaller than or equal to a second threshold value or larger than or equal to a third threshold value; and under the condition that the data block hot spot information is larger than a fourth threshold value, taking the data block hot spot information as early warning information of the data block.

The first threshold may be a number threshold corresponding to the number information of the bottom files, and may be configured according to an actual requirement scene. The second threshold and the third threshold may be capacity thresholds corresponding to capacity information of the data block, and may be configured according to actual requirements; the capacity threshold may be configured according to the actual capacity of the system. The second threshold may be a maximum capacity threshold and the third may be a minimum capacity threshold. In this embodiment, the currently used capacity information of all the data blocks in the database cluster may be acquired and compared with a set capacity threshold. The fourth threshold value can be a hotspot threshold value corresponding to the hotspot information of the data block, and can be configured according to actual requirements; in this embodiment, the current request access times of all the data block regions in the database cluster may be obtained and compared with a set hotspot threshold.

The process of analyzing and processing the performance index information of the data block in this embodiment may be a process of judging whether each performance index information exceeds a set threshold. In this embodiment, whether the number information of the bottom files in the data block is greater than a preset first threshold may be determined, and if the number information of the bottom files in the data block is greater than the first threshold, the data information of the bottom files in the data block is used as early warning information of the data block; judging whether the capacity information of the data block is larger than a second threshold value or smaller than a third threshold value under the condition that the quantity information of the bottom files in the data block is smaller than or equal to a set first threshold value, and taking the capacity information of the data block as early warning information of the data block under the condition that the capacity information of the data block is larger than the second threshold value or smaller than the third threshold value; when the data block capacity information is smaller than or equal to the second threshold value or larger than or equal to the third threshold value, whether the data block hot spot information is larger than a set fourth threshold value can be continuously judged; taking the data block hot spot information as early warning information of the data block under the condition that the data block hot spot information is larger than a fourth threshold value; when the data block hot spot information is less than or equal to the fourth threshold value, the data block hot spot information may not be used as early warning information of the data block, and the analysis processing operation process may end.

Through the arrangement, the performance index information in the data block Region can be analyzed and processed to determine the early warning information which influences the performance of the data block Region at present, so that the early warning information is conveniently optimized.

S130, generating a target tuning task list according to the data block early warning information.

The target tuning task list may include a plurality of tuning tasks. The target tuning task list may be each tuning task generated according to the data block early warning information. In this embodiment, a corresponding tuning task may be generated according to the early warning information of each data block of the analysis processing result, so as to form a target tuning task list.

S140, executing the target tuning task list according to the set time.

The set time may be understood as execution information of the tuning task, and may be a time configured in advance. In this embodiment, the executable time of the tuning task may be set by the configuration and scheduling unit, and the Region tuning task corresponding to the target tuning task list is triggered to execute according to the task executable time. In this embodiment, the tuning task may be executed according to the target tuning task list obtained according to the set time range. In this embodiment, the tuning tasks in the target tuning task list may be sequentially executed by setting the time to avoid the service peak period execution.

S150, updating the target tuning task list based on the execution state of each tuning task in the target tuning task list.

Wherein the execution state includes executed and unexecuted. The fact that the execution state of each tuning task in the target tuning task list is executed may mean that the tuning task has been successfully executed. The fact that the execution state of each tuning task in the target tuning task list is not executed means that the tuning task does not start to be executed. In this embodiment, according to the execution state of each tuning task included in the target tuning task list, the executed tuning task may be removed from the target tuning task list, and the target tuning task list may be updated.

In this embodiment, optionally, updating the target tuning task list based on the execution status of each tuning task in the target tuning task list includes: and removing the tuning tasks with the execution states of the tuning tasks in the target tuning task list being executed from the target tuning task list so as to update the target tuning task list.

In this embodiment, the execution state of each tuning task in the target tuning task list may be obtained, and the tuning task whose execution state is successfully executed is directly removed from the target tuning task list, so as to update the target tuning task list. If the tuning task cannot be successfully executed, abnormal information can be recorded.

According to the technical scheme, performance index information of the data blocks is obtained by responding to the detection request of the data blocks in the database cluster; analyzing and processing the performance index information to obtain early warning information of the data block; generating a target tuning task list according to the data block early warning information; executing a target tuning task list according to the set time; updating the target tuning task list based on the execution state of each tuning task in the target tuning task list. According to the technical scheme, the data block performance index information of the database is detected in real time, the data block subjected to analysis processing automatically generates the tuning task, the tuning task can be executed according to the configured execution time, the influence on the reading and writing of the database when the database tuning task is automatically triggered is avoided, and the cluster performance of the database is optimized.

Example two

Fig. 2 is a flowchart of a method for detecting and optimizing data blocks in a database cluster according to a second embodiment of the present invention, where the method is optimized based on the foregoing embodiment. The concrete optimization is as follows: generating a target tuning task list according to the data block early warning information, including: acquiring a white list of the data block; judging whether each data block corresponding to the data block early warning information is in a white list or not; if the data block corresponding to the data block early warning information is not in the white list, determining a tuning task type based on the data block early warning information so as to obtain an original tuning task list; and adjusting the original tuning task list based on the set priority to obtain a target tuning task list. As shown in fig. 2, the method includes:

S210, responding to a detection request of the data block in the database cluster, and acquiring performance index information of the data block.

S220, analyzing and processing the performance index information to obtain early warning information of the data block.

S230, acquiring a white list of the data block.

The white list of the data blocks is directly available. The data block Region whitelist in this embodiment includes the executing data block Region information. In this embodiment, the data block Region white list may be directly obtained, so as to obtain the data block Region list that is currently executing the tuning task. In this embodiment, the data block Region white list is mainly used to determine whether the Region is in an early warning state, and the task is not repeatedly generated for the Region in the early warning state. The early warning state may be understood as the state of the Region that is performing the tuning task. In addition, in this embodiment, the successfully executed tuning task may be deleted from the Region and the whitelist.

S240, judging whether each data block corresponding to the data block early warning information is in a white list or not.

In this embodiment, whether the Region of the data block corresponding to the current data block early warning information is executing the tuning task may be determined by determining whether each data block to be tuned corresponding to the data block early warning information is in the Region white list.

S250, if the data block corresponding to the data block early warning information is not in the white list, determining the tuning task type based on the data block early warning information so as to obtain an original tuning task list.

The tuning task type can be determined according to the data block early warning information, and the corresponding tuning task type can be determined according to the specific content of the early warning information and the size exceeding a threshold value. The tuning task types in this embodiment may include a compression task, a segmentation task, a merging task, a migration task, and the like. The original tuning task list can be understood as a tuning task list obtained by determining the type of the tuning task according to the early warning information.

In this embodiment, if it is determined that a data block Region to be tuned is required according to the data block early warning information and the data block Region is not in the white list, an automatic tuning task may be generated and a corresponding tuning task type may be determined, so as to obtain an original tuning task list. In addition, the embodiment may add the data block Region for generating the tuning task to the white list.

In this embodiment, optionally, the tuning task types include a compression task, a segmentation task, a merging task, and a migration task; determining a tuning task type based on the data block early warning information, including: generating a data block compression task for the data block based on the early warning information of the quantity information of the bottom files in the data block; based on the early warning information of the data block capacity information, if the data block capacity information is larger than a second threshold value, generating a segmentation task for the data block; if the capacity information of the data block is smaller than a third threshold value, generating a merging task for the data block; and generating a data block migration task for the data block based on the early warning information of the data block hot spot information.

Where the compression task (Region Compact) is a task that merges the underlying files in the database, small HFiles may be merged to reduce the number of files. In this embodiment, as data is continuously written with flush operation, the number of HFile files on the disk is increased, so that the HBase queries the data with increased io times, so that the HBase merges small hfiles to reduce the number of files, and this hfiles merging operation is called compact. A Split task (Region Split) can be understood as a task that splits large-capacity data block regions. In this embodiment, when a Region of a data block is large to a certain extent, split (Split) is performed, and HBase may reach load balancing through the Region Split. A Merge task (Region Merge) may be understood as a task that performs a Merge operation on a data block Region. In this embodiment, if a large amount of data is deleted, the capacity of many data block regions becomes smaller, and the division into a plurality of data block regions is wasteful, so that the data block regions can be combined, and the combination of the data block regions is not for performance consideration, but mainly for maintenance purposes. A migration task may be understood as an operation of migrating a data block Region to other servers. In this embodiment, the Region of the data block with higher hotspot information may be migrated to other servers, so that hotspots may be dispersed.

Specifically, in this embodiment, based on early warning information that the HFile number information of the bottom layer file in the data block exceeds the first threshold, a data block Region compression task may be generated for the data block; the embodiment can generate a data block Region compression task for each data block Region early warning message. Generating a Region segmentation task for the data block when the Region capacity information of the data block is larger than the data block early warning information of the second threshold value, namely larger than the maximum threshold value; and generating a Region merging task for the data block if the Region capacity information of the data block is smaller than a third threshold, namely smaller than a minimum threshold. Generating a data block migration task for the data block according to the early warning information that the hot spot information of the data block is larger than a fourth threshold value; in this embodiment, the information of all the data block Region hot spots on the server where the data block Region is located may be obtained, and the Region with the higher hot spot may be migrated to other servers to disperse the hot spot.

Through the arrangement, the corresponding tuning task type can be determined according to the early warning information, so that the performance of the data block Region and the performance of the database HBase cluster are optimized.

And S260, adjusting the original tuning task list based on the set priority to obtain a target tuning task list.

The set priority may include a set priority corresponding to the task type and a set threshold priority. The target tuning task list may be understood as a tuning task list adjusted according to a set priority. In this embodiment, the execution sequence of the tuning tasks may be performed on the original tuning task list according to the set priority, so as to obtain a final target tuning task list.

In this embodiment, optionally, the adjusting the original tuning task list based on the set priority to obtain the target tuning task list includes: acquiring a first priority order corresponding to the task type; adjusting the original tuning task list according to the first priority order to obtain a first tuning task list; and adjusting the tuning tasks with the same task types in the first tuning task list according to the second priority order to obtain a target tuning task list.

The first priority order may be a priority order set in advance based on the task type, and may be set according to actual requirements. For example, the first priority order in this embodiment may be according to the priority order of the Region compression task, the Region segmentation task, the Region merging task, and the Region migration task. The first tuning task list may be understood as a task list obtained by performing tuning according to a set first priority order. The second priority order may be a priority order set according to how much the warning information exceeds the set threshold. The set threshold may be a first threshold, a second threshold, a third threshold, and a fourth threshold. The target tuning task list may be understood as a tuning task list obtained by adjusting the first tuning task list according to the second priority order.

In this embodiment, a first priority order corresponding to a task type may be obtained, and an original tuning task list is adjusted according to the set first priority order to obtain a first tuning task list, and tuning tasks of the same task type in the first tuning task list are adjusted according to the set priority order according to the number exceeding a set threshold to obtain a final target tuning task list.

Specifically, in this embodiment, the predicted information may be generated according to the analysis result, and the corresponding task type may be determined according to the early warning information, and since the number of hfiles affects the cluster performance much more than the Region capacity, the occurrence frequency of Region hot spots is low and the degree of impact on performance is lower than the Region capacity, the optimizing task is prioritized according to the number exceeding the threshold in the task of this type. For different types of tuning tasks, the tasks can be executed according to the priorities of Region compression, region segmentation, region merging and Region migration. For tuning tasks of the same task type, different priorities can be determined according to the setting of the threshold, and the priorities are inserted into a tuning task list, and meanwhile, a Region white list is updated. In this embodiment, the more the set threshold is exceeded, the higher the priority of the corresponding task, and the more the position in the tuning task list is, the more the position is, the earlier the position is, so as to ensure that the most service-affecting situation is processed first.

Through the arrangement, the execution sequence of the tuning tasks can be automatically adjusted according to the priority setting sequence of the tuning tasks, so that the task with the highest service performance is executed first, and stable and efficient service is guaranteed to the greatest extent.

S270, executing the target tuning task list according to the set time.

S280, updating the target tuning task list based on the execution state of each tuning task in the target tuning task list. Wherein the execution state includes executed and unexecuted.

Further, an example of the overall functional framework in this embodiment is shown in fig. 3. The detection and optimization system in the embodiment comprises a detection unit, an analysis unit, an execution unit, a configuration and scheduling unit, wherein a master node Hmaster in the HBase cluster in the embodiment is responsible for management work of the whole cluster, including Region allocation, load balancing, data maintenance and the like. The slave node Region Server in the HBase cluster directly meets the read-write requirements of users, and is mainly used for managing regions distributed by HMaster, butting bottom HDFS and the like. The detection unit comprises an HFile quantity detector, a Region capacity detector and a Region hot spot detector, and is mainly used for detecting performance index detection information related to a data block Region in the HBase cluster. The analysis unit can comprise an index analyzer and a task generator, and is used for analyzing according to various performance index information fed back by the detection unit and automatically generating a tuning task and a Region white list according to data block early warning information obtained by index analysis and the data block early warning information. The Region white list is used for filtering regions which have generated tuning tasks and preventing the regions from generating tuning tasks again. The execution unit comprises a segmentation executor, a merging executor, a compression executor and a migration executor, and different tasks are triggered by calling HBase related instructions. The configuration and scheduling unit comprises a threshold value configurator and a task scheduler, wherein the threshold value configurator is used for configuring a threshold value used by the detection unit, the task scheduler is used for performing task scheduling of the execution unit, triggering the corresponding Region task to execute, and the execution time can be configured to avoid the execution of the service peak period. The detection and optimization system in the embodiment achieves the purposes of reducing the cluster memory and disk read-write pressure and realizing load balancing in the cluster by adjusting the Region capacity in the HBase cluster, the number of HFiles in the Region and Region hot spot information.

According to the technical scheme, performance index information of the data blocks is obtained by responding to the detection request of the data blocks in the database cluster; analyzing and processing the performance index information to obtain early warning information of the data block; acquiring a white list of the data block; judging whether each data block corresponding to the data block early warning information is in a white list or not; if the data block corresponding to the data block early warning information is not in the white list, determining a tuning task type based on the data block early warning information so as to obtain an original tuning task list; adjusting the original tuning task list based on the set priority to obtain a target tuning task list; executing a target tuning task list according to the set time; updating the target tuning task list based on the execution state of each tuning task in the target tuning task list. According to the technical scheme, the data block performance index information of the database is detected in real time, the data block subjected to analysis processing automatically generates the tuning task, the tuning task can be executed according to the configured execution time, the influence on the reading and writing of the database when the database tuning task is automatically triggered is avoided, and the cluster performance of the database is optimized.

Example III

Fig. 4 is a schematic structural diagram of a data block detecting and optimizing device in a database cluster according to a third embodiment of the present invention. As shown in fig. 4, the apparatus includes:

An information obtaining module 410, configured to obtain performance index information of a data block in response to a detection request of the data block in the database cluster;

the information analysis processing module 420 is configured to perform analysis processing on the performance index information to obtain early warning information of the data block;

the tuning task generating module 430 is configured to generate a target tuning task list according to the data block early warning information;

the tuning task execution module 440 is configured to execute the target tuning task list according to the set time;

a list updating module 450, configured to update the target tuning task list based on the execution status of each tuning task in the target tuning task list; wherein the execution state includes executed and unexecuted.

Optionally, the information analysis processing module 420 is specifically configured to determine whether the number information of the bottom files in the data block is greater than a first threshold; under the condition that the number information of the bottom files in the data block is larger than a first threshold value, the number information of the bottom files in the data block is used as early warning information of the data block; judging whether the capacity information of the data block is larger than a second threshold or smaller than a third threshold under the condition that the quantity information of the bottom files in the data block is smaller than or equal to the first threshold; taking the data block capacity information as early warning information of the data block under the condition that the data block capacity information is larger than a second threshold value or smaller than a third threshold value; judging whether the data block hot spot information is larger than a fourth threshold value or not under the condition that the data block capacity information is smaller than or equal to a second threshold value or larger than or equal to a third threshold value; and under the condition that the data block hot spot information is larger than a fourth threshold value, taking the data block hot spot information as early warning information of the data block.

Optionally, the tuning task generating module 430 includes:

an acquisition unit for acquiring a white list of data blocks;

the judging unit is used for judging whether each data block corresponding to the data block early warning information is in the white list or not;

the task type determining unit is used for determining the tuning task type based on the data block early warning information if the data block corresponding to the data block early warning information is not in the white list so as to obtain an original tuning task list;

and the adjusting unit is used for adjusting the original tuning task list based on the set priority to obtain a target tuning task list.

the task type determining unit is specifically used for generating a data block compression task for the data block based on the early warning information of the quantity information of the bottom files in the data block; based on the early warning information of the data block capacity information, if the data block capacity information is larger than a second threshold value, generating a segmentation task for the data block; if the capacity information of the data block is smaller than a third threshold value, generating a merging task for the data block; and generating a data block migration task for the data block based on the early warning information of the data block hot spot information.

Optionally, the adjusting unit is specifically configured to obtain a first priority order corresponding to the task type;

adjusting the original tuning task list according to the first priority order to obtain a first tuning task list; and adjusting the tuning tasks with the same task types in the first tuning task list according to the second priority order to obtain a target tuning task list.

Optionally, the list updating module 450 is specifically configured to remove, from the target tuning task list, the tuning tasks whose execution states of the tuning tasks in the target tuning task list are executed, so as to update the target tuning task list.

The data block detection and optimization device in the database cluster provided by the embodiment of the invention can execute the data block detection and optimization method in the database cluster provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example IV

Fig. 5 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. The electronic device 10 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the data block detection tuning method in a database cluster.

In some embodiments, the data block detection tuning method in a database cluster may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more of the steps of the data block detection tuning method in a database cluster described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the data block detection tuning method in the database cluster in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method for detecting and optimizing data blocks in a database cluster, comprising the steps of:

executing the target tuning task list according to the set time;

2. The method of claim 1, wherein the performance index information of the data block includes information of the number of underlying files in the data block, information of capacity of the data block, and hot spot information of the data block.

3. The method according to claim 2, wherein analyzing the performance index information to obtain the early warning information of the data block includes:

4. The method of claim 3, wherein generating a target tuning task list from the data block pre-warning information comprises:

acquiring a white list of the data block;

5. The method of claim 4, wherein the tuning task types include a compression task, a segmentation task, a merge task, and a migration task;

6. The method of claim 4, wherein adjusting the original tuning task list based on the set priority results in a target tuning task list, comprising:

acquiring a first priority order corresponding to the task type;

7. The method of claim 1, wherein updating the target tuning task list based on the execution status of each tuning task in the target tuning task list comprises:

8. A data block detection and optimization device in a database cluster, comprising:

9. An electronic device, the electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of data block detection tuning in a database cluster according to any one of claims 1-7.

10. A computer readable storage medium storing computer instructions for causing a processor to perform the method of detecting and optimizing data blocks in a database cluster according to any one of claims 1-7.