WO2022028033A1

WO2022028033A1 - Hierarchical mapping-based automatic balancing storage method for ceph storage system

Info

Publication number: WO2022028033A1
Application number: PCT/CN2021/094042
Authority: WO
Inventors: 陈宁江; 卢煜
Original assignee: 广西大学
Priority date: 2020-08-01
Filing date: 2021-05-17
Publication date: 2022-02-10
Also published as: CN111880747B; CN111880747A; JP2023536693A

Abstract

A hierarchical mapping-based automatic balancing storage method for a Ceph storage system. The method comprises: adding a level attribute to all object storage devices (OSD) in a storage cluster, dividing the devices into multiple sub-storage pools according to the level, adding a level attribute to placement groups (PG) by using OSD levels as a basis, and looking up an OSD combination in OSD sub-storage pools having the same level as PGs for storage; meanwhile, adding a random factor and an impact factor to guide the process in which the PGs select the OSDs; determining a PG large migration direction according to usage information about storage pools where the PGs are located and other storage pools when the usage of a single-point OSD in the total storage pool is too high, while adjusting the migration balance according to a combination of PG level, random factor and impact factor. According to the method, OSDs with excessively high usage within the Ceph storage system can reasonably migrate internal PGs so as to ensure that the system storage is balanced, and the system stability is improved.

Description

A Ceph Storage System Automatic Balance Storage Method Based on Hierarchical Mapping

technical field

The invention belongs to the technical field of distributed storage, and more particularly, relates to a Ceph storage system automatic balancing storage method based on hierarchical mapping.

Background technique

The Ceph storage system is an Object-Based Storage System (OBSS), but unlike traditional OBSS, the Ceph storage system does not have an independent metadata server to record the OSD (Object Storage System) of sharded object storage. Device, object storage device) location, but use CRUSH (Controlled Replication Under Scalable Hashing, Controlled Replication Under Scalable Hashing) algorithm to determine the storage location of the object and the copy backup of the object. When it is necessary to search for data again or modify data, the process of reading, writing and addressing data can be completed independently on each OSD, and there is no single-node bottleneck. This scheduling method relies on software rather than manual labor. When equipment is replaced or added, the software can automatically calculate the storage location of objects to achieve a balance between data recovery and expansion. This process does not require manual intervention. The function of Ceph's original CRUSH algorithm is to perform the corresponding hash operation through the incoming PG (Placement Group, placement group), and select a storage master node and multiple replica nodes, so when the PG remains unchanged, all The selected OSD combination will not change, and the preliminary addressing function of reading and writing is completed. At the same time, if the OSD changes, it can spontaneously recover data from other nodes. The storage service request is divided into small objects of equal size, and the logical group PG generated by the collection of small objects can be evenly distributed to each OSD according to the preset OSD weight, so that the system and operation and maintenance personnel do not need to pay attention to the OSD situation. However, the difference of the OSD itself cannot be accurately reflected by the weight. The weight is only a probabilistic selection problem, not a definite ratio. At the same time, it is assumed that when PG is macro-balancedly allocated to each OSD, each The PG data on the OSD is assumed to be the same, but the difference of PG is not considered. Although PG is a logical collection of objects (not a data entity), the selection unit of data migration and storage is PG as the smallest unit. Object mapping To PG is the result of taking the remainder through Hash operation. Therefore, not every object mapped on PG is consistent, and the size of PG is also inconsistent. At the same time, if the storage allocation is unbalanced, causing a single node to be overloaded, the entire storage system will be in an unusable state.

Because Ceph's storage selection and mapping process is different from the traditional storage system using MDS (MetaData Server, metadata server), the existing weight-based adjustment methods cannot accurately control the number and direction of migration, and at the same time It is also impossible to predict whether this adjustment will cause a data avalanche (after the adjustment data of an overloaded OSD is migrated out, it will cause more OSDs to be overloaded). Therefore, a new Ceph automatic balancing storage method is needed, which can perform real-time data migration according to the real situation of PG usage, and can ensure that this migration has a benign effect on the balance of single-node utilization of the system while migrating.

SUMMARY OF THE INVENTION

The technical problem to be solved by the present invention: in view of the above-mentioned problems of the prior art, a Ceph storage system automatic balanced storage method based on hierarchical mapping is provided, and the present invention can realize the storage in the environment of distributed work tasks based on the Ceph storage system Automatic balancing enables a single node with high load to independently adjust the balancing, and precisely controls the direction and quantity of data migration, thereby ensuring the stability of the system.

In order to solve the above-mentioned technical problems, the technical scheme adopted in the present invention is:

A Ceph storage system automatic balancing storage method based on hierarchical mapping, the implementation steps include:

(1) Give PG and OSD new classification attributes, divide the entire storage pool into multiple sub-storage pools with the same level of OSD aggregation logic, and the PG classification corresponds to the OSD classification one-to-one, PG can only be in the same level of OSD storage pools According to the change of classification, PG can be freely migrated, and random factors are added as new parameters of the original CRUSH algorithm of the Ceph storage system to guide the selection result of the new OSD combination, giving PG more choices for migration;

(2) When inserting data, obtain the difference between the usage rate of a single OSD and the average usage rate of the system, and compare it with the pre-set threshold to see if it exceeds the threshold. If it exceeds the threshold, go to step (3) to trigger the balanced storage strategy , if the threshold is not exceeded, the data will be inserted normally;

(3) Obtain the queue sorted according to the PG size in the OSD, select the PG with the median size for analysis, and sort the size based on the usage rate of the OSD sub-storage pool where the PG is located and the sub-storage pools of adjacent levels. , the level of the sub-storage pool with the lowest usage level is used as the new level of PG; at the same time, based on the configuration of this new level, multiple random numbers are generated with the level as the seed to generate multiple random factors, and the random factors are used as the parameters of the CRUSH algorithm. The selection result of the OSD combination generates a number of different OSD combinations for data storage, and generates the corresponding impact factor according to the balance of the OSD combination generated by the random factor on the system. The combination of level and influence factor that has the least impact on the balance of the system gives PG a new grouping attribute.

In the initialization process of step (1), the main initialization steps include: when initializing the hierarchical attribute of the OSD, manually initialize it; when initializing the hierarchical attribute of the PG, uniformly distribute the PGs in each In the storage pool, because the size of PGs is unpredictable, a relatively even distribution is carried out according to the number in the initial stage, so as to avoid a large number of balanced migrations when the system is just started to use. Optionally, a consistent hashing algorithm can be used to initialize the PG level.

In step (1), the random factor is used to guide the output result of the CRUSH algorithm, and its function is to change the original CRUSH algorithm selection process to:

R ⁱ <OSD>=CRUSH(PGID,r ⁱ )

In the above formula, R ⁱ <OSD> is the selected i-th OSD combination, the input parameters for calling the CRUSH algorithm are PGID and r ⁱ , PGID is the unique identifier of PG, and r ⁱ is a random factor. According to this algorithm, in step (3), multiple sets of OSD combinations can be generated, and the most suitable OSD combination with the lowest impact on the balance of the system can be selected from them.

The process of generating the trigger balancing strategy in step (2) is to judge and trigger when inserting data, that is, during the process of performing the CRUSH algorithm, which needs to be implemented by introducing global monitoring.

In step (3), the role of the impact factor is to measure the balanced storage situation of the target sub-storage pool before PG migration and the balanced storage situation of the target sub-storage pool after the PG is migrated according to the new level and the new impact factor, for a sub-storage pool The balanced storage situation of , its quantitative expression is:

In the above formula, M is the average usage rate of the sub-storage pool, x _j is the usage rate of each OSD in the sub-storage pool, and n is the number of OSDs in the sub-storage pool.

Using the β ^r value of a certain PG before a certain migration and the β ^j value after the migration, the influence factor δ of this PG on the storage equilibrium value of the system in this migration can be obtained as:

Among them, if the usage rate of one group exceeds 1 after the PG migration, the impact factor of this group is -1, so as to ensure that the new OSD will not be overloaded or completely unavailable due to the migration of the PG.

Step (1) also includes the planning of the hardware in the system and the configuration of the sub-storage pool:

①Categorize and organize the existing storage devices to ensure that the size of the newly divided sub-storage pools is reasonable. The usage rate is used as a reference, so the size of each sub-storage pool is close to the best.

②Configure each storage pool, each storage pool can have its own threshold and the number of random factors.

After step (3) is completed, if there is no suitable migration object, jump to step (2).

Compared with the prior art, the present invention has the following advantages: the timing of balancing storage is real-time, rather than the balancing operation performed after overload behavior occurs, and at the same time, it does not need to consume extra computing resources and human resources for monitoring; All OSDs of a storage cluster add a level attribute, divide them into sub-storage pools of multiple levels, and add a level attribute to PG based on the OSD level. PG can only be found in OSD sub-storage pools of the same level OSD combinations are stored; at the same time, PG adds random factors to guide the selection process of OSDs, so that more selection combinations are generated; an impact factor is added to quantify the impact of a PG attribute change on the balanced storage of the system due to the change of the selection result; when When the usage rate of a single point OSD in the total storage pool is too high, select the PG with the median size, determine the direction of PG migration according to the usage information of the sub-storage pool and adjacent sub-storage pools, and determine the PG migration direction according to the PG level. , Random factors generate corresponding impact factor combinations, and select the optimal level and impact factor combination for balanced adjustment. The invention utilizes the idea of risk transfer, and divides the entire storage system into storage areas by using the principle of hierarchical mapping. When the local storage device is overloaded, the storage data can be transferred to the area with low risk (low usage rate), The high-load storage nodes are relieved, the storage resources are reasonably utilized, and the system is more stable.

Description of drawings

Fig. 1 is a basic flow diagram of a method according to an embodiment of the present invention;

2 is a schematic diagram of a random factor generation process according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a selection process of an impact factor according to an embodiment of the present invention.

detailed description

In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as there is no conflict with each other.

As shown in FIG. 1 , the process of automatically balancing storage in the Ceph storage system based on hierarchical mapping in this embodiment includes:

(1) Give PG and OSD a new hierarchical attribute, and logically divide the entire storage pool into multiple sub-storage pools with the same level of OSD aggregation.

(2) At the same time, the PG classification corresponds to the OSD classification one-to-one. The PG can only be selected in the OSD storage pool of the same level. According to the change of the classification, the PG can be freely migrated.

(3) When performing data insertion, according to the comparison result between the OSD usage rate and the system average usage rate, whether it exceeds the preset threshold, if it exceeds the threshold, trigger the PG migration method in the present invention to achieve the purpose of balance, if If the threshold is not reached, data is written normally.

(4) If a single OSD is compared with the average utilization rate of the system and exceeds the threshold, the balanced storage strategy of the present invention will be triggered, which will uniformize the storage distribution of the system and reduce the outstanding utilization rate of local OSDs. If this strategy is used After that, the OSD usage rate is a preset value (for example, 100), that is, it is fully loaded, and the writing is rejected. If it is less than 100, data is written normally.

As shown in Figure 2, the steps of generating random factors in this embodiment include:

(1) Obtain the configuration of the target sub-storage pool and obtain the maximum number of random factors, because the calculation scale of each sub-storage pool will become smaller after the sub-storage pool is divided. In order not to generate the same combination, a judgment needs to be made here. A low maximum number of random factors can quickly complete the selection of random factors, ensuring efficient equilibrium, and at a high level, it can ensure that enough random tests can be performed to ensure high availability of the system.

(2) In this embodiment, the role of the random factor is to interfere with the selection process of the CRUSH algorithm as a parameter, so the random factor can be generated by using the PG level as the seed and using the C language's own random number generation method. The algorithm process of the OSD combination for factor selection is as follows:

R ⁱ <OSD>=CRUSH(PGID,r ⁱ )

In the above formula, R ⁱ <OSD> is the selected i-th OSD combination; the input parameters for calling the CRUSH algorithm are PGID and r ⁱ , and PGID is the unique identifier of PG; r ⁱ is a random factor.

(3) After each instance of an OSD combination is selected, first determine whether the combination has been selected. If the combination already exists in the selection result, skip this selection and perform OSD selection again. If there is no such combination, is saved.

(4) If the number of combinations has reached the requirement of the OSD sub-storage pool, end the selection process of the OSD combination, otherwise transfer to step (2) to continue the selection.

As shown in Figure 3, in this example, the calculation and selection of the impact factor guides the change of PG attributes. The role of the impact factor is to measure the balanced storage situation of the target sub-storage pool before PG migration. The guiding steps include:

(1) Load the OSD combination corresponding to the new level of the PG and the random factor.

(2) Iterate these OSD combinations in a loop. If all the calculations are completed, exit this process. If there are still combinations that have not been calculated, skip to the next step.

(3) Calculate the balance parameters of the current system. For the balanced storage situation of a sub-storage pool, its quantitative expression is:

In the above formula, M is the average usage rate of the sub-storage pool, xj is the usage rate of each OSD in the sub-storage pool, and n is the number of OSDs in the sub-storage pool.

(4) Calculate the system equilibrium parameter β ^j according to the above formula if the PG is changed according to the random factor migration at this time

(5) Using the β value of a certain PG before a certain migration and the β value after the migration, the influence factor δ of this PG on the storage equilibrium value of the system in this migration can be obtained as:

Among them, if the usage rate of one group exceeds 1 after the PG migration, the impact factor of this group is -1, so as to ensure that the migration of the PG will not cause the new OSD to be overloaded or completely unavailable. In this example, if it is -1, the calculation will be abandoned directly instead of using the result.

This embodiment of the Ceph storage system automatic balance storage method based on hierarchical mapping aims to solve the problem that the overload of a single OSD in the entire storage cluster will cause the unavailability of the entire system. Because of the characteristics of the Ceph storage system, the data will appear to be evenly distributed in each On the OSD, the difference of the OSD can simulate the local overload situation with the same weight value. In this example, multiple OSDs and multiple sub-storage pool division methods are used for initialization, and the maximum write amount is used as the evaluation standard, and the effectiveness of the present invention is judged by the maximum write amount until the total data amount when the system crashes. The results show that the present invention can effectively alleviate the situation of system collapse caused by overloading of a single OSD.

Those skilled in the art can easily understand that the above are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention, etc., All should be included within the protection scope of the present invention.

Claims

A Ceph storage system automatic balancing storage method based on hierarchical mapping, characterized in that the implementing steps include:

(1) Give PG and OSD new classification attributes, divide the entire storage pool into multiple sub-storage pools with the same level of OSD aggregation logic, and the PG classification corresponds to the OSD classification one-to-one, PG can only be in the same level of OSD storage pools According to the change of classification, PG can be freely migrated, and random factors are added as new parameters of the original CRUSH algorithm of the Ceph storage system to guide the selection result of the new OSD combination, giving PG more choices for migration;

(2) When inserting data, obtain the difference between the usage rate of a single OSD and the average usage rate of the system, and compare it with the pre-set threshold to see if it exceeds the threshold. If it exceeds the threshold, go to step (3) to trigger the balanced storage strategy , if the threshold is not exceeded, the data will be inserted normally;

(3) Obtain the queue sorted according to the PG size in the OSD, select the PG with the median size for analysis, and sort the size based on the usage rate of the OSD sub-storage pool where the PG is located and the sub-storage pools of adjacent levels. , the level of the sub-storage pool with the lowest usage level is used as the new level of PG; at the same time, based on the configuration of this new level, multiple random numbers are generated with the level as the seed to generate multiple random factors, and the random factors are used as the parameters of the CRUSH algorithm. The selection result of the OSD combination generates a number of different OSD combinations for data storage, and generates the corresponding impact factor according to the balance of the OSD combination generated by the random factor on the system. The combination of level and influence factor that has the least impact on the balance of the system gives PG a new grouping attribute.
The method for automatically balancing storage in a Ceph storage system based on hierarchical mapping according to claim 1, characterized in that, when initializing the hierarchical attribute of the OSD in step (1), the initialization is performed manually.
The method for automatically balancing storage in a Ceph storage system based on hierarchical mapping according to claim 1 or 2, characterized in that, when initializing the hierarchical attribute of PG in step (1), the PG is evenly distributed according to a consistent hash algorithm In each storage pool, because the size of PGs is unpredictable, a relatively even distribution is performed according to the number in the initial stage, so as to avoid a large number of balanced migrations when the system is just started to use.
The method for automatically balancing storage in a Ceph storage system based on hierarchical mapping according to claim 1 or 2, wherein the random factor in step (1) is used to guide the output result of the CRUSH algorithm, and its function is to convert the original CRUSH The algorithm selection process is changed to:

R i <OSD>=CRUSH(PGID,r i )

In the above formula, R i <OSD> is the selected i-th OSD combination, the input parameters for calling the CRUSH algorithm are PGID and r i , PGID is the unique identifier of PG, and r i is a random factor.
The Ceph storage system automatic balancing storage method based on hierarchical mapping according to claim 1 or 2, characterized in that, the process of generating the trigger balancing strategy in step (2) is when data is inserted, that is, the process of performing the CRUSH algorithm In order to make judgments and triggers, it is necessary to introduce global monitoring to achieve this.
The method for automatically balancing storage in a Ceph storage system based on hierarchical mapping according to claim 2, wherein in step (3), the impact factor is used to measure the balanced storage situation of the target sub-storage pool before PG migration, and if the PG is The balanced storage situation of the target sub-storage pool after migration according to the new level and the new impact factor, specifically:

For the balanced storage situation of a sub-storage pool, its quantitative expression is:

Among them, M is the average usage rate of the sub-storage pool, x j is the usage rate of each OSD in the sub-storage pool, and n is the number of OSDs in the sub-storage pool;

Using the β r value of a certain PG before a certain migration and the β j value after the migration, the influence factor δ of this PG on the storage equilibrium value of the system in this migration can be obtained as:

Among them, if the usage rate of one group exceeds 1 after the PG migration, the impact factor of this group is -1, so as to ensure that the new OSD will not be overloaded or completely unavailable due to the migration of the PG.
The Ceph storage system automatic balancing storage method based on hierarchical mapping according to claim 1, is characterized in that, step (1) also includes the planning of hardware in the storage system and the configuration of sub-storage pools, specifically:

Categorize and organize existing storage devices to ensure that the size of the newly divided sub-storage pools is reasonable, the randomness of PG level assignment, the randomness of PG data writing, and the comparison between each storage pool are based on the usage rate as a reference , the size of each sub-storage pool is close;

Configure each sub-storage pool, each sub-storage pool can have its own threshold and the number of random factors.
The method for automatically balancing storage in a Ceph storage system based on hierarchical mapping according to claim 1, characterized in that, after step (3) is completed, if there is no suitable migration object, the PG is removed from the sorting queue and jumps to step (2).