CN110955522A - Resource management method and system for coordination performance isolation and data recovery optimization - Google Patents
Resource management method and system for coordination performance isolation and data recovery optimization Download PDFInfo
- Publication number
- CN110955522A CN110955522A CN201911100053.XA CN201911100053A CN110955522A CN 110955522 A CN110955522 A CN 110955522A CN 201911100053 A CN201911100053 A CN 201911100053A CN 110955522 A CN110955522 A CN 110955522A
- Authority
- CN
- China
- Prior art keywords
- tenant
- request
- data recovery
- priority
- storage resources
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1469—Backup restoration techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5077—Logical partitioning of resources; Management or configuration of virtualized resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5021—Priority
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a resource management method and a system for coordination performance isolation and data recovery optimization, which belong to the field of cloud storage and comprise the following steps: at a client of the cloud storage system, distributing storage resources for each tenant according to tenant performance requirements, simultaneously monitoring whether a data recovery request occurs, if so, enabling the resource distribution to only meet the lowest performance requirements of the tenant, reducing the priority of a tenant IO request under the condition that the storage resources which are distributed to the tenant at the time are guaranteed to meet the tenant performance requirements, and then sending the tenant IO request to a storage node; if not, the resource allocation is enabled to realize the maximum utilization of the system resources, and then the tenant IO request is directly sent to the storage node; and receiving various requests at a storage node end of the cloud storage system, and scheduling different types of requests according to the priority proportion so as to allocate storage resources to the different types of requests according to the priority proportion. The invention can shorten the data recovery time on the premise of guaranteeing the performance requirements of tenants.
Description
Technical Field
The invention belongs to the field of cloud storage, and particularly relates to a resource management method and system for coordination performance isolation and data recovery optimization.
Background
Cloud storage systems, such as Ceph, google file system, Azure storage, Amazon block storage, etc., often run loads of multiple tenants simultaneously in order to reduce costs and simplify management. Specifically, the cloud storage system creates a large number of virtual block devices, such as Ceph creating RBD, Amazon block storage creating EBS volume, and then allocates these virtual block devices to different tenants for use, thereby providing different storage services to the different tenants. In a cloud storage system, different tenants use different virtual block devices, but storage resources of the bottom layer are shared, so resource competition and performance interference exist among the tenants. In order to guarantee the performance requirements of tenants, an effective performance isolation means needs to be provided. In addition, storage resources in the cloud storage system are often over-allocated, for example, the cloud storage system needs to meet the peak load of the tenant, but the peak state is only a short time in the tenant load operation process, so that idle resources exist in the cloud storage system most of the time. In order to improve resource utilization, existing performance isolation means often allocate idle resources to tenants with the lowest performance requirements.
When providing storage services to tenants, a cloud storage system inevitably has node failures. The failure may be from human factors, software bugs, or hardware failures, etc. In order to guarantee reliability of tenant data, a cloud storage system often adopts multiple local mechanisms or erasure code mechanisms to store data. When a node failure occurs, the cloud storage system can automatically recover lost data. However, when data recovery is performed, resource contention may be generated by tenant requests and data recovery requests, which may cause more challenges for storage management.
In the existing storage management method, because the priority assigned to the tenant IO request is much higher than the priority assigned to the data recovery request, when the tenant IO request and the data recovery request generate resource competition, the tenant IO request is preferentially processed, and thus the data recovery time is long. On one hand, the data recovery time is too long, which may cause other data copies, even all data copies to be lost in the data recovery process, so that the data is completely unrepairable; on the other hand, because the system is in a degraded state during the data recovery process, the tenant request may be blocked at this time, and the performance of the tenant may be seriously affected by an excessively long data recovery request. Generally speaking, the existing storage management methods often cannot give consideration to both performance isolation and data recovery optimization.
Disclosure of Invention
Aiming at the defects and improvement requirements of the prior art, the invention provides a resource management method and system for coordination performance isolation and data recovery optimization, and aims to shorten the data recovery time on the premise of ensuring the performance requirements of tenants.
To achieve the above object, according to a first aspect of the present invention, there is provided a resource management method for coordinating performance isolation and data recovery optimization, comprising:
at a client of the cloud storage system, distributing storage resources for each tenant according to tenant performance requirements, simultaneously monitoring whether a data recovery request occurs, if so, enabling the resource distribution to only meet the lowest performance requirements of the tenant, reducing the priority of a tenant IO request under the condition that the storage resources which are distributed to the tenant at the time are guaranteed to meet the tenant performance requirements, and then sending the tenant IO request to a storage node; if not, the resource allocation is enabled to realize the maximum utilization of the system resources, and then the tenant IO request is directly sent to the storage node;
receiving various requests at a storage node end of the cloud storage system, and scheduling different types of requests according to the priority proportion so as to allocate storage resources to the different types of requests according to the priority proportion;
the request types comprise a tenant IO request and a data recovery request.
When a data recovery request occurs, the storage resources allocated to the tenants only meet the lowest performance requirements of the tenants, the priority of tenant IO requests is reduced under the condition that the storage resources allocated to the tenants actually meet the performance requirements of the tenants, the data recovery request can be allocated with residual resources when the storage node scheduling requests are ensured, and the proportion of the storage resources allocated to the data recovery request is increased when the IO requests are scheduled according to the priority proportion, so that more storage resources are allocated to the data recovery request under the condition that the performance requirements of the tenants are ensured, the data recovery time is shortened, and the purpose of optimizing the data recovery is achieved.
Further, when a data recovery request occurs, the resource allocation is made to meet only the minimum performance requirement of the tenant, and the method includes:
creating a token bucket for the virtual block device of each tenant;
if the tenant performance requirement indicates that the tenant requirement size is T1If the fixed throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T1(ii) a If the tenant performance requirement indicates that the tenant requirement is not lower than T2If the minimum throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T2。
Further, when a data recovery request occurs, the priority of the tenant IO request is reduced under the condition that the storage resource actually allocated to the tenant is greater than the storage resource required by the tenant, and the method includes:
(S1) initializing the lowest priority minW of the tenant IO request to be 1, and initializing the highest priority maxW of the tenant IO request to be the current priority of the tenant IO request;
(S2) adjusting the priority of the tenant IO request to (minW + maxW)/2;
(S3) if the storage resources actually allocated to the tenant can not meet the performance requirement of the tenant, the step is shifted to (S4); if the storage resources actually allocated to the tenant can meet the performance requirement of the tenant and the cloud storage system has residual storage resources, the step (S5) is carried out; if the storage resources actually allocated to the tenants can meet the performance requirements of the tenants and no residual storage resources exist in the cloud storage system, the step (S6) is carried out;
(S4) after adding the lowest priority minW to the range of (minW, maxW), proceeding to step (S2);
(S5) after decreasing the highest priority maxW within the range of (minW, maxW), proceeding to step (S2);
(S6) the priority adjustment of the tenant IO request is ended.
According to the window-based adjusting method, the priority of the tenant IO request is adjusted, dynamic resource allocation is realized, and a data recovery process is optimized as much as possible under the condition that an SLO (Service Level Objective) default is not caused.
Further, in the step (S4), the lowest priority minW is added in the range of (minW, maxW), specifically: updating the lowest priority minW to (minW + maxW)/2;
in step (S5), the highest priority maxW is lowered within the range of (minW, maxW), specifically: the highest priority maxW is updated to (minW + maxW)/2.
According to the invention, the regulation window of the tenant IO request priority is reduced by half every time, so that the priority regulation can reach a stable state more quickly.
Further, the method for judging whether the storage resources actually allocated to the tenant can meet the performance requirement of the tenant comprises the following steps:
according to CR ═ TPA–TPN)/TPNCalculating the SLO compliance rate CR of the tenants in the current cloud storage system;
if CR <0, judging that the storage resources actually allocated to the tenants cannot meet the performance requirements of the tenants; if CR is greater than Th, the storage resources actually allocated to the tenants are judged to meet the performance requirements of the tenants, and residual storage resources still exist in the cloud storage system; if 0< CR < Th, judging that the storage resources actually allocated to the tenant can meet the performance requirement of the tenant, and judging that no residual storage resources exist in the cloud storage system;
wherein, TPARepresents the sum of storage resources, TP, actually allocated to the tenantNRepresents the minimum sum of storage resources required by the tenant, Th represents a preset threshold, and Th>0。
Further, when no data recovery request occurs, the resource allocation realizes the maximum utilization of system resources, and the method comprises the following steps:
a token bucket is created for the virtual block device of each tenant, and currently available storage resources of the cloud storage system are obtained;
if the tenant performance requirement indicates that the tenant requirement size is T1The fixed throughput rate of (1) is ensuredSetting the rate of generating tokens by the token bucket of the virtual block device to be T1(ii) a If the tenant performance requirement indicates that the tenant requirement is not lower than T2If the minimum throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T2;
After all tenants distribute storage resources, if the remaining storage resources still exist in the cloud storage system, proportionally distributing the remaining storage resources among the tenants with the minimum throughput rate guarantee requirement, so that the rate of generating tokens by the token bucket of the corresponding virtual block device is increased according to the same proportion;
wherein, the distribution proportion for distributing the residual storage resources is the ratio of the corresponding minimum throughput rates.
When no data recovery request occurs, the invention completes the allocation of storage resources in two rounds: in the first round of allocation, the lowest performance requirements of all tenants are guaranteed; in the second round of allocation, allocating the residual storage resources to the tenants with the minimum throughput rate guarantee in proportion; therefore, the invention can improve the service quality of the tenant and improve the performance of the tenant as much as possible when the data recovery optimization is needed.
Further, at the client, sending the tenant IO request to the storage node, the method includes:
when a tenant IO request is sent, consuming tokens in a token bucket of corresponding virtual block equipment, wherein the number of the consumed tokens is equal to the size of the request;
and if the number of tokens in the token bucket is not enough to serve the tenant IO request, enabling the process initiating the tenant IO request to be dormant until enough tokens are generated in the token bucket.
Further, at a storage node end of the cloud storage system, receiving various requests, and scheduling the requests of different types according to a priority ratio, wherein the method comprises the following steps:
constructing a request queue for each type of request at a storage node, wherein the priority of the request queue is consistent with that of the requests in the request queue;
and carrying out request scheduling from different queues according to the priority proportion.
According to a second aspect of the present invention, there is provided a cloud storage system, including a client and a storage node, the client including: the system comprises a monitoring module, a resource allocation module and a priority regulation module; the storage node comprises a request scheduling module;
the monitoring module is used for monitoring the use condition of storage resources in the cloud storage system and whether a data recovery request occurs;
the resource allocation module is used for allocating storage resources for each tenant according to the performance requirements of the tenant, enabling the resource allocation to only meet the lowest performance requirements of the tenant when a data recovery request occurs, and enabling the resource allocation to realize the maximum utilization of system resources when the data recovery request does not occur;
the priority adjusting module is used for reducing the priority of the tenant IO request under the condition that the storage resources which are actually allocated to the tenant meet the performance requirement of the tenant when the data recovery request occurs;
the resource allocation module is also used for sending the tenant IO request to the storage node;
the request scheduling module is used for receiving various requests and scheduling different types of requests according to the priority proportion so as to distribute the storage resources to the different types of requests according to the priority proportion;
the request types comprise a tenant IO request and a data recovery request.
According to a third aspect of the invention, there is provided a system comprising a computer readable storage medium for storing an executable program and a processor;
the processor is used for reading an executable program stored in a computer readable storage medium and executing the resource management method for coordination performance isolation and data recovery optimization provided by the first aspect of the invention.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
(1) according to the resource management method and system for coordination performance isolation and data recovery optimization, when a data recovery request occurs, the storage resources allocated to the tenants only meet the minimum performance requirements of the tenants, the priority of the tenant IO request is reduced under the condition that the storage resources allocated to the tenants meet the performance requirements of the tenants, the residual resources can be allocated to the data recovery request when the storage node scheduling request is ensured, and the proportion of the storage resources allocated to the data recovery request is increased when the IO request is scheduled according to the priority proportion, so that more storage resources are allocated to the data recovery request under the condition that the performance requirements of the tenants are ensured, the data recovery time is shortened, and the purpose of data recovery optimization is achieved.
(2) According to the resource management method and system for coordination performance isolation and data recovery optimization, the priority of the tenant IO request is adjusted through a window-based adjusting method, dynamic resource allocation is achieved, and the data recovery process is optimized as much as possible under the condition that the default of an SLO (Service Level object) is not caused.
(3) According to the resource management method and system for coordination performance isolation and data recovery optimization, provided by the invention, in the preferred scheme, the regulation window of the tenant IO request priority is reduced by half every time, so that the priority regulation can reach a stable state more quickly.
(4) The resource management method and the system for coordination performance isolation and data recovery optimization provided by the invention finish the allocation of storage resources in two rounds when no data recovery request occurs: in the first round of allocation, the lowest performance requirements of all tenants are guaranteed; in the second round of allocation, allocating the residual storage resources to the tenants with the minimum throughput rate guarantee in proportion; therefore, the invention can improve the service quality of the tenant and improve the performance of the tenant as much as possible when the data recovery optimization is needed.
Drawings
Fig. 1 is a schematic diagram of a resource management method for coordination performance isolation and data recovery optimization and a cloud storage system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
In the present application, the terms "first," "second," and the like (if any) in the description and the drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
In order to shorten the data recovery time on the premise of guaranteeing the performance requirements of tenants, the resource management method for coordination performance isolation and data recovery optimization provided by the invention, as shown in fig. 1, comprises the following steps:
at a client of the cloud storage system, distributing storage resources for each tenant according to tenant performance requirements, simultaneously monitoring whether a data recovery request occurs, if so, enabling the resource distribution to only meet the lowest performance requirements of the tenant, reducing the priority of a tenant IO request under the condition that the storage resources which are distributed to the tenant at the time are guaranteed to meet the tenant performance requirements, and then sending the tenant IO request to a storage node; if not, the resource allocation is enabled to realize the maximum utilization of system resources, and then the tenant IO request is directly sent to the storage node, wherein the priority of the tenant IO request is the priority which is defaulted by the cloud storage system to be allocated by the cloud storage system; the performance requirement of a tenant is stored in metadata of a virtual block device used by the tenant, and the performance requirement of the tenant may indicate that the tenant has a fixed throughput requirement or indicate that the tenant has a minimum throughput requirement;
receiving various requests at a storage node end of the cloud storage system, and scheduling different types of requests according to the priority proportion so as to allocate storage resources to the different types of requests according to the priority proportion;
the request types comprise a tenant IO request and a data recovery request.
In a traditional resource management method, the priority of a tenant IO request and the priority of a data recovery request are often fixed, and the priority allocated to the tenant IO request is much higher than that of the data recovery request, for example, in a Ceph system, the priority of the tenant IO request is 63, and the priority of the data recovery request is 3, so that when resource competition occurs between the data recovery request and the tenant IO request, storage resources are preferentially allocated to the tenant IO request, and thus the data recovery time is long; the resource management method for coordinating performance isolation and data recovery optimization only meets the lowest performance requirement of the tenant by the storage resource allocated to the tenant when a data recovery request occurs, and reduces the priority of tenant IO requests in the event that storage resources that are actually allocated to the tenant are guaranteed to meet the tenant performance requirements, e.g., in one application example of the present invention, the priority of the data recovery request is still 3, while the priority of the tenant IO request is finally adjusted to 8, thereby ensuring that when the storage node schedules the request, the residual resources can be allocated to the data recovery request, and when the IO request is scheduled according to the priority proportion, the proportion of the storage resources allocated to the data recovery request is increased, therefore, under the condition of ensuring the performance requirements of the tenants, more storage resources are allocated for the data recovery request, the data recovery time is shortened, and the purpose of optimizing the data recovery is achieved.
In an optional embodiment, in the resource management method for coordinating performance isolation and data recovery optimization, when a data recovery request occurs, resource allocation is made to meet only the minimum performance requirement of a tenant, and the method includes:
creating a token bucket for the virtual block device of each tenant;
if the tenant performance requirement indicates that the tenant requirement size is T1If the fixed throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T1(ii) a If the tenant performance requirement indicates that the tenant requirement is not lower than T2If the minimum throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T2(ii) a For example, the performance requirement of tenant 1 indicates that the tenant requires a minimum throughput guarantee of no less than 10MB/s, the performance requirement of tenant 2 indicates that the tenant requires a minimum throughput guarantee of no less than 20MB/s, the currently available storage resources are 60MB/s,after 10MB/s and 20MB/s of storage bandwidth are respectively allocated to the tenant 1 and the tenant 2, the allocation of storage resources is finished, and after the allocation is finished, the system still has 30MB/s of idle bandwidth;
when a data recovery request occurs, the priority of the tenant IO request is reduced under the condition that the storage resource which is actually allocated to the tenant is ensured to be larger than the storage resource required by the tenant, the adopted method is a window-based adjustment method, and the method specifically comprises the following steps:
(S1) initializing the lowest priority minW of the tenant IO request to be 1, and initializing the highest priority maxW of the tenant IO request to be the current priority of the tenant IO request;
(S2) adjusting the priority of the tenant IO request to (minW + maxW)/2;
(S3) if the storage resources actually allocated to the tenant can not meet the performance requirement of the tenant, the step is shifted to (S4); if the storage resources actually allocated to the tenant can meet the performance requirement of the tenant and the cloud storage system has residual storage resources, the step (S5) is carried out; if the storage resources actually allocated to the tenants can meet the performance requirements of the tenants and no residual storage resources exist in the cloud storage system, the step (S6) is carried out;
(S4) after adding the lowest priority minW to the range of (minW, maxW), proceeding to step (S2);
preferably, the lowest priority minW is added in the range of (minW, maxW) by: updating the lowest priority minW to (minW + maxW)/2;
(S5) after decreasing the highest priority maxW within the range of (minW, maxW), proceeding to step (S2);
preferably, the highest priority maxW is lowered in the range of (minW, maxW), in a specific manner: updating the highest priority maxW to (minW + maxW)/2;
(S6) the priority adjustment of the tenant IO request ends;
the resource management method for coordinating performance isolation and data recovery optimization can ensure that the performance requirements of tenants are just met after adjustment, and no residual storage resources exist in the system, so that the priority of tenant IO requests is reduced as much as possible under the condition of ensuring the performance requirements of the tenants, and further, the proportion of the storage resources allocated to data recovery requests is as large as possible when the requests are scheduled; that is, the priority of the tenant IO request is adjusted by the window-based adjustment method, so as to implement dynamic resource allocation, and optimize the data recovery process as much as possible without causing a default of SLO (Service Level Objective); in a preferred embodiment, by reducing the adjustment window of the tenant IO request priority by half each time, the priority adjustment can reach a stable state faster;
in the window-based adjustment method, the method for judging whether the storage resources actually allocated to the tenant can meet the performance requirement of the tenant comprises the following steps:
according to CR ═ TPA–TPN)/TPNCalculating the current SLO compliance rate CR of the cloud storage system; wherein, TPARepresents the sum of storage resources, TP, actually allocated to the tenantNRepresenting the minimum sum of storage resources required by the tenant;
if CR <0, judging that the storage resources actually allocated to the tenants cannot meet the performance requirements of the tenants; if CR is greater than Th, the storage resources actually allocated to the tenants are judged to meet the performance requirements of the tenants, and residual storage resources still exist in the cloud storage system; if 0< CR < Th, judging that the storage resources actually allocated to the tenant can meet the performance requirement of the tenant, and judging that no residual storage resources exist in the cloud storage system;
wherein Th represents a preset threshold value, and Th is greater than 0; the specific value of the threshold Th can be determined according to the actual tenant performance requirement and the system fault condition, so as to ensure that the data recovery time can be shortened to the maximum extent under the condition of reducing the default risk of SLO (service level Objective); in this embodiment, the threshold Th is set to 0.25.
In an optional embodiment, the resource management method for coordinating performance isolation and data recovery optimization enables resource allocation to achieve maximum utilization of system resources when no data recovery request occurs, and includes:
a token bucket is created for the virtual block device of each tenant, and currently available storage resources of the cloud storage system are obtained;
if the tenant performance requirement indicates that the tenant requirement size is T1If the fixed throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T1(ii) a If the tenant performance requirement indicates that the tenant requirement is not lower than T2If the minimum throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T2;
After all tenants distribute storage resources, if the remaining storage resources still exist in the cloud storage system, proportionally distributing the remaining storage resources among the tenants with the minimum throughput rate guarantee requirement, so that the rate of generating tokens by the token bucket of the corresponding virtual block device is increased according to the same proportion;
wherein, the distribution proportion for distributing the residual storage resources is the ratio of the corresponding minimum throughput rates; for example, the performance requirement of tenant 1 indicates that the tenant requires a minimum throughput guarantee of not less than 10MB/s, the performance requirement of tenant 2 indicates that the tenant requires a minimum throughput guarantee of not less than 20MB/s, and the currently available storage resource is 60MB/s, then after the tenant 1 and tenant 2 are respectively allocated with storage bandwidths of 10MB/s and 20MB/s, the system still has an idle bandwidth of 30MB/s, and the remaining idle bandwidth is calculated according to the following ratio of 10 MB/s: the 20MB/s is 1:2, and the memory bandwidth allocated to the tenant 1 and the tenant 2 is 20MB/s and 40MB/s respectively after the allocation is finished;
according to the resource management method for coordination performance isolation and data recovery optimization, when no data recovery request occurs, the allocation of storage resources is completed in two rounds: in the first round of allocation, the lowest performance requirements of all tenants are guaranteed; in the second round of allocation, allocating the residual storage resources to the tenants with the minimum throughput rate guarantee in proportion; therefore, when data recovery optimization is needed, the quality of service for tenants can be improved as much as possible, and the resource utilization rate can be maximized.
In an optional embodiment, at a client, a tenant IO request is sent to a storage node, and the method includes:
when a tenant IO request is sent, consuming tokens in a token bucket of corresponding virtual block equipment, wherein the number of the consumed tokens is equal to the size of the request;
and if the number of tokens in the token bucket is not enough to serve the tenant IO request, enabling the process initiating the tenant IO request to be dormant until enough tokens are generated in the token bucket.
In an optional embodiment, at a storage node end of a cloud storage system, various types of requests are received, and different types of requests are scheduled according to a priority ratio, where the method includes:
constructing a request queue for each type of request at the storage node end, wherein the priority of the request queue is consistent with that of the requests in the request queue;
carrying out request scheduling from different queues according to priority proportion; the specific request scheduling mechanism is determined by a specific cloud storage system, for example, in a Ceph system, request scheduling is realized by creating a token bucket for each request queue, and accordingly, in the Ceph system, request scheduling is performed according to priority ratios of queues, specifically, the ratio of token generation rates in the token buckets of different queues is consistent with the ratio of priorities of the request queues; for the request scheduling mechanisms in other cloud storage systems, which will not be listed one by one here, it should be understood that, after the requests are scheduled from different queues according to the priority ratio, the proportion of the storage resources allocated to each request queue is consistent with the priority ratio of the request queue, that is, the storage resources are allocated to different types of requests according to the priority ratio.
Corresponding to the resource management method for coordination performance isolation and data recovery optimization, the present invention further provides a cloud storage system, as shown in fig. 1, including a client and a storage node, where the client includes: the system comprises a monitoring module, a resource allocation module and a priority regulation module; the storage node comprises a request scheduling module;
the monitoring module is used for monitoring the use condition of storage resources in the cloud storage system and whether a data recovery request occurs;
the resource allocation module is used for allocating storage resources for each tenant according to the performance requirements of the tenant, enabling the resource allocation to only meet the lowest performance requirements of the tenant when a data recovery request occurs, and enabling the resource allocation to realize the maximum utilization of system resources when the data recovery request does not occur;
the priority adjusting module is used for reducing the priority of the tenant IO request under the condition that the storage resources which are actually allocated to the tenant meet the performance requirement of the tenant when the data recovery request occurs;
the resource allocation module is also used for sending the tenant IO request to the storage node;
the request scheduling module is used for receiving various requests and scheduling different types of requests according to the priority proportion so as to distribute the storage resources to the different types of requests according to the priority proportion;
the request types comprise a tenant IO request and a data recovery request;
in the embodiment of the present invention, the detailed implementation of each module may refer to the description of the method embodiment described above, and will not be repeated here.
The invention also provides a system comprising a computer-readable storage medium and a processor, the computer-readable storage medium for storing an executable program;
the processor is used for reading an executable program stored in the computer readable storage medium and executing the resource management method for coordinating performance isolation and data recovery optimization.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. A method for resource management with coordination of performance isolation and data recovery optimization, comprising:
at a client of the cloud storage system, distributing storage resources for each tenant according to tenant performance requirements, simultaneously monitoring whether a data recovery request occurs, if so, enabling the resource distribution to only meet the lowest performance requirements of the tenant, reducing the priority of a tenant IO request under the condition that the storage resources which are distributed to the tenant at the time are guaranteed to meet the tenant performance requirements, and then sending the tenant IO request to a storage node; if not, the resource allocation is enabled to realize the maximum utilization of the system resources, and then the tenant IO request is directly sent to the storage node;
receiving various requests at a storage node end of the cloud storage system, and scheduling different types of requests according to the priority proportion so as to allocate storage resources to the different types of requests according to the priority proportion;
the request types comprise a tenant IO request and a data recovery request.
2. The method of resource management for coordination performance isolation and data recovery optimization according to claim 1, wherein when a data recovery request occurs, resource allocation is made to meet only a minimum performance requirement of a tenant, the method comprising:
creating a token bucket for the virtual block device of each tenant;
if the tenant performance requirement indicates that the tenant requirement size is T1If the fixed throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T1(ii) a If the tenant performance requirement indicates that the tenant requirement is not lower than T2If the minimum throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T2。
3. The method for resource management with coordination performance isolation and data recovery optimization according to claim 2, wherein when a data recovery request occurs, the priority of the tenant IO request is reduced under the condition that the storage resource actually allocated to the tenant is greater than the storage resource required by the tenant, and the method comprises:
(S1) initializing the lowest priority minW of the tenant IO request to be 1, and initializing the highest priority maxW of the tenant IO request to be the current priority of the tenant IO request;
(S2) adjusting the priority of the tenant IO request to (minW + maxW)/2;
(S3) if the storage resources actually allocated to the tenant can not meet the performance requirement of the tenant, the step is shifted to (S4); if the storage resources actually allocated to the tenant can meet the performance requirement of the tenant and the cloud storage system has residual storage resources, the step (S5) is carried out; if the storage resources actually allocated to the tenant can meet the performance requirement of the tenant and no residual storage resources exist in the cloud storage system, the step (S6) is carried out;
(S4) after increasing the lowest priority minW within the range of (minW, maxW), proceeding to step (S2);
(S5) after decreasing the highest priority maxW within the range of (minW, maxW), proceeding to step (S2);
(S6) the priority adjustment of the tenant IO request is ended.
4. The method for resource management with coordination performance isolation and data recovery optimization according to claim 3, wherein said step (S4) is to increase said lowest priority minW within the range of (minW, maxW) by: updating the lowest priority minW to (minW + maxW)/2;
in the step (S5), the highest priority maxW is lowered within the range of (minW, maxW), specifically: and updating the highest priority maxW to (minW + maxW)/2.
5. The method for resource management with coordination performance isolation and data recovery optimization according to claim 3, wherein the method for determining whether the storage resources actually allocated to the tenant can meet the performance requirement of the tenant comprises:
according to CR ═ TPA–TPN)/TPNCalculating the SLO compliance rate CR of the tenants in the current cloud storage system;
if CR <0, judging that the storage resources actually allocated to the tenants cannot meet the performance requirements of the tenants; if CR is greater than Th, the storage resources actually allocated to the tenants are judged to meet the performance requirements of the tenants, and residual storage resources also exist in the cloud storage system; if 0< CR < Th, judging that the storage resources actually allocated to the tenant can meet the performance requirement of the tenant, and judging that no residual storage resources exist in the cloud storage system;
wherein, TPARepresents the sum of storage resources, TP, actually allocated to the tenantNRepresents the minimum sum of storage resources required by the tenant, Th represents a preset threshold, and Th>0。
6. The method of resource management for coordination performance isolation and data recovery optimization according to claim 1, wherein said method for maximizing utilization of system resources for resource allocation in the absence of a data recovery request comprises:
creating a token bucket for the virtual block device of each tenant, and obtaining currently available storage resources of the cloud storage system;
if the tenant performance requirement indicates that the tenant requirement size is T1If the fixed throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T1(ii) a If the tenant performance requirement indicates that the tenant requirement is not lower than T2If the minimum throughput rate is guaranteed, the rate of generating tokens by the token bucket of the virtual block device is set to be T2;
After all tenants distribute storage resources, if the remaining storage resources still exist in the cloud storage system, proportionally distributing the remaining storage resources among the tenants with the minimum throughput rate guarantee requirement, so that the rate of generating tokens by the token bucket of the corresponding virtual block device is increased according to the same proportion;
wherein, the distribution proportion for distributing the residual storage resources is the ratio of the corresponding minimum throughput rates.
7. The method for resource management with coordination performance isolation and data recovery optimization according to any one of claims 1-4, wherein at said client, a tenant IO request is sent to a storage node, the method comprising:
when a tenant IO request is sent, consuming tokens in a token bucket of corresponding virtual block equipment, wherein the number of the consumed tokens is equal to the size of the request;
and if the number of tokens in the token bucket is not enough to serve the tenant IO request, enabling the process initiating the tenant IO request to be dormant until enough tokens are generated in the token bucket.
8. The method for resource management with coordination performance isolation and data recovery optimization according to any one of claims 1-4, wherein each type of request is received at a storage node of the cloud storage system, and different types of requests are scheduled according to a priority ratio, and the method comprises:
constructing a request queue for each type of request at the storage node, wherein the priority of the request queue is consistent with that of the requests in the request queue;
and carrying out request scheduling from different queues according to the priority proportion.
9. A cloud storage system comprises a client and a storage node, wherein the client comprises: the system comprises a monitoring module, a resource allocation module and a priority regulation module; the storage node comprises a request scheduling module;
the monitoring module is used for monitoring the use condition of the storage resources in the cloud storage system and whether a data recovery request occurs;
the resource allocation module is used for allocating storage resources for each tenant according to the performance requirements of the tenant, enabling the resource allocation to only meet the lowest performance requirements of the tenant when a data recovery request occurs, and enabling the resource allocation to realize the maximum utilization of system resources when the data recovery request does not occur;
the priority adjusting module is used for reducing the priority of the tenant IO request under the condition that the storage resources which are actually allocated to the tenant meet the performance requirement of the tenant when the data recovery request occurs; the resource allocation module is further configured to send a tenant IO request to the storage node;
the request scheduling module is used for receiving various requests and scheduling different types of requests according to the proportion of the priority so as to allocate storage resources to the different types of requests according to the proportion of the priority;
the request types comprise a tenant IO request and a data recovery request.
10. A system comprising a computer-readable storage medium and a processor, wherein the computer-readable storage medium is configured to store an executable program;
the processor is configured to read an executable program stored in the computer-readable storage medium and execute the resource management method for coordination performance isolation and data recovery optimization according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911100053.XA CN110955522B (en) | 2019-11-12 | 2019-11-12 | Resource management method and system for coordination performance isolation and data recovery optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911100053.XA CN110955522B (en) | 2019-11-12 | 2019-11-12 | Resource management method and system for coordination performance isolation and data recovery optimization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110955522A true CN110955522A (en) | 2020-04-03 |
CN110955522B CN110955522B (en) | 2022-10-14 |
Family
ID=69977228
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911100053.XA Active CN110955522B (en) | 2019-11-12 | 2019-11-12 | Resource management method and system for coordination performance isolation and data recovery optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110955522B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113687798A (en) * | 2021-10-26 | 2021-11-23 | 苏州浪潮智能科技有限公司 | Method, device and equipment for controlling data reconstruction and readable medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030182348A1 (en) * | 2002-03-21 | 2003-09-25 | James Leong | Method and apparatus for runtime resource deadlock avoidance in a raid system |
CN103136056A (en) * | 2013-03-04 | 2013-06-05 | 浪潮电子信息产业股份有限公司 | Cloud computing platform scheduling method |
US20140250440A1 (en) * | 2013-03-01 | 2014-09-04 | Adaptive Computing Enterprises, Inc. | System and method for managing storage input/output for a compute environment |
US20160283274A1 (en) * | 2015-03-27 | 2016-09-29 | Commvault Systems, Inc. | Job management and resource allocation |
CN106484536A (en) * | 2016-09-30 | 2017-03-08 | 杭州朗和科技有限公司 | A kind of I O scheduling method, device and equipment |
CN107249035A (en) * | 2017-06-28 | 2017-10-13 | 重庆大学 | A kind of shared repeated data storage of hierarchical dynamically changeable and reading mechanism |
CN107534583A (en) * | 2015-04-30 | 2018-01-02 | 华为技术有限公司 | The application drive and adaptive unified resource management of data center with multiple resource schedulable unit (MRSU) |
US20180060176A1 (en) * | 2016-08-29 | 2018-03-01 | Vmware, Inc. | Tiered backup archival in multi-tenant cloud computing system |
CN108337109A (en) * | 2017-12-28 | 2018-07-27 | 中兴通讯股份有限公司 | A kind of resource allocation methods and device and resource allocation system |
CN109992418A (en) * | 2019-03-25 | 2019-07-09 | 华南理工大学 | The multi-tenant big data platform resource priority level scheduling method and system of SLA perception |
-
2019
- 2019-11-12 CN CN201911100053.XA patent/CN110955522B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030182348A1 (en) * | 2002-03-21 | 2003-09-25 | James Leong | Method and apparatus for runtime resource deadlock avoidance in a raid system |
US20140250440A1 (en) * | 2013-03-01 | 2014-09-04 | Adaptive Computing Enterprises, Inc. | System and method for managing storage input/output for a compute environment |
CN103136056A (en) * | 2013-03-04 | 2013-06-05 | 浪潮电子信息产业股份有限公司 | Cloud computing platform scheduling method |
US20160283274A1 (en) * | 2015-03-27 | 2016-09-29 | Commvault Systems, Inc. | Job management and resource allocation |
CN107534583A (en) * | 2015-04-30 | 2018-01-02 | 华为技术有限公司 | The application drive and adaptive unified resource management of data center with multiple resource schedulable unit (MRSU) |
US20180060176A1 (en) * | 2016-08-29 | 2018-03-01 | Vmware, Inc. | Tiered backup archival in multi-tenant cloud computing system |
CN106484536A (en) * | 2016-09-30 | 2017-03-08 | 杭州朗和科技有限公司 | A kind of I O scheduling method, device and equipment |
CN107249035A (en) * | 2017-06-28 | 2017-10-13 | 重庆大学 | A kind of shared repeated data storage of hierarchical dynamically changeable and reading mechanism |
CN108337109A (en) * | 2017-12-28 | 2018-07-27 | 中兴通讯股份有限公司 | A kind of resource allocation methods and device and resource allocation system |
CN109992418A (en) * | 2019-03-25 | 2019-07-09 | 华南理工大学 | The multi-tenant big data platform resource priority level scheduling method and system of SLA perception |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113687798A (en) * | 2021-10-26 | 2021-11-23 | 苏州浪潮智能科技有限公司 | Method, device and equipment for controlling data reconstruction and readable medium |
Also Published As
Publication number | Publication date |
---|---|
CN110955522B (en) | 2022-10-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10772115B2 (en) | Resource scheduling method and server | |
US10185592B2 (en) | Network storage device using dynamic weights based on resource utilization | |
CN108337109B (en) | Resource allocation method and device and resource allocation system | |
CN108667748B (en) | Method, device, equipment and storage medium for controlling bandwidth | |
CN108268317B (en) | Resource allocation method and device | |
CN107688492B (en) | Resource control method and device and cluster resource management system | |
US9225668B2 (en) | Priority driven channel allocation for packet transferring | |
CN109564528B (en) | System and method for computing resource allocation in distributed computing | |
CN108268318A (en) | A kind of method and apparatus of distributed system task distribution | |
CN111798113A (en) | Resource allocation method, device, storage medium and electronic equipment | |
CN113672391B (en) | Parallel computing task scheduling method and system based on Kubernetes | |
CN112783659A (en) | Resource allocation method and device, computer equipment and storage medium | |
CN107665143A (en) | Method for managing resource, apparatus and system | |
CN112073532B (en) | Resource allocation method and device | |
CN116991585A (en) | Automatic AI calculation power scheduling method, device and medium | |
CN113760549B (en) | Pod deployment method and device | |
CN110955522B (en) | Resource management method and system for coordination performance isolation and data recovery optimization | |
CN113010309B (en) | Cluster resource scheduling method, device, storage medium, equipment and program product | |
CN108228323B (en) | Hadoop task scheduling method and device based on data locality | |
CN114265676B (en) | Cluster resource scheduling method, device, equipment and medium | |
CN104731662B (en) | A kind of resource allocation methods of variable concurrent job | |
CN113630733A (en) | Network slice distribution method and device, computer equipment and storage medium | |
CN114489463A (en) | Method and device for dynamically adjusting QOS (quality of service) of storage volume and computing equipment | |
CN113127186B (en) | Method, device, server and storage medium for configuring cluster node resources | |
CN107491448A (en) | A kind of HBase resource adjusting methods and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |