CN113127230A - Dynamic resource regulation and control method and system for sensing storage back-end tail delay SLO - Google Patents

Dynamic resource regulation and control method and system for sensing storage back-end tail delay SLO Download PDF

Info

Publication number
CN113127230A
CN113127230A CN202110399392.3A CN202110399392A CN113127230A CN 113127230 A CN113127230 A CN 113127230A CN 202110399392 A CN202110399392 A CN 202110399392A CN 113127230 A CN113127230 A CN 113127230A
Authority
CN
China
Prior art keywords
tenant
window
request
cpu
slo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110399392.3A
Other languages
Chinese (zh)
Other versions
CN113127230B (en
Inventor
马留英
刘振青
熊劲
蒋德钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202110399392.3A priority Critical patent/CN113127230B/en
Priority to PCT/CN2021/100821 priority patent/WO2022217739A1/en
Publication of CN113127230A publication Critical patent/CN113127230A/en
Application granted granted Critical
Publication of CN113127230B publication Critical patent/CN113127230B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

The invention provides a method for dynamically regulating and controlling resources of a storage back end of a distributed storage system, wherein a plurality of LC tenants and BE tenants share the storage back end, and each LC tenant has a request queue and a CPU core number N for the request queueiDividing the request queue by window unit, NiTaking all current requests of an LC tenant request queue as temporary windows according to different frequencies by using the number of CPU cores distributed for the window i; obtaining the number of requests QL for each temporary windowtAnd the queue time TW of the first request of the temporary windowt(ii) a Based on QLtAnd TWtDetermining the number of CPU cores N required for the current temporary windowt(ii) a According to NtThe number of CPU cores N of the current windowiAnd adjusting the number of CPU cores of the LC tenant. Based on the embodiment of the invention, the bandwidth of the BE tenant can BE maximized, and the CPU resource can BE rapidly and accurately increased to avoid abnormal conditionsThe occurrence of the target SLO unsatisfied condition detects the abnormality at an appropriate timing, and recalculates and allocates the CPU resources.

Description

Dynamic resource regulation and control method and system for sensing storage back-end tail delay SLO
Technical Field
The invention relates to the field of distributed storage system back-end storage, in particular to the technical field of ensuring tail delay requirements of delay type tenants under a multi-tenant shared storage back-end scene.
Background
In order to improve the resource utilization rate of the storage backend of the distributed storage system, tenants of different types will generally share the storage backend of the distributed storage system. These tenants can be generally divided into two categories: one is the need for delay: an explicit service Level object (slo) (service Level object) requirement, namely: 99th/99.9th tail delay requirements, for example, a delay-sensitive tenant (LC) with a 99.9th tail delay of no more than 5ms, the request granularity of this type of tenant is small (for example, 4 KB); another class is tenants (best-effort, BE) that can run in the background with no explicit need for performance, and the request granularity of this class of tenants is large (e.g., 64KB or larger). When the LC tenant and the BE tenant share the storage back end of the distributed storage system, the tail delay requirement of the LC tenant cannot BE met due to obvious resource (such as threads and CPU cores) competition, and the bandwidth of the BE tenant is low.
Under the scene that multiple LC tenants and multiple BE tenants share the storage back end of the distributed storage system, the objective is to ensure the target SLO of the LC tenants and simultaneously maximize the bandwidth of the BE tenants so as to improve the resource utilization rate of the storage back end. Currently, there is a great deal of work being deployed around the above mentioned goals. The prior art is divided into four categories.
The first type adopts a shared thread model, considers the target SLO requirement of the LC tenant, and ensures the target SLO requirement of the LC tenant by a request scheduling (such as priority scheduling) mode, and the method adopts a mode of adjusting the thread resource sharing proportion between the LC tenant and the BE tenant based on historical tail delay information feedback, or determines the highest sending rate and the priority of the LC tenant by adopting a mode of analyzing load characteristics offline, or performs offline analysis on different read-write accesses of the storage device and combines a priority scheduling mode based on flow control to ensure the target SLO requirement of the LC tenant.
The second type is to dynamically divide the CPU resources among different tenants, while considering the target SLO requirements of the LC tenants. The method adopts trial incremental core allocation strategy based on fixed time interval by detecting historical tail delay information and comparing with target SLO requirement of LC tenant.
The third category dynamically partitions the CPU resources with the goal of minimizing the tail delay of LC tenants. This type of approach does not consider the target SLO requirements of LC tenants, and only dynamically allocates core resources at fixed time intervals through some information (e.g., core resource usage, request queue length, and real-time load).
The fourth method is that firstly, a request window is adopted to quantify the real-time load of the LC tenant, then the number of CPU cores required by the LC tenant is estimated by considering the target SLO demand and the real-time load of the LC tenant, and CPU core resources are dynamically regulated and controlled for the LC tenant based on the request window, and the method refers to the Chinese patent application with the application number of 202010139287.1.
The four types of prior art have obvious disadvantages and shortcomings when meeting the target requirement of ensuring the target SLO requirement of the LC tenant and maximizing the bandwidth of the BE tenant under the scene of 'multi-tenant shared distributed storage system storage backend', and specifically the following are:
for the first kind of work, because the shared thread model is adopted, although the requests of the LC tenant and the BE tenant are dynamically scheduled according to the target SLO requirement of the LC tenant, the threads can BE scheduled to process the requests only by needing CPU resources, and the threads compete for the CPU core resources at any time because no clear relation exists between the threads and the CPU core. Contention for CPU core resources can have a severe impact on latency, especially tail latency. Thus, the target SLO requirements of the LC tenant may not be met. If the influence of the CPU core competition on the tail delay is to be eliminated, more resources need to be allocated to the LC tenant, which results in low resource utilization. In addition, the regulation and control method has obvious limitations based on offline analysis of historical information or tenant request historical access record trace in the operation process, because (1) the historical information only reflects performance conditions in a history period of time and has no clear correlation with future performance, and when a burst (burst) request occurs to a request of an LC tenant, the regulation and control method based on the feedback information cannot timely find and effectively deal with the request burst, and the feedback regulation and control method based on the historical information is difficult to meet future target requirements; (2) if the tenant access mode changes, which is different from the offline trace, the regulation and control based on the trace cannot necessarily guarantee the target requirement of the tenant with the changed access mode. Both of these factors make it difficult for the target SLO requirements of LC tenants to be met.
For the second kind of work, although the CPU core resources are dynamically divided between the LC tenant and the BE tenant, the competition of the CPU cores is avoided, the number of the CPU cores which can meet the target SLO is gradually converged by adopting an incremental core allocation mode based on the difference between the history tail delay and the target SLO and the real-time load. Firstly, the regulation and control mode of the kernel based on the historical information is inaccurate; secondly, the incremental core allocation mode needs a long time (second level) to converge to a proper core number, which cannot meet the requirement of the millisecond level target SLO, and meanwhile, in the convergence process, the delay (especially tail delay) of the LC tenant is continuously influenced due to insufficient CPU core resources; finally, the work adopts a regulation and control mode with a fixed time interval, which makes the target SLO of the LC tenant difficult to guarantee.
Aiming at the third type of work, the tail delay of the LC tenant is minimized, the real-time load capacity of the LC tenant is considered, and the CPU core resources are dynamically adjusted at fixed time intervals. The CPU regulation and control mode of this type of work is independent of the target SLO, which may present two problems. Firstly, the regulation and control intervals required for obtaining the minimum tail delay of different LC tenants are different and are related to the load characteristics of the LC tenants, and when a plurality of LC tenants exist, a reasonable regulation and control interval is difficult to determine. Secondly, the core allocation method aiming at minimizing the tail delay of the LC tenant will result in lower resource utilization rate, and when the target SLO requirement of the LC tenant is looser, because the method does not sense the target SLO requirement, the method still continuously occupies more CPU core resources to minimize the tail delay of the LC tenant, so that the bandwidth of the BE tenant is very low.
For the second type of work and the third type of work, an additional core is required to be specially responsible for the regulation and control of CPU core resources, which obviously causes resource waste. In addition, although the real-time load is considered to regulate and control the CPU core resources, the influence of IO fluctuation of the underlying storage device on the CPU core resource allocation is not considered. In the storage back end of the distributed storage system, the request inevitably accesses the underlying storage device, and meanwhile, the service time of the underlying storage device has obvious fluctuation, especially when the underlying storage device is an SSD. Therefore, when the CPU core resources are dynamically allocated, the influence caused by the service time fluctuation of the underlying device must BE considered together, so as to ensure that the target SLO of the LC tenant is ensured and maximize the bandwidth of the BE tenant.
For the fourth type of work, although the number of the CPU cores required by the LC tenants is estimated by considering the target SLO demand and the real-time load amount at the same time and the CPU resources are dynamically regulated and controlled based on the request window, the estimation method lacks a theoretical basis, and the same core allocation strategy is adopted for tenants with different target SLO demands, and LC tenants with different target SLO demands are not distinguished, which results in lower resource utilization rate. Meanwhile, when the CPU core resources are regulated again in the request window, the enqueue rate of the request and the dequeue rate of the request need to BE monitored to determine whether the load burst and the service time fluctuation of the underlying storage device occur, the calculation process is complex and not accurate enough, and a situation that the target SLO of the LC tenant cannot BE satisfied or the bandwidth of the BE tenant is low may occur.
Disclosure of Invention
The present invention is directed to the above-mentioned problems, and according to a first aspect of the present invention, a method for dynamically regulating and controlling resources of a storage back-end of a distributed storage system is provided, in which a plurality of LC tenants share the storage back-end, each LC tenant has a request queue and a number N of CPU cores for the request queueiThe access requests in each request queue are divided in window units, NiThe number of CPU cores allocated to the window i, the method comprises the following steps:
step 100: taking all current requests of a request queue of each LC tenant as a temporary window;
step 200: obtaining the number of requests QL for each temporary windowtAnd queuing of the first request of the temporary windowTime TWt
Step 300: based on QLtAnd TWtDetermining the number of CPU cores N required for a current requestt
Step 400: according to the required CPU core number NtThe number of CPU cores N of the current windowiAnd adjusting the number of CPU cores of the LC tenant.
In one embodiment of the present invention, step 300 comprises:
the number of CPU cores N required for the temporary window is calculated using the following formulat
Figure BDA0003019524260000041
Wherein T isavg_ioFor the average service time of the request,
Figure BDA0003019524260000042
for the average dequeue rate of requests within the temporary window,
Figure BDA0003019524260000043
calculated using the following formula:
Figure BDA0003019524260000044
wherein, QLtIndicating the number of requests in the temporary window, TWtIndicating the queuing time, T, of the first request in the temporary windowsloTarget SLO demand, Tail, for Tail delay of LC tenantioIs the service time tail delay.
In one embodiment of the present invention, wherein the distributed storage system further comprises a BE tenant, the step 400 comprises:
if N is presentt>NiAnd then seizing N from CPU core resources occupied by BE tenantt-NiA CPU core resource and adds N to the LC tenantt-NiA CPU core resource.
In one embodiment of the invention, dynamically regulating resources is performed using one of the following strategies:
a conservative strategy that detects and reallocates CPU core resources only at the beginning of a window;
an aggressive policy that implements dynamic regulation of resources once a request within a window is dequeued; and
SLO aware policies that use temporary windows to detect and reallocate CPU core resources each time one or more requests are dequeued for different SLO requirements.
In one embodiment of the present invention, wherein in the policy of aware SLO, a temporary window is used to detect and reallocate CPU core resources every time a budget request is dequeued, wherein
Figure BDA0003019524260000051
Wherein Tslo is a target SLO requirement, Tail, of the LC tenantioFor service time tail delay, Tavg_ioFor average service time requested, TWiIs the current window WiThe queue time of the first request.
In an embodiment of the present invention, a policy is dynamically selected for the LC tenant according to 3 threshold window thresholds THRESH _ WIN, a LOW threshold THRESH _ LOW, and a HIGH threshold THRESH _ HIGH, and the method further includes:
dynamically acquiring tail delay information once every THRESH _ WIN windows, calculating the difference between the target SLO requirement and the acquired tail delay, selecting a conservative strategy if the difference exceeds THRESH _ HIGH, selecting an aggressive strategy if the difference is less than THRESH _ LOW, and selecting a strategy for sensing SLO under other conditions.
In one embodiment of the invention, the method further comprises the following steps of calculating and distributing the required CPU core number N for the LC tenant at the beginning of each window i according to the following formulai
Figure BDA0003019524260000052
Figure BDA0003019524260000053
For the average dequeue rate, T, of requests within the window Wiavg_ioIs the average service time of the request, wherein,
Figure BDA0003019524260000054
is composed of
Figure BDA0003019524260000055
Wherein, QLiTo be at the window WiTslo is the target SLO requirement for the Tail delay of the LC tenant, TailioFor service time tail delay, TWiIs a window WiThe queue time of the first request.
In one embodiment of the present invention, further comprising:
will NiAnd window Wi-1Number of CPU cores occupied at the end Ni-1By comparison, the process of the first and second steps,
if N is presenti>Ni-1And then seizing N from the CPU core occupied by BE tenanti-Ni-1Each core is distributed to the LC tenant;
if N is presenti<Ni-1Then reduce N for the LC tenanti-1-NiA reduced number of CPU cores to BE used to serve BE tenants;
if N is presenti=Ni-1And the number of CPU cores occupied by the LC tenant does not need to be adjusted.
In one embodiment of the invention, the CPU resource used by each LC tenant is responsible for both processing requests and performing CPU core resource regulation.
According to a second aspect of the present invention, there is provided a computer readable storage medium having stored therein one or more computer programs which, when executed, implement the method of the present invention for dynamically regulating resources for a distributed storage system storage backend.
According to a third aspect of the invention there is provided a computing system comprising:
a storage device, and one or more processors;
wherein the storage device is used for storing one or more computer programs, and the computer programs are used for realizing the method for dynamically regulating and controlling resources of the storage back end of the distributed storage system when being executed by the processor.
Compared with the prior art, the method has the advantages that when the plurality of LC tenants and the plurality of BE tenants share the storage back end of the distributed storage system, the reasonable CPU resources are calculated and distributed for each window when the window starts by combining the target SLO requirements of the LC tenants and the real-time load quantification method based on the window so as to ensure that the delay of the requests in the window meets the target SLO requirements. In each window processing process, for 2 possible abnormal situations (load burst and service time fluctuation of the underlying storage device) that the target SLO cannot be met due to changes of the CPU core requirements, a simple temporary window (temp window) mode is adopted to detect and calculate changes of the CPU core requirements. Meanwhile, a proper CPU core allocation strategy is flexibly selected for different LC tenants or different stages of the same LC tenant, so that the requirement of each LC tenant on CPU core resources is met, and the target SLO requirement of each LC tenant is ensured. In addition, the CPU core occupied by the LC tenant adopts a completely autonomous regulation and control mode, so that resource waste is avoided. In the regulation and control process, the residual CPU core resources are used for processing the request of the BE tenant, the bandwidth of the BE tenant is maximized, and the utilization rate of system resources is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 is a diagram illustrating a process for handling requests at a distributed storage system storage back-end, according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating the use of temp window to detect exceptions and reallocate CPU core resources according to an embodiment of the present invention;
figure 3a shows a 3 LC tenant tail delay comparison graph under different technologies;
fig. 3b shows a bandwidth comparison diagram of 3 BE tenants under different technologies.
Detailed Description
In order to solve the problems in the background art, the inventor provides a method for dynamically regulating and controlling resources by sensing the tail delay SLO of the storage back end through research. First of all, the invention is based on the following basic principles: 1) each tenant has a special request queue and CPU core, and these are not shared among tenants; 2) the queue adopts a first-in first-out strategy, the received request is queued, and the first received request is processed first. The processed requests are processed first and then processed, and dequeue refers to the removal of the processed requests from the queue. The method for carrying out real-time quantification on LC tenant load based on request window divides access requests in an LC tenant queue by taking the window as a unit, when a first request in the queue is processed, all requests in the current queue are regarded as a window, and a window W is treatediIn WiThe first request to enter the queue after the last request in the window is window Wi+1I is a positive integer. In the model of the invention there is only one request queue per LC tenant.
For example, if there are currently 8 requests in the queue when the first request in the queue is processed, then all 8 requests make up a window W1In processing W1In the process of 8 requests, some requests are received successively. When the whole W is1After the 8 requests are processed, the 8 requests in the queue are all dequeued, and the first request in the current queue is the window W2When starting to process this request, all the requests in the queue form a window W2The request of (1). Suppose window W2There are 9 requests, and when all of the 9 requests in the queue have been dequeued, the first request in the current queueRequest is window W3When starting to process this request, all the requests in the queue form a window W3The request of (1). Subsequent requests may also constitute W3、W4、W5.., window.
The invention designs a CPU core calculation and distribution method and an abnormality detection method, and improves the following steps:
(1) combining with Little's Law, calculating the number of CPU cores to be allocated according to the target SLO requirement
The processing of a request at the storage back end of a distributed storage system is illustrated in fig. 1. The dashed box portion of fig. 1 is considered as a request processing system, incorporating classical theoretical litter's law: λ W (where L is the request processing concurrency, λ is the average arrival rate of the request, and W is the average processing time of the request), then: l is the number of CPU cores (N), and λ is the average dequeue rate of requests (DR)avg) W is the average service time (T) of the requestavg_io). Dividing LC tenant's request into windows, each window recording dyad information (QL)i,TWiIn which QLiTo be at the window WiNumber of queued requests, TWiIs a window WiQueue time of the first request in, i.e. from window WiThe time interval from the first request to be enqueued to being processed, i.e. the window WiThe dequeue time of the first request in the queue is subtracted by the enqueue time. To guarantee tail delay of LC tenant as target SLO requirement (T)slo) Then the window W must be satisfiediAll requests within Tslo-Tailio-TWiDequeuing in time, wherein TailioTail delay of service time for the underlying device. Thus, the average dequeue rate of requests within window Wi
Figure BDA0003019524260000081
The calculation can be obtained by the following formula:
Figure BDA0003019524260000082
as long as requests within a window are dequeued according to this rate, none of the requests within the window will be delayed more than TsloI.e. to meet the target SLO requirements. Finally, the combination with the Litter Law theory leads to a window WiThe number of cores for which the internal request meets the target SLO requirement can be calculated by the following formula:
Figure BDA0003019524260000083
wherein the average service time T is requestedavg_ioTail delay with service timeioAll can be obtained in real time in the running process of the system. As the windows of the LC tenants change, the required CPU core number N is calculated and distributed for the LC tenants at the beginning of each windowiAnd the residual CPU core resources are used by the BE tenant so as to maximize the resource utilization rate.
The improved technical effect is that the accurate CPU core resources are calculated and distributed for each LC tenant by taking a window as a unit so as to ensure the respective target SLO requirements, and meanwhile, the rest CPU cores are used for processing the request of the BE tenant, so that the bandwidth of the BE tenant is maximized, and the resource utilization rate of the system is improved.
(2) Detection and reallocation of intra-window load bursts (bursts) and underlying storage device service time fluctuations Number of CPU cores
No matter the load generates burst request access or service time fluctuation of the underlying storage device, the requirement of the CPU core resource is changed, and if the requirement of the CPU core resource cannot be met in time, the target SLO of the LC tenant cannot be guaranteed. Through analysis, a large number of burst requests (load bursts) of the LC tenant can enable the queue requests of the LC tenant to increase sharply, namely the queue is lengthened; the service time fluctuation of the bottom storage device causes a CPU core which services the request to be occupied for a long time and cannot process the subsequent requests in the queue in time, and the request queue of the LC tenant is also lengthened. If the CPU core resources cannot be increased on demand in time, the target SLO requirements of the LC tenants cannot be met. Need to be longerThe queuing time of (c). Since the request queue becomes longer due to both the load burst and the underlying storage service time fluctuation, a method for detecting the occurrence of an exception by using a temporary window temp window is proposed, as shown in fig. 2. At the window WiAt a certain time during the processing of (1), all requests in the current queue are defined as a temp window, and then the demand of the temp window for CPU core resources is calculated by using an improved CPU core calculation method similar to that in (1) (N)t). During the detection, the window WiMay have been dequeued, and after detection, there may still be requests to enter window Wi+1Thus temp window includes window WiNon-dequeued requests in (1) and window Wi+1With QLtIndicates the number of requests in temp window, TWtIndicating the queuing time of the first request in temp window. Therefore, the average dequeue rate of requests in temp window can be calculated by the following equation 3
Figure BDA0003019524260000091
And calculating the number N of CPU cores required by temp window by formula 4t
Figure BDA0003019524260000092
Figure BDA0003019524260000093
If the number of the CPU cores obtained by calculation exceeds the number of the cores currently used by the current window (N)t>Ni) If so, the CPU core resource occupied by the BE tenant needs to BE preempted (N)t-Ni) And CPU cores) and increases the number of CPU cores to be preempted for the LC tenant to cope with the exception and ensure that the target SLO of the LC tenant is satisfied.
The technical effect of the improvement is that 2 possible anomalies in the window can be detected rapidly and simultaneously by the simple temp window-based detection method: the method comprises the following steps of generating an abnormal condition that the demand of a CPU core changes due to load burst or service time fluctuation of a bottom storage device, and rapidly and accurately increasing CPU core resources to avoid the condition that a target SLO is not met due to abnormality.
(3) CPU core allocation strategy for detecting anomalies at different frequencies
The anomaly detection method proposed in (2) above does not need to be performed at any time, because anomalies do not always occur at any time, and detection too frequently causes additional overhead. In addition, target SLO requirements of different LC tenants are different, and even if the same LC tenant is in a different stage, the requirements for CPU core resources are different. Therefore, 3 CPU core allocation strategies for detecting an abnormality at different frequencies have been proposed. The first is a conservative strategy, which only recalculates and allocates CPU core resources at the beginning of the window; the second is an aggressive strategy, namely, each time a request in a window is dequeued, a temp window is used for checking whether the CPU core resource requirement is changed; the third is a strategy for sensing the SLO, and the frequency of checking is determined by using an abnormal detection frequency budget for different SLO requirements, that is, each time a budget request is dequeued, a temp window is used for checking once, and the budget is obtained by calculating according to the following formula:
Figure BDA0003019524260000101
in addition, a method for dynamically selecting a proper CPU core allocation strategy for the LC tenant can be further adopted on the basis of 3 thresholds, namely a window threshold THRESH _ WIN, a LOW threshold THRESH _ LOW and a HIGH threshold THRESH _ HIGH. Since the high percentile tail delay needs to be statistically significant for enough requested delays, tail delay information is dynamically obtained every THRESH _ WIN window, and the difference between the target SLO demand and the obtained tail delay is calculated. If the difference exceeds THRESH _ HIGH, the target SLO tail delay is met, a conservative strategy is selected, and the bandwidth of the BE tenant is maximized as far as possible on the basis of meeting the target SLO; if the difference is smaller than THRESH _ LOW, the target SLO is probably not met, and then an aggressive strategy is selected to quickly and timely cope with possible SLO violation conditions at any time; and otherwise, selecting a strategy for sensing the SLO, and dynamically setting the budget by each LC tenant according to the target SLO of the LC tenant to monitor the occurrence of the abnormity and regulate and control the CPU core resource. The 3 thresholds can be set according to actual conditions.
The improved technical effect is that according to the difference between the target SLO of the LC tenant and the statistical tail delay, a proper CPU core allocation strategy (namely different abnormal detection frequencies) is dynamically selected, the abnormality is detected at a proper time, the CPU core resources are recalculated and allocated, and the bandwidth of the BE tenant is maximized while the target SLO of the LC tenant is ensured to BE satisfied.
(4) Regulatory autonomy of CPU cores
The CPU core number regulation processes (1), (2) and (3) do not need additional cores to be specially responsible for the regulation and control of CPU core resources. The regulation and control mode of the core resources by the CPU used by the LC tenant is completely autonomous, mainly because the CPU can acquire all information in the regulation and control process, including queue condition, window condition, occupied core number and specific cores of the LC tenant, currently used core allocation strategy and the like. Based on the global information, the CPU core can monitor the queue condition and regulate and control the CPU core resources according to the requirement. Therefore, resource waste caused by the fact that extra cores are needed to be used for regulation and control is avoided.
The following is a specific embodiment of the present invention.
The embodiment is a scenario in which a plurality of LC tenants and a plurality of BE tenants share the storage back end of the distributed storage system. And dynamically regulating and controlling CPU core resources in real time according to the target SLO requirement of each LC tenant to ensure the target requirement of each LC tenant, and simultaneously, using the residual CPU resources for serving BE tenant requests to improve the utilization rate of system resources. The CPU core used by each LC tenant can acquire the queue information, the window information and the number of occupied CPU cores of each LC tenant in real time, so that the regulation and control process can be executed, namely, the CPU resource used by each LC tenant is in charge of processing the request and executing the regulation and control of the CPU core, and the extra core is not required to be specially used for the regulation and control of the CPU core, so that the resource waste is avoided. The implementation process of the technical scheme of the invention is described by taking an example that two LC tenants (LC1 and LC2) and two BE tenants (BE1 and BE2) share the storage back end of the distributed storage system.
When the requests of tenants LC1 and LC2 start to access the storage back end, respective request windows are respectively created for LC1 and LC2 according to the window determination method, the relevant information of the respective windows is recorded at the beginning of the windows, and the respective target SLO requirements are combined according to the formula
Figure BDA0003019524260000111
The number of CPU cores (assumed to be N) required to satisfy the respective target SLO requirements is calculated separately1And N2) And N is1Allocating CPU cores to tenant LC1, N2Each CPU core is allocated as tenant LC 2. If the total number of CPU cores in the system is N, the remaining N-N1-N2One CPU core will serve tenant BE1 and tenant BE2, and the two BE tenants share the remaining CPU cores in a round-robin fashion. In the subsequent request processing process, along with the change of the window, if the numbers of the CPU cores of the tenant LC1 and the tenant LC2 change (are regulated to meet the target SLO requirement), the number of the CPU cores for serving the BE tenant is adjusted accordingly.
Window W of tenant LC1i(ith window) is taken as an example to describe the process of adjusting the CPU core allocation policy and adjusting the number of CPU cores in the window, and the tenant LC2 adopts a similar regulation and control process. Assume that the initial CPU core allocation policy of tenant LC1 is an aggressive policy (the initial policy can be flexibly set to one of 3 policies).
Step A: when a window starts, calculating and distributing the number of CPU cores required for ensuring the target SLO requirement, and selecting a proper CPU core distribution strategy for the tenant at a proper time;
step A10: calculation window WiNumber of CPU cores required (N)i) Number of CPU cores currently occupied (N)i-1I.e. window Wi-1Number of CPU cores occupied at the end), if Ni>Ni-1And then seizing N from the CPU core occupied by BE tenanti-Ni-1Core and allocateGiving the LC tenant; if N is presenti<Ni-1Then reduce N for the LC tenanti-1-NiA reduced number of CPU cores to BE used to serve BE tenants; if N is presenti=Ni-1And the number of CPU cores occupied by the LC tenant does not need to be adjusted.
Step A20: if the serial number i of the window is an integral multiple of THRESH _ WIN, the history Tail delay Tail of the LC tenant is calculated statistically and is compared with the target SLO (T)slo) Comparing, and selecting different CPU core allocation strategies for the LC tenant according to the following three conditions:
a) if Tslo-Tail > THRESH _ HIGH, selecting a conservative strategy and setting the frequency of detecting anomalies (budget) in the window to 0, indicating that the strategy only recalculates CPU core requirements at the beginning of the window;
b) if Tslo-Tail < THRESH _ LOW, selecting an aggressive policy, and setting the frequency (budget) of the anomaly in the detection window to 1, indicating that the policy detects whether an anomaly occurs after processing one request;
c) otherwise, selecting a strategy for sensing the SLO, and setting the abnormal frequency (budget) in the detection window as:
Figure BDA0003019524260000121
the frequency is related to the target SLO and the information in the window, and indicates whether the strategy detects the abnormity every budget requests.
And B: in the process of processing the requests in the window, if the number of the processed requests in the window is integral multiple of the set abnormal detection frequency (budget), detecting whether the CPU core resource requirement is changed by using temp window. The specific detection mode is as follows: all requests in the current queue belong to temp window, and the number of CPU cores (N) required for the temp window is calculatedt) And the number of CPU cores currently occupied (N)i) Making a comparison if Nt>NiThis indicates that the CPU core needs have changed, i.e., an exception (load burst or underlying storage service time fluctuation) has occurred. In order to ensure that the target SLO of the LC tenant is still satisfied under the condition of abnormal occurrence, the CPU core occupied by the BE tenant is requiredIntermediate preemption Nt-NiAnd the CPU core is distributed to the LC tenant.
And C: when the last request in the window is processed, if no request exists in the tenant queue, the subsequent window is an empty window. If the tenant does not log out from the storage back end, a CPU core is reserved for the tenant, and the rest CPU cores are used for serving BE tenants; and if the tenant logs out from the storage back end, releasing all CPU cores occupied by the tenant for serving the BE tenant. If the tenant queue is not empty, a new window is established, the required CPU core resources are distributed to the new window, and subsequently, the CPU core resources which are reasonable for the window are continuously regulated and controlled, as shown in the step A and the step B.
The following is an example of the present invention in comparison to the prior art Shenango and ake.
The storage back-end scenario of the distributed storage system shared by 3 LC tenants and 3 BE tenants is tested, and a specific test result is shown in fig. 3.
As can be seen from fig. 3 showing the test results: when the LC tenants (all Webserver, target SLOs (99.9th tail delay) are 4ms/5.5ms/7ms, respectively, as shown by the dotted line in fig. 3 (a)) with different target SLO requirements share the storage back end of the distributed storage system with 3 BE tenants (the bandwidth of the BE tenants is shown in fig. 3 (b)), the method (shown by QWin in the figure) adopted by the invention can simultaneously meet different target SLO requirements of the 3 LC tenants, and simultaneously the bandwidth of the BE tenants is increased by 2-28 times.
The previous description is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Moreover, all or a portion of any aspect and/or embodiment may be utilized with all or a portion of any other aspect and/or embodiment, unless stated otherwise. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A method for dynamically regulating and controlling resources of a storage back end of a distributed storage system is provided, wherein a plurality of LC tenants share the storage back end, each LC tenant has a request queue and the number N of CPU cores used for the request queueiThe access requests in each request queue are divided in window units, NiThe number of CPU cores allocated to the window i, and the method comprises
Step 100: taking all current requests of a request queue of each LC tenant as a temporary window;
step 200: obtaining the number of requests QL for each temporary windowtAnd the queue time TW of the first request of the temporary windowt
Step 300: based on QLtAnd TWtDetermining the number of CPU cores N required for a current requestt
Step 400: according to the required CPU core number NtThe number of CPU cores N of the current windowiAnd adjusting the number of CPU cores of the LC tenant.
2. The method of claim 1, step 300 comprising:
the number of CPU cores N required for the temporary window is calculated using the following formulat:
Figure RE-FDA0003096124890000011
Wherein T isavg_ioFor the average service time of the request,
Figure RE-FDA0003096124890000012
for the average dequeue rate of requests within the temporary window,
Figure RE-FDA0003096124890000013
calculated using the following formula:
Figure RE-FDA0003096124890000014
wherein, QLtIndicating the number of requests in the temporary window, TWtIndicating the queuing time, T, of the first request in the temporary windowsloTarget SLO demand, Tail, for Tail delay of LC tenantioIs the service time tail delay.
3. The method of claim 1, wherein the distributed storage system further comprises a BE tenant, the step 400 comprising:
if N is presentt>NiAnd then seizing N from CPU core resources occupied by BE tenantt-NiA CPU core resource and adds N to the LC tenantt-NiA CPU core resource.
4. The method of claim 1, wherein dynamically regulating resources is performed using one of the following strategies:
a conservative strategy that detects and reallocates CPU core resources only at the beginning of a window;
an aggressive policy that implements dynamic regulation of resources once a request within a window is dequeued; and
SLO aware policies that use temporary windows to detect and reallocate CPU core resources each time one or more requests are dequeued for different SLO requirements.
5. The method of claim 4, wherein in the SLO aware policy, a temporary window is used to detect and reallocate CPU core resources every time a budget request is dequeued, wherein CPU core resources are detected and reallocated
Figure RE-FDA0003096124890000021
Wherein Tslo is a target SLO requirement, Tail, of the LC tenantioFor service time tail delay, Tavg_ioAs an average of requestsService time, TWiIs the current window WiThe queue time of the first request.
6. The method of claim 4, dynamically selecting a policy for an LC tenant according to 3 threshold window thresholds THRESH _ WIN, a LOW threshold THRESH _ LOW, and a HIGH threshold THRESH _ HIGH, the method further comprising:
dynamically acquiring tail delay information once every THRESH _ WIN windows, calculating the difference between the target SLO requirement and the acquired tail delay, selecting a conservative strategy if the difference exceeds THRESH _ HIGH, selecting an aggressive strategy if the difference is less than THRESH _ LOW, and selecting a strategy for sensing SLO under other conditions.
7. The method of claim 1, further comprising calculating and assigning a required number of CPU cores N for LC tenants at the beginning of each window i using the following formulai
Figure RE-FDA0003096124890000022
Figure RE-FDA0003096124890000023
For the average dequeue rate, T, of requests within the window Wiavg_ioIs the average service time of the request, wherein,
Figure RE-FDA0003096124890000024
is composed of
Figure RE-FDA0003096124890000025
Wherein, QLiTo be at the window WiTslo is the target SLO requirement for the Tail delay of the LC tenant, TailioFor service time tail delay, TWiIs a window WiQueue time of the first request in。
8. The method of claim 7, further comprising:
will NiAnd window Wi-1Number of CPU cores occupied at the end Ni-1By comparison, the process of the first and second steps,
if N is presenti>Ni-1And then seizing N from the CPU core occupied by BE tenanti-Ni-1Each core is distributed to the LC tenant;
if N is presenti<Ni-1Then reduce N for the LC tenanti-1-NiA reduced number of CPU cores to BE used to serve BE tenants;
if N is presenti=Ni-1And the number of CPU cores occupied by the LC tenant does not need to be adjusted.
9. The method of claim 1, wherein the CPU resources used by each LC tenant are responsible for both processing requests and performing CPU core resource throttling.
10. A computer-readable storage medium, in which one or more computer programs are stored, which when executed, are for implementing the method of any one of claims 1-9.
11. A computing system, comprising:
a storage device, and one or more processors;
wherein the storage means is for storing one or more computer programs which, when executed by the processor, are for implementing the method of any one of claims 1-9.
CN202110399392.3A 2021-04-14 2021-04-14 Dynamic resource regulation and control method and system for perceiving and storing tail delay SLO Active CN113127230B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110399392.3A CN113127230B (en) 2021-04-14 2021-04-14 Dynamic resource regulation and control method and system for perceiving and storing tail delay SLO
PCT/CN2021/100821 WO2022217739A1 (en) 2021-04-14 2021-06-18 Dynamic resource regulation and control method and system for sensing storage backend tail delay slo

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110399392.3A CN113127230B (en) 2021-04-14 2021-04-14 Dynamic resource regulation and control method and system for perceiving and storing tail delay SLO

Publications (2)

Publication Number Publication Date
CN113127230A true CN113127230A (en) 2021-07-16
CN113127230B CN113127230B (en) 2023-10-03

Family

ID=76776333

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110399392.3A Active CN113127230B (en) 2021-04-14 2021-04-14 Dynamic resource regulation and control method and system for perceiving and storing tail delay SLO

Country Status (2)

Country Link
CN (1) CN113127230B (en)
WO (1) WO2022217739A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115033477A (en) * 2022-06-08 2022-09-09 山东省计算中心(国家超级计算济南中心) Large-scale micro-service-oriented active performance anomaly detection and processing method and system
CN116467068A (en) * 2023-03-14 2023-07-21 浙江大学 Resource scheduling method, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444012A (en) * 2020-03-03 2020-07-24 中国科学院计算技术研究所 Dynamic resource regulation and control method and system for guaranteeing delay sensitive application delay S L O
CN112463044A (en) * 2020-11-23 2021-03-09 中国科学院计算技术研究所 Method and system for ensuring tail reading delay of server side of distributed storage system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5787482A (en) * 1995-07-31 1998-07-28 Hewlett-Packard Company Deadline driven disk scheduler method and apparatus with thresholded most urgent request queue scan window
CN104679593B (en) * 2015-03-13 2017-12-01 浪潮集团有限公司 Task scheduling optimization method based on SMP system
CN109947619B (en) * 2019-03-05 2021-07-13 上海交通大学 Multi-resource management system and server for improving throughput based on service quality perception
CN112165508B (en) * 2020-08-24 2021-07-09 北京大学 Resource allocation method for multi-tenant cloud storage request service

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444012A (en) * 2020-03-03 2020-07-24 中国科学院计算技术研究所 Dynamic resource regulation and control method and system for guaranteeing delay sensitive application delay S L O
CN112463044A (en) * 2020-11-23 2021-03-09 中国科学院计算技术研究所 Method and system for ensuring tail reading delay of server side of distributed storage system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115033477A (en) * 2022-06-08 2022-09-09 山东省计算中心(国家超级计算济南中心) Large-scale micro-service-oriented active performance anomaly detection and processing method and system
CN115033477B (en) * 2022-06-08 2023-06-27 山东省计算中心(国家超级计算济南中心) Performance abnormality active detection and processing method and system for large-scale micro-service
CN116467068A (en) * 2023-03-14 2023-07-21 浙江大学 Resource scheduling method, equipment and storage medium

Also Published As

Publication number Publication date
WO2022217739A1 (en) 2022-10-20
CN113127230B (en) 2023-10-03

Similar Documents

Publication Publication Date Title
Tavakkol et al. FLIN: Enabling fairness and enhancing performance in modern NVMe solid state drives
WO2021174735A1 (en) Dynamic resource scheduling method for guaranteeing latency slo of latency-sensitive application, and system
US8549199B2 (en) Data processing apparatus and a method for setting priority levels for transactions
US8510741B2 (en) Computing the processor desires of jobs in an adaptively parallel scheduling environment
US8667493B2 (en) Memory-controller-parallelism-aware scheduling for multiple memory controllers
CN113127230B (en) Dynamic resource regulation and control method and system for perceiving and storing tail delay SLO
US8701118B2 (en) Adjusting thread priority to optimize computer system performance and the utilization of computer system resources
US8522244B2 (en) Method and apparatus for scheduling for multiple memory controllers
US10545701B1 (en) Memory arbitration techniques based on latency tolerance
US20110167427A1 (en) Computing system, method and computer-readable medium preventing starvation
KR101880452B1 (en) Apparatus and method for scheduling kernel execution order
JP2018517201A (en) Native storage quality of service for virtual machines
JPH09120389A (en) Method and device for job scheduling of cluster type computer
CN102904835A (en) System bandwidth distribution method and device
CN109005130A (en) network resource allocation scheduling method and device
CN108196939B (en) Intelligent virtual machine management method and device for cloud computing
CN113505084A (en) Memory resource dynamic regulation and control method and system based on memory access and performance modeling
Gifford et al. Dna: Dynamic resource allocation for soft real-time multicore systems
Usui et al. Squash: Simple qos-aware high-performance memory scheduler for heterogeneous systems with hardware accelerators
EP2038748A1 (en) Resource-based scheduler
US10733023B1 (en) Oversubscription scheduling
CN109144664B (en) Dynamic migration method of virtual machine based on user service quality demand difference
CN111208943B (en) IO pressure scheduling system of storage system
CN109005052B (en) Network task prediction method and device
CN113835868B (en) Buffer scheduling method based on feedback and fair queue service quality perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant