WO2022217739A1 - 一种感知存储后端尾延迟slo的动态调控资源方法及系统 - Google Patents

一种感知存储后端尾延迟slo的动态调控资源方法及系统 Download PDF

Info

Publication number
WO2022217739A1
WO2022217739A1 PCT/CN2021/100821 CN2021100821W WO2022217739A1 WO 2022217739 A1 WO2022217739 A1 WO 2022217739A1 CN 2021100821 W CN2021100821 W CN 2021100821W WO 2022217739 A1 WO2022217739 A1 WO 2022217739A1
Authority
WO
WIPO (PCT)
Prior art keywords
tenant
window
request
requests
cpu
Prior art date
Application number
PCT/CN2021/100821
Other languages
English (en)
French (fr)
Inventor
马留英
刘振青
熊劲
蒋德钧
Original Assignee
中国科学院计算技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院计算技术研究所 filed Critical 中国科学院计算技术研究所
Publication of WO2022217739A1 publication Critical patent/WO2022217739A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Definitions

  • the invention relates to the field of back-end storage of a distributed storage system, in particular to the technical field of ensuring the tail-delay requirements of each delay-type tenant in a multi-tenant shared storage back-end scenario.
  • the goal is to ensure the target SLO of the LC tenant while maximizing the bandwidth of the BE tenant to improve the resource utilization of the storage backend.
  • the existing technologies are divided into four categories.
  • the first type is to adopt a shared thread model, consider the target SLO requirements of LC tenants, and ensure the target SLO requirements of LC tenants through request scheduling (for example, priority scheduling).
  • Information feedback adjusts the sharing ratio of thread resources between LC tenants and BE tenants, or determines the maximum sending rate and priority of LC tenants by offline analysis of load characteristics, or performs offline analysis on different read and write accesses of storage devices. Combined with the priority scheduling method based on flow control to ensure the target SLO requirements of LC tenants.
  • the second type is to dynamically divide CPU resources among different tenants, and at the same time consider the target SLO requirements of LC tenants to dynamically divide CPU resources.
  • This type of method adopts a tentative incremental core allocation strategy based on a fixed time interval by detecting the historical tail delay information and comparing it with the target SLO requirement of the LC tenant.
  • the third category dynamically partitions CPU resources with the goal of minimizing tail latency for LC tenants. This type of method does not consider the target SLO requirements of LC tenants, and only dynamically allocates core resources at fixed time intervals through some information (eg, core resource utilization, request queue length, and real-time load).
  • the fourth type of method first adopts the request window to quantify the real-time load of the LC tenant, and then estimates the number of CPU cores required by the LC tenant’s target SLO demand and real-time load, and dynamically adjusts the CPU cores for the LC tenant based on the request window.
  • Resources for this kind of method, please refer to the Chinese patent application with the application number of 202010139287.1.
  • the method of regulation has obvious limitations, because (1) the historical information only reflects the performance of a historical period of time, which is different from the historical performance. There is no clear correlation in the future performance, and when a burst request occurs in the request of the LC tenant, this method of regulation based on feedback information cannot detect and effectively deal with the request burst in time.
  • the feedback control method is difficult to meet the future target requirements; (2) if the tenant access mode changes, different from its offline trace, the trace-based regulation cannot guarantee the target requirements of the tenant whose access mode changes.
  • the third type of work it aims to minimize the tail delay of LC tenants, considers the real-time load of LC tenants, and dynamically adjusts CPU core resources at fixed time intervals.
  • the way the CPU is throttled for this type of work is independent of the target SLO, which may have two problems.
  • the adjustment interval required to obtain the minimum tail delay of different LC tenants is different, which is related to the load characteristics of the LC tenants.
  • this core allocation method aiming at minimizing the tail delay of LC tenants will lead to lower resource utilization.
  • the target SLO requirements of LC tenants are relatively loose, because this type of method does not perceive the target SLO requirements, it will still It is to continuously occupy more CPU core resources to minimize the tail delay of LC tenants, which makes the bandwidth of BE tenants very low.
  • the target SLO demand and real-time load are considered at the same time to estimate the number of CPU cores required by LC tenants, and the CPU resources are dynamically adjusted based on the request window, this estimation method lacks theoretical basis, and it is not suitable for different target SLOs.
  • the tenants in demand all adopt the same core allocation strategy, and do not distinguish LC tenants with different target SLO requirements, which will lead to lower resource utilization.
  • the CPU core resources are regulated again in the request window, it is necessary to monitor the request queue rate and the request queue rate to determine whether the load burst and the service time of the underlying storage device fluctuate. The calculation process is complex and inaccurate, which may cause There is a situation that the target SLO of the LC tenant cannot be satisfied or the bandwidth of the BE tenant is low.
  • the present invention proposes a method for dynamically regulating resources in the storage backend of a distributed storage system, wherein a plurality of LC tenants share the storage backend, and each LC tenant has a request queue and the number of CPU cores N i used for the request queue, the access requests in each request queue are divided in units of windows, and N i is allocated by window i
  • the number of CPU cores includes:
  • Step 100 use all current requests in the request queue of each LC tenant as a temporary window
  • Step 200 Obtain the number of requests QL t of each temporary window, and the queuing time TW t of the first request of the temporary window;
  • Step 300 Determine the number of CPU cores N t required by the current request based on QL t and TW t ;
  • Step 400 Adjust the number of CPU cores of the LC tenant according to the required number of CPU cores N t and the number of CPU cores N i of the current window.
  • step 300 includes:
  • T avg_io is the average service time of the request, is the average dequeue rate of requests within the temporary window
  • QL t represents the number of requests in the temporary window
  • TW t represents the queuing time of the first request in the temporary window
  • T slo is the target SLO demand of the LC tenant's tail delay
  • Tail io is the service time tail delay.
  • the step 400 includes:
  • N t >N i N t -N i CPU core resources are preempted from the CPU core resources occupied by the BE tenant, and N t -N i CPU core resources are added to the LC tenant.
  • one of the following strategies is used to perform dynamic regulation of resources:
  • a SLO-aware strategy that uses a temporary window to detect and reallocate CPU core resources every time one or more requests are dequeued for different SLO requirements.
  • a temporary window is used to detect and reallocate CPU core resources every time budget requests are dequeued, wherein
  • Tslo is the target SLO requirement of the LC tenant's tail delay
  • Tail io is the service time tail delay
  • T avg_io is the average service time of the request
  • T i is the queuing time of the first request in the current window Wi .
  • a strategy is dynamically selected for the LC tenant according to three threshold window thresholds THRESH_WIN, low threshold THRESH_LOW, and high threshold THRESH_HIGH, and the method further includes:
  • it also includes calculating and allocating the required number of CPU cores N i for the LC tenant at the beginning of each window i according to the following formula:
  • T avg_io is the average service time of requests
  • QL i is the number of queued requests in the window Wi
  • Tslo is the target SLO demand of the LC tenant's tail delay
  • Tail io is the service time tail delay
  • TW i is the queuing time of the first request in the window Wi .
  • it also includes:
  • N i >N i-1 preempt N i -N i-1 cores from the CPU cores occupied by the BE tenant and allocate them to the LC tenant;
  • the CPU resources used by each LC tenant are responsible for both processing requests and performing CPU core resource regulation.
  • a computer-readable storage medium in which one or more computer programs are stored, and when executed, the computer programs are used to implement the storage back end of the distributed storage system of the present invention.
  • a method of dynamically regulating resources is provided, in which one or more computer programs are stored, and when executed, the computer programs are used to implement the storage back end of the distributed storage system of the present invention.
  • a computing system comprising:
  • a storage device and one or more processors
  • the storage device is used for storing one or more computer programs, and when the computer programs are executed by the processor, the computer programs are used to implement the method for dynamically regulating resources of the storage back end of the distributed storage system of the present invention.
  • the advantage of the present invention is that when multiple LC tenants and multiple BE tenants share the storage back end of the distributed storage system, combined with the target SLO requirements of each LC tenant and the window-based real-time load quantification method, At the beginning of each window, reasonable CPU resources are calculated and allocated for the window to ensure that the delay requested within the window meets the target SLO requirements.
  • a simple temporary window ( temp window) method to detect and calculate changes in CPU core requirements.
  • the appropriate CPU core allocation strategy is flexibly selected to meet the CPU core resource requirements of each LC tenant and ensure the target SLO requirements of each LC tenant.
  • the CPU cores occupied by LC tenants are controlled in a completely autonomous way, which avoids wasting resources. During the regulation process, the remaining CPU core resources are used to process requests from BE tenants, maximizing the bandwidth of BE tenants and improving system resource utilization.
  • FIG. 1 shows a schematic diagram of a process of processing a request in the storage back-end of a distributed storage system according to an embodiment of the present invention
  • FIG. 2 shows a schematic diagram of using temp window to detect abnormality and reallocate CPU core resources according to an embodiment of the present invention
  • Figure 3a shows a comparison of tail delays of three LC tenants under different technologies
  • Figure 3b shows a comparison diagram of the bandwidth of three BE tenants under different technologies.
  • the present invention is based on the following basic principles: 1) each tenant has a dedicated request queue and CPU core, and these are not shared among tenants; 2) the queue adopts a first-in, first-out policy, and the received requests are queued, Requests received first are processed first. The processed request will be dequeued before being processed. Dequeuing means being removed from the queue.
  • the method adopted in this paper is to quantify the load of LC tenants in real time based on the request window. The access requests in the LC tenant queue are divided into windows.
  • the present invention is designed for the CPU core calculation and allocation method and the abnormality detection method, and the following improvements are made:
  • FIG. 1 The processing process of the request in the storage backend of the distributed storage system is shown in Figure 1.
  • each window records two-tuple information ⁇ QL i ,TW i ⁇ , where QL i is the number of queued requests in window Wi , and TW i is the first request in window Wi
  • the queuing time of that is, the time interval from when the first request in the window Wi is enqueued to being processed, that is, the dequeue time of the first request in the window Wi minus the time of joining the queue.
  • the number of cores requested to meet the target SLO requirements in the window Wi can be calculated by the following formula:
  • the average request service time T avg_io and the service time tail delay Tail io can be obtained in real time during system operation.
  • the required number of CPU cores Ni is calculated and allocated for the LC tenant , and the remaining CPU core resources will be used by the BE tenant to maximize resource utilization.
  • the technical effect of this improvement is to calculate and allocate accurate CPU core resources for each LC tenant in a window as a unit to ensure their respective target SLO requirements, while the remaining CPU cores are used to process the requests of BE tenants to maximize BE tenants bandwidth to improve system resource utilization.
  • the demand for CPU core resources will change. If the CPU core resource demand cannot be met in time, the target SLO of the LC tenant cannot be guaranteed.
  • burst requests load bursts
  • the target SLO requirements of the LC tenant will not be met.
  • temp window includes unqueued requests and windows in window Wi
  • QL t to represent the number of requests in the temp window
  • TW t to represent the queuing time of the first request in the temp window. Therefore, the following formula 3 can be used to calculate the average dequeue rate of requests within the temp window
  • use formula 4 to calculate the number of CPU cores N t required for temp window:
  • the technical effect of this improvement is that through this simple temp window-based detection method, two kinds of anomalies that may occur in the window can be quickly detected at the same time: the abnormal change in CPU core demand caused by the load burst or the fluctuation of the service time of the underlying storage device situation, and quickly and accurately increase the CPU core resources to avoid the abnormal situation that the target SLO is not satisfied.
  • the anomaly detection method proposed in (2) above does not need to be performed all the time, because anomalies do not always happen all the time, and too frequent detection will bring additional overhead.
  • different LC tenants have different target SLO requirements. Even if the same LC tenant is in different stages, the requirements for CPU core resources are also different. Therefore, three CPU core allocation strategies are proposed to detect anomalies with different frequencies.
  • the first is a conservative strategy, which recalculates and allocates CPU core resources only at the beginning of the window;
  • the second is an aggressive strategy, that is, every time a request in the window is dequeued, the temp window is used to check whether the CPU core resource demand occurs
  • the third is the strategy of sensing SLO, using the abnormal detection frequency budget to determine the frequency of inspection according to different SLO requirements, that is, using the temp window to check every time the budget request is dequeued, and the budget is calculated by the following formula:
  • the window threshold THRESH_WIN the low threshold THRESH_LOW, and the high threshold THRESH_HIGH. Since the high percentile tail latency needs to count the latency of enough requests to make sense, the tail latency information is dynamically obtained every THRESH_WIN windows, and the gap between the target SLO requirement and the acquired tail latency is calculated.
  • the gap exceeds THRESH_HIGH, it indicates that the target SLO tail delay has been satisfied, and a conservative strategy is selected to maximize the bandwidth of BE tenants as much as possible on the basis of satisfying the target SLO; if the gap is less than THRESH_LOW, it indicates that the target SLO is likely to fail. If it is satisfied, the aggressive strategy is selected to quickly and timely respond to the SLO violation that may occur at any time; in other cases, the SLO-aware strategy is selected, and each LC tenant dynamically sets a budget according to its own target SLO to monitor abnormal occurrences and regulate CPU cores resource.
  • the above three thresholds can be set according to the actual situation.
  • the technical effect of this improvement is to dynamically select an appropriate CPU core allocation strategy (ie, different anomaly detection frequencies) according to the gap between the LC tenant's target SLO and the statistical tail delay, detect anomalies at the right time, and re-run Calculate and allocate CPU core resources to maximize the bandwidth of BE tenants while ensuring that the target SLO of LC tenants is satisfied.
  • an appropriate CPU core allocation strategy ie, different anomaly detection frequencies
  • the above CPU core count control processes (1) (2) (3) do not require additional cores to be exclusively responsible for the control of CPU core resources.
  • the CPU core resource control method used by the LC tenant is completely autonomous, mainly because it can obtain all the information in the control process, including the LC tenant's queue status, window status, the number of cores occupied and which cores are. The currently used core allocation strategy, etc. Based on the above global information, the CPU core can monitor the queue status and adjust the CPU core resources as needed. This avoids the waste of resources caused by having to use additional cores for regulation.
  • This embodiment is a scenario in which multiple LC tenants and multiple BE tenants share the storage backend of the distributed storage system.
  • the CPU core resources are dynamically adjusted in real time to ensure the target requirements of each LC tenant, and the remaining CPU resources are used to serve the BE tenant requests to improve system resource utilization. Since the CPU cores used by each LC tenant can obtain the queue information, window information, and the number of CPU cores occupied by each LC tenant in real time, the above regulation process can be performed, that is, the CPU resources used by each LC tenant are responsible for both processing requests and executing the CPU. Core control, instead of having to use additional cores to be exclusively responsible for CPU core control, to avoid wasting resources.
  • the implementation process of the technical solution of the present invention is described by taking two LC tenants (LC1 and LC2) and two BE tenants (BE1 and BE2) sharing the storage back end of the distributed storage system as an example.
  • Tenant LC2 adopts a similar control process. It is assumed that the initial CPU core allocation strategy of tenant LC1 is an aggressive strategy (the initial strategy can be flexibly set to one of three strategies).
  • Step A At the beginning of the window, calculate and allocate the number of CPU cores required to ensure the target SLO demand, and select the appropriate CPU core allocation strategy for the tenant at the right time;
  • Step A20 If the number i of the window is an integer multiple of THRESH_WIN, count the historical tail delay Tail of the LC tenant, compare it with its target SLO (T slo ), and select a different one for the LC tenant according to the following three situations.
  • CPU core allocation strategy :
  • T slo -Tail ⁇ THRESH_LOW select the aggressive strategy, and set the frequency (budget) of anomalies in the detection window to 1, indicating that the strategy detects whether an anomaly occurs after each request is processed;
  • c) Otherwise, select the strategy of sensing SLO, and set the frequency (budget) of anomalies in the detection window as:
  • the frequency is related to the target SLO and the in-window information, indicating that the strategy detects whether an anomaly occurs after every budget request.
  • Step B During the request processing process in the window, if the number of processed requests in the window is an integer multiple of the set frequency (budget) for detecting abnormality, use the temp window to detect whether the CPU core resource requirement has changed.
  • the specific detection method is as follows: all requests in the current queue belong to the temp window, and at the same time calculate the number of CPU cores (N t ) required by the temp window, and compare it with the number of CPU cores currently occupied (N i ). N t >N i , indicating that the CPU core demand has changed, that is, an exception has occurred (load burst or underlying storage service time fluctuation). To ensure that the target SLO of the LC tenant is still satisfied in the event of an exception, N t -N i CPU cores need to be preempted from the CPU cores occupied by the BE tenant and allocated to the LC tenant.
  • Step C When the last request in the window is processed, if there is no request in the tenant queue, the subsequent window is an empty window. If the tenant does not log out from the storage backend, one CPU core will be reserved for the tenant, and the remaining CPU cores will be used to serve the BE tenant; if the tenant is logged out from the storage backend, all CPU cores occupied by the tenant will be released for service BE tenant. If the tenant queue is not empty, create a new window and allocate required CPU core resources to it, and then continue to control reasonable CPU core resources for the window, as shown in steps A and B above.
  • test is carried out on the storage back-end scenario of the distributed storage system shared by three LC tenants and three BE tenants.
  • the specific test results are shown in Figure 3.
  • Figure 3 shows the test results: when three LC tenants with different target SLO requirements (all are Webservers, the target SLO (99.9th tail delay) is 4ms/5.5ms/7ms respectively, as shown in Figure 3(a)
  • the method adopted by the present invention shown in QWin in the figure
  • the bandwidth of the BE tenant is increased by 2 to 28 times.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

本发明提供一种用于分布式存储系统存储后端的动态调控资源的方法,其中多个LC与BE租户共享存储后端,每个LC租户有一个请求队列和用于该请求队列的CPU核数Ni,将请求队列以窗口为单位进行划分,N i为窗口i分配的CPU核数,根据不同的频率将LC租户请求队列的所有当前请求作为临时窗口;获得每个临时窗口的请求数QL t,以及该临时窗口第一个请求的排队时间TWt;基于QL t和TW t确定当前临时窗口所需的CPU核数N t;根据N t与所述当前窗口的CPU核数Ni,调整该LC租户的CPU核数。基于本发明的实施例,可以最大化BE租户的带宽,并快速准确的增加CPU资源以避免异常导致目标SLO不被满足情况的发生,在合适的时机对异常进行检测,并重新计算并分配CPU资源。

Description

一种感知存储后端尾延迟SLO的动态调控资源方法及系统 技术领域
本发明涉及分布式存储系统后端存储领域,尤其是在多租户共享存储后端场景下保证各延迟型租户的尾延迟需求技术领域。
背景技术
为了提高分布式存储系统存储后端的资源利用率,不同类型的租户通常会共享分布式存储系统存储后端。这些租户一般可划分为两类:一类是对延迟有需求:明确的服务等级目标SLO(Service Level Objective)需求,即:99th/99.9th尾延迟需求,比如,99.9th尾延迟不超过5ms的延迟敏感租户(latency-critical,LC),该类租户的请求粒度较小(比如,4KB);另一类是可后台运行对性能无明确需求的租户(best-effort,BE),该类租户的请求粒度较大(比如,64KB或更大)。当LC租户与BE租户共享分布式存储系统存储后端时,由于存在明显的资源(比如:线程、CPU核)竞争,导致LC租户的尾延迟需求不能被满足,且BE租户的带宽较低。
在多LC租户与多BE租户共享分布式存储系统存储后端场景下,其目标是保证LC租户的目标SLO同时最大化BE租户的带宽,以提高存储后端的资源利用率。目前,已有大量工作围绕上述目标进行展开。将现有技术进行划分为四类。
第一类是采取共享的线程模型,考虑LC租户的目标SLO需求,并通过请求调度(比如,优先级调度)的方式来保证LC租户的目标SLO需求,该类方法或是采取基于历史尾延迟信息反馈调整LC租户与BE租户之间的线程资源共享比例,或是采取离线分析负载特征的方式确定LC租户的最高发送速率和优先级,或是对存储设备不同的读写访问进行离线分析并结合基于流量控制的优先级调度方式,以保证LC租户的目标SLO需求。
第二类是在不同的租户之间动态划分CPU资源,同时考虑LC租户的目标SLO需求对CPU资源进行动态划分。该类方法通过检测历史尾延迟信息,并与LC租户的目标SLO需求进行比较,采取基于固定的时间间隔采取尝试性的增量式的核分配策略。
第三类以最小化LC租户的尾延迟为目标,动态划分CPU资源。该类方法并不考虑LC租户的目标SLO需求,仅通过一些信息(比如,核资源的使用率,请求排队长度,以及实时负载)以固定的时间间隔动态分配核资源。
第四类方法首先采取请求窗口对LC租户的实时负载进行量化,然后同时考虑LC租户的目标SLO需求和实时负载量估算其所需的CPU核数,并基于请求窗口为LC租户动态调控CPU核资源,该类方法参见申请号为202010139287.1的中国专利申请。
上述四类现有技术在满足“多租户共享分布式存储系统存储后端场景下,保证LC租户的目标SLO需求并最大化BE租户带宽”这一目标需求时存在明显的缺点与不足,具体如下:
针对第一类工作,由于其采取共享的线程模型,尽管其根据LC租户的目标SLO需求动态调度LC租户与BE租户的请求,然而,线程需要CPU资源才能被调度处理请求,由于线程与CPU核之间无明确的关系,导致线程时时刻刻竞争CPU核资源。CPU核资源的竞争会对延迟,尤其是尾延迟产生严重影响。因此,LC租户的目标SLO需求可能不被满足。若要消除CPU核竞争对尾延迟的影响,则需要为LC租户分配更多的资源,这将会导致资源利用率低。除此之外,基于运行过程中的历史信息或租户请求历史访问记录trace的离线分析,进行调控的方法存在明显的局限性,因为(1)历史信息仅反映历史一段时间内的性能情况而与未来性能无明确的相关性,且当LC租户的请求出现突发(burst)请求时,这种基于反馈信息进行调控的方式不能够及时发现并有效的应对请求突发,这种基于历史信息进行反馈调控方式很难满足未来的目标需求;(2)若租户访问模式发生变化,与其离线trace不同,则基于trace的调控必然不能保证访问模式改变的租户的目标需求。这些因素均使得LC租户的目标SLO需求很难被满足。
针对第二类工作,虽然其在LC租户与BE租户之间动态划分CPU核资源,避免了CPU核的竞争,但其仍旧是基于历史尾延迟与目标SLO之间的差距以及实时负载量,采取增量式的核分配方式来逐渐收敛到一个能满足目标SLO的CPU核数。首先,这种基于历史信息对核进行调控的方式不准确;其次,采取增量式的分配核方式需要较长的时间(秒级)才能收敛到一个合适的核数,这就不能满足毫秒级目标SLO的需求,同时在此 收敛过程中,LC租户的延迟(尤其是尾延迟)因为CPU核资源不足而持续受到影响;最后,该类工作采取固定时间间隔的调控方式,这将使得LC租户的目标SLO很难得到保证。
针对第三类工作,其是以最小化LC租户的尾延迟为目标,考虑LC租户的实时负载量,并以固定的时间间隔对CPU核资源进行动态调整。该类工作的CPU调控方式与目标SLO无关,这可能存在两个问题。一是获得不同的LC租户的最小尾延迟需要的调控间隔是不同的,与LC租户的负载特征相关,当存在多个LC租户时,很难确定一个合理的调控间隔。二是这种以最小化LC租户的尾延迟为目标的核分配方法将会导致较低的资源利用率,当LC租户的目标SLO需求较宽松时,因为该类方法不感知目标SLO需求,仍旧是持续占用较多的CPU核资源以最小化LC租户的尾延迟,这就使得BE租户的带宽很低。
针对第二类工作和第三类工作,其均需要一个额外的核专门负责CPU核资源的调控,这显然造成了资源的浪费。另外,虽然其均考虑实时负载量对CPU核资源进行调控,但均未考虑底层存储设备的IO波动对CPU核资源分配的影响。在分布式存储系统的存储后端,请求不可避免的会访问底层存储设备,同时底层存储设备的服务时间存在明显的波动,尤其当底层存储设备为SSD时。因此,在对CPU核资源进行动态分配时,必须将底层设备服务时间波动带来的影响一并考虑,才能保证LC租户的目标SLO得到保证,同时最大化BE租户的带宽。
针对第四类工作,其虽然同时考虑目标SLO需求与实时负载量估算LC租户所需要的CPU核数并基于请求窗口动态调控CPU资源,但是这种估算方法缺少理论依据,且其对不同目标SLO需求的租户均采取相同的核分配策略,并不区分不同目标SLO需求的LC租户,这将导致较低的资源利用率。同时在请求窗口内再次调控CPU核资源时,需要监控请求的入队速率又需要监控请求的出队速以判定是否发生负载burst与底层存储设备服务时间波动,计算过程复杂且不够准确,可能会出现LC租户的目标SLO不能被满足或者BE租户带宽低的情况。
发明内容
本发明针对上述问题,根据本发明的第一方面,提出一种用于分布式存储系统存储后端的动态调控资源的方法,其中多个LC租户共享存 储后端,每个LC租户有一个请求队列和用于该请求队列的CPU核数N i,每个请求队列中的访问请求以窗口为单位进行划分,N i为窗口i分配的
CPU核数,所述方法包括:
步骤100:将每个LC租户的请求队列的所有当前请求作为临时窗口;
步骤200:获得每个临时窗口的请求数QL t,以及该临时窗口第一个请求的排队时间TW t
步骤300:基于QL t和TW t确定当前请求所需的CPU核数N t
步骤400:根据所需的CPU核数N t与所述当前窗口的CPU核数N i,调整该LC租户的CPU核数。
在本发明的一个实施例中,步骤300包括:
使用以下公式计算临时窗口所需的CPU核数N t:
Figure PCTCN2021100821-appb-000001
其中T avg_io为请求的平均服务时间,
Figure PCTCN2021100821-appb-000002
为临时窗口内请求的平均出队速率,
Figure PCTCN2021100821-appb-000003
使用以下公式计算:
Figure PCTCN2021100821-appb-000004
其中,QL t表示临时窗口中的请求数,TW t表示临时窗口中的第一个请求的排队时间,T slo为LC租户的尾延迟的目标SLO需求,Tail io为服务时间尾延迟。
在本发明的一个实施例中,其中所述分布式存储系统还包括BE租户,所述步骤400包括:
若N t>N i,则从BE租户所占用的CPU核资源中抢占N t-N i个CPU核资源,并为所述LC租户增加N t-N i个CPU核资源。
在本发明的一个实施例中,采用以下策略之一执行动态调控资源:
保守策略,其仅在窗口开始时检测并重新分配CPU核资源;
激进策略,其在窗口内的请求每出队一个就执行动态调控资源;以及
感知SLO的策略,其针对不同的SLO需求,在每出队一个或多个请求时使用临时窗口进行检测并重新分配CPU核资源。
在本发明的一个实施例中,其中,在所述感知SLO的策略中,在每出队budget个请求时使用临时窗口进行检测并重新分配CPU核资源,其中
Figure PCTCN2021100821-appb-000005
其中,Tslo为LC租户的尾延迟的目标SLO需求,Tail io为服务时间尾延迟,T avg_io为请求的平均服务时间,TW i为当前窗口W i内第一个请求的排队时间。
在本发明的一个实施例中,根据3个阈值窗口阈值THRESH_WIN、低阈值THRESH_LOW、高阈值THRESH_HIGH动态为LC租户选择策略,所述方法还包括:
每隔THRESH_WIN个窗口动态获取一次尾延迟信息,并计算目标SLO需求与获取的尾延迟之间的差距,若差距超过THRESH_HIGH,则选择保守策略,若差距小于THRESH_LOW,则选择激进策略,其他情况,则选择感知SLO的策略。
在本发明的一个实施例中,还包括采用根据以下公式在每个窗口i开始时,为LC租户计算并分配所需的CPU核数N i
Figure PCTCN2021100821-appb-000006
Figure PCTCN2021100821-appb-000007
为窗口Wi内请求的平均出队速率,T avg_io为请求的平均服务时间,其中,
Figure PCTCN2021100821-appb-000008
Figure PCTCN2021100821-appb-000009
其中,QL i为在窗口W i中的排队请求数,Tslo为LC租户的尾延迟的目标SLO需求,Tail io为服务时间尾延迟,TW i为窗口W i内第一个请求的排队时间。
在本发明的一个实施例中,还包括:
将N i与窗口W i-1结束时所占的CPU核数N i-1比较,
若N i>N i-1,则从BE租户所占用的CPU核中抢占N i-N i-1个核并分配给该LC租户;
若N i<N i-1,则为该LC租户减少N i-1-N i个CPU核,减少的CPU核将用于服务于BE租户;
若N i=N i-1,则该LC租户占用的CPU核数无需调整。
在本发明的一个实施例中,其中各LC租户使用的CPU资源既负责处理请求又执行CPU核资源调控。
根据本发明的第二方面,提供一种计算机可读存储介质,其中存储有一个或者多个计算机程序,所述计算机程序在被执行时用于实现本发明的用于分布式存储系统存储后端的动态调控资源的方法。
根据本发明的第三方面,提供一种计算系统,包括:
存储装置、以及一个或者多个处理器;
其中,所述存储装置用于存储一个或者多个计算机程序,所述计算机程序在被所述处理器执行时用于实现本发明的用于分布式存储系统存储后端的动态调控资源的方法。
与现有技术相比,本发明的优点在于当多个LC租户与多个BE租户共享分布式存储系统存储后端时,结合各LC租户的目标SLO需求与基于窗口的实时负载量化方法,在每个窗口开始时,为该窗口计算并分配合理的CPU资源以保证该窗口内请求的延迟满足目标SLO需求。在每个窗口处理过程中,针对可能发生的2种对CPU核需求发生改变而导致目标SLO不能被满足的异常情况(负载burst和底层存储设备服务时间波动),采取一种简洁的临时窗口(temp window)方式进行检测并计算CPU核需求的变化。同时,针对不同的LC租户或同一LC租户的不同阶段,灵活地选择合适的CPU核分配策略,以满足各LC租户对CPU核资源的需求,并保证了各LC租户的目标SLO需求。另外,LC租户占用的CPU核采取完全自治的调控方式,避免了资源浪费。在调控过程中,剩余的CPU核资源用于处理BE租户的请求,最大化BE租户的带宽,提升了系统资源利用率。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并与说明书一起用于解释本发明的原理。显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1示出了根据本发明实施例的请求在分布式存储系统存储后端处理过程示意图;
图2示出了根据本发明实施例的使用temp window检测异常并重新分配CPU核资源的示意图;
图3a示出了不同技术下3个LC租户尾延迟对比图;
图3b示出了不同技术下3个BE租户带宽对比图。
具体实施方式
针对背景技术中提出的问题,发明人经过研究,提出了一种感知存储后端尾延迟SLO的动态调控资源的方法。首先,本发明基于以下基本原则:1)每个租户有一个专门的请求队列和CPU核,且这些在租户间不共享;2)该队列采取先进先出的策略,接收到的请求被排队,先收到的请求,先处理。被处理的请求会先出队再处理,出队指从队列中删去。本采取的基于请求的窗口对LC租户负载进行实时量化的方法,将LC租户队列中的访问请求以窗口为单位进行划分,当队列中第一个请求被处理时,当前队列中的所有请求视为一个窗口,对于窗口W i,在W i窗口中的最后一个请求之后进入队列的第一个请求为窗口W i+1的第一个请求,i为正整数。在本发明的模型中每个LC租户只有一个请求队列。
例如当队列中第一个请求被处理时,如果在该队列中目前共有8个请求,则所有8个请求组成窗口W 1,在处理W 1的8个请求的过程中,会陆续接收一些请求。当整个W 1的8个请求处理完后,队列中这8个请求都已出队,当前队列的第一个请求为窗口W 2的第一个请求,当开始处理这个请求时,队列中所有请求构成窗口W 2的请求。假设窗口W 2有9个请求,当队列中这9个请求都已出队,当前队列的第一个请求为窗口W 3的第一个请求,当开始处理这个请求时,队列中所有请求构成窗口W 3的请求。之后的请求还可以构成W 3、W 4、W 5…窗口。
本发明针对CPU核计算与分配方法和异常检测方法进行了设计,进行了以下改进:
(1)结合利特尔法则(Little’s Law),根据目标SLO需求计算需分配 CPU核数
请求在分布式存储系统存储后端的处理过程如图1所示。将图1中虚线方框部分视为一个请求处理系统,结合经典理论利特尔法则:L=λW(其中,L为请求处理并发度,λ为请求的平均到达速率,W为请求的平均处理时间),则:L即为CPU核数(N),λ即为请求的平均出队速率(DR avg),W即为请求的平均服务时间(T avg_io)。将LC租户的请求划分为窗口,每个窗口记录二元组信息{QL i,TW i},其中QL i为在窗口W i中的排队请求数,TW i为窗口W i内第一个请求的排队时间,即从窗口W i第一个请求入队到被处理 的时间间隔,也即是窗口W i内第一个请求的出队时刻减去入队时刻。为保证LC租户的尾延迟为目标SLO需求(T slo),则必须满足窗口W i内的所有请求在T slo-Tail io-TW i时间内出队,其中Tail io为底层设备服务时间的尾延迟。因此,窗口Wi内请求的平均出队速率
Figure PCTCN2021100821-appb-000010
可以通过如下公式计算所得:
Figure PCTCN2021100821-appb-000011
只要窗口内请求按照此速率出队请求,则窗口内的请求的延迟均不会超过T slo,也即能满足目标SLO需求。最后,与利特尔法则理论进行结合,使得窗口W i内请求满足目标SLO需求的核数可以通过如下公式计算所得:
Figure PCTCN2021100821-appb-000012
其中,请求平均服务时间T avg_io与服务时间尾延迟Tail io均可在系统运行过程中实时获取。随着LC租户窗口发生变化,在每个窗口开始时,为LC租户计算并分配所需的CPU核数N i,剩余CPU核资源将被BE租户使用,以最大化资源利用率。
此改进的技术效果为以窗口为单位,为各个LC租户计算并分配准确的CPU核资源以保证其各自的目标SLO需求,同时剩余的CPU核用于处理BE租户的请求,以最大化BE租户的带宽,提升系统的资源利用率。
(2)对窗口内负载突发(burst)与底层存储设备服务时间波动的检测 与重新分配CPU核数
无论负载发生burst请求访问还是底层存储设备服务时间波动,都会导致CPU核资源的需求发生变化,若CPU核资源需求不能及时被满足,则LC租户的目标SLO必然不能被保证。通过分析发现,LC租户大量的突发请求(负载burst)会使得LC租户的队列请求急剧增加,也即队列变长;底层存储设备服务时间波动使得服务该请求的CPU核被长时间占用而不能及时处理队列中的后续请求,同样导致LC租户的请求队列变长。若不能及时按需增加CPU核资源,则LC租户的目标SLO需求将不能被满足。需要更长的排队时间。由于负载burst与底层存储服务时间波动均会导致请求队列变长,因此,提出了使用临时窗口temp window对异常发 生进行检测的方法,如图2所示。在窗口W i的处理过程中的某个时刻,将当前队列中的所有请求定义为一个temp window,然后,使用类似于上述(1)中改进的CPU核计算方法计算temp window对CPU核资源的需求(N t)。在进行检测时,窗口W i中的一部分请求可能已经出队,而在检测之后,仍然有可能有请求进入窗口W i+1,因此temp window包括窗口W i中的未出队的请求和窗口W i+1的已入队的请求,用QL t表示temp window中的请求数,TW t表示temp window中的第一个请求的排队时间。因此可以用以下公式3计算temp window内请求的平均出队速率
Figure PCTCN2021100821-appb-000013
并用公式4计算temp window所需的CPU核数N t
Figure PCTCN2021100821-appb-000014
Figure PCTCN2021100821-appb-000015
若计算所得CPU核数超过当前窗口正在使用的核数(N t>N i),则表明发生了异常(负载burst或服务时间波动),此时需要抢占BE租户所占用的CPU核资源((N t-N i)个CPU核),并为LC租户增加所抢占的CPU核数,以应对异常并保证LC租户的目标SLO得到满足。
此改进的技术效果为通过这种简洁的基于temp window的检测方法,能同时快速检测窗口内可能发生的2种异常:由负载burst或底层存储设备服务时间波动导致的CPU核需求发生变化的异常情况,并快速准确的增加CPU核资源以避免异常导致目标SLO不被满足情况的发生。
(3)以不同的频率对异常进行检测的CPU核分配策略
上述(2)提出的异常检测方法并不需要时刻进行,因为异常并不总是时刻发生,且过于频繁的检测会带来额外的开销。另外,不同的LC租户其目标SLO需求也不尽相同,即使同一个LC租户,其所处的阶段不同,对CPU核资源的需求亦不同。因此,提出了3种以不同的频率对异常进行检测的CPU核分配策略。第一种是保守策略,仅在窗口开始时重新计算并分配CPU核资源;第二种是激进策略,也即窗口内的请求每出队一个就要使用temp window进行检查是否CPU核资源需求发生了变化;第三种是感知SLO的策略,针对不同的SLO需求使用异常检测频率budget确定检查的频率,也即每出队budget个请求就使用temp window检查一次, budget通过如下的公式计算所得:
Figure PCTCN2021100821-appb-000016
另外,还可以基于3个阈值,窗口阈值THRESH_WIN,低阈值THRESH_LOW,高阈值THRESH_HIGH动态为LC租户选择合适的CPU核分配策略的方法。由于高百分位尾延迟需要对足够多请求的延迟进行统计才有意义,所以每隔THRESH_WIN个窗口动态获取一次尾延迟信息,并计算目标SLO需求与获取的尾延迟之间的差距。若差距超过THRESH_HIGH,则表明目标SLO尾延迟已被满足,则选择保守策略,在满足目标SLO的基础上,尽可能最大化BE租户的带宽;若差距小于THRESH_LOW,则表明目标SLO很有可能不能被满足,则选择激进策略,以快速及时应对随时可能发生的SLO违反情况;其他情况,则选择感知SLO的策略,各LC租户根据自身的目标SLO动态设定budget以监控异常发生并调控CPU核资源。上述3个阈值可根据实际情况进行设定。
此改进的技术效果为根据LC租户的目标SLO与统计尾延迟之间的差距,动态选择合适的CPU核分配策略(也即不同的异常检测频率),在合适的时机对异常进行检测,并重新计算并分配CPU核资源,在保证LC租户的目标SLO得到满足的同时最大化BE租户的带宽。
(4)CPU核的调控自治
上述CPU核数调控过程(1)(2)(3)均无需额外的核专门负责CPU核资源的调控。LC租户使用的CPU核对核资源的调控方式是完全自治的,这主要是由于其能够获取调控过程中的所有信息,包括LC租户的队列情况,窗口情况,占用的核数及具体是哪些核,目前所使用的核分配策略等等。基于上述全局信息,CPU核能够监控队列情况并按需对CPU核资源进行调控。从而避免了必须使用额外的核进行调控所带来的资源浪费。
以下为本发明的一个具体实施例。
本实施例为多个LC租户与多个BE租户共享分布式存储系统存储后端的场景。根据各LC租户的目标SLO需求实时动态调控CPU核资源以保证各LC租户的目标需求,同时剩余CPU资源用于服务BE租户请求以 提高系统资源利用率。由于各LC租户使用的CPU核能实时获取各LC租户的队列信息,窗口信息,所占用的CPU核数,因此可以执行上述调控过程,也即各LC租户使用的CPU资源即负责处理请求又执行CPU核调控,而不必使用额外的核专门负责CPU核调控,避免了资源浪费。以两个LC租户(LC1和LC2)与两个BE租户(BE1和BE2)共享分布式存储系统存储后端为例描述本发明技术方案的实施过程。
当租户LC1与LC2的请求开始访问存储后端时,根据窗口确定的方法,分别为LC1与LC2创建各自的请求窗口,在窗口开始时,记录各自窗口相关信息,并结合各自的目标SLO需求根据公式
Figure PCTCN2021100821-appb-000017
分别计算满足各自目标SLO需求所需要的CPU核数(假定为N 1和N 2),并将N 1个CPU核分配给租户LC1,将N 2个CPU核分配为租户LC2。若系统中CPU核总数为N,则剩余的N N 1-N 2个CPU核将服务于租户BE1和租户BE2,且两个BE租户以round-robin方式共享剩余CPU核。在后续请求处理过程中,随着窗口的变化,若租户LC1与租户LC2的CPU核数发生变化(被调控以满足目标SLO需求),则用于服务于BE租户的CPU核数进行相应地调整。
以租户LC1的窗口W i(第i个窗口)为例,描述CPU核分配策略的调整以及窗口内CPU核数进行调整的过程,租户LC2采取类似的调控过程。假定租户LC1初始的CPU核分配策略为激进策略(初始策略可灵活设置为3种策略中的一种)。
步骤A:在窗口开始时,计算并分配为保证目标SLO需求所需要的CPU核数,同时在合适的时机为该租户选择合适的CPU核分配策略;
步骤A10:计算窗口W i所需要的CPU核数(N i)与当前占用的CPU核数(N i-1,也即窗口W i-1结束时所占的CPU核数),若N i>N i-1,则从BE租户所占用的CPU核中抢占N i-N i-1个核并分配给该LC租户;若N i<N i-1,则为该LC租户减少N i-1-N i个CPU核,减少的CPU核将用于服务于BE租户;若N i=N i-1,则该LC租户占用的CPU核数无需调整。
步骤A20:若该窗口的编号i为THRESH_WIN的整数倍,则统计计算该LC租户的历史尾延迟Tail,并与其目标SLO(T slo)进行对比,按照以下三种情况为该LC租户选择不同的CPU核分配策略:
a)若T slo-Tail>THRESH_HIGH,则选择保守策略,并设定检测窗口内异常的频率(budget)为0,表明该策略仅在窗口开始时重新计算CPU核需求;
b)若T slo-Tail<THRESH_LOW,则选择激进策略,并设定检测窗口内异常的频率(budget)为1,表明该策略每处理一个请求后就检测是否发生异常;
c)否则,选择感知SLO的策略,并设定检测窗口内异常的频率(budget)为:
Figure PCTCN2021100821-appb-000018
该频率与目标SLO和窗口内信息相关,表明该策略每隔budget个请求后检测异常是否发生。
步骤B:在窗口内请求处理过程中,若该窗口内已处理的请求数为设定的检测异常的频率(budget)的整数倍,则使用temp window检测CPU核资源需求是否发生了变化。具体检测方式为:当前队列中的全部请求均属于temp window,同时为temp window计算其所需要的CPU核数(N t),并与当前所占用的CPU核数(N i)进行比较,若N t>N i,表明CPU核需求发生了变化,也即发生了异常(负载burst或底层存储服务时间波动)。为保证在异常发生的情况下该LC租户的目标SLO仍然得到满足,此时需要从BE租户所占用的CPU核中抢占N t-N i个CPU核并分配给该LC租户。
步骤C:当该窗口中最后一个请求被处理完成时,若该租户队列中无请求,那么后续窗口为空窗口。若该租户未从存储后端注销,则为该租户预留一个CPU核,其余CPU核将用于服务BE租户;若该租户从存储后端注销,则释放其占用的所有CPU核用于服务BE租户。若该租户队列不为空,则建立新窗口,并为其分配所需的CPU核资源,后续继续为该窗口调控合理的CPU核资源,如上步骤A和步骤B所示。
以下为对本发明与现有技术Shenango和Cake的实例对比。
针对3个LC租户与3个BE租户共享分布式存储系统存储后端场景进行测试,具体测试结果如图3所示。
从展示测试结果图的3可以看出:当3个不同目标SLO需求的LC租户(均为Webserver,目标SLO(99.9th尾延迟)分别为4ms/5.5ms/7ms,如图3(a)中虚线所示)与3个BE租户(BE租户带宽如图3(b)所示)共享分布式存储系统存储后端时,本发明所采取的方法(图中QWin所示),能够同时满足3个LC租户的不同目标SLO需求,同时BE租户的带宽提升2倍~28倍。
为使本领域任何普通技术人员能够实现或者使用本公开内容,上面围绕本公开内容进行了描述。对于本领域普通技术人员来说,对本公开内容进行各种修改是显而易见的,并且,本文定义的通用原理也可以在不脱离本公开内容的精神或保护范围的基础上适用于其它变型。此外,除非另外说明,否则任何方面和/或实施例的所有部分或一部分可以与任何其它方面和/或实施例的所有部分或一部分一起使用。因此,本公开内容并不限于本文所描述的例子和设计方案,而是与本文公开的原理和新颖性特征的最广范围相一致。

Claims (11)

  1. 一种用于分布式存储系统存储后端的动态调控资源的方法,其中多个LC租户共享存储后端,每个LC租户有一个请求队列和用于该请求队列的CPU核数N i,每个请求队列中的访问请求以窗口为单位进行划分,N i为窗口i分配的CPU核数,所述方法包括
    步骤100:将每个LC租户的请求队列的所有当前请求作为临时窗口;
    步骤200:获得每个临时窗口的请求数QL t,以及该临时窗口第一个请求的排队时间TW t
    步骤300:基于QL t和TW t确定当前请求所需的CPU核数N t
    步骤400:根据所需的CPU核数N t与所述当前窗口的CPU核数N i,调整该LC租户的CPU核数。
  2. 根据权利要求1所述的方法,步骤300包括:
    使用以下公式计算临时窗口所需的CPU核数N t:
    Figure PCTCN2021100821-appb-100001
    其中T avg_io为请求的平均服务时间,
    Figure PCTCN2021100821-appb-100002
    为临时窗口内请求的平均出队速率,
    Figure PCTCN2021100821-appb-100003
    使用以下公式计算:
    Figure PCTCN2021100821-appb-100004
    其中,QL t表示临时窗口中的请求数,TW t表示临时窗口中的第一个请求的排队时间,T slo为LC租户的尾延迟的目标SLO需求,Tail io为服务时间尾延迟。
  3. 根据权利要求1所述的方法,其中所述分布式存储系统还包括BE租户,所述步骤400包括:
    若N t>N i,则从BE租户所占用的CPU核资源中抢占N t-N i个CPU核资源,并为所述LC租户增加N t-N i个CPU核资源。
  4. 根据权利要求1所述的方法,采用以下策略之一执行动态调控资源:
    保守策略,其仅在窗口开始时检测并重新分配CPU核资源;
    激进策略,其在窗口内的请求每出队一个就执行动态调控资源;以及
    感知SLO的策略,其针对不同的SLO需求,在每出队一个或多个请求时使用临时窗口进行检测并重新分配CPU核资源。
  5. 根据权利要求4所述的方法,其中,在所述感知SLO的策略中,在每出队budget个请求时使用临时窗口进行检测并重新分配CPU核资源,其中
    Figure PCTCN2021100821-appb-100005
    其中,Tslo为LC租户的尾延迟的目标SLO需求,Tail io为服务时间尾延迟,T avg_io为请求的平均服务时间,TW i为当前窗口W i内第一个请求的排队时间。
  6. 根据权利要求4所述的方法,根据3个阈值窗口阈值THRESH_WIN、低阈值THRESH_LOW、高阈值THRESH_HIGH动态为LC租户选择策略,所述方法还包括:
    每隔THRESH_WIN个窗口动态获取一次尾延迟信息,并计算目标SLO需求与获取的尾延迟之间的差距,若差距超过THRESH_HIGH,则选择保守策略,若差距小于THRESH_LOW,则选择激进策略,其他情况,则选择感知SLO的策略。
  7. 根据根据权利要求1所述的方法,还包括采用根据以下公式在每个窗口i开始时,为LC租户计算并分配所需的CPU核数N i
    Figure PCTCN2021100821-appb-100006
    Figure PCTCN2021100821-appb-100007
    为窗口Wi内请求的平均出队速率,T avg_io为请求的平均服务时间,其中,
    Figure PCTCN2021100821-appb-100008
    Figure PCTCN2021100821-appb-100009
    其中,QL i为在窗口W i中的排队请求数,Tslo为LC租户的尾延迟的目标SLO需求,Tail io为服务时间尾延迟,TW i为窗口W i内第一个请求的排队时间。
  8. 根据权利要求7所述的方法,还包括:
    将N i与窗口W i-1结束时所占的CPU核数N i-1比较,
    若N i>N i-1,则从BE租户所占用的CPU核中抢占N i-N i-1个核并分配给该LC租户;
    若N i<N i-1,则为该LC租户减少N i-1-N i个CPU核,减少的CPU核将用于服务于BE租户;
    若N i=N i-1,则该LC租户占用的CPU核数无需调整。
  9. 根据权利要求1所述的方法,其中各LC租户使用的CPU资源既负责处理请求又执行CPU核资源调控。
  10. 一种计算机可读存储介质,其中存储有一个或者多个计算机程序,所述计算机程序在被执行时用于实现如权利要求1-9任意一项所述的方法。
  11. 一种计算系统,包括:
    存储装置、以及一个或者多个处理器;
    其中,所述存储装置用于存储一个或者多个计算机程序,所述计算机程序在被所述处理器执行时用于实现如权利要求1-9任意一项所述的方法。
PCT/CN2021/100821 2021-04-14 2021-06-18 一种感知存储后端尾延迟slo的动态调控资源方法及系统 WO2022217739A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110399392.3A CN113127230B (zh) 2021-04-14 2021-04-14 一种感知存储后端尾延迟slo的动态调控资源方法及系统
CN202110399392.3 2021-04-14

Publications (1)

Publication Number Publication Date
WO2022217739A1 true WO2022217739A1 (zh) 2022-10-20

Family

ID=76776333

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/100821 WO2022217739A1 (zh) 2021-04-14 2021-06-18 一种感知存储后端尾延迟slo的动态调控资源方法及系统

Country Status (2)

Country Link
CN (1) CN113127230B (zh)
WO (1) WO2022217739A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115033477B (zh) * 2022-06-08 2023-06-27 山东省计算中心(国家超级计算济南中心) 一种面向大规模微服务的性能异常主动检测和处理方法及系统
CN116467068A (zh) * 2023-03-14 2023-07-21 浙江大学 资源调度方法、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5787482A (en) * 1995-07-31 1998-07-28 Hewlett-Packard Company Deadline driven disk scheduler method and apparatus with thresholded most urgent request queue scan window
CN104679593A (zh) * 2015-03-13 2015-06-03 浪潮集团有限公司 一种基于smp系统的任务调度优化方法
CN109947619A (zh) * 2019-03-05 2019-06-28 上海交通大学 基于服务质量感知提高吞吐量的多资源管理系统及服务器
CN111444012A (zh) * 2020-03-03 2020-07-24 中国科学院计算技术研究所 一种保证延迟敏感应用延迟slo的动态调控资源方法及系统
CN112165508A (zh) * 2020-08-24 2021-01-01 北京大学 一种多租户分布式存储请求服务的资源分配方法
CN112463044A (zh) * 2020-11-23 2021-03-09 中国科学院计算技术研究所 一种保证分布式存储系统服务器端读尾延迟的方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5787482A (en) * 1995-07-31 1998-07-28 Hewlett-Packard Company Deadline driven disk scheduler method and apparatus with thresholded most urgent request queue scan window
CN104679593A (zh) * 2015-03-13 2015-06-03 浪潮集团有限公司 一种基于smp系统的任务调度优化方法
CN109947619A (zh) * 2019-03-05 2019-06-28 上海交通大学 基于服务质量感知提高吞吐量的多资源管理系统及服务器
CN111444012A (zh) * 2020-03-03 2020-07-24 中国科学院计算技术研究所 一种保证延迟敏感应用延迟slo的动态调控资源方法及系统
CN112165508A (zh) * 2020-08-24 2021-01-01 北京大学 一种多租户分布式存储请求服务的资源分配方法
CN112463044A (zh) * 2020-11-23 2021-03-09 中国科学院计算技术研究所 一种保证分布式存储系统服务器端读尾延迟的方法及系统

Also Published As

Publication number Publication date
CN113127230A (zh) 2021-07-16
CN113127230B (zh) 2023-10-03

Similar Documents

Publication Publication Date Title
WO2021174735A1 (zh) 一种保证延迟敏感应用延迟slo的动态调控资源方法及系统
Tavakkol et al. FLIN: Enabling fairness and enhancing performance in modern NVMe solid state drives
US10185592B2 (en) Network storage device using dynamic weights based on resource utilization
WO2022217739A1 (zh) 一种感知存储后端尾延迟slo的动态调控资源方法及系统
Delgado et al. Kairos: Preemptive data center scheduling without runtime estimates
EP2382554B1 (en) System and methods for allocating shared storage resources
US9201816B2 (en) Data processing apparatus and a method for setting priority levels for transactions
US8397236B2 (en) Credit based performance managment of computer systems
US8667493B2 (en) Memory-controller-parallelism-aware scheduling for multiple memory controllers
US20080271030A1 (en) Kernel-Based Workload Management
US8522244B2 (en) Method and apparatus for scheduling for multiple memory controllers
US8799913B2 (en) Computing system, method and computer-readable medium for managing a processing of tasks
US9244733B2 (en) Apparatus and method for scheduling kernel execution order
US9104482B2 (en) Differentiated storage QoS
US10908955B2 (en) Systems and methods for variable rate limiting of shared resource access
US9262093B1 (en) Prioritized rate scheduler for a storage system
US11220688B2 (en) Oversubscription scheduling
US10942850B2 (en) Performance telemetry aided processing scheme
Tang et al. Toward balanced and sustainable job scheduling for production supercomputers
CN114564300A (zh) 用于动态分派内存带宽的方法
Ma et al. Qwin: Core allocation for enforcing differentiated tail latency slos at shared storage backend
Kambatla et al. UBIS: Utilization-aware cluster scheduling
CN111208943B (zh) 存储系统的io压力调度系统
CN101661406A (zh) 处理单元调度装置和方法
US11921648B1 (en) Statistic-based adaptive polling driver

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21936611

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21936611

Country of ref document: EP

Kind code of ref document: A1