CN117591001A - Balancing storage device throughput and fairness in a multi-client environment - Google Patents

Balancing storage device throughput and fairness in a multi-client environment Download PDF

Info

Publication number
CN117591001A
CN117591001A CN202310646776.XA CN202310646776A CN117591001A CN 117591001 A CN117591001 A CN 117591001A CN 202310646776 A CN202310646776 A CN 202310646776A CN 117591001 A CN117591001 A CN 117591001A
Authority
CN
China
Prior art keywords
commit
requests
priority
client
client computers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310646776.XA
Other languages
Chinese (zh)
Inventor
桑杰夫·纳拉因·堤里迦
克里斯托弗·萨博尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/078,605 external-priority patent/US20240095074A1/en
Application filed by Google LLC filed Critical Google LLC
Publication of CN117591001A publication Critical patent/CN117591001A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0607Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0634Configuration or reconfiguration of storage systems by changing the state or mode of one or more devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0658Controller construction arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2206/00Indexing scheme related to dedicated interfaces for computers
    • G06F2206/10Indexing scheme related to storage interfaces for computers, indexing schema related to group G06F3/06
    • G06F2206/1012Load balancing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In a computer system in which multiple client computers share a storage device, when one or more of the client computers exceeds its usage quota, the commit priority of the input-output commands of the computers is adjusted. The commit priority of the client computers that do not exceed their quota is reduced relative to the commit priority of the client computers that exceed their quota. This allows the processing capacity of the storage device to be fully utilized while minimizing the effects of unfairness and latency experienced by, for example, other of the client computers.

Description

Balancing storage device throughput and fairness in a multi-client environment
Cross Reference to Related Applications
The present application claims the filing date of U.S. provisional patent application No.63/399,333 filed on day 19 of 8/2022, the disclosure of which is incorporated herein by reference.
Background
The present disclosure relates to the operation of storage devices such as solid state disk drives ("SSDs") and conventional disk drives serving multiple client computers. For example, in a cloud computing environment, a single storage device may store data for multiple virtual machines operating in a data center. The storage capacity (capacity) of a device is typically shared among clients by allocating a portion of the storage space to each client. For example, if an SSD has a storage capacity of 4TB and is shared by four clients, the sharing may be allocated for each client such that the total number of shares is 4TB.
Storage devices also have limited processing load capacity, i.e., limited capacity to handle input and output requests ("IOs"), such as requests to read data from and write data to the device. Heretofore, two arrangements have been used to allocate processing performance of storage devices.
In a "performance throttling" arrangement, each client is allocated a portion of the processing load capacity of the device, and IO streams from each client to the device are limited so that the streams do not exceed the allocated portion. For example, if an SSD has a processing load capacity of 1 million IOs per second ("IOPS") and the load capacity is shared equally by 4 clients, each client is allocated 250,000IOPS and the number of flows from each client is limited. Because no clients can exceed their share of allocated load capacity, no clients experience a decrease in persistent performance caused by the demands imposed on the storage device by other clients. However, this method cannot fully utilize the load capacity when the request streams from various clients fluctuate. In the example discussed above, a first one of the clients may require 800,000IOPS, while the other clients are each 10,000IOPS only. In this case, the first client is unnecessarily throttled down while a large portion of the load capacity of the storage device remains unused.
In a "work share" arrangement, each client is allowed to send an unlimited flow of requests as long as the total flow remains below the processing load capacity of the storage device. This provides for an efficient use of the storage device such that the total workload imposed by all clients together is faster than is performed in the performance throttling approach. However, when another client sends a request at a high rate, the client sending the request at a low rate will experience a longer delay. In other words, clients with low processing load are treated unfairly by the storage device.
Disclosure of Invention
One aspect of the present technology provides an operational method that promotes fairness to clients while allowing full utilization of storage hardware performance. Another aspect of the present technology provides a computer system that takes on similar benefits.
According to one aspect of the disclosure, a method of processing requests sent by a plurality of client computers to a shared storage device having a processing load capacity includes: the storage device is operated to fulfill requests at different rates such that requests with higher commit priorities are fulfilled at a greater rate than requests with lower commit priorities, monitoring a measure of processing load represented by the requests sent by each client computer. When the measure of load for the first set of client computers is higher than the processing load quota for those computers, and the measure of load for the second set of client computers is less than or equal to the processing load quota for those computers, the commit priority is assigned to the requests according to the modified assignment scheme such that the commit priority for at least some requests from the client computers in the first set is reduced relative to the commit priority of requests from the client computers in the second set compared to the original priority scheme.
In the modified allocation scheme, at least some of the requests from the client computers in the first group may have a lower commit priority than that provided in the original allocation scheme, and the requests from the client computers in the second group may have the same commit priority as that provided in the original allocation scheme.
Requests from the first group of client computers may be throttled when the sum of the metric values for the loads of all client computers exceeds a total load threshold. When the measure of load for all client computers is less than or equal to the processing load quota for the client computer, the commit priority allocation request may be allocated according to the original priority allocation scheme.
In some examples, operating the storage device to fulfill the request at a different rate may include maintaining a plurality of commit queues, each commit queue having a commit priority, and assigning the commit priority to the request may include directing the request to the commit queue. In some examples, each commit queue may have a weighted polling coefficient, and the method may include retrieving the fulfillment requests from the commit queue for fulfillment using a recurring weighted polling process such that the number of commit requests fulfilled for fulfillment during each recurring of the process is directly related to the weighted polling coefficient of the commit queue. In some examples, the same set of commit queues is used in the original and in the modified priority allocation schemes, the method including changing a commit priority of at least one of the commit queues to change from the original allocation scheme to the modified allocation scheme. According to some examples, each client computer may send requests to one or more client queues associated with the client computer, and directing the requests to the commit queue may include directing the requests from each client queue to a corresponding one of the commit queues. Fulfillment may include directing completion commands from a storage device into a set of completion queues such that completion commands generated when fulfilling requests taken from a given commit queue are directed into the completion queue corresponding to that commit queue, whereby completion commands of requests from a given input queue will be directed into the completion queue corresponding to that input queue.
According to some examples, the request may be an input/output (IO) request.
According to another aspect of the disclosure, a computer system may include a storage device and a flow controller. The flow controller may be arranged to monitor a measure of the processing load represented by the requests sent by each of the plurality of client computers. When the measure of load for the first set of client computers is higher than the processing load quota for those computers and the measure of load for the second set of client computers is less than or equal to the processing load quota for those computers, the commit priority may be assigned to the requests according to a modified assignment scheme such that the commit priority for at least some of the requests from the client computers in the first set is reduced relative to the commit priority for the requests from the client computers in the second set compared to the original priority scheme. These requests may be directed to the storage device such that requests with higher commit priorities are fulfilled at a greater rate than requests with lower commit priorities.
According to some examples, the computer system may further comprise a set of commit queues, each commit queue having an associated commit priority, and a sampler arranged to fetch requests from the queues for fulfillment by the storage device at a rate directly related to the commit priority associated with the queue, the flow controller being operable to assign commit priorities to requests by directing the requests to the commit queues. The samplers may be, for example, weighted polling samplers, and the commit priority associated with each queue is a weighted polling coefficient for that queue. The flow controller is operable to change a commit priority associated with the at least one commit queue to change from the original allocation scheme to the modified allocation scheme.
When the metric value of the load for all client computers is less than or equal to the processing load quota for the client computer, the flow controller is operable to assign the commit priority to the request according to the original prioritization scheme. The flow controller is operable to throttle requests from the first set of client computers when the sum of the load metric values of all client computers exceeds a total load threshold.
According to another aspect of the disclosure, a non-transitory computer-readable medium storing instructions executable by one or more processors for performing a method of processing requests sent by a plurality of client computers to a shared storage device having processing load capacity. Such a method may include operating the storage device to fulfill requests at different rates such that requests with higher commit priorities are fulfilled at a greater rate than requests with lower commit priorities, monitoring a measure of processing load represented by requests sent by each client computer. When the measure of load for the first set of client computers is higher than the processing load quota for those computers, and the measure of load for the second set of client computers is less than or equal to the processing load quota for those computers, the request is assigned a commit priority according to the modified allocation scheme such that the commit priority for at least some of the requests from the client computers in the first set is reduced relative to the commit priority for requests from the client computers in the second set compared to the original priority allocation scheme.
In the modified allocation scheme, at least some of the requests from the client computers in the first group may have a lower commit priority than that provided in the original allocation scheme, and the requests from the client computers in the second group may have the same commit priority as that provided in the original allocation scheme.
The instructions may also include throttling requests from the first set of client computers when the sum of the load metric values of all client computers exceeds a total load threshold.
When the measure of load for all client computers is less than or equal to the processing load quota for the client computer, the commit priority allocation request may be allocated according to the original priority allocation scheme.
Brief description of the drawings
Fig. 1 is a diagrammatic schematic view of an apparatus used in one example of the present disclosure under a first operating condition.
FIG. 2 is a flow chart depicting certain operations in an example of the present invention.
Fig. 3 is a schematic view of the apparatus of fig. 1 in a second operating condition.
Fig. 4 is a schematic diagram of an apparatus in accordance with another example of the present technology.
FIG. 5 is an exemplary block diagram of a computing environment in accordance with aspects of the present disclosure.
Detailed Description
An example of the present technique is implemented in the apparatus shown in fig. 1. The apparatus includes four client computers 20a-20d and a solid state disk drive or "SSD"22, and also includes a flow controller 24. These elements are connected to each other through a network (not shown). For example, where the client computer is a virtual machine operating in a host within the data center, the network may be a local network within the data center, and the network connection may be established under the control of a management software program, commonly referred to as a "hypervisor". The client computer issues IO commands, such as "read" or "write" commands. Because IO commands request actions of SSDs, these commands are referred to herein as "requests". SSD24 issues a completion command for each request, indicating the outcome of the request outcome, e.g., whether the request was successfully executed.
The network is configured such that requests from client computers 20a-20d to SSD 22 are routed to the SSD through flow controller 24, and such that completion commands from the SSD are routed to the client computers through the flow controller. Each client computer sends requests and receives completion commands through a plurality of client queue pairs 26, the client queue pairs 26 being associated with the computer and accessible by the flow controller 24. Each queue pair 26 includes a request queue 28 and a completion queue 30. In FIG. 1, the client queue pair (pair) is represented by ordinal numbers 0-11. Queue pairs numbered 0, 1, and 2 are associated with client computer 20 a; queue pair numbers 3, 4, and 5, and client computer 20b, etc. Each client routes requests to the request queue in the associated queue pair 26 according to the client assigned priority. For example, client 20a routes the high priority request to request queue 28 in pair 0; medium priority requests to the request queue in pair 1 and low priority requests in pair 2. Each client computer receives a completion command for routing requests in a given pair of request queues via the same pair of completion queues. For example, client 20a will receive a completion command for a high priority request from a completion queue of 0. For each client computer, these aspects of operation may be the same as the operation of the computer in communication with a dedicated storage device operating in accordance with the NVMe standard.
The SSD 22 receives the request and sends a completion command via a set of storage queue pairs 32 accessible to the SSD. Each store queue pair 32 includes a commit queue 32 and a completion queue, the commit queue 32 receiving incoming requests and feeding them to the SSD for fulfillment, the completion queue receiving completion commands from the SSD and directing them via the flow controller to the completion queue of the client queue pair 26. Store queue pair 32 is represented in FIG. 1 by ordinals 0-11.
SSD 22 includes a memory 38 and a fulfillment processor 40. The fulfillment processor 40 responds to incoming requests by performing operations necessary to read data from a location within the memory 38 specified in the command or write data to a location within the memory 38, generates appropriate completion commands, and routes the completion commands for each request to completion queues in the same pair 32 that handled the request. For example, a completion command for a request from a commit queue in pair 32 with ordinal 4 will be routed to the completion queue in the same pair.
SSD 22 also includes a weighted round robin ("WRR") sampler. The WRR sampler maintains data representing WRR coefficients associated with each commit queue, queries (poll) the commit queue during the cycle, and submits requests taken from the respective commit queue to the fulfillment processor 40. The loop interrogation process is arranged such that during each full loop, the number of requests taken from each commit queue corresponds to the weighted polling coefficient ("WRRC") associated with that queue, except that empty commit queues are ignored. In other words, requests from the commit queue with a higher WRRC are committed to the fulfillment processor and fulfilled at a greater rate than requests from the commit queue with a lower WRRC. Thus, requests from queues with higher WRRC are processed with higher commit priority than requests from queues with lower WRRC. Under the conditions shown in FIG. 1, paired commit queues 34 with ordinal numbers 0, 3, 6, and 9 have a WRRC of 5 and therefore have a high commit priority. Paired commit queues with ordinals 1, 4, 7 and 10 have WRRC of 3 and thus have medium commit priorities, and paired commit queues with ordinals 2, 5, 8 and 11 have WRRC of 1 and low commit priorities. The WRR sampler may refer to a data table showing WRRC in detail. In other examples, the data table may store data in an implicit form. For example, in an SSD operating according to the NVMe standard, there are preset values for high, medium, and low priorities, and the sampler will apply these preset values as WRRC to the individual sample queues according to the characterization "high", "medium", or "low" of each sample queue.
The flow controller 24 includes a processor 39 and a memory 41. The memory stores software that instructs the processor to perform the functions discussed below, as well as the data discussed below in connection with the processor. The flow controller also includes components such as a conventional network interface (not shown) that interfaces with the client queue pair 28 and the store queue pair 32.
The flow controller maintains an association table that associates each client queue pair 28 with one of the clients 20a-20d and also associates each client queue pair with one of the store queue pairs 32. In this example, each client queue pair 28 is associated with a storage queue pair having the same ordinal number, and the association is fixed during normal operation. The flow controller routes the requests and completion commands such that requests from the request queues in each client queue pair are routed to the commit queue of the associated store queue pair and completion commands from the completion queue in each store pair 32 are routed to the completion queue in the associated client queue pair 28. For example, a request sent by client 20b through a request queue in client pair 28 with ordinal 3 is routed to a commit queue in storage pair 32 with ordinal 3, and a completion command sent from that pair 32 is routed back to a completion queue of client pair 38 with ordinal 3. The association between the client pair 28 and the client and the association between the client pair 28 and the storage pair also establishes an association between the storage pair and the client. Thus, the storage pair 32 with ordinals 0, 1, and 2 is associated with client 20 a; those with ordinals 3, 4, and 5 are associated with client 20b, and so on. The flow controller also maintains an original WRRC value and a current WRRC value for each memory pair 32. The WRRC value shown in fig. 1 is the original value. Here, the WRRC value may again be represented as a value of WRRC, or as features such as "high", "medium", and "low", which are to be translated by the SSD to corresponding preset values.
The flow controller also maintains a performance table with allocated processing load quotas for each client. The processing load quota is a portion of the processing capacity of the SSD. The processing load quota is stated in terms of a measure of the processing load imposed on the SSD. In this example, the measure of processing load is the number of IO commands per second ("IOPS"). Thus, if the SSD has a capacity to handle 1 million IOPS, and if four clients are assigned equal quotas, each client will have a quota of 250,000IOPS. The flow controller also stores a value for the total processing load threshold. The total processing load threshold may be equal to the processing capacity of the SSD or preferably slightly less than the processing capacity, e.g. 90% of the processing capacity. The flow controller also maintains a current processing load for each client that represents the actual processing load imposed by the requests sent from each computer. In this example, the flow controller counts the number of requests sent by each client 20a-20d by counting requests sent from three client pairs 28 associated with that client 20a-20d during a count interval, and calculates the current processing load of that client 20, for example, by dividing the count value by the duration of the count interval. This process is repeated continuously such that after each count interval, for example every 100 milliseconds, the current processing load value for each client is updated. The flow controller maintains the current total load value equal to the sum of the current processing loads of all clients 20a-20 b. When the current processing load of the client is updated, the controller updates the value.
The flow controller repeatedly performs the process shown in fig. 2. In this example, the process begins after each update of the current processing load and the current total load. In block 101, the flow controller identifies the client. In block 103, the controller compares the identified current processing load of the client with the processing load quota of the client. Although the operations of fig. 2 are shown and described in a particular order, it should be understood that the order may be modified or the operations may be performed at overlapping times or simultaneously. Further, operations may be added or omitted.
If the current processing load of the client is less than or equal to the processing load quota of the client, then the process passes to block 105. At block 105, the flow controller checks the current WRRC for a commit queue associated with the client; if they are different from the original WRRC, the flow controller resets the current WRRC to the original WRRC. When the traffic control resets the WRRC, it sends a command to WRR sampler 42 specifying the commit queue ordinals and the new current WRRC for those commit queues to reset the WRRC. If the current WRRC for the commit queue associated with the client is equal to the original WRRC, then at block 105, the flow controller takes no action.
If at block 102 the current processing load of the client exceeds the processing load quota of the client, the process branches to block 107. In block 107, the flow controller compares the current total processing load to a total processing load threshold. If the current total processing load is below the threshold, this indicates that the SSD has a capacity to accommodate excess load imposed by the client above its processing load quota, and the process branches to block 109. If the current total processing load is above the total processing load threshold, this indicates that the total load is close to the capacity of the SSD, and the process branches to block 111.
At block 109, the flow controller checks the current WRRC for a commit queue associated with the client; if they are original WRRCs, the flow controller resets the WRRCs of these commit queues to modified WRRCs such that at least some of the modified WRRCs are lower than the corresponding original WRRCs, none of the modified WRRCs are higher than the corresponding original WRRCs, and none of the modified WRRCs are zero. As shown in fig. 3, commit queue 36 associated with client 20a has been reset to modified WRRC 3, 1 and 1. Thus, the commit queue with ordinal 0 has been reset to a modified WRRC (medium priority) of 3 from the original WRRC (high priority) of 5. The commit queue with ordinal 1 has been reset from the original WRRC of 3 (medium priority) to the modified WRRC of 1 (low priority). The commit queue with ordinal number 2 has a modified WRRC equal to its original WRRC of 1.
In block 111, the flow controller begins throttling requests from clients. For example, the traffic controller may reduce the rate at which it receives requests from a client request queue 28 associated with a client. In this block, the flow controller does not change the WRRC of the commit queue.
If the process has passed through either block 105 or block 109, the process passes to block 113. In this block, if such throttling has already begun earlier, the flow controller ends the throttling of requests from the client.
After performing block 111 or block 115, the flow controller determines whether there are any other unprocessed clients. If so, the process returns to block 101 to select the next client and repeat. If not, the process ends. These processes may process clients in any order.
When all clients 20a-20d send requests at a rate below their performance load quota, the process of FIG. 2 passes through block 105 for all clients. The system remains in the state shown in fig. 1. In this case, the high priority request from each client is set to have a commit queue 34 with WRRC of 5, and is therefore assigned a high commit priority. The medium priority request from each client is sent to the commit queue 34 with WRRC of 3 and is therefore assigned medium commit priority, and the low priority request from each client is sent to the commit queue with WRRC of 1 and is therefore assigned low commit priority. This state is referred to as the "original" priority assignment scheme.
When a first set of one or more clients send requests at a rate higher than their processing load quota, the process of FIG. 2 resets the WRRC, thereby resetting the commit priority of the commit queues associated with those clients. For example, in the condition shown in FIG. 3, the first group includes client 20a; the client sends requests at a rate higher than its quota. A second group of clients, consisting of client 20b, client 20c, and client 20d, sends requests at a rate that is lower than its quota. The process of fig. 2 sets the WRRC of the commit queue 34 associated with the client 20a of the first group to the modified WRRC described above, but retains the WRRC of the other commit queues associated with the clients of the second group as the original value. Thus, the high priority request from client 20a is set to have a commit queue 34 with WRRC of 3, and is therefore assigned a medium commit priority. The medium priority request from client 20a is sent to commit queue 34 with WRRC of 1 and is therefore assigned a low commit priority, and the low priority request from client 20a is sent to commit queue with WRRC of 1 and is therefore assigned a low commit priority. The commit priority for requests from the second set of clients (client 20b, client 20c, and client 20 d) remains at the same commit priority as in the original priority scheme. In other words, in the modified priority scheme, the commit priority for the request from the client of the first group is reduced compared to the commit priority of the same request under the original priority scheme. Furthermore, the commit priority of the requests from the clients of the first group is reduced relative to the commit priority of the requests from the clients of the second group.
When the request commit rate changes, different clients are included in the first and second groups, resulting in different modified priority allocation schemes.
The reduced commit priority for the clients of the first group mitigates the impact of excessive requests from the clients of the first group on the latency encountered by requests from the clients of the second group and maintains fairness in allocating processing resources of the storage devices.
The features of the examples discussed above with reference to fig. 1-3 may be varied in a number of ways. In one such variant, the commit priority, e.g., WRRC, for the various commit queues is fixed, but the flow controller may change the commit priority by directing requests from clients to a different set of commit queues. For example, the system shown in FIG. 4 is similar to the system shown in FIG. 1, except that commit queue 134 associated with storage device 122 includes three additional commit queues, numbered 12, 13, and 14. Furthermore, each commit queue has a fixed WRRC and therefore a fixed commit priority. The commit queues with ordinals 0 through 11 have fixed commit priorities that constitute the original priority allocation scheme as described above. The extra commit queue 134 with ordinals 12, 13 and 14 has a fixed WRRC of 1 and therefore has a low commit priority. When the original prioritization scheme is in effect, the flow controller 124 routes the requests in the same manner as described above, such that requests from each client request queue 128 are directed to the commit queue 134 having the same ordinal number. This represents the request queue associated with client 120a with solid arrows in fig. 4. In this case, the additional request queue 134 with ordinals 12, 13, and 14 remains empty. WRR sampler 142 in storage 122 considers all commit queues 134 during the circular sampling process, but ignores the extra sample queues because they are empty. When a client 120a exceeds its processing load quota, the flow controller reroutes requests from certain client request queues 128 associated with that client, as shown by the dashed lines in fig. 4. Thus, requests from client request queue 128 with ordinal 0 are rerouted to commit queue 134 with ordinal 1. A request from the client request queue 128 with ordinal 1 is rerouted to one of the additional commit queues. The route for the request from the client request queue 128 with ordinal 2 remains unchanged. The route for requests from other clients remains unchanged. This results in the same modified priority allocation scheme as discussed above with reference to fig. 3. Here, the request from client 120a again receives commit priorities 3, 1, and 1 for requests with high, medium, and low client priorities. Requests from other clients are rerouted in a similar manner when those clients exceed their respective processing load quotas.
In the example discussed above, the clients have equal processing load quotas, and the original priority allocation scheme provides equal commit priority to requests from all clients. The quota and original commit priority of the client need not be equal. The number of clients may vary. Further, although the examples shown above include only storage devices, the flow controller is preferably capable of supporting multiple storage devices. In this case, different groups of commit queues are associated with different storage devices. Where the flow controller is used in a cloud computing system, the flow controller is desirably capable of adding and deleting clients and storage devices as dictated by the monitoring software of the computing system.
In the example discussed above, the measure of processing load is simply the number of IO requests sent per second by each client. Other factors may also be used, as desired, such as the number of write requests, write endurance, and the amount of data used in a read or write request. These may be applied separately in order to apply a variety of measures of processing load. For each metric, the flow controller maintains a quota and a total processing load threshold for each client. For each metric, the flow controller updates a current value representing the usage of each client as the current total number of all clients. A process similar to the one described above may be implemented separately for each metric, thereby initiating a modified commit priority for a client when any one of the metrics for that client exceeds an applicable quota. Likewise, throttling may be initiated when the current total number of all clients exceeds the total processing load available. In another variation, multiple factors may be combined into a composite score, and the score may be used as a single measure of processing load.
In the example discussed above, by assigning commit priorities to requests from clients of a first group when implementing a modified priority allocation scheme, the commit priority for requests from clients of the first group (clients exceeding their processing load quota) is reduced relative to the commit priority of requests from clients of a second group, wherein the commit priority of requests from clients of the first group is lower than those used in the original priority allocation scheme. In one variation, when implementing the modified priority allocation scheme, the commit priority of requests from clients of the second group is increased above those provided in the original priority allocation scheme, while the commit priority of requests from clients of the first group remains unchanged from those provided in the original priority allocation scheme.
In the example discussed above, commit priority is implemented by a weighted poll sampler as part of the storage device. However, commit priority may be implemented by a device separate from the storage device. For example, the flow controller may include a weighted poll sampler that accepts requests from the commit queue and samples them to implement a commit prioritization scheme. The sampler outputs a single request stream to the storage device. The flow controller receives a single completion command stream from the storage device. In this arrangement, the flow controller desirably maintains a record of which requests from which clients. The flow controller uses the record to route the completion command for each corresponding request back to the client that sent the request.
In another variation, when a single request is received from a client, the flow controller may assign a commit priority to the single request. The commit priority of each request will be selected according to this active priority allocation scheme. The flow controller then routes each request to a commit queue having a priority corresponding to the assigned priority. In this arrangement, there is no fixed association between the client request queue and the commit queue; all requests with a given commit priority may be routed to the same commit queue. These commit queues are sampled by weighted poll samplers in the storage device or in the flow controller itself. Here, the flow controller also wishes to keep a record of what is necessary to route the completion command back to the client that originated each request.
FIG. 5 is a simplified block diagram illustrating an exemplary computing environment in which the above-described system may be implemented. Controller 590 may include hardware configured to balance throughput and fairness of devices in data center 580. According to one example, controller 590 may reside within and control a particular data center. According to other examples, controller 590 may be coupled to one or more data centers 580, for example, through a network, and may manage the operation of the plurality of data centers. In some examples, the data center 580 may be a substantial distance from the controller 590 and/or other data centers (not shown).
The data center 580 may include one or more computing and/or storage devices 581-586, such as databases, processors, servers, shards, units, and the like. In some examples, computing/storage devices in a data center may have different capacities. For example, different computing devices may have different processing speeds, workloads, etc. Although only some of these computing/storage devices are shown, it should be understood that each data center 580 may include any number of computing/storage devices, and that the number of computing/storage devices in a first data center may be different than the number of computing/storage devices in a second data center. Further, it should be appreciated that the number of computing devices in each data center 580 may vary over time, e.g., as hardware is removed, replaced, upgraded, or expanded.
In some examples, the controller 590 may be in communication with computing/storage devices in the data center 580 and may facilitate execution of programs. For example, the controller 590 may track the capacity, status, workload, or other information of each computing device and use this information to assign tasks. Controller 590 may include a processor 598 and a memory 592 including data 594 and instructions 596. In other examples, such operations may be performed by one or more computing devices in the data center 580, and a separate controller may be omitted from the system.
The controller 590 may include a processor 598, memory 592, and other components typically found in server computing devices. Memory 592 can store information accessible to processor 598, including instructions 596 that can be executed by processor 598. The memory may also include data 594 that may be retrieved, manipulated, or stored by the processor 598. Memory 592 may be a non-transitory computer readable medium of the type capable of storing information accessible to processor 598, such as a hard disk drive, solid state drive, tape drive, optical memory, memory card, ROM, RAM, DVD, CD-ROM, writeable, and read-only memory. The processor 598 may be a well-known processor or other known type of processor. Alternatively, the processor 598 may be a dedicated controller, such as an ASIC.
The instructions 596 may be a set of instructions, such as machine code, that is executed directly by the processor 598, or a set of instructions, such as scripts, that is executed indirectly. In this regard, the terms "instruction," "step," and "program" are used interchangeably herein. The instructions 596 may be stored in object code format for direct processing by the processor 598, or other types of computer languages, including scripts or collections of individual source code modules that are interpreted or precompiled as desired.
The data 594 may be retrieved, stored, or modified by the processor 598 in accordance with the instructions 596. For example, although the system and method is not limited by a particular data structure, the data 594 may be stored in a computer register in a relational database as a table having a plurality of different fields and records, or in an XML document. The data 594 may also be formatted in a computer readable format such as, but not limited to, binary values, ASCII or Unicode. In addition, the data 594 may include information sufficient to identify the relevant information such as numbers, descriptive text, proprietary codes, pointers, references to data stored in other memory (including other network locations), or information used by a function to calculate the relevant data.
Although fig. 5 functionally shows processor 598 and memory 592 as being within the same block, processor 598 and memory 592 may in fact comprise multiple processors and memories, which may or may not be stored within the same physical housing. For example, some of the instructions 596 and data 594 may be stored on a removable CD-ROM, while others may be stored on a read-only computer chip. Some or all of the instructions and data may be stored in a location that is physically remote from the processor 598 but still accessible to the processor 598. Similarly, processor 598 may actually comprise a collection of processors that may or may not operate in parallel.
The above alternative examples are not mutually exclusive, unless otherwise stated, but may be implemented in various combinations to obtain unique advantages. As these and other variations and combinations of the features discussed above may be utilized without departing from the subject matter defined by the claims, the foregoing description of the exemplary implementation should be taken by way of illustration rather than by way of limitation of the subject matter defined by the claims. Furthermore, the provision of the embodiments described herein, and terms such as "comprising," "including," and the like, should not be construed as limiting the claimed subject matter to particular embodiments; rather, these embodiments are provided to illustrate only one of many possible embodiments. Furthermore, the same reference numbers in different drawings may identify the same or similar elements.

Claims (20)

1. A method of processing requests sent by a plurality of client computers to a shared storage device having a processing load capacity, the method comprising:
operating the storage device to fulfill the requests at different rates such that requests with higher commit priorities are fulfilled at a greater rate than requests with lower commit priorities;
Monitoring a measure of the processing load represented by the request sent by each client computer;
when the measure of the load for a first set of the client computers is higher than the processing load quota for those computers, and the measure of the load for a second set of the client computers is less than or equal to the processing load quota for those computers, the request is assigned a commit priority according to a modified allocation scheme such that the commit priority for at least some of the requests from client computers in the first set is reduced relative to the commit priority for requests from client computers in the second set compared to the original priority scheme.
2. The method of claim 1, wherein in the modified allocation scheme, at least some of the requests from client computers in the first group have lower commit priorities than provided in the original allocation scheme, and requests from client computers in the second group have the same commit priorities as provided in the original allocation scheme.
3. The method of claim 1, comprising throttling requests from the client computers of the first group when a sum of the metrics for the loads of all the client computers exceeds a total load threshold.
4. The method of claim 1, comprising assigning a commit priority to the request according to an original priority assignment scheme when the measure of load for all of the client computers is less than or equal to a processing load quota for the client computers.
5. The method of claim 4, wherein operating the storage device to fulfill the request at a different rate comprises maintaining a plurality of commit queues, each commit queue having a commit priority, and wherein assigning a commit priority to a request comprises directing a request to the commit queue.
6. The method of claim 5, wherein each commit queue has a weighted polling coefficient, the method comprising using a cyclic weighted polling process to retrieve requests from the commit queue for fulfillment such that a number of commit requests retrieved for fulfillment during each cycle of the process is directly related to the weighted polling coefficient of the commit queue.
7. The method of claim 5, wherein the same set of commit queues is used in the original and modified priority allocation schemes, the method comprising changing the commit priority of at least one of the commit queues to change from the original allocation scheme to the modified allocation scheme.
8. The method of claim 5, wherein each client computer sends requests to one or more client queues associated with the client computer, and wherein directing requests to the commit queue comprises directing requests from each client queue to a corresponding one of the commit queues.
9. The method of claim 8, wherein the fulfilling comprises directing completion commands from the storage device into a set of completion queues such that completion commands generated when fulfilling requests taken from a given commit queue are directed into completion queues corresponding to the commit queue, whereby the completion commands of requests from a given input queue are to be directed into completion queues corresponding to the input queue.
10. The method of claim 1, wherein the request is an input/output (IO) request.
11. A computer system, comprising:
a storage device; and
a flow controller arranged to:
monitoring a measure of the processing load represented by the request sent by each client computer;
when the measure of the load for a first set of the client computers is higher than the processing load quota for those computers, and the measure of the load for a second set of the client computers is less than or equal to the processing load quota for those computers, assigning a commit priority to the requests according to a modified allocation scheme such that a commit priority for at least some of the requests from client computers in the first set is reduced relative to a commit priority for requests from client computers in the second set compared to the original priority scheme; and
the requests are directed to the storage device such that requests with higher commit priorities are fulfilled at a greater rate than requests with lower commit priorities.
12. The computer system of claim 11, further comprising a set of commit queues, each of the commit queues having an associated commit priority, and a sampler arranged to fetch requests from the queues for fulfillment by the storage device at a rate directly related to the commit priority associated with each queue, the flow controller being operable to assign commit priorities to the requests by directing the requests to the commit queues.
13. The computer system of claim 12, wherein the sampler is a weighted poll sampler and the commit priority associated with each queue is a weighted poll coefficient of the queue.
14. The computer system of claim 12, wherein the flow controller is operable to change the commit priority associated with at least one of the commit queues to change from an original allocation scheme to the modified allocation scheme.
15. The computer system of claim 11, wherein the flow controller is operable to assign a commit priority to the request according to an original priority assignment scheme when the measure of the load for all of the client computers is less than or equal to a processing load quota for the client computers.
16. The computer system of claim 15, wherein the flow controller is operable to throttle requests from the client computers of the first group when a sum of the metrics for the loads of all the client computers exceeds a total load threshold.
17. A non-transitory computer-readable medium storing instructions executable by one or more processors for performing a method of processing requests sent by a plurality of client computers to a shared storage device having a processing load capacity, the method comprising:
operating the storage device to fulfill the requests at different rates such that requests with higher commit priorities are fulfilled at a greater rate than requests with lower commit priorities;
monitoring a measure of the processing load represented by the request sent by each client computer;
when the measure of the load for a first set of the client computers is higher than the processing load quota for those computers, and the measure of the load for a second set of the client computers is less than or equal to the processing load quota for those computers, the request is assigned a commit priority according to a modified allocation scheme such that the commit priority for at least some of the requests from client computers in the first set is reduced relative to the commit priority for requests from client computers in the second set compared to the original priority scheme.
18. The non-transitory computer-readable medium of claim 17, wherein in the modified allocation scheme, at least some of the requests from client computers in the first group have a lower commit priority than provided in the original allocation scheme, and requests from client computers in the second group have the same commit priority as provided in the original allocation scheme.
19. The non-transitory computer-readable medium of claim 17, wherein the instructions further comprise throttling requests from the client computers of the first group when a sum of the metrics of the loads for all of the client computers exceeds a total load threshold.
20. The non-transitory computer-readable medium of claim 17, comprising assigning a commit priority to the request according to an original priority assignment scheme when the measure of load for all of the client computers is less than or equal to a processing load quota for the client computers.
CN202310646776.XA 2022-08-19 2023-06-02 Balancing storage device throughput and fairness in a multi-client environment Pending CN117591001A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US63/399,333 2022-08-19
US18/078,605 US20240095074A1 (en) 2022-08-19 2022-12-09 Balancing Throughput And Fairness Of Storage Devices In A Multi-Client Environment
US18/078,605 2022-12-09

Publications (1)

Publication Number Publication Date
CN117591001A true CN117591001A (en) 2024-02-23

Family

ID=89913857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310646776.XA Pending CN117591001A (en) 2022-08-19 2023-06-02 Balancing storage device throughput and fairness in a multi-client environment

Country Status (1)

Country Link
CN (1) CN117591001A (en)

Similar Documents

Publication Publication Date Title
US11531493B2 (en) Request throttling in distributed storage systems
US10318467B2 (en) Preventing input/output (I/O) traffic overloading of an interconnect channel in a distributed data storage system
US10713092B2 (en) Dynamic resource management of a pool of resources for multi-tenant applications based on sample exceution, query type or jobs
CA2942665C (en) Coordinated admission control for network-accessible block storage
US9197703B2 (en) System and method to maximize server resource utilization and performance of metadata operations
US10810143B2 (en) Distributed storage system and method for managing storage access bandwidth for multiple clients
JP6172649B2 (en) Information processing apparatus, program, and information processing method
US8898422B2 (en) Workload-aware distributed data processing apparatus and method for processing large data based on hardware acceleration
JP2004302751A (en) Method for managing performance of computer system and computer system managing performance of storage device
US10250673B1 (en) Storage workload management using redirected messages
US11556391B2 (en) CPU utilization for service level I/O scheduling
US9535743B2 (en) Data processing control method, computer-readable recording medium, and data processing control device for performing a Mapreduce process
US6622177B1 (en) Dynamic management of addresses to an input/output (I/O) device
US20150237140A1 (en) Data storage systems and methods
US20120066413A1 (en) Storage apparatus for controlling running of commands and method therefor
US20230300086A1 (en) On-demand resource capacity in a serverless function-as-a-service infrastructure
JP2023539212A (en) Storage level load balancing
JP2017037492A (en) Distributed processing program, distributed processing method and distributed processor
US10055138B1 (en) Topology aware I/O scheduling for RAID storage systems
CN117591001A (en) Balancing storage device throughput and fairness in a multi-client environment
US20240095074A1 (en) Balancing Throughput And Fairness Of Storage Devices In A Multi-Client Environment
US11928517B2 (en) Feature resource self-tuning and rebalancing
JP2020154391A (en) Information processing system and program
JP5415338B2 (en) Storage system, load balancing management method and program thereof
US20230176923A1 (en) Systems, methods and computer program products for job management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination