CN113312323A

CN113312323A - IO (input/output) request scheduling method and system for reducing access delay in parallel file system

Info

Publication number: CN113312323A
Application number: CN202110620133.9A
Authority: CN
Inventors: 周恩强; 董勇; 张伟; 谢旻; 迟万庆; 朱清华; 邬会军; 张文喆; 李佳鑫; 吴振伟
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2021-06-03
Filing date: 2021-06-03
Publication date: 2021-08-27
Anticipated expiration: 2041-06-03
Also published as: CN113312323B

Abstract

The invention discloses an IO request scheduling method and system for reducing access delay in a parallel file system, wherein the method comprises the steps of calculating IO requests of client access server storage equipment of the parallel file system and marking delay sensitivity; under the condition that a plurality of IO requests are in competition, the IO requests with higher delay sensitivity are scheduled preferentially so as to reduce the queuing time of the IO requests under the condition of competition congestion and achieve the purpose of reducing delay. The invention aims to reduce the waiting time Tw of partial IO requests under the IO request competition condition, thereby reducing the IO response delay sensed by the client application, and being capable of improving the comprehensive processing performance of various computer systems using parallel file systems, including large-scale high-performance computer systems, aiming at the typical workload with mixed requirements of high bandwidth and low IO delay.

Description

IO (input/output) request scheduling method and system for reducing access delay in parallel file system

Technical Field

The invention relates to the field of computer operating systems, in particular to an IO (input/output) request scheduling method and system for reducing access delay in a parallel file system.

Background

The parallel file system is an important component of a high-performance computer, manages a large number of storage devices, and is responsible for scheduling a large number of concurrent IO requests, the concurrent IO requests are sequentially sent to the storage devices, the requests are sequentially executed, different execution orders of the IO requests can show different performance performances, although the storage devices are usually provided with request queues, the IO queues of the storage devices are limited in depth, and the IO requests can be queued in a software layer firstly under a large-scale concurrent condition, so an IO request scheduler of the parallel file system layer has a large influence on IO response time of a user program and throughput rate of the storage devices.

The IO request scheduler of the system software is responsible for ordering the execution sequence of IO requests. For example, the Linux operating system is provided with various scheduling strategies, the requested scheduling optimization is implemented from the aspects of optimizing performance, fairness, preventing starvation and the like, and users select the scheduling strategies according to respective workload characteristics. The Lustre parallel file system is a typical representative of a parallel file system in the high-performance computing field, and a request scheduler of the Lustre parallel file system also has a plurality of selectable scheduling strategies, such as bandwidth priority and the like. Different schedulers operate according to respective scheduling policies, with a scheduling policy generally being valid for a particular workload and invalid for other workloads. When multiple workload patterns coexist, how IO requests are scheduled becomes an important and complex issue.

In a typical workload scenario for high performance computing, a parallel file system faces a mix of workload patterns. Traditional scientific computing application programs are sensitive to IO bandwidth, and novel applications such as intensive data processing and artificial intelligence are sensitive to IO delay. The mixed demand mode of high bandwidth and low IO latency becomes a typical workload under the condition that high performance computing systems are used by multi-domain application sharing competition. Under the new load mode, both the delay-sensitive IO requests and the bandwidth-sensitive IO requests are queued in a queue of the parallel file system, and when the number of the requests is large, competition occurs, which inevitably results in a long time for the IO requests to wait for scheduling, which affects the performance of the delay-sensitive application program. Under the condition of using the traditional disk medium, because the latency of the disk access is large, the IO request queuing time is low, and this problem is not obvious, but when a new storage medium (for example, a new-type nonvolatile storage medium NVM (Non-Volatile Memory) starts to be widely used, because the access latency is lower than the access latency of the disk by several orders of magnitude, the latency of the IO request queuing time in the whole IO access flow is rapidly increased, and the influence on the performance of the application program is obvious.

Disclosure of Invention

Parallel file systems generally adopt a multi-client and multi-server architecture, IO requests sent by multiple types of application programs at the client at the same time compete for storage devices of the server, when the number of IO requests exceeds the IO processing capacity of the storage devices of the server, the IO requests queue in a queue of the server, and the waiting time Tw becomes a component of IO request delay. The technical problem to be solved by the invention is as follows: the method and the system aim to reduce the waiting time Tw of partial IO requests under the IO request competition condition, thereby reducing the IO response delay sensed by client application, and being capable of improving the comprehensive processing performance of various computer systems using the parallel file system aiming at the typical workload with mixed requirements of high bandwidth and low IO delay, including large-scale high-performance computer systems.

In order to solve the technical problems, the invention adopts the technical scheme that:

an IO request scheduling method for reducing access delay in a parallel file system comprises

1) Calculating and marking delay sensitivity to IO (input/output) requests of clients of the parallel file system for accessing storage equipment of a server;

2) under the condition that a plurality of IO requests are in competition, the IO requests with higher delay sensitivity are scheduled preferentially so as to reduce the queuing time of the IO requests under the condition of competition congestion and achieve the purpose of reducing delay.

Optionally, the step of calculating the delay sensitivity in step 1) includes: judging whether the IO request is an IO request buffered by a write-back buffer WBC, if so, judging the IO request to be a delay-sensitive IO request, otherwise, judging the IO request to be a delay-insensitive IO request, and aiming at the delay-insensitive IO request R_dioTaking a preset fixed value as the delay sensitivity obtained by calculation; for delay sensitive IO requests R_wbcThen the delay sensitivity of the IO request is calculated based on the correlation of the write-back buffered WBCs and the IO.

Optionally, the calculating the delay sensitivity of the IO request based on the correlation between the write-back buffer WBC and the IO refers to: the available space ratio P of the write-back buffer WBC is obtained, and 1-P is taken as the calculated delay sensitivity of the IO request.

Optionally, when the delay sensitivity is marked in step 1), a field for adding a delay sensitivity to the IO request is included, and the field includes the calculated delay sensitivity S.

Optionally, when the delay sensitivity is marked in step 1), the method further includes adding a type field to the IO request, and the field includes a delay-insensitive IO request R_dioOr delay sensitive IO request R_wbcThe type of (d) identifies T.

Optionally, the method further includes the following steps that a server of the parallel file system performs IO request processing:

s1) initializing the maximum waiting time T_deadline；

S2) receives the IO request R andrecording time stamp TS_r；

S3) resolving the IO request R to obtain the delay sensitivity S and according to T_w＝T_deadline(1-S) calculating the waiting time T_w；

S4) determining the waiting time T_wIf it is 0, if the waiting time T is_wIf the number is 0, activating the scheduling execution workflow of the IO request R; otherwise, the IO request R is put into a scheduling waiting queue, and a queuing waiting workflow of the IO request R is activated.

Optionally, the processing step after the schedule execution workflow of the IO request is activated in step S4) includes:

S4.1A) receives the IO request R, fetches the timestamp TS_r；

S4.2A) to traverse from the wait queue to find an IO request R that originates from the same write-back buffered WBC as the IO request R_i；

S4.3A) fetch IO request R_iTime stamp TS of_i；

S4.4A) determines the IO request R_iTime stamp TS of_iTimestamp TS less than IO request R_rIf yes, remove IO request R from wait queue_iAnd request IO for R_iThe send-in execution queue;

S4.5A) determining whether the IO request in the waiting queue and the IO request R originated from the same WBC are completely traversed, if not, continuously traversing and finding the IO request R originated from the same WBC from the waiting queue and the IO request R_iJump to execute step S4.3A); otherwise, skipping to execute the next step;

S4.6A) removes the IO request R from the wait queue and enqueues the IO request R into the execution queue, executing all IO requests enqueued in turn.

Optionally, the processing step after the queue-waiting workflow of the IO request is activated in step S4) includes:

S4.1B) fetching an IO request R from the dispatch wait queue traversal;

S4.2B) wait time T for IO request R_wSubtracting the preset scheduling time slice t, and judging the value obtained after subtracting the preset scheduling time slice tWaiting time T_wIf the number of the IO request R is zero, activating a scheduling execution workflow of the IO request R; otherwise, judging whether the scheduling waiting queue is completely traversed, if not, continuously traversing from the scheduling waiting queue to take out an IO request R, and skipping to the step S4.2B); otherwise, skipping to execute the next step;

S4.3B) whether any of the following conditions is satisfied: condition 1: the time slice expires; condition 2: a new IO request enters a scheduling waiting queue; if either is satisfied, then jump to execution step S4.1B), otherwise wait for activation.

In addition, the invention also provides an IO request scheduling system for reducing access delay in a parallel file system, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the IO request scheduling method for reducing access delay in the parallel file system.

Furthermore, the present invention also provides a computer-readable storage medium having stored therein a computer program programmed or configured to execute an IO request scheduling method for reducing access latency in the parallel file system.

Compared with the prior art, the invention mainly has the following advantages:

1. calculating and marking delay sensitivity to IO (input/output) requests of clients of a parallel file system for accessing storage equipment of a server; under the condition that a plurality of IO requests are in competition, the IO requests with higher delay sensitivity are preferentially scheduled to reduce the queuing time under the condition of competition congestion and achieve the purpose of reducing the delay.

2. The invention is suitable for large-scale high-performance computer systems, such systems generally adopt a parallel file system to provide IO access service, the parallel file system usually adopts a Write-Back buffer technology (WBC, Write Back Cache) on a client, namely, IO calls sent by an application program temporarily store data in a buffer area of the client, then return the data immediately, the WBC selects a proper time and sends the data to a storage device at the rear end, and the WBC can reassemble discrete IO requests into continuous IO requests, thereby improving the access bandwidth.

3. The invention is suitable for a general parallel file system framework, can be used in series with a request scheduling system of an original system, and in a request sequence output after scheduling by the embodiment, delay sensitive requests are arranged in front, thereby being beneficial to the prior scheduling execution of a subsequent scheduling system.

Drawings

FIG. 1 is a core flow diagram of a method according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a system according to an embodiment of the present invention.

FIG. 3 is a flowchart illustrating a process of calculating and labeling delay sensitivity according to an embodiment of the present invention.

Fig. 4 is a schematic processing flow diagram of the server according to the embodiment of the present invention.

Detailed Description

As shown in fig. 1, the IO request scheduling method for reducing access delay in a parallel file system in this embodiment includes:

The IO request scheduling method for reducing access delay in the parallel file system of this embodiment assigns a delay sensitivity to each IO request, and the IO request with high delay sensitivity is preferentially scheduled and executed, thereby reducing queuing time under contention congestion, and achieving the purpose of reducing delay.

Referring to fig. 2, as an optional implementation manner, in this embodiment, each IO request is given a delay sensitivity based on the correlation between the WBC and the IO request in the client, and the IO request with high delay sensitivity is preferentially scheduled and executed, so that the queuing time of the IO request under the contention congestion condition is reduced, and the purpose of reducing the delay is achieved.

Referring to fig. 3, the step of calculating the delay sensitivity in step 1) of the present embodiment includes: judging whether the IO request is an IO request buffered by a write-back buffer WBC, if so, judging the IO request to be a delay-sensitive IO request, otherwise, judging the IO request to be a delay-insensitive IO request, and aiming at the delay-insensitive IO request R_dioTaking a preset fixed value as the delay sensitivity obtained by calculation; for delay sensitive IO requests R_wbcThen the delay sensitivity of the IO request is calculated based on the correlation of the write-back buffered WBCs and the IO. In this embodiment, the IO requests of the application program are classified into 2 types: WBC buffered requests (hereinafter referred to as R)_wbc) And requests that bypass WBC buffering (hereinafter referred to as R)_dio). According to the WBC principle, R_wbcAfter the carried data is written into the cache, the application program is informed that the IO is finished, so that the application program does not need to wait for the data to be really written into the storage equipment, and the data is selected by the file system in the backgroundThe machine sends to the storage device. The application need not wait for R_wbcTrue completion is not sensitive to its completion time, so this type of request is defined as a delay insensitive request. For R_dioThe application must wait until the IO request operation is completed at the storage device. Thus, R can be seen_dioIs directly influencing the execution time of the application and is therefore defined as a delay sensitive IO request. In the embodiment, whether the IO request enters the write-back buffer WBC is not controlled, the IO request is only discriminated at the client of the parallel file system, and a field describing delay sensitivity is added to the two types of IO requests, so that the delay-sensitive request is endowed with relatively high delay sensitivity, and the delay-insensitive request is endowed with relatively low delay sensitivity. The calculation rule of the delay sensitivity is as follows: the delay sensitivity of the Rdio request takes a fixed maximum value, and the delay sensitivity of the Rwbc request needs to be calculated according to the space availability of the WBC, so as to prevent the Rwbc request from being excessively delayed and scheduled to cause the data in the WBC not to be emptied smoothly.

Referring to fig. 3, in the present embodiment, calculating the delay sensitivity of the IO request based on the correlation between the write-back buffer WBC and the IO refers to: the available space ratio P of the write-back buffer WBC is obtained, and 1-P is taken as the calculated delay sensitivity of the IO request.

In this embodiment, when the delay sensitivity is marked in step 1), a field for adding a delay sensitivity to the IO request is included, and the field includes the calculated delay sensitivity S.

In this embodiment, when the delay sensitivity is marked in step 1), the method further includes adding a type field to the IO request, where the type field includes a delay-insensitive IO request R_dioOr delay sensitive IO request R_wbcThe type of (d) identifies T.

As shown in fig. 4, sub-diagram (a), this embodiment further includes the following steps of performing IO request processing by the server of the parallel file system:

s1) initializing the maximum waiting time T_deadline；

S2) receives the IO request R and records the timestamp TS_r；

S3) resolution from IO request RObtaining a delay sensitivity S according to T_w＝T_deadline(1-S) calculating the waiting time T_w(ii) a I.e. the smaller S, the smaller T_wThe larger the ratio, the more inverse linear relationship, i.e. when S is 0, T is corresponded_w＝T_deadlineThe waiting time is longest; when S is 1, corresponding to T_w0, no waiting is needed;

The present embodiment, step S1), will wait for the maximum time T_deadlineThe initialization setting is the time for the WBC available capacity to drop from 100% to 0%, as may be found based on specific system measurements.

In addition, step S1) of this embodiment further includes a step of initializing and setting a scheduling time slice t, where the scheduling time slice t is used for processing IO requests in the scheduling wait queue in a loop traversal manner. In this embodiment, assuming that the precision of the field S is g, the value of the scheduling time slice t is initialized to t ═ Tdeadline ×, i.e., R_wbcAnd requesting the minimum granularity of the waiting time, and activating the scheduling flow at intervals of time t by the server.

In this embodiment, a scheduling wait queue is set at the server, request scheduling is implemented according to the delay sensitivity of the received IO request, and a request with high delay sensitivity is scheduled and executed preferentially. In the case of two types of request contention, the scheduled execution of Rwbc is delayed and Tw thereof increases, but according to the working principle of WBC, when there is available space in WBC, the application program issuing the Rwbc request does not have to wait for the completion of the request, so the moderately delayed scheduling does not affect the normal execution of the application. The delay of Rwbc instead causes the Rdio request to be in a preferentially scheduled position in the queue, and therefore has a shorter latency Tw, thereby reducing the IO delay perceptible to the application issuing the Rdio request.

As shown in fig. 4, sub-diagram (c), the processing step after the scheduling execution workflow of the IO request is activated in step S4) includes:

S4.1A) receives the IO request R, fetches the timestamp TS_r；

S4.3A) fetch IO request R_iTime stamp TS of_i；

Through the loop traversal of steps S4.3A) -S4.5A) in this embodiment, IO requests { R) originating from the same write-back buffer WBC as IO request R can be fetched separately_i,…,R_jTime stamp of { TS }_i,…,TS_jRequesting IO for { R }_i,…,R_jTime stamp of { TS }_i,…,TS_jAre respectively connected with TS_rComparing, and making the time stamp earlier than TS_rThe requests are removed from the waiting queue and are sent to the next stage of the file system in sequence for execution.

As shown in fig. 4, sub-diagram (b), the processing steps after the queue-waiting workflow for activating the IO request in step S4) include:

S4.1B) fetching an IO request R from the dispatch wait queue traversal;

S4.2B) wait time T for IO request R_wSubtracting a preset scheduling time slice T, and judging the waiting time T after subtracting the preset scheduling time slice T_wIf the number of the IO request R is zero, activating a scheduling execution workflow of the IO request R; otherwise, judging whether the scheduling waiting queue is traversed completely or not, and if not, judging whether the scheduling waiting queue is traversed completely or notContinuously taking out an IO request R from the scheduling waiting queue in a traversing way after the completion of the process, and jumping to execute the step S4.2B); otherwise, skipping to execute the next step;

As an optional implementation manner, in this embodiment, the flows corresponding to subgraphs (a) to (c) in fig. 4 are executed by using different threads, and the activation is to activate the corresponding thread.

In summary, the parallel file system oriented IO request scheduling method of the embodiment has the following advantages: aiming at the problem of how to reduce the IO request delay under the condition of the mixed load mode, the delay sensitivity is respectively calculated in the parallel file system according to the type of the IO request, and the queue order of the IO request is adjusted according to the delay sensitivity. The method is suitable for a general parallel file system framework, can be used in series with a request scheduling system of an original system, and is beneficial to a subsequent scheduling system to perform priority scheduling in the request sequence output after scheduling by the method of the embodiment because delay sensitive requests are arranged in front.

In addition, the present embodiment also provides an IO request scheduling system for reducing access delay in a parallel file system, including a microprocessor and a memory, which are connected to each other, and the microprocessor is programmed or configured to execute the steps of the IO request scheduling method for reducing access delay in the parallel file system.

Furthermore, the present embodiment also provides a computer-readable storage medium, in which a computer program is stored, the computer program being programmed or configured to execute the IO request scheduling method for reducing access latency in the parallel file system.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims

1. An IO request scheduling method for reducing access delay in a parallel file system is characterized by comprising

2. The IO request scheduling method for reducing access delay in a parallel file system according to claim 1, wherein the step of calculating the delay sensitivity in step 1) comprises: judging whether the IO request is an IO request buffered by a write-back buffer WBC, if so, judging the IO request to be a delay-sensitive IO request, otherwise, judging the IO request to be a delay-insensitive IO request, and aiming at the delay-insensitive IO request R_dioTaking a preset fixed value as the delay sensitivity obtained by calculation; for delay sensitive IO requests R_wbcThen the delay sensitivity of the IO request is calculated based on the correlation of the write-back buffered WBCs and the IO.

3. The IO request scheduling method for reducing access delay in a parallel file system according to claim 2, wherein the calculating the delay sensitivity of the IO request based on the correlation between the write-back buffer WBC and the IO is: the available space ratio P of the write-back buffer WBC is obtained, and 1-P is taken as the calculated delay sensitivity of the IO request.

4. The IO request scheduling method for reducing access delay in a parallel file system according to claim 3, wherein when the delay sensitivity is marked in step 1), the method includes adding a field of the delay sensitivity to the IO request, and the field includes the calculated delay sensitivity S.

5. The IO request scheduling method for reducing access delay in a parallel file system according to claim 4, wherein when the delay sensitivity is marked in step 1), the method further comprises adding a type field to the IO request, and the type field contains a delay-insensitive IO request R_dioOr delay sensitive IO request R_wbcThe type of (d) identifies T.

6. The IO request scheduling method for reducing the access delay in the parallel file system according to any one of claims 1 to 5, further comprising the following steps of performing IO request processing by a server of the parallel file system:

s1) initializing the maximum waiting time T_deadline；

S2) receives the IO request R and records the timestamp TS_r；

7. The IO request scheduling method for reducing access delay in a parallel file system according to claim 6, wherein the processing step after the scheduling execution workflow of the IO request is activated in step S4) includes:

S4.1A) receives the IO request R, fetches the timestamp TS_r；

S4.3A) fetch IO request R_iTime stamp TS of_i；

8. The IO request scheduling method for reducing access delay in a parallel file system according to claim 7, wherein the processing step after the queue-waiting workflow for activating the IO request in step S4) includes:

S4.1B) fetching an IO request R from the dispatch wait queue traversal;

S4.2B) wait time T for IO request R_wSubtracting a preset scheduling time slice T, and judging the waiting time T after subtracting the preset scheduling time slice T_wIf the number of the IO request R is zero, activating a scheduling execution workflow of the IO request R; otherwise, judging whether the scheduling waiting queue is completely traversed, if not, continuously traversing from the scheduling waiting queue to take out an IO request R, and skipping to the step S4.2B); otherwise, skipping to execute the next step;

9. An IO request scheduling system for reducing access delay in a parallel file system, comprising a microprocessor and a memory connected to each other, wherein the microprocessor is programmed or configured to perform the steps of the IO request scheduling method for reducing access delay in a parallel file system according to any one of claims 1 to 8.

10. A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, the computer program being programmed or configured to perform an IO request scheduling method for reducing access latency in a parallel file system according to any one of claims 1 to 8.