CN113312323B

CN113312323B - IO (input/output) request scheduling method and system for reducing access delay in parallel file system

Info

Publication number: CN113312323B
Application number: CN202110620133.9A
Authority: CN
Inventors: 周恩强; 董勇; 张伟; 谢旻; 迟万庆; 朱清华; 邬会军; 张文喆; 李佳鑫; 吴振伟
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2021-06-03
Filing date: 2021-06-03
Publication date: 2022-07-19
Anticipated expiration: 2041-06-03
Also published as: CN113312323A

Abstract

The invention discloses an IO request scheduling method and system for reducing access delay in a parallel file system, wherein the method comprises the steps of calculating IO requests of client access server storage equipment of the parallel file system and marking delay sensitivity; under the condition that a plurality of IO requests are in competition, the IO requests with higher delay sensitivity are scheduled preferentially so as to reduce the queuing time of the IO requests under the condition of competition congestion and achieve the purpose of reducing delay. The invention aims to reduce the waiting time Tw of partial IO requests under the IO request competition condition, thereby reducing the IO response delay sensed by client application and being capable of improving the comprehensive processing performance of a typical workload of various computer systems using a parallel file system for mixed requirements of high bandwidth and low IO delay, including large-scale high-performance computer systems.

Description

IO (input/output) request scheduling method and system for reducing access delay in parallel file system

Technical Field

The invention relates to the field of computer operating systems, in particular to an IO request scheduling method and an IO request scheduling system for reducing access delay in a parallel file system.

Background

The parallel file system is an important component of a high-performance computer, manages a large number of storage devices, and is responsible for scheduling a large number of concurrent IO requests, the concurrent IO requests are sequentially sent to the storage devices, the requests are sequentially executed, different execution orders of the IO requests can show different performance performances, although the storage devices are usually provided with request queues, the IO queue depth of the storage devices is limited, and the IO requests are inevitably queued in a software layer firstly under a large-scale concurrent condition, so an IO request scheduler of the parallel file system layer has a large influence on IO response time of a user program and throughput rate of the storage devices.

The IO request scheduler of the system software is responsible for ordering the execution sequence of IO requests. For example, the Linux operating system is provided with a plurality of scheduling strategies, the scheduling optimization is respectively implemented from the aspects of optimizing performance, fairness, preventing starvation and the like, and a user selects according to respective workload characteristics. The Lustre parallel file system is a typical representative of the parallel file system in the high-performance computing field, and a request scheduler of the Lustre parallel file system also has a plurality of selectable scheduling strategies, such as bandwidth priority and the like. Different schedulers operate according to respective scheduling policies, with a scheduling policy generally being valid for a particular workload and invalid for other workloads. When multiple workload modes coexist, how IO requests are scheduled becomes an important and complex issue.

In a typical workload scenario for high performance computing, a parallel file system faces a mix of workload patterns. Traditional scientific computing application programs are sensitive to IO bandwidth, and novel applications such as intensive data processing and artificial intelligence are sensitive to IO delay. The mixed demand mode of high bandwidth and low IO latency becomes a typical workload under the condition that high performance computing systems are used by multi-domain application sharing competition. Under the new load mode, delay sensitive IO requests and bandwidth sensitive IO requests are queued in a queue of the parallel file system, and when the number of requests is large, competition is generated, which inevitably causes the waiting and scheduling time of the IO requests to be prolonged, thereby affecting the performance of the delay sensitive application program. Under the condition of using a traditional disk medium, because the latency of disk access is large, the IO request queuing time is low, and this problem is not obvious, but when a new storage medium (for example, a new-type nonvolatile storage medium NVM) starts to be widely used, because the access latency is several orders of magnitude lower than the access latency of the disk, the occupancy of the IO request queuing time in the whole IO access flow is rapidly increased, and the influence on the application program performance is obvious.

Disclosure of Invention

The parallel file system generally adopts a multi-client and multi-server architecture, IO requests simultaneously sent by various types of application programs at a client compete for storage equipment of a server, when the number of the IO requests exceeds the IO processing capacity of the storage equipment of the server, the IO requests queue in a queue of the server, and the waiting time Tw becomes a component of IO request delay. The technical problem to be solved by the invention is as follows: the method and the system aim at reducing the waiting time Tw of partial IO requests under the IO request competition condition, thereby reducing the IO response delay sensed by client application and being capable of improving the comprehensive processing performance of various computer systems using the parallel file system aiming at the typical workload of the mixed requirements of high bandwidth and low IO delay, including large-scale high-performance computer systems.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

an IO request scheduling method for reducing access delay in a parallel file system includes

1) Calculating and marking delay sensitivity to IO (input/output) requests of clients of the parallel file system for accessing storage equipment of a server;

2) Under the condition that a plurality of IO requests are in competition, the IO requests with higher delay sensitivity are scheduled preferentially so as to reduce the queuing time of the IO requests under the condition of competition congestion and achieve the purpose of reducing delay.

Optionally, the step of calculating the delay sensitivity in step 1) includes: judging whether the IO request isThe IO request buffered by the write-back buffer WBC is judged to be a delay-sensitive IO request if the IO request buffered by the write-back buffer WBC is detected, otherwise, the IO request is judged to be a delay-insensitive IO request, and the delay-insensitive IO request R is aimed at_dioTaking a preset fixed value as the delay sensitivity obtained by calculation; for delay sensitive IO requests R_wbcThen the delay sensitivity of the IO request is calculated based on the correlation of the write-back buffered WBCs and the IO.

Optionally, the calculating the delay sensitivity of the IO request based on the correlation between the write-back buffer WBC and the IO refers to: the available space ratio P of the write-back buffer WBC is obtained, and 1-P is taken as the calculated delay sensitivity of the IO request.

Optionally, when the delay sensitivity is marked in step 1), a field for adding a delay sensitivity to the IO request is included, and the field includes the calculated delay sensitivity S.

Optionally, when the delay sensitivity is marked in step 1), the method further includes adding a type field to the IO request, where the type field includes a delay-insensitive IO request R_dioOr delay sensitive IO request R_wbcThe type of (c) identifies T.

Optionally, the method further includes the following steps that a server of the parallel file system performs IO request processing:

s1) initializing the maximum waiting time T_deadline；

S2) receiving the IO request R and recording the timestamp TS_r；

S3) resolving the IO request R to obtain the delay sensitivity S and according to T_w＝T_deadline(1-S) calculating latency T_w；

S4) determining the waiting time T_wIf it is 0, if the waiting time T is_wIf the number is 0, activating the scheduling execution workflow of the IO request R; otherwise, the IO request R is put into a scheduling waiting queue, and a queuing waiting workflow of the IO request R is activated.

Optionally, the processing step after the schedule execution workflow of the IO request is activated in step S4) includes:

S4.1A) receives IO request R, fetch timeStamp TS_r；

S4.2A) to traverse from the wait queue to find an IO request R that originates from the same write-back buffered WBC as the IO request R_i；

S4.3A) fetch IO request R_iTime stamp TS of_i；

S4.4A) determines the IO request R_iTime stamp TS of_iTimestamp TS less than IO request R_rIf yes, remove IO request R from wait queue _iAnd request IO for R_iThe send-in execution queue;

S4.5A) determining whether the IO request in the waiting queue and the IO request R originated from the same WBC, if not, continuously traversing the IO request R from the waiting queue and the IO request R originated from the same WBC_iJump to execute step S4.3A); otherwise, skipping to execute the next step;

S4.6A) removes the IO request R from the wait queue and enqueues the IO request R into the execution queue, executing all IO requests enqueued in turn.

Optionally, the processing step after the queue-waiting workflow of the IO request is activated in step S4) includes:

S4.1B) fetching an IO request R from the dispatch wait queue traversal;

S4.2B) wait time T for IO request R_wSubtracting a preset scheduling time slice T, and judging the waiting time T after subtracting the preset scheduling time slice T_wIf the number of the IO request R is zero, activating a scheduling execution workflow of the IO request R; otherwise, judging whether the scheduling waiting queue is completely traversed, if not, continuously traversing from the scheduling waiting queue to take out an IO request R, and skipping to the step S4.2B); otherwise, skipping to execute the next step;

S4.3B) whether any of the following conditions is satisfied: condition 1: the time slice expires; condition 2: a new IO request enters a scheduling waiting queue; if either is satisfied, then the jump is performed to step S4.1B), otherwise wait to be activated.

In addition, the invention also provides an IO request scheduling system for reducing access delay in a parallel file system, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the IO request scheduling method for reducing access delay in the parallel file system.

Furthermore, the present invention also provides a computer-readable storage medium having stored therein a computer program programmed or configured to execute an IO request scheduling method for reducing access latency in the parallel file system.

Compared with the prior art, the invention mainly has the following advantages:

1. calculating and marking delay sensitivity to IO (input/output) requests of clients of a parallel file system for accessing storage equipment of a server; under the condition that a plurality of IO requests are in competition, the IO requests with higher delay sensitivity are preferentially scheduled to reduce the queuing time under the condition of competition congestion and achieve the purpose of reducing the delay.

2. The parallel file system generally adopts a Write-Back buffer technology (WBC, Write Back Cache) on a client, namely, IO call sent by an application program temporarily stores data in a buffer area of the client and then returns immediately, the WBC selects a proper time and then sends the data to a storage device at a rear end, and the WBC can reassemble discrete IO requests into continuous IO requests, so that the access bandwidth is improved.

3. The invention is suitable for a general parallel file system framework, can be used in series with a request scheduling system of an original system, and in a request sequence output after scheduling by the embodiment, delay sensitive requests are arranged in front, thereby being beneficial to the prior scheduling execution of a subsequent scheduling system.

Drawings

FIG. 1 is a core flow diagram of the method according to the embodiment of the present invention.

Fig. 2 is a schematic diagram of a system structure in an embodiment of the present invention.

FIG. 3 is a flowchart illustrating a process of calculating and labeling delay sensitivity according to an embodiment of the present invention.

Fig. 4 is a schematic processing flow diagram of the server according to the embodiment of the present invention.

Detailed Description

As shown in fig. 1, the IO request scheduling method for reducing access delay in a parallel file system in this embodiment includes:

The IO request scheduling method for reducing access delay in the parallel file system of the embodiment endows each IO request with a delay sensitivity, the IO request with high delay sensitivity is scheduled and executed preferentially, queuing time of the IO request under the condition of competitive congestion is reduced, and the purpose of reducing delay is achieved.

Referring to fig. 2, as an optional implementation manner, in this embodiment, based on the correlation between WBCs and IO requests, each IO request is given a delay sensitivity, and IO requests with high delay sensitivities are scheduled and executed preferentially, so that the queuing time of the IO requests under the condition of contention congestion is reduced, and the purpose of reducing delay is achieved.

Referring to fig. 3, the step of calculating the delay sensitivity in step 1) of the present embodiment includes: judging whether the IO request is an IO request buffered by the write-back buffer WBC, if so, judging the IO request to be a delay-sensitive IO request, otherwise, judging the IO request to be a delay-insensitive IO request, and aiming at the delay-insensitive IO request R_dioTaking a preset fixed value as the delay sensitivity obtained by calculation; for delay sensitive IO requests R_wbcThen the delay sensitivity of the IO request is calculated based on the dependencies of the write-back buffered WBC and the IO. In this embodiment, the IO requests of the application are classified into 2 types: WBC buffered requests (hereinafter referred to as R)_wbc) And requests that bypass WBC buffering (hereinafter referred to as R)_dio). According to the WBC principle, R_wbcAfter the carried data is written into the cache, the application program is informed that the IO is completed, so that the application program does not need to wait for the data to be really written into the storage equipment, and the data is sent to the storage equipment by the file system at a background selection time. The application need not wait for R _wbcTrue completion, insensitive to its completion time, so this type of request is defined as a delay-insensitive request. For R_dioThe application must wait until the IO request operation is completed at the storage device. Thus, R is shown_dioIs directly influencing the execution time of the application and is therefore defined as a delay-sensitive IO request. In this embodiment, whether an IO request enters a write-back buffer WBC is not controlled, the IO request is only discriminated at a client of a parallel file system, a field describing delay sensitivity is added to the two types of IO requests, the delay sensitive request is endowed with relatively high delay sensitivity, and the delay is not sensitiveThe sensory request gives relatively low delay sensitivity. The calculation rule for the delay sensitivity is as follows: the delay sensitivity of the Rdio request takes a fixed maximum value, and the delay sensitivity of the Rwbc request needs to be calculated according to the space availability of the WBC, so as to prevent the Rwbc request from being excessively delayed and scheduled to cause the data in the WBC not to be drained smoothly.

Referring to fig. 3, in the present embodiment, calculating the delay sensitivity of the IO request based on the correlation between the write-back buffer WBC and the IO refers to: the available space ratio P of the write-back buffer WBC is obtained, and 1-P is taken as the calculated delay sensitivity of the IO request.

In this embodiment, when the delay sensitivity is marked in step 1), a field for adding a delay sensitivity to the IO request is included, and the field includes the calculated delay sensitivity S.

In this embodiment, when the delay sensitivity is marked in step 1), the method further includes adding a type field to the IO request, where the type field includes a delay-insensitive IO request R_dioOr delay sensitive IO request R_wbcThe type of (d) identifies T.

As shown in fig. 4, sub-diagram (a), this embodiment further includes the following steps of performing IO request processing by the server of the parallel file system:

s1) initializing the maximum waiting time T_deadline；

S2) receives the IO request R and records the timestamp TS_r；

S3) analyzing the IO request R to obtain the delay sensitivity S and analyzing the delay sensitivity according to T_w＝T_deadline(1-S) calculating the waiting time T_w(ii) a I.e. the smaller S, the smaller T_wThe larger the ratio, the more inverse linear relationship, i.e. when S is 0, T is corresponded_w＝T_deadlineThe waiting time is longest; when S is 1, corresponding to T_w0, no waiting is needed;

s4) determining the waiting time T_wIf it is 0, if waiting time T_wIf the number is 0, activating the scheduling execution workflow of the IO request R; otherwise, the IO request R is put into a scheduling waiting queue, and a queuing waiting workflow of the IO request R is activated.

The present embodiment, step S1), will wait for the maximum time T_deadlineThe initialization setting is the time for the WBC available capacity to drop from 100% to 0%, as may be found based on specific system measurements.

In addition, step S1) of this embodiment further includes a step of initializing and setting a scheduling time slice t, where the scheduling time slice t is used for processing IO requests in the scheduling wait queue in a loop traversal manner. In this embodiment, assuming that the precision of the field S is g, the value of the scheduling time slice t is initialized to t ═ tdeadine ═ g, that is, R_wbcAnd requesting the minimum granularity of the waiting time, and activating the scheduling flow at intervals of time t by the server.

In this embodiment, a scheduling wait queue is set at the server, request scheduling is implemented according to the delay sensitivity of the received IO request, and a request with high delay sensitivity is scheduled and executed preferentially. In the case of two types of request contention, the scheduled execution of Rwbc is delayed and Tw thereof increases, but according to the working principle of WBC, when there is available space in WBC, the application program issuing the Rwbc request does not have to wait for the completion of the request, so the moderately delayed scheduling does not affect the normal execution of the application. The delay of Rwbc instead causes the Rdio request to be in a preferentially scheduled position in the queue, and therefore has a shorter latency Tw, thereby reducing the IO delay perceptible to the application issuing the Rdio request.

As shown in fig. 4, sub-diagram (c), the processing step after the scheduling execution workflow of the IO request is activated in step S4) includes:

S4.1A) receives IO request R, fetches time stamp TS_r；

S4.2A) to traverse from the wait queue to find an IO request R that originated from the same write-back buffered WBC as the IO request R_i；

S4.3A) fetch IO request R_iTime stamp TS of_i；

S4.4A) judges the IO request R_iTime stamp TS of_iTimestamp TS less than IO request R_rIf yes, remove IO request R from wait queue_iAnd request IO for R_iIs sent to the execution queue;

S4.5A) determining whether the IO request in the wait queue and the IO request R originated from the same write-back buffer WBC have been traversed, and if notAfter the completion of the operation, the IO request R from the same write-back buffer WBC as the IO request R is continuously searched from the waiting queue_iJump to execute step S4.3A); otherwise, skipping to execute the next step;

Through the loop traversal of steps S4.3A) -S4.5A) in this embodiment, IO requests { R) originating from the same write-back buffer WBC as IO request R can be fetched separately_i,…,R_jTime stamp of { TS }_i,…,TS_jRequesting IO for { R }_i,…,R_jTime stamp of { TS }_i,…,TS_jAre respectively connected with TS_rComparing, and making the time stamp earlier than TS_rThe requests are removed from the waiting queue and are sent to the next stage of the file system in sequence for execution.

As shown in fig. 4, sub-diagram (b), the processing step after the queue-waiting workflow for activating the IO request in step S4) includes:

S4.1B) fetching an IO request R from the dispatch wait queue;

S4.2B) wait time T for IO request R_wSubtracting the preset scheduling time slice T, and judging the waiting time T after subtracting the preset scheduling time slice T_wWhether the current value is 0 or not, and if the current value is zero, activating a scheduling execution workflow of the IO request R; otherwise, judging whether the scheduling waiting queue is traversed or not, if not, continuously traversing the scheduling waiting queue to take out an IO request R, and jumping to execute the step S4.2B); otherwise, skipping to execute the next step;

S4.3B) whether any of the following conditions is satisfied: condition 1: the time slice expires; condition 2: a new IO request enters a scheduling waiting queue; if either is satisfied, then jump to execution step S4.1B), otherwise wait for activation.

As an optional implementation manner, in this embodiment, the flows corresponding to subgraphs (a) to (c) in fig. 4 are executed by using different threads, and the activation is to activate the corresponding thread.

In summary, the parallel file system oriented IO request scheduling method of the embodiment has the following advantages: aiming at the problem of how to reduce the IO request delay under the condition of the mixed load mode, the delay sensitivity is respectively calculated in the parallel file system according to the type of the IO request, and the queue order of the IO request is adjusted according to the delay sensitivity. The method is suitable for a general parallel file system framework, can be used in series with a request scheduling system of an original system, and is beneficial to a subsequent scheduling system to perform priority scheduling in the request sequence output after scheduling by the method of the embodiment because delay sensitive requests are arranged in front.

In addition, the embodiment also provides an IO request scheduling system for reducing access delay in a parallel file system, which includes a microprocessor and a memory connected to each other, where the microprocessor is programmed or configured to execute the steps of the IO request scheduling method for reducing access delay in a parallel file system.

Furthermore, the present embodiment also provides a computer-readable storage medium, in which a computer program programmed or configured to execute the IO request scheduling method for reducing access delay in the parallel file system is stored.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiments, and all technical solutions that belong to the idea of the present invention belong to the scope of the present invention. It should be noted that modifications and adaptations to those skilled in the art without departing from the principles of the present invention should also be considered as within the scope of the present invention.

Claims

1. An IO request scheduling method for reducing access delay in a parallel file system is characterized by comprising

2) under the condition that a plurality of IO requests are in competition, the IO requests with higher delay sensitivity are scheduled preferentially to reduce the queuing time under the condition of competition congestion and achieve the purpose of reducing delay;

the step of calculating the delay sensitivity in step 1) includes: judging whether the IO request is an IO request buffered by the write-back buffer WBC, if so, judging the IO request to be a delay-sensitive IO request, otherwise, judging the IO request to be a delay-insensitive IO request, and aiming at the delay-insensitive IO request R _dioTaking a preset fixed value as the delay sensitivity obtained by calculation; for delay sensitive IO requests R_wbcCalculating the delay sensitivity of the IO request based on the correlation between the write-back buffer WBC and the IO, wherein calculating the delay sensitivity of the IO request based on the correlation between the write-back buffer WBC and the IO means: obtaining an available space ratio P of a write-back buffer WBC, and taking 1-P as the calculated delay sensitivity of the IO request;

the method also comprises the following steps that the server side of the parallel file system carries out IO request processing:

s1) initializing the maximum waiting time T_deadline；

S2) receives the IO request R and records the timestamp TS_r；

S3) analyzing the IO request R to obtain the delay sensitivity S and analyzing the delay sensitivity according to T_w=T_deadline(1-S) calculating the waiting time T_w；

S4) determining the waiting time T_wIf it is 0, if the waiting time T is_wIf the number is 0, activating the scheduling execution workflow of the IO request R; otherwise, the IO request R is put into a scheduling waiting queue, and a queuing waiting work flow of the IO request R is activated;

the processing step after the scheduling execution workflow of the IO request is activated in step S4) includes:

S4.1A) receives the IO request R, fetches the timestamp TS_r；

S4.2A) to traverse from the wait queue to find an IO request R that originates from the same write-back buffered WBC as the IO request R _i；

S4.3A) fetch IO request R_iTime stamp TS of_i；

S4.4A) judges the IO request R_iWhen (2)Timestamp TS_iTimestamp TS less than IO request R_rIf yes, remove IO request R from wait queue_iAnd request IO for R_iThe send-in execution queue;

S4.5A) determining whether the IO request in the waiting queue and the IO request R originated from the same WBC are completely traversed, if not, continuously traversing and finding the IO request R originated from the same WBC from the waiting queue and the IO request R_iJump to execute step S4.3A); otherwise, skipping to execute the next step;

S4.6A) removing the IO request R from the waiting queue, sending the IO request R into the execution queue, and sequentially executing all IO requests sent into the execution queue;

the processing step after the queue waiting workflow of the IO request is activated in step S4) includes:

S4.1B) fetching an IO request R from the dispatch wait queue traversal;

2. The IO request scheduling method for reducing access delay in a parallel file system according to claim 1, wherein when the delay sensitivity is marked in step 1), the method includes adding a field of the delay sensitivity to the IO request, and the field includes the calculated delay sensitivity S.

3. According to claim 2The IO request scheduling method for reducing access delay in the parallel file system is characterized in that when the delay sensitivity is marked in the step 1), a field of a type is added to the IO request, and the field contains a delay insensitive IO request R_dioOr delay sensitive IO request R_wbcThe type of (d) identifies T.

4. An IO request scheduling system for reducing access delay in a parallel file system, comprising a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the IO request scheduling method for reducing access delay in the parallel file system according to any one of claims 1 to 3.

5. A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, the computer program being programmed or configured to perform an IO request scheduling method for reducing access delay in a parallel file system according to any one of claims 1 to 3.