CN114217733B

CN114217733B - IO (input/output) processing framework and IO request processing method for IO forwarding system

Info

Publication number: CN114217733B
Application number: CN202110479680.XA
Authority: CN
Inventors: 陈起; 陈德训; 何晓斌; 余婷; 高洁; 肖伟
Original assignee: Wuxi Jiangnan Computing Technology Institute
Current assignee: Wuxi Jiangnan Computing Technology Institute
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2023-10-13
Anticipated expiration: 2041-04-30
Also published as: CN114217733A

Abstract

The invention discloses an IO processing framework facing an IO forwarding system, which comprises an IO scheduling unit with the following operation granularity, wherein the IO scheduling unit is used for classifying all IO requests on an IO forwarding node according to job ID of the operation; the IO scheduling unit of the file granularity is used for distributing IO requests from the same job according to the file and scheduling the IO requests by taking the file as a unit; the IO scheduling unit of the IO operation level is used for analyzing the dependency relationship among a plurality of IO requests from the same file in the IO request scheduling process and realizing merging and writing of the IO requests and optimizing pre-reading according to the dependency relationship; the system also comprises a dynamic resource scheduling unit which is used for carrying out IO request according to each IO scheduling unit. The invention solves the IO scheduling problem of the IO forwarding server under the high-performance computing forwarding architecture.

Description

IO (input/output) processing framework and IO request processing method for IO forwarding system

Technical Field

The invention relates to an IO processing frame and an IO request processing method for an IO forwarding system, and belongs to the technical field of high-performance calculation.

Background

High performance storage systems are an important component of high performance computing systems. During the use of high performance computers, problems with scientific computing applications running on computing nodes require that the results of the computation or intermediate temporary states be saved onto a storage system for later application analysis or for recovery using these states when the problem exits abnormally. High performance computing applications are available from storage systems. IO performance directly limits the efficiency of application execution and the utilization of the compute nodes. IO performance provided by a storage system is affected by aspects such as IO concurrency, types of IO requests, multi-job interference and the like.

In recent years, high-performance computer systems have been developed fully, the number of computing nodes reaches millions of cores or even more than tens of millions of cores, and an increase in the computing scale means an increase in the size of a scientific computing problem, which inevitably leads to an increase in the IO concurrency of processing by a storage system, and a high IO concurrency reduces the performance of the storage system. In order to mitigate the impact of computing node size increases on storage system performance, IO forwarding architectures are widely used in high performance computing systems. In this architecture, IO requests from large-scale computing nodes are first aggregated onto relatively small-scale IO forwarding nodes, which then access the backend storage system to complete the IO requests. The IO forwarding architecture has great advantages in reducing IO concurrency, and the load of a back-end storage system is greatly reduced.

The conventional IO forwarding server adopts a single-layer IO scheduling strategy when processing IO requests, in the scheduling strategy, two independent threads are usually used for receiving the IO requests from the computing nodes and sending the results of the IO requests to the computing nodes, and besides, the system also has a large number of worker threads used for accessing the storage system to complete IO execution of the computing nodes. The scheduling algorithm is simple to implement, and the concurrency of the storage system can be controlled through the number of worker threads, so that the scheduling algorithm is widely applied to high-performance computing systems.

The current high-performance computing system has changed greatly, so that the scheduling policy of the traditional IO forwarding system cannot meet the current high-performance computing system, and the changes of the high-performance computing system include:

(1) The IO modes of application are becoming more and more rich: the IO of traditional high-performance calculation is mainly based on large-block and write requests, and an IO forwarding system usually assumes that IO requests processed by the IO forwarding system are similar, but applications such as AI, high-precision scientific calculation and the like in recent years have various requirements on IO access, and random reading, read-write mixing and the like begin to appear.

(2) The IO interference of multiple applications is becoming more and more severe: with the increase in the size of computing nodes, it has become normal for a single high-performance computing system to run multiple tasks simultaneously, in which case, IO requests from multiple different applications may be aggregated onto the same IO forwarding node, and the IO forwarding node may need to process different types of requests from multiple different applications simultaneously.

(3) The isomerism of storage media is becoming more and more evident: the bottom layer of the traditional storage system is usually built based on a disk, the IO access delay of the media is similar, and at present, the back-end storage system can have different delay and performance when executing IO requests due to the appearance of novel media such as SSD.

In a high-performance computing IO forwarding architecture, an IO forwarding server needs to process high-concurrency IO requests from different computing nodes and different applications at the same time, the high-concurrency IO requests compete with each other, and how to reasonably schedule the IO requests has an important influence on the IO performance of a high-performance computing system and the performance of a storage system.

The existing technology does not distinguish the type and the characteristics of the IO request when processing the IO request, and the processing method ensures that the IO scheduling cannot be optimized according to the IO, so that the IO scheduling has certain blindness; in the prior art, a one-dimensional dividing method is adopted when IO requests are processed, namely all IO requests are placed into a plurality of queues with determined priorities, a scheduling algorithm completes IO scheduling by linearly processing the requests in the queues, the IO requests are difficult to combine by the scheduling method, a large number of random accesses are generated in back-end storage, and the system performance is reduced; the prior art cannot apply heterogeneous storage media, and under the condition that the performance difference of the bottom media is large, the IO requests with large delay can be caused to block other IO requests.

Disclosure of Invention

The invention aims to provide an IO processing frame and an IO request processing method for an IO forwarding system, which are used for solving the IO scheduling problem of an IO forwarding server under a high-performance computing forwarding architecture.

In order to achieve the above purpose, the invention adopts the following technical scheme: the IO processing framework facing the IO forwarding system is characterized in that job ID is added to each IO request, and the job ID is used for distinguishing the types of the IO requests, and comprises the following functional modules:

the IO scheduling unit with the granularity of the operation is used for classifying all IO requests on the IO forwarding node according to the job ID of the operation so as to realize the scheduling frequency and scheduling times of the IO requests among multiple operations, and realize the IO request distinction among the operations and the IO request isolation, arrangement and QoS management among the operations;

the IO scheduling unit of the file granularity is used for distributing IO requests from the same job according to the file and scheduling the IO requests by taking the file as a unit;

the IO scheduling unit of the IO operation level is used for analyzing the dependency relationship among a plurality of IO requests from the same file in the IO request scheduling process and realizing merging and writing of the IO requests and optimizing pre-reading according to the dependency relationship;

the system also comprises a dynamic resource scheduling unit, which is used for counting the average delay of IO execution of each IO classification according to the execution condition of each IO scheduling unit on IO requests, and for IO requests in the IO classification with higher average delay, such as the IO requests with higher file level or job level, the system resource is yielded by adopting a method for reducing concurrency, so as to serve the faster IO requests of other classes.

The IO request processing method for the IO forwarding system is based on an IO processing frame for the IO forwarding system, and carries out classified scheduling and execution on the IO request according to the job information, the file information and the operation type information carried by the IO request, and specifically comprises the following steps:

s1, a computing node sends an IO request carrying job JobID, file mark information and IO operation type to an IO forwarding node through a network;

s2, the IO forwarding node classifies the received IO requests according to the following steps:

s21, an IO forwarding node storage service program analyzes JobID fields from IO requests, classifies all the IO requests according to JobID, places the same JobID IO request into a first-stage queue, and obtains a plurality of job IO request queues classified according to JobID;

s22, the IO forwarding node storage service program extracts file mark information from the IO requests, classifies the IO requests belonging to the same JobID according to the file mark information, places the IO requests belonging to the same file into a file queue of the same JobID, and obtains a plurality of file IO request queues classified according to the file mark information;

s23, the IO forwarding node storage service program classifies the IO requests belonging to the same file from the IO requests according to the IO operation types, and further classifies the IO requests in the file IO request queues obtained in the S22 to obtain IO request queues of a plurality of operation types classified according to file mark information;

s3, after classification is completed, the dynamic resource scheduling unit of the IO forwarding node performs the following steps to complete the processing of the IO request:

s31, a dynamic resource scheduling unit adopts an IO scheduling algorithm to select a certain job IO request queue for scheduling;

s32, the dynamic resource scheduling unit adopts a polling mode to fairly select an IO request queue of a certain file from the IO request queues of the jobs;

s33, the dynamic resource scheduling unit selects one IO request from the IO request queues belonging to the same file to schedule;

when IO request processing is carried out, the metadata is preferentially selected for scheduling;

when the metadata scheduling is completed, selecting a request of the data IO for scheduling;

when IO request scheduling is carried out, analyzing the dependency relationship among IO requests, determining whether the IO requests can be combined, if not, scheduling according to the sequence from small to large of the accessed file ranges, and if so, combining the file ranges of IO request operation to form a large IO request for scheduling;

s34, the dynamic resource scheduling unit puts the IO request on the IO execution thread with the corresponding priority for execution according to the IO execution time average delay of the request queue where the selected IO request is located;

and S35, after the IO requests are executed, the IO execution thread records the execution time of the IO requests into the IO request queues of the corresponding files, and the IO request queues evaluate the possible execution time of each IO request according to the recorded execution time of the IO requests and serve as the basis for selecting the execution thread by the dynamic resource scheduling unit.

The further improved scheme in the technical scheme is as follows:

1. in the above solution, the file operation type in S23 includes a metadata request, a read data request, and a write data request.

2. In the above scheme, the IO scheduling algorithm in S31 includes an existing token bucket-based IO scheduling algorithm and a fair scheduling policy-based IO scheduling algorithm.

Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:

the IO processing framework and the IO request processing method for the IO forwarding system solve the problems that an IO scheduling algorithm under the traditional IO forwarding framework cannot cope with multi-job IO distinction, IO request type difference, IO access locality in a high concurrency mode, bottom storage difference and the like through the multi-stage IO request processing framework.

Drawings

FIG. 1 is a schematic diagram of an IO forwarding architecture in the present invention;

FIG. 2 is a schematic diagram of an IO forwarding node scheduling process;

FIG. 3 is a schematic diagram of a multi-level scheduling framework in accordance with the present invention.

Detailed Description

Examples: the invention provides an IO processing framework facing an IO forwarding system, which is characterized in that job ID is added in each IO request, and the job ID is used for distinguishing the types of the IO requests and comprises the following functional modules:

the IO scheduling unit of the file granularity is used for distributing IO requests from the same job according to the file, scheduling the IO requests by taking the file as a unit, and maintaining the locality of the file from a longer time period, so that the bottom layer can use the functions of pre-reading, merging writing and the like to reduce the network transmission of a back-end file system, and improve the system performance;

the system further comprises a dynamic resource scheduling unit, wherein the dynamic resource scheduling unit is used for counting the average delay of IO execution of each IO classification according to the execution condition of each IO scheduling unit on IO requests, and for IO requests in the IO classification with higher average delay, such as the IO requests with higher delay file level or job level, the system resource is yielded by adopting a method for reducing concurrency, and the system resource is used for serving the faster IO requests of other classes, so that the aim of improving the overall performance is achieved.

The file operation types in S23 include metadata request, read data request, write data request.

The IO scheduling algorithm in S31 comprises the existing IO scheduling algorithm based on the token bucket and the IO scheduling algorithm based on the fairness scheduling strategy.

Further explanation of the above embodiments is as follows:

the invention provides a new IO scheduling framework, which enables an IO scheduling system to divide and distinguish IO requests by using different dimensionalities, avoids blindness of IO scheduling, and also provides a multidimensional scheduling mechanism, which respectively schedules IO requests from dimensionalities such as operation, file, IO operation and the like, digs dependence among the IO requests, maintains the characteristic of data access locality, and also provides a method for IO delay statistics and feedback to solve the problem that heterogeneous storage media cannot be applied in the prior art.

Multistage IO processing framework: in the invention, IO scheduling under an IO forwarding architecture is divided into three layers, different scheduling requirements are respectively dealt with, and in order to realize layered scheduling, a system needs to carry job ID in each IO request as the distinction of IO types.

The IO scheduling and scheduling targets for the three layers are described as follows:

(1) Job granularity IO scheduling: the scheduling hierarchy mainly solves IO request distinction between jobs and IO request isolation and arrangement between jobs. For an IO forwarding node, the IO forwarding node may serve IO requests from a plurality of jobs, and a traditional IO scheduling strategy performs IO scheduling according to the arrival sequence of the IO requests, so that functions of priority scheduling, qoS guarantee and the like of the IO requests of the jobs cannot be achieved. And the IO scheduling of the operation granularity classifies all IO requests on the IO forwarding node according to the job ID of the operation, and the scheduling frequency and scheduling times among multiple operations can be conveniently arranged according to the classifications, so that the IO isolation and QoS management and control functions of different operations are realized.

(2) File granularity IO scheduling: the hierarchy mainly classifies IO requests from the same job at the file level and schedules in units of files. For concurrency reasons, the IO requests arriving at the IO forwarding node may lose the file locality feature, and this may cause the access of the backend file system to become completely random, which may greatly reduce the performance of the file system. Based on the file-level scheduling, IO requests are distributed according to the files, and the locality of the files is maintained from a longer time period, so that the bottom layer can use the functions of pre-reading, merging writing and the like to reduce network transmission of a back-end file system, and the system performance is improved.

(3) IO scheduling at IO operation level: the scheduling of this hierarchy is for specific files. For the same file, the system may have reads and writes, and the order of the read or write requests affects the order of the back-end file system IO operations. The scheduling of the layer analyzes the dependency relationship among a plurality of IO requests of the same file in the scheduling process, and achieves the functions of merging IO requests, optimizing pre-reading and the like according to the dependency relationship.

Besides the specific targets of three-layer scheduling, the scheduling framework also supports a dynamic resource adjustment mechanism, for scheduling of each layer, the system can count the delay of IO execution according to the condition of IO scheduling, and for IO requests with higher delay, such as file level or job level, a method for reducing concurrency is adopted to give up other corresponding faster jobs or files to be served by system resources, so that the aim of improving the overall performance is achieved.

The IO forwarding node is provided with a plurality of worker threads to concurrently execute IO requests selected by the IO scheduling program. Each worker thread processes IO requests of different execution times respectively. For IO requests with longer IO execution time, the worker increases the concurrency quantity of low-delay IO requests in an asynchronous mode, so that the throughput rate is improved; for IO requests with shorter execution time, the worker adopts a synchronous IO execution mode, so that the pressure on a back-end system is reduced.

Fig. 1 depicts a schematic diagram of the patent-oriented IO forwarding architecture. In such an architecture, a request of an IO node is sent to the IO forwarding node through an IO forwarding service. The IO forwarding node reorders the request from the compute node and submits it to the backend parallel file system. After the execution of the back-end parallel file system is completed, the IO forwarding node returns the result to the corresponding computing node to complete the processing of the IO request.

Fig. 2 depicts the IO overall IO scheduling process for an IO forwarding node. Firstly, IO forwarding service of the IO forwarding node continuously receives IO requests from a network, and puts the IO requests into an IO request queue to wait for the execution of the underlying IO service thread. A fixed number of IO service threads run on one IO forwarding node, and the IO service threads take IO requests out of the IO request queues in parallel, submit the IO requests to a back-end file system for execution, and return execution results to corresponding computing nodes after the execution is waited for completion.

Fig. 3 depicts our proposed multi-level IO processing framework. The method is mainly characterized in that when an IO service thread receives a network message, the IO service thread can be classified and rearranged according to JobID information of an IO request, a file identifier, operation of the IO request and the like. And selecting scheduling according to three levels of JobID, file and IO operation by the IO service thread when selecting the IO request for execution.

When the IO processing framework facing the IO forwarding system is adopted, the problems that the IO scheduling algorithm under the traditional IO forwarding framework cannot cope with multi-job IO distinction, IO request type difference, IO access locality in a high concurrency mode, bottom storage difference and the like are solved through the multi-stage IO request processing framework.

In order to facilitate a better understanding of the present invention, the terms used herein will be briefly explained below:

parallel file system: in high performance computing, a shared, high concurrency IO access storage system is provided for the whole machine.

IO forwarding server: in high performance computing systems, IO requests from computing nodes are aggregated onto smaller-scale servers that receive the IO requests from the computing nodes and provide storage services by invoking interfaces of a back-end parallel file system, such servers being referred to as IO forwarding servers.

IO request scheduling: the IO operations of the application form a plurality of IO requests, and the IO requests can be rearranged and organized by the system in the execution process so as to achieve the purposes of improving the system performance and meeting the IO requirements, wherein the process is called IO request scheduling.

The above embodiments are provided to illustrate the technical concept and features of the present invention and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, and are not intended to limit the scope of the present invention. All equivalent changes or modifications made in accordance with the spirit of the present invention should be construed to be included in the scope of the present invention.

Claims

1. The IO processing framework facing the IO forwarding system is characterized in that a job ID is added to each IO request, and the job ID is used for distinguishing the types of the IO requests, and comprises the following functional modules:

2. An IO request processing method for an IO forwarding system is characterized in that: based on the IO processing framework facing the IO forwarding system of claim 1, classifying, scheduling and executing the IO request according to the job information, the file information and the operation type information carried by the IO request, specifically comprising the following steps:

3. The method for processing the IO request for the IO forwarding system according to claim 2, wherein the method comprises the steps of: the file operation types in S23 include metadata request, read data request, write data request.

4. The method for processing the IO request for the IO forwarding system according to claim 2, wherein the method comprises the steps of: the IO scheduling algorithm in S31 comprises the existing IO scheduling algorithm based on the token bucket and the IO scheduling algorithm based on the fairness scheduling strategy.