CN117950599B

CN117950599B - I/O stack construction method, device, equipment and medium based on distributed system

Info

Publication number: CN117950599B
Application number: CN202410353708.9A
Authority: CN
Inventors: 李宇奇; 张健; 徐斌; 贺成; 冯景华
Original assignee: National Supercomputer Center In Tianjin
Current assignee: National Supercomputer Center In Tianjin
Priority date: 2024-03-27
Filing date: 2024-03-27
Publication date: 2024-07-19
Anticipated expiration: 2044-03-27
Also published as: CN117950599A

Abstract

The embodiment of the disclosure relates to an I/O stack construction method, device, equipment and medium based on a distributed system, wherein the method comprises the following steps: acquiring job description information of a job to be operated, and determining I/O type data of the job to be operated according to the job description information; wherein the job description information includes: at least one of job name information, job path information, job submission information, the job name information including: at least one of software name information, software version information, and operation example information; and determining a target I/O stack for processing the job to be operated according to the I/O type data. According to the embodiment of the disclosure, the suitability of the I/O stack and the job is higher, the system-level automatic construction of the I/O stack is realized, the consumption of human resources is reduced, the I/O resources are reasonably utilized, and the I/O performance of the job is improved.

Description

I/O stack construction method, device, equipment and medium based on distributed system

Technical Field

The disclosure relates to the technical field of data storage, and in particular relates to an I/O stack construction method, device, equipment and medium based on a distributed system.

Background

With the development of computer technology, the computing power of the supercomputer is rapidly improved, however, the improvement of the I/O (Input/Output) capability of the supercomputer is slower, and the lower I/O capability is difficult to match with the higher computing power.

In the related technology, a user can set a corresponding I/O stack according to the operation executed by the supercomputer, but the setting of the I/O stack is limited by the knowledge level of the user, the suitability of the finally determined I/O stack and the operation executed at present is possibly low, the I/O stack of each operation needs to be determined manually and sequentially, more manpower resources are consumed, the I/O resource utilization is unreasonable, and the I/O performance of the operation is low.

Disclosure of Invention

In order to solve the technical problems, the present disclosure provides an I/O stack construction method, apparatus, device and medium based on a distributed system.

The embodiment of the disclosure provides an I/O stack construction method based on a distributed system, which comprises the following steps:

acquiring job description information of a job to be operated, and determining I/O type data of the job to be operated according to the job description information; wherein the job description information includes: at least one of job name information, job path information, job submission information, the job name information including: at least one of software name information, software version information, and operation example information;

and determining a target I/O stack for processing the job to be operated according to the I/O type data.

The embodiment of the disclosure also provides an I/O stack construction device based on the distributed system, which comprises:

The prediction module is used for acquiring job description information of a job to be operated and determining I/O type data of the job to be operated according to the job description information; wherein the job description information includes: at least one of job name information, job path information, job submission information, the job name information including: at least one of software name information, software version information, and operation example information;

And the determining module is used for determining a target I/O stack for processing the job to be operated according to the I/O type data.

The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instructions from the memory and execute the instructions to implement the distributed system-based I/O stack construction method according to the embodiments of the present disclosure.

The embodiments of the present disclosure also provide a computer readable storage medium storing a computer program for executing the distributed system-based I/O stack construction method as provided by the embodiments of the present disclosure.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages: the I/O stack construction method based on the distributed system, provided by the embodiment of the disclosure, acquires the job description information of the job to be operated, and determines the I/O type data of the job to be operated according to the job description information; wherein the job description information includes: at least one of job name information, job path information, job submission information, the job name information including: at least one of software name information, software version information, and operation example information; and determining a target I/O stack for processing the job to be run according to the I/O type data. By adopting the technical scheme, the characteristics of the operation are determined from one or more dimensions such as naming habit of the user, storage path habit of the user, submitting condition of the operation and the like, the comprehensiveness and accuracy of the attribute are improved, the I/O type corresponding to the characteristics of the operation is determined according to the operation description information, and then the I/O stack for processing the operation process of the operation is determined according to the I/O type.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a schematic flow chart of a method for constructing an I/O stack based on a distributed system according to an embodiment of the disclosure;

FIG. 2 is a flow chart of another method for constructing an I/O stack based on a distributed system according to an embodiment of the disclosure;

FIG. 3 is a schematic structural diagram of an I/O stack building device based on a distributed system according to an embodiment of the disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.

With the development of computer technology, the computing power of supercomputers is rapidly improved, however, the improvement of the I/O capability of supercomputers is slower, and the lower I/O capability needs to be matched with higher computing power.

If the computational power matches the I/O capabilities poorly, I/O contention may occur, which may lead to I/O blocking, thereby degrading I/O performance. And in some application scenarios, bursty I/O may occur, which may lead to insufficient I/O capability, thereby causing I/O blocking and also degrading I/O performance. In addition, when the supercomputer performs deep learning related calculation, a large number of small files occupying smaller storage space are generated, and corresponding metadata operation is performed on each small file, so that the burden of a metadata server is increased, and the performance of a file system is reduced.

In the related art, a user can set a corresponding I/O stack according to a job executed by a supercomputer, but the setting of the I/O stack is limited by the knowledge level of the user, access authority and other aspects, so that the suitability of the finally determined I/O stack and the currently executed job is possibly low, the I/O stack of each job needs to be determined manually in sequence, and more manpower resources are consumed.

In order to solve the above-mentioned problems, the embodiments of the present disclosure provide a method for constructing an I/O stack based on a distributed system, and the method is described below with reference to specific embodiments.

Fig. 1 is a schematic flow chart of a method for constructing an I/O stack based on a distributed system according to an embodiment of the present disclosure, where the method may be performed by an I/O stack constructing device based on a distributed system, where the device may be implemented by software and/or hardware, and may generally be integrated in an electronic device, and in some embodiments, the electronic device may be a monitoring operation platform for performing operation maintenance on a supercomputer. As shown in fig. 1, the method includes:

Step 101, acquiring job description information of a job to be operated, and determining I/O type data of the job to be operated according to the job description information; wherein the job description information includes: at least one of job name information, job path information, job submission information, the job name information including: at least one of software name information, software version information, and operation example information.

Where a job may be a computational task run by a supercomputer, the job may be understood as a computation with respect to given data. The job to be run may be a task waiting for the supercomputer to run. The I/O type data may be an I/O type in an I/O stack adapted for the job determined by the prediction, and the embodiment does not limit the I/O type data, for example, the I/O type data may include: at least one of read-write type data, metadata type data, occupied resource type data, and file type data.

The job description information may be information characterizing the nature of the job, which may be understood as job attribute information. The job description information is also called as job meta information. The job name information may be information characterizing a job file name, and when a user names a job, the user may name the job with one or more of a simulation software name, a simulation software version, and an instance corresponding to running the job. Thus, the job name information may include one or more of software name information, software version information, job instance information.

The job path information may be information characterizing a job file storage path. The software name information may be information characterizing a simulation software name to which the job corresponds. The software version information may be information characterizing a simulated software version to which the job corresponds. The operation case information may be information representing a calculation case corresponding to the operation to be operated, and the calculation case information may represent an engineering application type of the operation to be operated. For example, the working example information is information calculated to characterize the fluid.

The job path information may be information characterizing a job storage path. When the user stores the job, the user can store the jobs with similar job types and similar corresponding examples under the same file path, so that the information of the simulation software, the corresponding examples and the like related to the job can be reflected based on the job path information.

The job submission information may be information that records the submission of a job to be run. The job submission information may include: one or more of commit user information, commit user group information, job commit time information. Wherein the submitted user information may be an Identity (ID) code for the user submitting the job to be run. The submitted user group information may be an encoding of the user group in which the user submitting the job to be run is located. The job submission time information may be information that records the time at which the job to be run was submitted to the distributed system. Since the job of the same type is likely to be submitted by the user within the same period of time, the relevant information of the I/O type data can be determined based on the job submission time information.

Optionally, the job description information may further include hardware resource information, where the hardware resource information may be information for recording a job to be run and estimating that a hardware resource is required. The hardware resource information may include: central processing unit (Central Processing Unit, CPU) number information and/or node number information.

Optionally, the job description information may further include user description information, and the user description information may be information of an attribute of the record submitting the job user. The user description information may include: and one or more of user research field information, unit information to which the user belongs, and region information to which the user belongs.

In some embodiments of the present disclosure, the method for determining the I/O type data is various, and the embodiment is not limited, and examples are as follows:

In an alternative embodiment, determining the I/O type data of the job to be executed according to the job description information includes: and inputting the operation description information into a pre-trained I/O characteristic classification model to obtain the I/O type data.

The I/O feature classification model may be an I/O type model that is obtained based on initial model training and is capable of corresponding to a prediction operation, and may be a neural network model.

In this embodiment, the pre-trained I/O feature classification model is capable of automatically generating corresponding I/O type data according to the input job description information.

Optionally, the training process of the I/O feature classification model includes: and acquiring index information of the sample operation, and determining sample I/O type data of the sample operation according to the index information and an index threshold value of the index information. Training an initial model according to sample description information of sample operation and corresponding sample I/O type data to obtain an I/O characteristic classification model; wherein the sample description information includes: at least one of sample name information and sample path information, the sample name information including: at least one of sample software information and sample calculation power information.

Wherein, the sample job may be a job that has been run out, and the index information may be a parameter characterizing the sample job running process from multiple dimensions including, but not limited to: one or more of a file path dimension, a job submission dimension, a hardware resource dimension, a file name dimension. The hardware resource dimension may include: one or more of a read-write dimension, an occupied resource dimension, a metadata dimension. For example, the index information includes, but is not limited to: the information processing system comprises one or more of read file number information, write file number information, read throughput information, write throughput information, read times information, write times information, metadata operation times information, file size information, calculation node number information, CPU utilization information, memory utilization information, network receiving amount information, network sending amount information and proportion information of I/O time to total operation time.

The initial model may be a neural network model to be trained, and the specific type of the initial model is not limited in this embodiment, for example, the initial model may be a Long Short-Term Memory (LSTM) model. The sample description information may be information characterizing the nature of the sample job itself, also known as sample meta information, and the classification of the sample description information may be the same as the classification of the job description information.

In this embodiment, the I/O stack construction device based on the distributed system may acquire the index information of the sample job that has completed running, compare the index information with the index threshold, and determine the sample I/O type data of the sample job according to the comparison result. Further, sample description information of the sample operation is used as input data of an initial model, sample I/O type data corresponding to the sample description information is used as output data of the initial model, and training is carried out on the initial model to obtain an I/O characteristic classification model.

In another alternative embodiment, determining the I/O type data of the job to be executed according to the job description information includes: inquiring the job description information in a preset description type relation, and determining I/O type data successfully matched with the job description information.

The description type relationship may record each description information and the corresponding I/O type data thereof.

In this embodiment, I/O type data corresponding to different description information is recorded in advance in the description type relationship. The I/O stack construction device based on the distributed system matches the job description information in the description type relation, and takes the I/O type data corresponding to the description information consistent with the job description information as the I/O type data corresponding to the job description information.

Step 102, determining a target I/O stack for processing the job to be run according to the I/O type data.

The target I/O stack may be understood as an I/O read flow related to running a job to be run, where the target I/O stack may include a data read flow based on one or more of a metadata management service (METADATA SERVICE, MDS), a metadata target (METADATA TARGET, MDT), an object storage server (Object Storage Service, OSS), and an object storage target (Object Storage Target, OST). Based on the target I/O stack, operations such as temporary file storage, result file storage and the like can be performed in the processing process of the job to be operated.

In this embodiment, after the I/O type data is determined, parameter setting may be performed on the corresponding I/O stack setting policy according to the I/O type data, so as to obtain the target I/O stack.

In some embodiments of the present disclosure, if the I/O type data includes read-write type data and occupied resource type data, determining a target I/O stack for processing a job to be run according to the I/O type data includes:

If the read-write type data comprises a first read-write frequency type and the occupied resource type data comprises a first memory occupied type, configuring a hard disk type burst cache for a preset initial I/O stack to obtain a target I/O stack; the first read-write frequency type represents that the predicted read-write frequency of the operation to be operated is larger than a preset read-write frequency threshold value, and the first memory occupation type represents that the predicted memory utilization rate of the operation to be operated is larger than a preset memory utilization rate threshold value.

The read-write type data may be data representing the read-write characteristics in the operation process of the job, and the embodiment does not limit the read-write type data, for example, the read-write type data may be data representing the read-write frequency. The predicted read-write frequency may be the frequency at which read operations and/or write operations are performed during the predicted job run. The preset read-write frequency threshold may be set according to a user requirement, etc., which is not limited in this embodiment.

The occupied resource type data may be data characterizing hardware resources occupied during operation of the job. The predicted memory usage may be a proportion of memory used in the predicted operation to total memory. The preset memory usage threshold may be set according to a user requirement, etc., and the embodiment is not limited, for example, the preset memory usage threshold may be set to 50% or 75%. The initial I/O stack may be the originally configured I/O stack, which may be understood as the I/O stack provided for the job by default.

In this embodiment, the first read-write frequency type characterizes that the frequency of performing the read operation and/or the write operation in the predicted operation running process is higher, and in order to adapt the I/O stack to the higher frequency, a corresponding Burst Buffer (Burst Buffer) may be set, which does not limit the size of the Burst Buffer. The first memory occupation type characterizes the higher memory utilization rate occupied in the predicted operation process, in order to enable the I/O stack to adapt to the higher memory utilization rate, the memory is not divided as burst cache, but the burst cache of a Solid state disk (Solid STATE DISK, SSD for short) is configured, and then the target I/O stack is obtained.

If the read-write type data comprises a first read-write frequency type and the occupied resource type data comprises a second memory occupied type, configuring a burst cache of a memory type for the initial I/O stack to obtain a target I/O stack; the second memory occupation type represents that the predicted memory utilization rate of the operation to be operated is smaller than or equal to a preset memory utilization rate threshold value.

The burst buffer of the memory may be a burst buffer that performs a read operation through the memory and performs a write operation through the hard disk.

In this embodiment, the frequency of performing the read operation and/or the write operation in the predicted operation process is higher, in order to adapt the I/O stack to the higher frequency, a corresponding Burst Buffer (Burst Buffer) may be set, and part of the data generated in the operation process to be operated is stored in the Burst Buffer, which is not limited in size in this embodiment. The second memory occupation type characterizes that the utilization rate of the memory occupied in the predicted operation process is low, the burst cache of the memory is configured for effectively utilizing the idle memory resources, and partial data generated in the operation process of the operation to be operated is stored in the burst cache of the memory.

In the scheme, under the condition of higher read-write frequency, the burst cache is configured, so that the read-write efficiency is improved. Under the condition of larger memory utilization rate, the burst cache is realized based on the hard disk, and further increase of the memory utilization rate is avoided. Under the condition of smaller memory utilization rate, the burst cache is realized based on the memory, memory resources are effectively utilized, and the read-write efficiency is further improved through the higher read-write speed of the memory.

In some embodiments of the present disclosure, if the I/O type data includes metadata type data, determining a target I/O stack for processing a job to be run according to the I/O type data includes:

If the metadata type data comprises the first processing times type, carrying out banding processing on metadata management resources in a preset initial I/O stack to obtain a target I/O stack; the first processing frequency type represents that the predicted metadata processing frequency of the job to be operated is larger than a preset metadata processing frequency threshold.

Among them, metadata may be data regarding organization of data, data fields, and relationships thereof, which may be understood as data describing the data. The metadata type data may be data characterizing metadata of the job. The predicted metadata processing times may be total times of processing such as opening, closing, modifying, etc. of metadata in the operation process predicted by the neural network. The preset metadata processing frequency threshold may be set according to a user requirement, etc., which is not limited in this embodiment.

In this embodiment, the first processing frequency type characterizes that the predicted operation process has more processing times for metadata, and in the initial I/O stack, the metadata processing with more processing times is performed by the same metadata management resource, so that the metadata management resource is overloaded, and the I/O performance is reduced. And carrying out banding processing on the metadata management resources in the initial I/O stack to obtain a target I/O stack, wherein in the target I/O stack, a plurality of metadata management resources can be used for processing the metadata of the job, so that the load of a single metadata management resource is reduced, and the I/O performance is improved.

In some embodiments of the present disclosure, the I/O type data includes: file type data, determining a target I/O stack for processing a job to be run according to the I/O type data, including:

If the file type data comprises a first number of types, dividing a metadata storage space for storing files of a second space type in a preset initial I/O stack to obtain a target I/O stack; the method comprises the steps that the number of predicted files of a first space type, which represents a job to be operated, of a first quantity type is larger than a preset file number threshold, and the predicted storage space occupied by the second space type representing files is smaller than or equal to the preset second space threshold;

The file type data may be data characterizing a file generated by running a job, and the embodiment does not limit the file type data, for example, the file type data may include data characterizing that the file occupies a storage space and/or data that the file occupies. The threshold number of files may be a criterion for determining whether there are too many files, and the threshold number of files is set according to a user requirement, etc., and is not limited in this embodiment, for example, the threshold number of files may be set to 10000 or 100000. The preset second space threshold may be a standard for determining whether the storage space occupied by the file is smaller, and the preset second space threshold is set according to a user requirement, etc., which is not limited in this embodiment, for example, the preset second space threshold may be 2KB (KiloByte). The metadata storage space may be space that would otherwise be used to store metadata, also known as a metadata object (METADATA TARGET, MDT).

In this embodiment, the second spatial type characterizes that the predicted file generated by the job execution occupies less memory, i.e., a small file is predicted to exist. The first number type characterizes a larger number of predicted files of the second spatial type, i.e. a larger number of predicted small files is present. When the number of the small files is large, in order to improve the I/O performance, a metadata storage space for storing the small files is divided in an initial I/O stack through a DoM function in a lustre MDT, a target I/O stack is obtained, and partial data generated in the operation process of the operation to be operated is transferred to the metadata storage space. In the target I/O stack, the small file can be stored in the metadata storage space, so that when the small file processing operation is performed, the processing speed of the small file metadata is improved, the processing efficiency of the small file is further improved, and the I/O performance is improved.

If the file type data comprises the first space type, carrying out banding processing on the calculation data storage space in the initial I/O stack to obtain a target I/O stack; the predicted storage space occupied by the first space type characterization file is larger than a preset first space threshold.

The preset first space threshold may be a criterion for judging whether the storage space occupied by the file is too large, and the preset first space threshold may be set according to a user requirement, etc., which is not limited in this embodiment. The computation data storage space may be a space for storing computation-generated data, which is also referred to as a volume space, an object storage target.

In this embodiment, the first spatial type characterizes that the storage space occupied by the file generated by the predicted job execution is large, i.e. the existence of a large file is predicted. In the initial I/O stack, a single computing data storage space is used to store files generated by a job, the storage space occupied by a single file is large, which may result in insufficient space size of the single computing data storage space.

It should be noted that, the method for determining the target I/O stack described above may be used in a cross manner, where a corresponding policy is determined according to specific data included in the I/O type data, and a final target I/O stack is determined according to the policy.

In the scheme, when the number of small files is large, the small files are placed in the MDT, so that the I/O processing efficiency of the small files is improved; when a large file exists, the OST is striped, so that the condition of insufficient space of a single volume is avoided, and the I/O processing can be normally performed.

The I/O stack construction method based on the distributed system, provided by the embodiment of the disclosure, acquires the job description information of the job to be operated, and determines the I/O type data of the job to be operated according to the job description information; wherein the job description information includes: at least one of job name information, job path information, job submission information, the job name information including: at least one of software name information, software version information, and operation example information; and determining a target I/O stack for processing the job to be run according to the I/O type data. By adopting the technical scheme, the characteristics of the operation are determined from one or more dimensions such as naming habit of the user, storage path habit of the user, submitting condition of the operation and the like, the comprehensiveness and accuracy of the attribute are improved, the I/O type corresponding to the characteristics of the operation is determined according to the operation description information, and then the I/O stack for processing the operation process of the operation is determined according to the I/O type.

Fig. 2 is a flow chart of another method for constructing an I/O stack based on a distributed system according to an embodiment of the present disclosure, as shown in fig. 2, in some embodiments of the present disclosure, the method for constructing an I/O stack based on a distributed system further includes:

Step 201, obtaining position information of a plurality of computing nodes for computing a job to be run.

The computing nodes may be computing function points in the supercomputer. The location information may be information characterizing a specific physical location of the computing node.

In this embodiment, before the job to be executed is executed, the I/O stack building device based on the distributed system may further obtain a plurality of computing nodes in the supercomputer for computing the job to be executed, and determine location information corresponding to each computing node.

Step 202, determining the minimum node distance between a plurality of calculation nodes according to the position information.

The minimum node distance may be the shortest distance between every two calculation nodes.

In this embodiment, the I/O stack building apparatus based on the distributed system may calculate the distance between two calculation nodes according to the position information, obtain a plurality of node distances, and determine the minimum value between the plurality of node distances as the minimum node distance.

And 203, if the minimum node distance is smaller than or equal to the preset node distance, adjusting the calculation node to obtain an adjustment node, and returning the adjustment node as a new calculation node to determine a new minimum node distance until the new minimum node distance is larger than the preset node distance.

The preset node distance may be a preset minimum node distance, and if the node distance is smaller than or equal to the preset node distance, it indicates that the two nodes are too close, which may further cause occurrence of bursty I/O. The adjustment node may be a redefined calculation node.

In this embodiment, after determining the minimum node distance, the I/O stack building device based on the distributed system compares the minimum node distance with the preset node distance, and if the minimum node distance is less than or equal to the preset node distance, it indicates that there are at least two calculation nodes in the current plurality of calculation nodes that are too close to each other, then an adjustment node is selected by a method of adjusting a route, the adjustment node is used as a new calculation node, and the position information of the new calculation node is acquired in a return manner, so that the corresponding new minimum node distance is calculated until the new minimum node distance is greater than the preset node distance, which indicates that the distances between the current new calculation nodes are all greater than the preset node distance.

In the scheme, the calculation nodes with the minimum node distance larger than the preset node distance are determined, so that sudden I/O caused by too close distance between the calculation nodes is avoided.

In some embodiments of the present disclosure, the distributed system-based I/O stack construction method further includes: and determining a target storage server in the preset storage servers according to the storage load information of the preset storage servers for each result file of the job to be operated, and storing the result file to the target storage server.

The preset storage server may be a server for storing data, and may be understood as a storage node. The target storage server may determine the server that stores the result file. The result file may be a file that records the calculation result of the job to be run.

In this embodiment, the I/O stack constructing device based on the distributed system may transfer each result file from the storage device in the target I/O stack to the storage server, and release the storage resources of the storage devices in the corresponding part of the target I/O stack. Specifically, for each result file, the I/O stack building device based on the distributed system may obtain a load rate corresponding to each preset storage server at the current moment, determine the preset storage server with the smallest load rate as a target storage server, and store the result file to the target storage server. Until each result file is stored to the corresponding storage server, so that all the portions of the storage device in the target I/O stack for storing the result file are released.

In the scheme, the storage server is determined by taking the file as a basis unit, and compared with the prior art that all files of the same operation are transferred to the same storage server, file transfer with finer granularity is realized, so that the load of the storage server is averaged, and transfer failure caused by overlarge load of a single storage server is avoided.

The method for constructing an I/O stack based on a distributed system in the embodiment of the present disclosure will be further described by way of a specific example.

In this embodiment, index information of a sample job is collected from a computing node to a storage node of a high performance computing (High performance computing, HPC) system, and an I/O feature classification model is obtained by combining sample description information of the sample job, I/O type data of the job to be operated can be predicted through the I/O feature classification model, a matched policy is determined in a preset knowledge base according to the I/O type data, preprocessing is performed according to the policy, a reasonable target I/O stack is determined, after the job to be operated is finished, storage resources corresponding to the target I/O stack need to be released, a target storage server in the storage servers is determined according to load information of the preset storage servers, and a result file is transferred to the target storage server and storage resources of the result file are stored in a storage device corresponding to the target I/O stack are released.

In this embodiment, the I/O stack construction device based on the distributed system may be capable of implementing collection of index information of a sample job, where the index information may include: one or more of read file number information, write file number information, read throughput information, write throughput information, read number information, write number information, metadata operation number information, file size information, node number information, CPU utilization information, memory utilization information, network reception amount information, network transmission amount information, and ratio of I/O time to total operation time information. Sample description information of a sample job, which may include sample name information, sample path information, etc., is also referred to as sample job meta information, and may be acquired by the distributed system-based I/O stack construction apparatus.

And determining sample I/O type data corresponding to each sample operation according to the index information of each sample operation. For example, the sample I/O type data may include a first read-write frequency type or a second read-write frequency type, a first memory occupation type or a second memory occupation type, a first processing frequency type or a second processing frequency type, and the like, and training an initial model with the sample I/O type data as a tag to obtain an I/O feature classification model.

Further, a policy base is constructed, and a corresponding policy for generating a target I/O stack is determined according to the I/O type data corresponding to the job to be operated. For example, the strategy includes, but is not limited to, the following:

wherein the preprocessing related strategy comprises:

First, if the I/O type data includes a first processing count type that characterizes an excessive number of metadata operations, then there may be a problem with excessive load on the MDS, and stripe setting is performed on the MDS to solve the problem.

Second, if the I/O type data includes a first number of types that characterize how many doclets, there may be a problem of poor I/O performance, in order to solve this problem, the doclets are stored onto the MDT by the DoM function of the lustre MDT.

Third, if the I/O type data includes a first space type that characterizes the presence of a large file, then there may be a problem of a single volume space being insufficient, and the volume space is striped to address this problem.

Among other things, the policies related to allocating resources include:

if the I/O type data comprises a first read-write frequency type representing that the I/O throughput is large, a single OSS load can be too high, or a plurality of OSS loads are too high, so that response faults of file systems such as disk hardware faults are caused, and in order to solve the problem, burst caches of the solid state disk type are set.

And setting a burst cache of the memory if the I/O type data comprises a second memory occupation type representing lower memory utilization rate.

After determining the corresponding policy, a target I/O stack is created according to the policy.

Further, when the user submits the job to be operated, the user edits the process path of the job to be operated in advance, and the I/O stack construction device based on the distributed system interacts with the I/O scheduler of the user through prolog and epilog. Specifically, the target I/O stack may be set by a preprocessing process (Prolog), and the job to be executed is submitted to the computing node by the preprocessing process (Prolog), and after the job to be executed is executed, the storage resources in the target I/O stack are released by a post-processing process (Epilog). When the operation of the job to be operated is finished, the target I/O buffer memory enters a release queue of the I/O scheduler, and the I/O scheduler releases the result file in the target I/O buffer memory to the storage server with low load by judging the load information of the storage server until the complete release of the result file is achieved.

In the scheme, by predicting the I/O type data of the operation to be operated, reasonable I/O stacks are allocated, the I/O performance of the operation is improved, and the I/O time of the operation is reduced. And a policy base is constructed, the rationality of the I/O stack determined according to the I/O type data is improved based on the policy base, and a user can also know how to determine the I/O stack. And, through the storage server that will result file priority to load low, have relieved the phenomenon of I/O contention and because of the phenomenon that the file system is unstable that I/O load is high. And by releasing the I/O buffer, system resources are saved.

Fig. 3 is a schematic structural diagram of an I/O stack building apparatus based on a distributed system according to an embodiment of the disclosure, where the apparatus 300 may be implemented by software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 3, the apparatus includes:

The prediction module 301 is configured to obtain job description information of a job to be run, and input the job description information into a pre-trained I/O feature classification model to obtain I/O type data of the job to be run; wherein the job description information includes: at least one of job name information, job path information, job submission information, the job name information including: at least one of software name information, software version information, and operation example information;

A determining module 302, configured to determine a target I/O stack for processing the job to be executed according to the I/O type data.

In an alternative embodiment, if the I/O type data includes read/write type data and occupied resource type data, the determining module 302 is configured to:

If the read-write type data comprises a first read-write frequency type and the occupied resource type data comprises a first memory occupied type, configuring a hard disk type burst cache for a preset initial I/O stack to obtain the target I/O stack; the first read-write frequency type represents that the predicted read-write frequency of the operation to be operated is larger than a preset read-write frequency threshold, and the first memory occupation type represents that the predicted memory utilization rate of the operation to be operated is larger than a preset memory utilization rate threshold;

If the read-write type data comprises a first read-write frequency type and the occupied resource type data comprises a second memory occupied type, configuring a burst cache of a memory type for the initial I/O stack to obtain the target I/O stack; the second memory occupation type represents that the predicted memory usage rate of the operation to be operated is smaller than or equal to the preset memory usage rate threshold.

In an alternative embodiment, if the I/O type data includes metadata type data, the determining module 302 is configured to:

if the metadata type data comprises a first processing frequency type, carrying out banding processing on metadata management resources in a preset initial I/O stack to obtain the target I/O stack; the first processing frequency type characterizes that the predicted metadata processing frequency of the job to be operated is larger than a preset metadata processing frequency threshold.

In an alternative embodiment, the I/O type data includes: file type data, the determining module 302 is configured to:

If the file type data comprises a first number of types, dividing a metadata storage space for storing files of a second space type in a preset initial I/O stack to obtain the target I/O stack; the first number type represents that the number of predicted files of a second space type of the job to be operated is larger than a preset file number threshold, and the predicted storage space occupied by the second space type represents files is smaller than or equal to the preset second space threshold;

If the file type data comprises a first space type, carrying out striping processing on a calculation data storage space in the initial I/O stack to obtain the target I/O stack; the predicted storage space occupied by the first space type characterization file is larger than a preset first space threshold.

In an alternative embodiment, the apparatus further comprises an adjustment module for:

Acquiring position information of a plurality of computing nodes for computing the job to be operated; determining a minimum node distance between the plurality of computing nodes according to the position information; and if the minimum node distance is smaller than or equal to the preset node distance, adjusting the calculation node to obtain an adjustment node, and returning the adjustment node as a new calculation node to determine a new minimum node distance until the new minimum node distance is larger than the preset node distance.

In an alternative embodiment, the apparatus further comprises:

and the storage module is used for determining a target storage server in the preset storage servers according to the storage load rates of the preset storage servers aiming at each result file of the job to be operated, and storing the result file to the target storage server.

In an alternative embodiment, the apparatus further comprises a training module for:

acquiring index information of a sample operation, and determining sample I/O type data of the sample operation according to the index information and an index threshold value of the index information;

Training an initial model according to sample description information of the sample operation and sample I/O type data corresponding to the sample description information to obtain an I/O characteristic classification model; wherein the sample description information includes: at least one of sample name information and sample path information, the sample name information including: at least one of sample software information and sample calculation power information.

The I/O stack construction device based on the distributed system provided by the embodiment of the disclosure can execute the I/O stack construction method based on the distributed system provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. As shown in fig. 4, electronic device 400 includes one or more processors 401 and memory 402.

The processor 401 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities and may control other components in the electronic device 400 to perform desired functions.

Memory 402 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 401 to implement the distributed system based I/O stack construction method and/or other desired functions of the embodiments of the present disclosure described above. Various contents such as an input signal, a signal component, a noise component, and the like may also be stored in the computer-readable storage medium.

In one example, the electronic device 400 may further include: an input device 403 and an output device 404, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).

In addition, the input device 403 may also include, for example, a keyboard, a mouse, and the like.

The output device 404 may output various information to the outside, including the determined distance information, direction information, and the like. The output device 404 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.

Of course, only some of the components of the electronic device 400 that are relevant to the present disclosure are shown in fig. 4, with components such as buses, input/output interfaces, etc. omitted for simplicity. In addition, electronic device 400 may include any other suitable components depending on the particular application.

In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the distributed system-based I/O stack construction method provided by embodiments of the present disclosure.

The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

Further, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the distributed system-based I/O stack construction method provided by embodiments of the present disclosure.

The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An I/O stack construction method based on a distributed system, comprising:

determining a target I/O stack for processing the job to be operated according to the I/O type data;

The I/O type data comprises read-write type data and occupied resource type data, and the determining a target I/O stack for processing the job to be operated according to the I/O type data comprises the following steps: if the read-write type data comprises a first read-write frequency type and the occupied resource type data comprises a first memory occupied type, configuring a hard disk type burst cache for a preset initial I/O stack to obtain the target I/O stack; the first read-write frequency type represents that the predicted read-write frequency of the operation to be operated is larger than a preset read-write frequency threshold, and the first memory occupation type represents that the predicted memory utilization rate of the operation to be operated is larger than a preset memory utilization rate threshold; if the read-write type data comprises a first read-write frequency type and the occupied resource type data comprises a second memory occupied type, configuring a burst cache of a memory type for the initial I/O stack to obtain the target I/O stack; the second memory occupation type represents that the predicted memory utilization rate of the operation to be operated is smaller than or equal to the preset memory utilization rate threshold; or alternatively

The I/O type data comprises metadata type data, the determining a target I/O stack for processing the job to be executed according to the I/O type data comprises the following steps: if the metadata type data comprises a first processing frequency type, carrying out banding processing on metadata management resources in a preset initial I/O stack to obtain the target I/O stack; wherein the first processing frequency type characterizes that the predicted metadata processing frequency of the job to be operated is larger than a preset metadata processing frequency threshold; or alternatively

The I/O type data comprises file type data, and the determining a target I/O stack for processing the job to be executed according to the I/O type data comprises the following steps: if the file type data comprises a first number of types, dividing a metadata storage space for storing files of a second space type in a preset initial I/O stack to obtain the target I/O stack; the first number type represents that the number of predicted files of a second space type of the job to be operated is larger than a preset file number threshold, and the predicted storage space occupied by the second space type represents files is smaller than or equal to the preset second space threshold; if the file type data comprises a first space type, carrying out striping processing on a calculation data storage space in the initial I/O stack to obtain the target I/O stack; the predicted storage space occupied by the first space type characterization file is larger than a preset first space threshold.

2. The method according to claim 1, wherein determining the I/O type data of the job to be executed according to the job description information includes:

Inputting the operation description information into a pre-trained I/O feature classification model to obtain the I/O type data; or alternatively

Inquiring the job description information in a preset description type relation, and determining the I/O type data successfully matched with the job description information.

3. The method according to claim 1, wherein the method further comprises:

acquiring position information of a plurality of computing nodes for computing the job to be operated;

Determining a minimum node distance between the plurality of computing nodes according to the position information;

And if the minimum node distance is smaller than or equal to the preset node distance, adjusting the calculation node to obtain an adjustment node, and returning the adjustment node as a new calculation node to determine a new minimum node distance until the new minimum node distance is larger than the preset node distance.

4. The method according to claim 1, wherein the method further comprises:

And determining a target storage server in a plurality of preset storage servers according to a plurality of storage load rates of the plurality of preset storage servers for each result file of the job to be operated, and storing the result file to the target storage server.

5. An I/O stack building apparatus based on a distributed system, comprising:

a determining module, configured to determine a target I/O stack for processing the job to be executed according to the I/O type data;

the I/O type data comprises read-write type data and occupied resource type data, and the determining module is used for: if the read-write type data comprises a first read-write frequency type and the occupied resource type data comprises a first memory occupied type, configuring a hard disk type burst cache for a preset initial I/O stack to obtain the target I/O stack; the first read-write frequency type represents that the predicted read-write frequency of the operation to be operated is larger than a preset read-write frequency threshold, and the first memory occupation type represents that the predicted memory utilization rate of the operation to be operated is larger than a preset memory utilization rate threshold; if the read-write type data comprises a first read-write frequency type and the occupied resource type data comprises a second memory occupied type, configuring a burst cache of a memory type for the initial I/O stack to obtain the target I/O stack; the second memory occupation type represents that the predicted memory utilization rate of the operation to be operated is smaller than or equal to the preset memory utilization rate threshold; or alternatively

The I/O type data includes metadata type data, and the determining module is configured to: if the metadata type data comprises a first processing frequency type, carrying out banding processing on metadata management resources in a preset initial I/O stack to obtain the target I/O stack; wherein the first processing frequency type characterizes that the predicted metadata processing frequency of the job to be operated is larger than a preset metadata processing frequency threshold; or alternatively

The I/O type data includes: file type data, the determining module is used for: if the file type data comprises a first number of types, dividing a metadata storage space for storing files of a second space type in a preset initial I/O stack to obtain the target I/O stack; the first number type represents that the number of predicted files of a second space type of the job to be operated is larger than a preset file number threshold, and the predicted storage space occupied by the second space type represents files is smaller than or equal to the preset second space threshold; if the file type data comprises a first space type, carrying out striping processing on a calculation data storage space in the initial I/O stack to obtain the target I/O stack; the predicted storage space occupied by the first space type characterization file is larger than a preset first space threshold.

6. An electronic device, the electronic device comprising:

A processor and a memory;

the processor is adapted to perform the steps of the method according to any of claims 1 to 4 by invoking a program or instruction stored in the memory.

7. A computer readable storage medium storing a program or instructions for causing a computer to perform the steps of the method according to any one of claims 1 to 4.