CN117075800A

CN117075800A - I/O perception self-adaptive writing method for massive check point data

Info

Publication number: CN117075800A
Application number: CN202310865203.6A
Authority: CN
Inventors: 刘轶; 贾婕; 穆鹏宇; 王锐; 解晨浩; 栾钟治; 钱德沛
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-11-17

Abstract

The application provides an I/O perception self-adaptive writing method of massive check point data, which sets an initial state of the check point data writing according to the number of processes and the attribute of a storage system; the initial state parameters of the initial state comprise a writing period; when a writing period is entered, selecting a target process corresponding to the writing period from all processes; aiming at a target process, carrying out feedback regulation according to the real-time running state of the storage system, and determining the writing data quantity of the target process in the writing period; and in the writing period, the checkpoint data of the writing data quantity of the target process is written into the shared storage system, so that the writing time and the writing data quantity of the checkpoint data of each process are adaptively adjusted, and the impact on the storage system caused by the centralized writing of massive checkpoint data of a large-scale high-performance computing system is avoided.

Description

I/O perception self-adaptive writing method for massive check point data

Technical Field

The application relates to the field of high-performance computation, in particular to an I/O perception self-adaptive writing method for massive check point data.

Background

With advances in science and technology, high performance computing (High Performance Computing, HPC) has become an important tool in the scientific and engineering fields. The high-performance calculation utilizes a cluster formed by a plurality of calculation nodes and storage equipment, processes ultra-large-scale data with strong calculation capacity, and is widely applied to the complex calculation fields of life science, weather forecast, deep learning and the like. The performance of HPC systems has been increasing over the past decades, and has become an important support for scientific and engineering innovation.

The performance of an HPC system is measured in floating point operations per second (FLOPS). In the TOP500 list of the supercomputer of 12 months 2022, the front-ranking front tier and Fugaku have huge processor cores, which respectively reach 8,730,112 and 7,630,848, wherein the front tier peak performance is as high as 1.102EFlop/s, and the TOP-ranking front tier and Fugaku are the first supercomputer breaking through billions of computing speeds (E level). The Shenwei Taihu lake light is the super computer with the strongest performance in China, and has 40960 processors, 10649600 cores and the peak system performance of 125.436 PFlips.

With the increase in performance, the HPC system scale has also grown rapidly, and the number of nodes and components has increased rapidly. HPC systems are not simple stacks of computing resources, but rather communication and synchronization between nodes to support execution of various applications. However, these nodes and the system itself may suffer from various failures, such as node downtime, software crashes, hard disk damage, network interruption of the interconnection, etc., which may cause the program to fail to operate properly. Therefore, HPC systems need to have a certain reliability, the system can remain stable, fail-safe or can recover quickly from a failure. Reliability is related to the number of nodes and the quality of the individual nodes. Under the current manufacturing process, the reliability of a single node is relatively stable, so as the scale of the HPC system increases, the less reliable the whole system becomes, and the probability of failure increases.

Mean time between failure (Mean Time Between Failures, MTBF) is an important indicator for the reliability of HPC systems. The higher the MTBF, the more reliable the system, and the less prone to failure. The MTBF of a P-level high performance computing system was counted to be about 15 hours. In view of the continuing rapid increase in system size, the MTBF of the front et al class E HPC system will be only a few hours or even less. This is very disadvantageous for applications on HPCs because they are often characterized by long runs and large-scale processes, and are more prone to various software and hardware failures during the run, resulting in program running interruptions.

However, with the increase of the program scale and the increasing number of processes, together with the introduction of heterogeneous architecture, the amount of data required for saving checkpoints is also increasing. Numerous processes generate massive amounts of checkpoint data, which not only results in extended time for checkpoint save, but also negatively impacts system performance. If numerous processes simultaneously save large amounts of checkpoint data to a shared storage system, the I/O subsystem will be subject to severe overload, resulting in network congestion and I/O bottlenecks. Checkpointing techniques require further optimization to improve scalability and IO bandwidth utilization, such as reducing the amount of checkpointed data, optimizing storage.

For checkpoint software in massively parallel systems, reducing the impact of saving checkpoint files on the I/O subsystem is an important indicator of the availability of checkpoint software. From the bottom, it is the control of the system's shared I/O bandwidth resource usage. For checkpoint software using a coordinated synchronous communication protocol, the guarantee of coordination will lead to a large number of processes to write massive checkpoint data into a hard disk at the same time, causing impact to an I/O subsystem, so how to dynamically determine the corresponding optimal write timing and write data volume of the checkpoint software data according to the comprehensive hardware conditions and real-time load conditions of different systems in a self-adaptive manner becomes a key for solving the problem.

Disclosure of Invention

Therefore, the application aims to provide the I/O perception self-adaptive writing method of the massive checkpoint data, which is used for adaptively adjusting the writing time and the writing data quantity of the checkpoint data of each process, avoiding the impact on a storage system caused by the concentrated writing of the massive checkpoint data of a large-scale high-performance computing system and improving the utilization rate of the bandwidth of the storage system.

The embodiment of the application provides an I/O perception self-adaptive writing method of mass check point data, which comprises the following steps:

Setting an initial state of checkpoint data writing according to the number of processes and the attribute of a storage system; the initial state parameters of the initial state comprise a writing period;

when a writing period is entered, selecting a target process corresponding to the writing period from all processes; corresponding target processes of different writing periods are different;

aiming at a target process, carrying out feedback regulation according to the real-time running state of the storage system, and determining the writing data quantity of the target process in the writing period;

and in the writing period, writing the check point data of the data quantity written by the target process into the shared storage system, and carrying out the next writing period when the writing period is finished.

In some embodiments, in the I/O aware adaptive writing method of the massive checkpoint data, the initial state parameter of the initial state further includes a writing probability;

when entering a writing period, determining a target process corresponding to the writing period from all processes, wherein the target process comprises the following steps:

after entering a new writing period, each process generates a random probability by a random method;

and selecting a process with the random probability smaller than the writing probability in the initial state parameter as a target process.

In some embodiments, in the I/O aware adaptive writing method of massive checkpoint data, when entering a writing period, determining a target process corresponding to the writing period from all processes, further includes:

judging whether a process which is not selected in all of N continuous periods exists or not; wherein, N is a preset period threshold;

if yes, determining that none of the N continuous periods is selected as a target process.

In some embodiments, in the I/O aware adaptive writing method of massive checkpoint data, for a target process, feedback adjustment is performed according to a real-time running state of a storage system, and determining a write data amount of the target process in the write cycle includes:

aiming at a target process, determining the writing data amount in the current writing period according to the writing state of the target process and the writing data amount in the last writing period;

and when the writing state of the target process is finished in the last writing period, determining according to the writing data quantity or the writing rate of the last writing period, wherein the writing data quantity and the writing rate of the last writing period represent the real-time running state of the storage system.

In some embodiments, in the I/O aware adaptive writing method of the massive checkpoint data, the initial state parameter includes a writing window threshold;

at the end of the last writing period, determining the writing state of the target process according to the writing data amount or the writing rate of the last writing period comprises the following steps:

when the target process is in an acceleration state in the last writing period, if the writing data volume in the last writing period is larger than or equal to a writing window threshold value, the writing state of the target process is adjusted to enter a stable state;

when the target process is in an acceleration state or a stable state in the last writing period, if the writing rate of the last writing period is smaller than a dynamic rate threshold value, the writing state of the target process is adjusted to enter a deceleration state;

when the target process is in a deceleration state in the last writing period, if the writing rate of the last period is greater than or equal to a dynamic rate threshold value, the writing state of the target process is adjusted to enter a stable state; if the writing rate of the last period is still smaller than the dynamic rate threshold, the target process is adjusted to enter an acceleration state, and the writing data quantity is the initial data quantity in the initial state parameters;

Wherein the dynamic rate threshold is dynamically determined based on an average of historical write rates prior to a previous write cycle.

In some embodiments, in the method for I/O aware adaptive writing of massive checkpoint data, the method further comprises:

and when the last writing period is over, after the writing state of the target process is determined, updating the historical writing rate average value based on the writing rate of the last writing period.

In some embodiments, in the I/O aware adaptive writing method of mass checkpoint data, updating the historical writing rate average based on the writing rate of the previous writing period includes:

calculating a historical writing rate average value based on the historical writing rate of each period and the weight of each period; wherein the later the period the greater the corresponding weight.

In some embodiments, in the I/O aware adaptive writing method of massive checkpoint data, determining, according to a writing state of the target process and a writing data amount and a data block size of a previous writing period, the writing data amount of a next writing period includes:

when the writing state of the target process is an acceleration state, determining that the writing data quantity in the current writing period is twice the writing data quantity in the previous period;

When the writing state of the target process is a stable state, determining that the writing data quantity in the current writing period is larger than the writing data quantity in the previous period by a preset number of data blocks;

when the writing state of the target process is the initial deceleration state, determining that the writing data volume in the current writing period is half of the writing data volume in the previous period;

and when the writing state of the target process is a secondary speed-down state, determining that the writing data amount in the current writing period is the initial writing data amount in the initial state parameters.

In some embodiments, in the I/O aware adaptive writing method of massive checkpoint data, when the writing state of the target process is a down state, the method further includes:

and adjusting the write window threshold to be half of the written data quantity of the previous period.

In some embodiments, there is also provided an I/O aware adaptive writing device for mass checkpoint data, the device comprising:

the setting module is used for setting an initial state of checkpoint data writing according to the number of processes and the attribute of the storage system; the initial state parameters of the initial state comprise a writing period;

the selection module is used for selecting a target process corresponding to a writing period from all processes when entering the writing period; corresponding target processes of different writing periods are different;

The determining module is used for carrying out feedback adjustment according to the real-time running state of the storage system aiming at the target process and determining the writing data quantity of the target process in the writing period;

and the writing module is used for writing the check point data of the data volume written by the target process into the shared storage system in the writing period, and writing the next writing period when the writing period is finished.

The application provides an I/O perception self-adaptive writing method of massive check point data, which is characterized in that an initial state of the check point data writing is set according to the number of processes and the attribute of a storage system; when a writing period is entered, selecting a target process corresponding to the writing period from all processes; corresponding target processes of different writing periods are different; determining the writing data quantity of the target process in the writing period aiming at the target process; writing checkpoint data of the target process writing data amount into a shared storage system in the writing period, and performing the next writing period when the writing period is finished; in this way, a feedback regulation mechanism is introduced, and the current hardware use information and the system load pressure are periodically detected before the writing module truly executes writing operation, so that the data volume written into the I/O system by the process is adaptively regulated; the write operation is distributed into a plurality of write periods of a time interval, and the number of processes for performing check point write simultaneously is controlled, so that the total operation amount of I/O write at the same time is reduced, the contention of a plurality of processes for bandwidth resources of a storage system is reduced, the load pressure of the I/O system is reduced, and the bandwidth utilization rate of the I/O system is improved; therefore, by adaptively adjusting the writing time and the writing data quantity of the check point data of each process, the optimal writing throughput can be approached when the check data is written, the impact on a storage system caused by the concentrated writing of massive check point data of a large-scale high-performance computing system is avoided, a plurality of writing operations and huge data quantity generated instantaneously are prevented from exceeding the load capacity of the storage system, and the utilization rate of the bandwidth of the storage system is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for I/O aware adaptive writing of mass checkpoint data in accordance with an embodiment of the present application;

FIG. 2 is a flowchart of a method for determining a target process corresponding to the write cycle from all processes according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a write state transition according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an I/O aware adaptive writing device for mass checkpoint data according to an embodiment of the present application;

fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for the purpose of illustration and description only and are not intended to limit the scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this disclosure, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to or removed from the flow diagrams by those skilled in the art under the direction of the present disclosure.

In addition, the described embodiments are only some, but not all, embodiments of the application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that the term "comprising" will be used in embodiments of the application to indicate the presence of the features stated hereafter, but not to exclude the addition of other features.

Aiming at the problems, the application provides an I/O perception self-adaptive writing method of massive check point data, wherein an initial state of the check point data writing is set according to the number of processes and the attribute of a storage system; when a writing period is entered, selecting a target process corresponding to the writing period from all processes; corresponding target processes of different writing periods are different; determining the writing data quantity of the target process in the writing period aiming at the target process; writing checkpoint data of the target process writing data amount into a shared storage system in the writing period, and performing the next writing period when the writing period is finished; in this way, a feedback regulation mechanism is introduced, and the current hardware use information and the system load pressure are periodically detected before the writing module truly executes writing operation, so that the data volume written into the I/O system by the process is adaptively regulated; the write operation is distributed into a plurality of write periods of a time interval, and the number of processes for performing check point write simultaneously is controlled, so that the total operation amount of I/O write at the same time is reduced, the contention of a plurality of processes for bandwidth resources of a storage system is reduced, the load pressure of the I/O system is reduced, and the bandwidth utilization rate of the I/O system is improved; therefore, by adaptively adjusting the writing time and the writing data quantity of the check point data of each process, the optimal writing throughput can be approached when the check data is written, the impact on a storage system caused by the concentrated writing of massive check point data of a large-scale high-performance computing system is avoided, a plurality of writing operations and huge data quantity generated instantaneously are prevented from exceeding the load capacity of the storage system, and the utilization rate of the bandwidth of the storage system is improved.

Referring to fig. 1, fig. 1 shows a flowchart of a method for I/O aware adaptive writing of mass checkpoint data according to an embodiment of the present application, and specifically, the method includes steps S101 to S104:

s101, setting an initial state of checkpoint data writing according to the number of processes and the attribute of a storage system; the initial state parameters of the initial state comprise a writing period;

s102, when a writing period is entered, selecting a target process corresponding to the writing period from all processes; corresponding target processes of different writing periods are different;

s103, aiming at the target process, determining the writing data quantity of the target process in the writing period;

s104, in the writing period, the check point data of the data volume written by the target process is written into a storage system, and the next writing period is carried out when the writing period is finished.

In the step S101, the storage system may also be referred to as a shared storage system, I/O subsystem, I/O system, shared I/O system, etc., where the storage system is used to store massive checkpoint data of a process.

The check point data refers to data for saving the running state of the program to a storage medium (such as a hard disk) with higher reliability, and comprises memory data of a process, inter-process communication data, context data and the like.

After the preparation of the check point data written into the hard disk is finished, firstly determining the initial state of the writing of the check point data; the state parameter includes a write period T, a write probability P of the write period ₀ Data block size S ₀ And a write window threshold S _w 。

Then, each user process periodically performs a write operation to the hard disk, and in order to reduce the contention of the I/O system resources caused by the writing of checkpoint data by a large number of processes, each process adopts a random method in each period to wait for being selected to enter a writing period with a certain writing probability. The write probability P ₀ The probability that each process is selected per cycle is characterized.

Here, the initial state of the checkpoint data writing is set according to the number of processes and the attribute of the storage system, for example, the attribute of the storage system includes disk capacity, the larger the disk capacity is, the more processes can be written at the same time, the higher the probability that each process is selected, the writing probability P is ₀ The larger; the greater the number of processes, the lower the probability that each process is selected, the write probability P ₀ The smaller.

The data block size S ₀ Is the basic unit of the written data quantity in the embodiment of the application, and is expressed by the data block size S ₀ The amount of data written is adjusted on a basis.

The write window threshold S _w Representing the limitation of the hardware condition of the storage system to the write operation of each period, if the hardware resource of the storage system is good, the more the data quantity which can resist simultaneous writing, the write window threshold S _w The larger.

Therefore, in the application, the initial state of the data writing of the check point is set according to the attribute of the storage system, and the feedback adjustment is carried out according to the initial state in the subsequent step, so that the process number and the data quantity of the writing operation can be dynamically adjusted in a self-adaptive manner according to the comprehensive hardware conditions of different systems.

In an embodiment of the present application, the initial state parameters of the initial state include a writing period and an initial stateWriting parameters; the initial write parameters are determined by preset values given by checkpoint software. The initial write parameters include: the initial data writing amount is, for example, a data block size S in the embodiment of the present application ₀ The method comprises the steps of carrying out a first treatment on the surface of the An initial historical write rate average, illustratively, the measured write rate for the first write hard disk in the present application; the initial write window threshold, illustratively, is Q data blocks (Q is a preset value for checkpoint software) in the present application.

In the embodiment of the application, si is set as the write data volume of the write period of the ith write data, V _i The write rate, V, measured for the write cycle of the ith write data _i ' is a weighted average of the write cycle history write rates of the first i write data.

In the step S102, when a writing period is entered, selecting a target process corresponding to the writing period from all processes; the corresponding target processes for different write cycles are different.

In the embodiment of the present application, referring to fig. 2, when a writing period is entered, a target process corresponding to the writing period is determined from all processes, and the method includes the following steps S201 to S202:

s201, after a new writing period is entered, each process generates a random probability through a random method;

s202, selecting a process with the random probability smaller than the writing probability in the initial state parameter as a target process.

The random probability described herein is that a random number R is generated in the interval 0, 1.

After entering a new write cycle, each process uses a random method to write the probability P ₀ A write operation is performed. If it is determined that no write operation is to be performed, the process does not perform any operation during this write cycle. Otherwise, the data S is written in this writing period (assuming the writing period of the ith writing data) _i . Thus, each write cycle has only about (P ₀ X 100%) process write operations, facilitating reduced number of simultaneous process write checksContention for I/O bandwidth resources by point data.

After entering a new writing period, each process generates a random probability by a random method; specifically, in the embodiment of the present application, each process is in the interval [0,1 ] by a random method]Generating a random number R, if R<P ₀ The write operation is performed, otherwise the write operation is not performed.

If a certain user process does not perform writing operation for N continuous writing periods, the writing operation is not performed directly by a random method in the writing period. In the embodiment of the present application, specifically, when entering a writing period, a target process corresponding to the writing period is determined from all processes, and the method further includes:

Here, N is a natural number, for example, 3, that is, no write operation is performed for any 3 consecutive periods, and this write period is not directly performed by a random method.

Each process write to hard disk operation has three write states: acceleration, steady state, and deceleration states. The acceleration state has twice the amount of write data per write cycle (i.e., an exponential increase) as the amount of write data written to the write cycle of the hard disk operation in the previous cycle. The writing amount of each writing period in the steady state is increased by one more data block S for the writing period of the last writing period for writing to the hard disk ₀ (i.e., linear growth). The reduced speed state then adjusts the amount of data written and the write window threshold accordingly.

When writing is started, the user process is in an acceleration state, and the initial writing data quantity is a data block S ₀ . After each period of writing is completed, the data writing rate in the period is counted, and compared with the weighted average value of the previous historical writing rate, so that the writing state is correspondingly adjusted, and the data of the check point is enabled by dynamically adjusting the writing data quantityThe writing is near the optimal writing rate. Because the data write rate during this period is actually indicative of the real-time operating state of the storage system, the method of embodiments of the present application essentially performs feedback adjustment based on the real-time operating state of the storage system.

In step S103, for the target process, feedback adjustment is performed according to the real-time running state of the storage system, so as to determine the write data amount of the target process in the write cycle.

Specifically, the feedback adjustment is performed according to the writing data volume and/or the writing rate of the target process in the previous period, and the writing data volume of the target process in the writing period is determined.

And aiming at the target process, carrying out feedback regulation according to the real-time running state of the storage system, and determining the writing data quantity of the target process in the writing period, wherein the method comprises the following steps:

aiming at a target process, determining the writing data volume in the next writing period according to the writing state of the target process, the writing data volume in the last writing period and the data block size in the initial state parameter;

and when the writing state of the target process is finished in the last writing period, determining according to the writing data quantity and/or the writing rate of the last writing period, wherein the writing data quantity and the writing rate of the last writing period represent the real-time running state of the storage system.

That is, in the embodiment of the present application, when the last writing period ends, the writing state of the target process is determined according to the writing data amount and/or the writing rate of the last writing period, where the writing state is a feedback result determined based on the real-time running state fed back by the storage system, and then the writing data amount of the last writing period is adjusted according to the feedback result, so as to obtain the writing data amount of the current writing period.

Here, since the embodiment of the present application uses the size of the data block included in the initial state parameter as a basic unit, the amount of write data is the number of data blocks.

In the I/O aware self-adaptive writing method of mass checkpoint data according to the embodiment of the present application, the initial state parameter includes a writing window threshold;

referring to fig. 3, fig. 3 is a schematic diagram illustrating a write state transition according to an embodiment of the application. At the end of the last writing period, determining the writing state of the target process according to the writing data amount or the writing rate of the last writing period comprises the following steps:

Wherein the dynamic rate threshold is determined based on a historical write rate average prior to a previous write cycle.

In the actual writing process, the target process keeps the acceleration state until the writing data amount is greater than or equal to the writing window threshold value, and keeps the stable state until the writing speed is less than the dynamic speed threshold value.

That is, at the end of the last writing period, determining the writing state of the target process according to the writing data amount or the writing rate of the last writing period further includes:

when the target process is in an acceleration state in the last writing period, if the writing data quantity in the last writing period is smaller than a writing window threshold value, the writing state of the target process is kept in the acceleration state;

when the target process is in a stable state in the last writing period, the writing rate of the last writing period is greater than or equal to a dynamic rate threshold value, and the writing state of the target process is kept in a stable state.

Referring to fig. 3, in the table of the embodiment of the present application, sw is set in the write window threshold; in the acceleration state, if the written data quantity S of the previous writing period _i Greater than or equal to the write window threshold, namely:

S _i ≥Sw,

a steady state is entered.

The dynamic rate threshold value, the historical writing rate weighted average V _i-1 ' minimum of deviation, the historical write rate weighted average V _i-1 The minimum value of the deviation is (1-epsilon) V _i-1 ' wherein epsilon is a preset write rate deviation.

In the acceleration or steady state, if the write rate V of the previous write cycle _i Less than the dynamic rate threshold, i.e.:

V _i <(1-ε)V _i-1 ’,

a deceleration state is entered.

Here, the last write cycle is the ith write cycle, the historical write rate weighted average V _i-1 ' is a weighted average of the write rate for i-1 write cycles.

In the embodiment of the present application, determining the write data amount in the next write cycle according to the write state of the target process, the write data amount in the last write cycle, and the size of the data block, includes:

It should be noted that: when the target process enters the writing state for the first time, the writing state is the acceleration state, and the writing data quantity is the initial writing data quantity in the initial state parameters.

Referring to fig. 3, in the embodiment of the present application, the initial write data amount in the initial state parameter is 1 data block.

Specifically, at each writing, the written data amount S of this writing period is first determined _i . If the user process is in an accelerated state, the amount of written data is twice the amount of written data of the previous writing period (exponentially increasing).

Namely: s is S _i ＝2S _i-1 ；

Wherein the S is _i Characterizing the amount of write data for the ith write cycle, S _i-1 The amount of write data for the i-1 th write cycle is characterized.

If the user process is in a steady state, the amount of data written in this write cycle is greater than the amount of data written in the previous write cycle by a preset number of data blocks (linear increase). In the embodiment of the present application, a data block is added specifically, that is:

S _i ＝S _i-1 +S ₀ ；

The S is _i Characterizing the amount of write data for the ith write cycle, S _i-1 The amount of write data for the i-1 th write cycle is characterized.

If the user is in the speed-down state, correspondingly adjusting the written data volume according to the detected speed, specifically, S _i ＝S _i-1 /2；

In the embodiment of the application, if the user is in a speed-down state, the write window threshold value is required to be adjusted while the write data amount is correspondingly adjusted according to the detected speed.

In the embodiment of the present application, when the writing state of the target process is a deceleration state, the method further includes:

If the user process is in a descending state, the writing rate of the process is low, and the data congestion is serious, the threshold value of the writing window needs to be adjusted, and the growth speed of the writing data of all the processes is reduced.

The I/O perception self-adaptive writing method for massive checkpoint data provided by the embodiment of the application further comprises the following steps:

Updating the historical write rate average based on the write rate of the last write cycle, comprising:

It should be noted that, the determining of the writing state of the current writing cycle is performed after the writing operation of the previous writing cycle is completed, in other words, after the writing operation of one writing cycle is completed, the writing state of the next writing cycle is determined based on the comparison result of the writing rate and the dynamic rate threshold value in the writing cycle, and the comparison result of the writing data amount and the writing window threshold value.

And updating the historical write rate average based on the write rate in this write period.

Specifically, the historical write rate weighted average V is updated _i The' calculation method is as follows:

V _i ’＝αV _i +(1-α)V _i-1 ’；

here, the V _i A write rate for an i-th write cycle; v (V) _i-1 ' is a weighted average of the write rate for i-1 write cycles; v (V) _i ' is an updated historical writing rate weighted average, i.e. i writing period writing rate weighted average, and alpha is a weighted weight.

Unfolding the V _i ’＝αV _i +(1-α)V _i-1 ' it can be seen that the later the historical write rate, the greater the weighted weight and thus the comparability with the measured real-time write rate.

That is, after the checkpoint data of the target process writing data amount is written into the storage system, the writing rate in the writing period is calculated, and the writing state of the next writing period is determined based on the writing rate and the writing data amount, so that feedback adjustment is performed according to the real-time running state of the storage system in the writing period, and the writing data amount of the target process in the next writing period is determined.

Here, the target process may be selected in the next writing period, may not be selected, and only when the target process is selected again, the writing operation is performed, and if not selected, the target process is in a waiting state, and does not perform the writing operation, and it is not necessary to re-determine the writing state, the writing data amount, and the like.

Based on the same inventive concept, the embodiment of the application also provides an I/O-aware adaptive writing device of mass checkpoint data corresponding to the I/O-aware adaptive writing method of mass checkpoint data, and since the principle of solving the problem of the device in the embodiment of the application is similar to that of the method in the embodiment of the application, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an I/O aware adaptive writing device for mass checkpoint data according to an embodiment of the present application, where the device includes:

the setting module 401 is configured to set an initial state of checkpoint data writing according to the number of processes and the attribute of the storage system; the initial state parameters of the initial state comprise a writing period;

a selection module 402, configured to, when entering a writing period, select a target process corresponding to the writing period from all processes; corresponding target processes of different writing periods are different;

a determining module 403, configured to perform feedback adjustment according to a real-time running state of the storage system for a target process, and determine a write data amount of the target process in the write period;

a writing module 404, configured to write checkpoint data of the target process write data amount into the shared storage system in the writing period, and perform a next writing period at the end of the writing period.

The application provides an I/O perception self-adaptive writing device for massive check point data, which is characterized in that an initial state for writing the check point data is set according to the number of processes and the attribute of a storage system; when a writing period is entered, selecting a target process corresponding to the writing period from all processes; corresponding target processes of different writing periods are different; determining the writing data quantity of the target process in the writing period aiming at the target process; writing checkpoint data of the target process writing data amount into a shared storage system in the writing period, and performing the next writing period when the writing period is finished; in this way, a feedback regulation mechanism is introduced, and the current hardware use information and the system load pressure are periodically detected before the writing module truly executes writing operation, so that the data volume written into the I/O system by the process is adaptively regulated; the write operation is distributed into a plurality of write periods of a time interval, and the number of processes for performing check point write simultaneously is controlled, so that the total operation amount of I/O write at the same time is reduced, the contention of a plurality of processes for bandwidth resources of a storage system is reduced, the load pressure of the I/O system is reduced, and the bandwidth utilization rate of the I/O system is improved; therefore, by adaptively adjusting the writing time and the writing data quantity of the check point data of each process, the optimal writing throughput can be approached when the check data is written, the impact on a storage system caused by the concentrated writing of massive check point data of a large-scale high-performance computing system is avoided, a plurality of writing operations and huge data quantity generated instantaneously are prevented from exceeding the load capacity of the storage system, and the utilization rate of the bandwidth of the storage system is improved.

In some embodiments, in the I/O aware adaptive writing device of the massive checkpoint data, the initial state parameter of the initial state further includes a writing probability;

the selection module is specifically configured to, when entering a writing period, determine a target process corresponding to the writing period from all processes:

In some embodiments, in the I/O aware adaptive writing device for massive checkpoint data, when entering a writing period, the selecting module is further configured to, when determining, from all processes, a target process corresponding to the writing period:

In some embodiments, in the I/O aware adaptive writing device for massive checkpoint data, the determining module is specifically configured to, when performing feedback adjustment according to a real-time running state of a storage system for a target process, determine a write data amount of the target process in the write cycle:

In some embodiments, in the I/O aware adaptive writing device of the mass checkpoint data, the initial state parameter includes a writing window threshold;

in some embodiments, in the I/O aware adaptive writing device for massive checkpoint data, the determining module is specifically configured to, when determining, at the end of a previous writing period, a writing state of a target process according to a writing data amount or a writing rate of the previous writing period:

In some embodiments, in the I/O aware adaptive writing device of mass checkpoint data, the determining module is further configured to:

In some embodiments, in the I/O aware adaptive writing device of mass checkpoint data, the determining module is specifically configured to, when updating the historical writing rate average value based on the writing rate of the previous writing period:

In some embodiments, in the I/O aware adaptive writing device for massive checkpoint data, the determining module determines, according to a writing state of the target process and a writing data amount and a data block size of a previous writing period, a writing data amount of a next writing period, including:

In some embodiments, in the I/O aware adaptive writing device of massive checkpoint data, the determining module is further configured to adjust the write window threshold to be half of the amount of write data of the previous cycle when the write state of the target process is a down state.

Based on the same inventive concept, the embodiment of the application also provides an electronic device corresponding to the I/O-aware adaptive writing method of mass checkpoint data, and since the principle of solving the problem of the electronic device in the embodiment of the application is similar to that of the method in the embodiment of the application, the implementation of the electronic device can refer to the implementation of the method, and the repetition is omitted.

Referring to fig. 5, fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the application, and the electronic device 500 includes: the system comprises a processor 502, a memory 501 and a bus, wherein the memory 501 stores machine-readable instructions executable by the processor 502, the processor 502 and the memory 501 communicate through the bus when the electronic device 500 is running, and the machine-readable instructions are executed by the processor 502 to perform the steps of the I/O aware adaptive writing method of mass check point data.

In the implementation of the application, the method aims to solve the problems of large program scale and large process number of storage of massive check point data of a cluster formed by a plurality of computing nodes and storage equipment, therefore, preferably, the processor comprises a plurality of processing modules or the processor is a cluster formed by a plurality of sub-processors, thereby ensuring the operation efficiency.

Based on the same inventive concept, the embodiment of the application further provides a computer readable storage medium corresponding to the I/O aware adaptive writing method of mass checkpoint data, and since the principle of solving the problem of the computer readable storage medium in the embodiment of the application is similar to that of the method in the embodiment of the application, the implementation of the computer readable storage medium can refer to the implementation of the method, and the repetition is omitted.

A computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the I/O aware adaptive writing method of mass checkpoint data.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the method embodiments, and are not repeated in the present disclosure. In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, and the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, or in a software product stored in a storage medium, including several instructions for causing a computer device, a plurality of computing devices or a cluster of computing nodes, which may be a personal computer, a platform server, or a network device, to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily appreciate variations or alternatives within the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. An I/O aware adaptive writing method for massive checkpoint data, the method comprising:

2. The I/O aware adaptive writing method of mass checkpoint data of claim 1, wherein the initial state parameters of the initial state further include a writing probability;

3. The method for I/O aware adaptive writing of mass checkpoint data according to claim 2, wherein when a writing cycle is entered, determining a target process corresponding to the writing cycle from all processes, further comprising:

4. The I/O aware adaptive writing method of mass checkpoint data as claimed in claim 1, wherein:

5. The I/O aware adaptive writing method of mass checkpoint data of claim 4, wherein the initial state parameters include a write window threshold;

6. The I/O aware adaptive writing method of mass checkpoint data as in claim 5, further comprising:

7. The I/O aware adaptive writing method of mass checkpoint data as in claim 6, wherein updating the historical write rate average based on the write rate of the last write cycle comprises:

8. The I/O aware adaptive writing method of mass checkpoint data as in claim 5, wherein determining the amount of write data for the next write cycle based on the write state of the target process and the amount of write data and the data block size for the previous write cycle comprises:

9. The I/O aware adaptive writing method of mass checkpoint data as in claim 5, further comprising, when the writing state of the target process is a down state:

10. An I/O aware adaptive writing device for mass checkpoint data, the device comprising: