CN108491159B

CN108491159B - Large-scale parallel system check point data writing method for relieving I/O bottleneck based on random delay

Info

Publication number: CN108491159B
Application number: CN201810188654.XA
Authority: CN
Inventors: 刘轶; 孙庆峥; 朱延超
Original assignee: Beihang University
Current assignee: Kaixi Beijing Information Technology Co ltd
Priority date: 2018-03-07
Filing date: 2018-03-07
Publication date: 2020-07-17
Anticipated expiration: 2038-03-07
Also published as: CN108491159A

Abstract

The invention discloses a large-scale parallel system check point data writing method for relieving I/O bottleneck based on random time delay. The invention uses the random delay checkpoint file processing method to determine the preset delay write-in time and disperses the write-in operation in time, thereby reducing the I/O write-in peak value at the same moment and achieving the purpose of relieving the I/O bottleneck. Before the large-scale parallel system executes I/O operation, the associated data information of the large-scale parallel system is periodically detected, if the operation of the application program is influenced, the delay operation is abandoned and the write operation is immediately executed, so that the influence of occupying shared resources for a long time on the normal operation of the application program is avoided; otherwise, continuing to write according to the determined delay writing time. The invention can reduce the pressure on the I/O subsystem caused by the traditional centralized writing mode applied to different system platforms, and obtain higher throughput rate and shorter global blocking time.

Description

Large-scale parallel system check point data writing method for relieving I/O bottleneck based on random delay

Technical Field

The invention relates to a processing method for dynamically adjusting the optimal writing time of Checkpoint (Checkpoint) data in the field of high-performance computing, in particular to a Checkpoint data writing control method for relieving I/O bottleneck caused by centralized writing of Checkpoint data in a large-scale parallel system.

Background

The high-performance computing mostly adopts a large-scale parallel computing mode, and a high-performance computing system comprises three major parts, namely a computing node, a network interconnection system and a storage system on a hardware infrastructure. The storage system comprises a plurality of I/O nodes and external storage equipment, wherein the I/O nodes run a parallel file system, respond to read-write requests of the computing nodes and realize management and scheduling of the external storage equipment.

With the continuous expansion of the scale of high performance computing systems, when a parallel program runs for a long time by using a large number of computing nodes, software or hardware errors in part of the nodes usually occur, which brings new challenges to the reliability of the system. The Mean Time Between Failures (MTBF) of the current super computer complete system is reduced to several hours. Most of existing long-time running programs belong to MPI (message passing interface) programs, and when an error occurs in a certain node, the MPI program running in the current node may be stopped or stopped, so that the programs in all the nodes must be re-executed, all previous calculation results are lost, and the serious waste of resources is undoubtedly caused. Meanwhile, in a large-scale high-performance computing system, since the mean time between failures is only several hours, in the worst case, the program repeatedly restarts to be executed and cannot be finally executed. Therefore, in order to enable the parallel program to execute correctly, a rollback recovery (rollback recovery) technique is widely used in high-performance computing as a fault-tolerant technique. One class of representatives is checkpoint software.

The checkpoint software periodically saves the relevant information of the application program in all the corresponding nodes at the current moment to form a checkpoint file set consisting of single-node checkpoint files of all the nodes, and then writes the set into the stable storage through the I/O nodes. When a node has an error, checkpoint software reads out the previous checkpoint data, creates a process according to the record, recovers the data, and further recovers the execution of an application program, so as to achieve the purpose of storing the previous operation result.

After the advent of checkpoint-based fault tolerance techniques, how to reduce the overhead incurred during checkpointing has become a major issue in the research of checkpointing techniques. Most checkpointing software, to date, has focused on reducing the amount of checkpoint data that a single node single checkpoint operation saves as a relevant research effort to reduce checkpoint overhead. However, in a node network application scenario of a large-scale or super-large-scale cluster, because the number of nodes in the computing node set is too large, the best effect still cannot be achieved if only the size of the data volume needing to be stored by the checkpoint software is considered. In the currently mainstream checkpoint software using a coordinated synchronization protocol (coordinated), before performing a checkpoint operation, a global synchronization operation is performed on all nodes to achieve a globally consistent state, so as to avoid a possible domino effect (consecutive rollback due to global state inconsistency). After the operation of collecting the checkpoint data is completed, the checkpoint software defaults to directly write the checkpoint data into the external storage system so as to cope with the node down fault which may occur later (if a certain node is down and the checkpoint data of the current node is not written into the stable storage, the corresponding checkpoint data of the node is lost, and the operation of the process related to the node cannot be recovered). Because the number of I/O nodes in the system is far less than that of the computing nodes, the centralized writing of check point data by the computing nodes with a large number of numbers impacts the I/O system, and further forms a system bottleneck, and the problem is more prominent along with the increase of the scale of a high-performance computing system.

For checkpoint software in a massively parallel system, reducing the impact of the checkpoint data writing process on the I/O subsystem is an important indicator for achieving checkpoint software availability. From the bottom level, it is the control of the use of the system shared I/O bandwidth resources. To better control the peak of I/O bandwidth usage, the centralized I/O request may be time resolved to some extent: the check point data is firstly cached in the memory of the current node, the independent writing module is responsible for processing, and then the writing operation is dispersed into a time interval, so that the total operation amount of I/O writing at the same time is reduced. And meanwhile, a feedback regulation mechanism is introduced, in the delay waiting process, the hardware use information of the current computing node is periodically detected, and the information such as the CPU use rate, the memory occupancy rate and the like is used as feedback information and provided to a controller of a write-in module for use, so that a reasonable I/O write-in time strategy is finally obtained.

For the checkpoint software using the coordination protocol, determining the parallelism degree without considering the use condition of system I/O will cause a certain impact on the I/O subsystem, so how to dynamically determine the optimal write-in timing corresponding to the checkpoint software data according to the comprehensive hardware conditions and real-time load conditions of different systems in a self-adaptive manner becomes the key to solve the problem. Aiming at the problems, the invention provides a large-scale parallel system check point data writing method based on a random delay I/O bottleneck.

Disclosure of Invention

The invention discloses a large-scale parallel system checkpoint (checkpoint) data writing method for relieving system I/O bottleneck through random delay. The method separates the checkpoint recovery process and the write-in process, temporarily caches checkpoint data in a memory after the checkpoint data is generated by each node in the system, calculates the delayed write-in time of the checkpoint data by using a corresponding random delayed checkpoint file processing method, and writes the checkpoint data into an external storage subsystem after the delayed write-in time is timed out. In the delay waiting process, the use information of the related hardware is periodically detected, and if the operation of the existing application program is influenced, the delay is abandoned to immediately execute the write-in operation, so that the influence of occupying hardware resources for a long time on the normal operation of the related program is avoided. Compared with the traditional checkpoint data centralized writing mode, the method has the advantages that the checkpoint data writing operation of each node is dispersed in time, the peak value formed by writing checkpoint data into an external storage system simultaneously by all nodes is avoided, the I/O bottleneck of the system can be relieved, and the expandability of a checkpoint system is improved.

The invention relates to a large-scale parallel system check point data writing method for relieving I/O bottleneck based on random delay, which specifically comprises the following steps:

step A, after the write-in module (20) finishes the buffer storage of the associated data information, the current time is obtained as the starting time point t of the time section_Opener；

Step B, calculating operation node set BP ═ BP_b,bp_b+1,…，bp_cEach node in the time section obtains an end time point t of the time section by using a random delay checkpoint file processing method_{Stop block}；

Step C, determining the writing time zone [ t ]_Opener,t_{Stop block}]Then, recording and calculating a running node set BP ═ BP_b,bp_b+1,…，bp_cAn independent random value existing at each node in the };

step D, under the determined random value, operating node set BP ═ BP { for calculation_b,bp_b+1,…，bp_cEvery node in the tree defines a time zone t_Opener,t_{Stop block}]Relative time position of (a);

step E, under the determined relative time position, calculating the operation node set BP ═ BP_b,bp_b+1,…，bp_cDetermining a preset delay writing time by each node in the data processing unit; uniformly distributing the preset delay writing time to the whole writing time zone [ t ] according to time sequence_Opener,t_{Stop block}]Removing to obtain a time axis;

step F, judging whether the current program running time reaches the preset delay writing time, if so, executing step J; if not, executing the step G;

step G, recording a calculation operation node set BP ═ BP { in the same period_b,bp_b+1,…，bp_cFeedback information of each node in the } is obtained;

step H, obtaining evaluation parameters through feedback information

And will be

With a predetermined threshold value K_{Threshold value}Make a comparison if

Executing step J; if it is

Executing the step I;

step I, when it is satisfied

When the time is longer than the preset time, the local operation environment is allowed to continue delaying, and the step F is carried out;

and step J, writing the cached associated data information into an external storage system (40), and finishing the data writing operation of the current delay check point.

The method for writing the checkpoint data of the large-scale parallel system based on the random delay relieving I/O bottleneck has the advantages that:

① the invention aims at the checkpoint program using the coordination protocol, and can slow down the write peak of checkpoint data by delaying the write, thereby obtaining higher throughput rate.

② the invention uses random delay checkpoint file processing method to determine the delay write time of checkpoint data, and at the same time, the invention can correspondingly adjust the optimal write time of checkpoint data according to the change of system load.

③ each node of the invention independently uses the random delay checkpoint file processing method to calculate the delay write time, does not depend on the centralized global scheduling control, is beneficial to reducing the extra overhead brought by the global synchronous operation in the large-scale parallel computing environment, reduces the processing time and improves the expandability of the checkpoint software.

④ the active latency technique of the present invention is not found in conventional I/O optimization techniques, and the active latency effectively reduces write conflicts and reduces I/O performance loss caused by periodic large simultaneous write operations.

Drawings

FIG. 1 is a flow diagram showing modules in a parallel process of adjusting checkpoint process writes according to the present invention.

FIG. 2 is a schematic diagram of a dynamic adjustment process for a predicted delay write time of any compute run node using the checkpoint writing method of the present invention.

Fig. 3 is a comparison graph of the delay time of the operation of writing a checkpoint file under the same bandwidth and node count.

FIG. 4 is a graph illustrating the efficiency of the delayed write time and the total write time of checkpoint data in I/O contention in a massively parallel system.

10. Execution module	20. Write-in module	30. Recovery module
			40. External storage system

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Referring to fig. 1, in the process of writing checkpoint data in the massively parallel system, an execution module 10, a write module 20 and a recovery module 30 are used to relieve the impact on the I/O subsystem, and after the write delay time is timed out, checkpoint data is written into an external storage system 40.

(A) When the large-scale parallel system enters the checkpoint operation, the execution module 10 firstly suspends the process, and then the execution module 10 completes each item of synchronization and collects the associated data information; the associated data information can be shared files, process information, memory information and the like;

(B) after the execution module 10 finishes collecting all the associated data information, the write-in module 20 caches the associated data information in the memory of the local node; after the cache is completed, the execution module 10 outputs a recovery instruction to the recovery module 30;

(C) the recovery module 30 is configured to recover the process to the state before the pause and release the unlock, so that the process resumes execution, and the checkpoint operation process ends at this time.

In the write-in module 20, the write-in module 20 calculates the corresponding delayed write-in time by using a random delayed checkpoint file processing method, and selects the largest delayed write-in time to be recorded as the total write-in time(ii) a When the write module 20 reaches the delayed write time or the system environment parameter exceeds the set threshold K_{Threshold value}When not suitable for continuing waiting, the associated data information is written into the external storage system 40.

In the present invention, assume that the current computing node set of the cluster is denoted as AP ═ P₀,P₁,P₂,…,P_a-1}，P₀Representing the first computing node, P, in the current cluster₁Representing a second computing node, P, in the current cluster₂Representing a third computing node, P, in the current cluster_a-1And the final calculation node in the current cluster is represented, the corner mark a represents the identification number of the calculation node in the current cluster, and a is the total number of the calculation nodes in the current cluster.

In the present invention, the set of computing nodes AP ═ P where the current application is running₀,P₁,P₂,…,P_a-1The subset (abbreviated as computation operation node set) in (is) is marked as BP ═ BP_b,bp_b+1,…，bp_c}，bp_bRepresenting any one calculation operation node, the corner mark b represents the identification number of the calculation operation node, and bp_b+1Represents bp_bThe last calculation run node, bp_cRepresenting the last compute run node.

B is more than or equal to 0 and less than or equal to c and less than or equal to a-1. And the calculation operation node set BP is { BP ═ BP_b,bp_b+1,…，bp_cIs used to perform computational tasks.

In the present invention, the write process for writing the checkpoint data is separated from the main process to form an independent write module 20, where BP ═ BP_b,bp_b+1,…，bp_cEach node in the } is run in checkpoint software.

In the present invention, the number of times of writing a checkpoint file is denoted as d, the file size of the checkpoint file is denoted as FI L E, the time of writing the checkpoint file is denoted as WT (referred to as file write time for short), and the write speed of writing the checkpoint file is denoted as WV (referred to as file write rate for short)The preservation period of the point file is recorded as T_{Period of time}. And when d is taken as the current time, recording the time before d as the previous time d-1, and recording the time after d as the next time d + 1.

For any one calculation operation node bp_bThe specific steps for carrying out the random delayed checkpoint file processing are as follows:

step one, for a calculation operation node bp_bThe first random delay calculation of (2);

calculation operation node bp_bWhen the first random delay calculation is performed, the preset checkpoint holding period T can be referred to_{Period of time}Said T is_{Period of time}1/3 or 1/2 in length as the time segment [ t ] at the first calculation_Opener,t_{Stop block}]And thus the corresponding end point (right end point)

To ensure that the write save operation is completed before the next checkpoint operation. Thus, the bp_bThe write time at the first calculation of (2) is recorded as

The bp is_bThe first calculation of (a) is noted as the checkpoint file size

The bp is_bThe first calculation of (1) the checkpoint file write rate is recorded as

Since there is no write rate in the initial checkpoint write process, the write rate is not the same as the initial checkpoint write process

The assignment is zero.

Step two, for the calculation operation node bp_bThe second random delay calculation of (2);

the bp is_bThe first write checksum is acquired during the second calculationWrite time of int files

Then on the one hand use

And

calculating the writing speed of the second time of writing into the checkpoint file, and recording as

And is

On the other hand use

And

calculating the writing time of writing the checkpoint file for the second time, and recording as

And is

The bp is_bThe second calculation of (2) is noted as the checkpoint file size

Step three, for the calculation operation node bp_bThe third random delay calculation of (4);

the bp is_bThe writing time of writing the checkpoint file for the second time needs to be acquired during the third calculation

Then on the one hand use

And

calculating the writing rate of the third time of writing into the checkpoint file and recording as

And is

On the other hand use

And

calculating the writing time of the third time of writing into the checkpoint file, and recording as

And is

The bp is_bThe checkpoint file size at the third calculation of (a) is noted

Step four, the processing subsequent to the third calculation is the same as that of step three;

and step five, after the user program exits or the checkpoint software receives the command and does not perform checkpoint operation any more, writing into the checkpoint file is finished.

To illustrate in general terms, there is a need to obtain the write time of the previous write to the checkpoint file during the checkpoint writing process

Then on the one hand use

And

calculating the write rate of the current written checkpoint file and recording as

And is

On the other hand use

And

calculating the write time of the current time written into the checkpoint file

And is

Is the write time to write the checkpoint file at the current time d.

Is the write time to write the checkpoint file at the previous time d-1.

Is the write rate at which the checkpoint file is written at the current time d.

Is the write rate for writing a checkpoint file at the previous time d-1.

Is the size of the checkpoint file at the current time d.

Is the size of the checkpoint file at the previous time d-1.

In the present invention, in order to set aside 2 times

The time interval ensures that the current checkpoint file is written completely before the next checkpoint file is executed, so that the right end point except the first checkpoint file has

In the present invention, a writing time zone t is determined_Opener,t_{Stop block}]Then, calculating a running node set BP ═ BP_b,bp_b+1,…，bp_cThere will be an independent random value for each node in the }. I.e., the bp_bRandom value of node, note

And is

The bp is_b+1Random value of node, note

And is

The bp is_cRandom value of node, note

And is

Under the determined random value, operating node set BP ═ BP for calculation_b,bp_b+1,…，bp_cEvery node in the tree defines a time zone t_Opener,t_{Stop block}]Relative time position in (c). I.e., the bp_bRelative time position of the node, note

And is

The bp is_b+1Relative time position of the node, note

And is

The bp is_cRelative time position of the node, note

And is

For the calculation of the operating node set BP, { BP ═ at the determined relative time positions_b,bp_b+1,…，bp_cEach node in the tree determines a predetermined delayed write time. I.e., the bp_bPredetermined delay write time of the node, noted

And is

The bp is_b+1Predetermined delay write time of the node, noted

And is

The bp is_cPredetermined delay write time of the node, noted

And is

In the present invention, the writing time is delayed by a predetermined time

And

uniformly distributed to the whole writing time section [ t ] according to time sequence_Opener,t_{Stop block}]And (5) obtaining a time axis.

In the present invention, after obtaining the time axis, the write module 20 performs periodic performance detection, so as to obtain a calculation operation node set BP ═ BP_b,bp_b+1,…，bp_cThe use condition of each node in the data is called feedback information IM; the feedback information IM is finally used to adjust the final write time.

The invention uses the preset initial reference value and the corresponding random delay calculation algorithm, firstly in a corresponding time period t_Opener,t_{Stop block}]Internal calculation operation node set BP ═ BP_b,bp_b+1,…，bp_cEach compute run node in the set of compute run nodes determines a predetermined delayed write time by: bp of bp_bHas a predetermined delay writing time of

And is

bp_b+1Has a predetermined delay writing time of

And is

bp_cHas a predetermined delay writing time of

And is

Then, before the write-in module really executes the I/O operation, the use information of the relevant hardware such as the CPU, the memory and the like of the general and the relevant specific programs on the local node is periodically detected and used as the feedback information as a reference value, and the corresponding value K is calculated_i+xThereby evaluating if the value exceeds a predetermined standard K_pThen, the delay operation is not performed any more, and immediately (note the time as

) Executing the write-in operation of the data to avoid occupying the shared resources for a long time from influencing the normal operation of the related programs; if the value K is_i+kAlways at a preset threshold value K_pAnd if the related program is not influenced by the delay operation, continuing to calculate the delay writing time

A deferred write operation is performed.

The invention relates to a large-scale parallel system check point data writing method capable of sensing the real-time state of a local node and randomly delaying and relieving I/O bottleneck, which comprises the following processing steps:

the method comprises the following steps: determining a preset delay writing time;

step 11, the following describes any one of the calculation operation nodes bp with reference to FIG. 3_bRespective corresponding delayed write times in software for checking points

The determination process of (1);

first, the write time zone should be determinedSegment is denoted as [ t ]_Opener,t_{Stop block}]I.e. the earliest time t at which a random delayed write operation is performed_OpenerAnd the latest time t_{Stop block}The time zones are formed. The time when the write module receives the request is generally taken as the zero point of the time section, i.e. the starting point (left end point) t_OpenerIt means that a partial write request can be immediately executed without performing a delay operation.

At the same time, the time section [ t_Opener,t_{Stop block}]Should be less than the time interval for two adjacent checkpoint operations

t_ck＜t_Opener＜t_{Stop block}＜t_ck+1And T is_{Period of time}＝t_ck+1-t_ckTo ensure that the next checkpoint operation is performed at time t_ck+1The current checkpoint save operation has been completed before. On the other hand, before the write module actually performs the write operation, the checkpoint data will be temporarily saved in bp_bIn memory, and thus will occupy bp_bA portion of the system resources, and thus may affect the performance of the application running to some extent.

Step two: monitoring and feeding back the use efficiency of the local node in real time;

(A) obtaining CPU and memory use information of local total and current program or other necessary related programs through various methods provided by the operating system, such as system call, terminal command, etc., and recording as feedback information, wherein the feedback information comprises current CPU use rate U_cpuTotal amount of memory C_MTotal amount of memory used U_MTotal remaining amount of memory U_minVirtual memory swap area size U_vmuVirtual memory swap area buffer size U_vmcAnd the like. This process is periodic until the write operation is actually performed.

(B) The operation of the feedback mechanism is described in connection with fig. 3. Firstly, the obtained utilization rate information is processed according to the following rule to obtain evaluation parameters

And is

The specific determination rules may be different, for example, whether the remaining memory is less than the preset value, whether the CPU utilization is higher than the preset value, and the formula using various factors as parameters are used, and they are marked as a different coefficients_iNormalization is carried out, whether the checkpoint data temporarily stored in the memory currently affects the running performance of the current program or not is judged by taking the normalization as a standard, and meanwhile, the influence possibly caused by some interference data can be eliminated, for example, when the feedback information exceeds the preset value K for a plurality of times_pOr significantly exceeds a predetermined value K_p(e.g. using

) Then, the current node state is determined as a resource shortage state, and a write operation needs to be executed immediately to release the currently occupied shared resource. If the current program operation is indeed affected according to the judgment result, no waiting is carried out until the predetermined delay writing time

And immediately performs the write operation at the current time, noted

Step three: delaying writing;

(A) each node bp_bIndependent periodical execution information IM collection operation and judgment operation acquisition parameter

If the current program operation is not influenced, continuing waiting until the preset delay writing time is reached

After which the write operation is performed.

Example 1

As shown in fig. 3, assuming that the total bandwidth of the I/O subsystem is 100GB/s, 16000 nodes perform the operation of writing into the checkpoint file in total, and if the checkpoint data volume written by each node is 10MB, 160GB/s is required to be written within 1s under the ideal environment without considering I/O conflicts; if the time is delayed for 5S, 32GB/S needs to be written in the 5S averagely; if the time delay is 10s, only 16G needs to be written in each second, and the occupation of the bandwidth is reduced to be lower than the total bandwidth of the system. The actual write time should be longer than the theoretical time because a large number of simultaneous writes cause collisions that reduce I/O efficiency, the extent of such collisions decreasing with increasing latency.

The case where the delay time is 0 in fig. 4 represents the case when the random delay writing method is not used. When the delay time just begins to increase, the total writing time is in a descending trend due to the fact that the I/O efficiency is improved due to the fact that competition is reduced; when the delayed write time is very close to or even exceeds the original write time, the I/O efficiency continues to improve, but the total write time does not continue to benefit, but is substantially close to the set delay time.

Claims

1. A large-scale parallel system check point data writing method for relieving I/O bottleneck based on random delay is characterized by specifically executing the following steps:

Step B, calculating operation node set BP ═ BP_b,bp_b+1,…，bp_cObtaining the end time point t of the time section by each node in the sequence by using a random delay check point file processing method_{Stop block}；

step H, obtaining evaluation parameters through feedback information

And will be

With a predetermined threshold value K_{Threshold value}Make a comparison if

Executing step J; if it is

Executing the step I;

step I, when it is satisfied

If yes, indicating that the local operation environment allows continuous delay, and turning to step F;

step J, writing the cached associated data information into an external storage system (40), and recording the current delayed checkpoint data writing time as feedback writing time

And the operation ends.

2. The massively parallel system checkpoint data writing method based on random delay mitigation of I/O bottlenecks according to claim 1, characterized in that: in the process of writing checkpoint data of the large-scale parallel system, an execution module (10), a writing module (20) and a recovery module (30) are adopted to relieve the impact on an I/O subsystem, and the checkpoint data is written into an external storage system (40) after the write-delay time is timed.

3. The massively parallel system checkpoint data writing method based on random delay relieving I/O bottleneck as claimed in claim 1, characterized in that the specific steps of the random delay checkpoint file processing are:

calculation operation node bp_bWhen the first random delay calculation is carried out, the preset check point saving period T can be referred to_{Period of time}Said T is_{Period of time}1/3 or 1/2 in length as the time segment [ t ] at the first calculation_Opener,t_{Stop block}]And thus the corresponding end point (right end point)

To ensure that the write save operation is completed before the next checkpoint operation; thus, the bp_bThe write time at the first calculation of (2) is recorded as

The bp is_bThe first calculation of (2) checkpoint file size

The bp is_bFirst time computing checkpoint ofThe file write rate is recorded as

Since there is no write rate for the initial checkpoint write process, the write rate is not the same for all checkpoints

The value is zero;

the bp is_bAcquiring the write time of the first-time write checkpoint file during the second calculation

Then on the one hand use

And

calculating the writing rate of the second time writing check point file, and recording as

And is

On the other hand use

And

calculating the writing time of the second time of writing the check point file, and recording as

And is

The bp is_bThe second calculation of (2) the checkpoint file size is recorded as

the bp is_bThe writing time of the second writing check point file is required to be acquired in the third calculation

Then on the one hand use

And

calculating the writing rate of the third time writing check point file, and recording as

And is

On the other hand use

And

calculating the writing time of the third time of writing the check point file, and recording as

And is

The bp is_bThird time counterCompute time checkpoint file size as

Step four, for the calculation operation node bp_bThe processing following the third random delay calculation is:

in the checkpoint writing process, the writing time of the previous checkpoint file needs to be acquired

Then on the one hand use

And

And is

On the other hand use

And

calculating the write time of the current time written into the checkpoint file

And is

Is the write time when the checkpoint file is written at the current time d;

is the write time to write the checkpoint file at the previous time d-1;

is the write rate of the checkpoint file written at the current time d;

is the write rate of the checkpoint file written at the previous time d-1;

is the size of the checkpoint file at the current time d;

is the size of the checkpoint file at the previous time d-1;

and ending the writing of the checkpoint file until the checkpoint file is finished after the user program exits or the checkpoint software receives a command to stop the checkpoint operation.