CN115587014A - Performance calculation method, system and medium for high-performance computer workflow scheduling - Google Patents

Performance calculation method, system and medium for high-performance computer workflow scheduling Download PDF

Info

Publication number
CN115587014A
CN115587014A CN202211166770.4A CN202211166770A CN115587014A CN 115587014 A CN115587014 A CN 115587014A CN 202211166770 A CN202211166770 A CN 202211166770A CN 115587014 A CN115587014 A CN 115587014A
Authority
CN
China
Prior art keywords
task
tasks
read
workflow
performance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211166770.4A
Other languages
Chinese (zh)
Inventor
董勇
戴屹钦
王睿伯
卢凯
张伟
张文喆
谢旻
周恩强
迟万庆
邬会军
李佳鑫
吴振伟
雷斐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202211166770.4A priority Critical patent/CN115587014A/en
Publication of CN115587014A publication Critical patent/CN115587014A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a performance calculation method, a system and a medium for high-performance computer workflow scheduling, wherein the performance calculation method comprises the steps of initializing a set X and a set Y which can start to run tasks, and then iterating: task v if X i If the ending time of (b) is equal to the variable k, updating X and Y; if Y is not empty, selecting a set Z of tasks, updating the starting time of the tasks to be k, adding the starting time to X, calculating the estimated ending time of the kth stage aiming at the tasks in X, and taking the shortest estimated ending time as the finishing time of the kth stage; and repeating the iteration until X and Y are both empty, and outputting the sum of the completion time of each stage as the total completion time of the workflow. The invention can realize the performance quantitative calculation of the high-performance computer workflow scheduling so as to quickly determine the influence of each variable on the workflow scheduling performance in the workflow scheduling operation process, thereby conveniently determining the minimum resource quantity required by the optimal workflow scheduling performance.

Description

Performance calculation method, system and medium for high-performance computer workflow scheduling
Technical Field
The invention relates to a workflow scheduling technology of a high-performance computer, in particular to a performance calculation method, a system and a medium for workflow scheduling of the high-performance computer.
Background
Workflows (scientific workflows) are a sequence of tasks defined to achieve various scientific research goals. Driven by the development of service-oriented architectures and their loosely-coupled nature, workflow has become a key technology in the current distributed and dynamic environments. Workflow has significant advantages in describing complex scientific problems, making it commonly used to solve large-scale scientific problems in the fields of bioinformatics, astronomy, and physics. In particular, a workflow is typically composed of multiple independent computing tasks with strict dependencies. Directed acyclic graphs are an efficient tool for representing workflows. As shown in FIG. 1, nodes in the graph represent independent tasks in the workflow, and directed edges represent dependencies between the tasks. The weight of a node represents the amount of computing resources (core number or node number) required by the task, and the weight of a directed edge represents the data dependency between two tasks. See, for example, FIG. 1, any of whichAffair v 1 Weight 2 of represents task v 1 Requiring the use of 2 computing resources (2 cores or 2 compute nodes) from task v 1 To task v 3 Represents the task v with directed edges 3 Need to be at task v 1 Thereafter, operation is started and a read task v is required 1 Generated 10GB of data.
The goal of workflow scheduling is to maintain good overall performance or throughput of the computing system while meeting user needs and resource provider management metrics. For a single workflow, minimizing the completion time of the workflow is a common scheduling objective. For a given workflow, the shorter its completion time, the higher the workflow scheduling performance. The amount of computational and I/O resources allocated to a workflow and the scheduling policy all affect the scheduling performance of the workflow. In recent years, with the increasing parallel performance of large-scale high-performance computers, the high-performance computers are becoming important operation platforms of workflows. The scenario of scheduling workflows on a high performance computer is complex. FIG. 2 illustrates a scenario for scheduling a workflow on a high performance computer. Firstly, a temporary independent resource partition is opened on a high-performance computer, all tasks in a workflow run in the partition, wherein the total number of resources in the partition should be larger than the resource requirement of any one task. Each workflow task is submitted to the high-performance computer to run at an appropriate point in time as a separate batch task. The shared file system is a storage medium for data transmission among tasks, and each workflow task reads data from the shared file system and writes data into the shared file system. During the operation of the workflow, there is I/O interference between different tasks that are running simultaneously, and the read-write rate of the resource partition to the file system may change over time. Burst buffers are a storage technique proposed to meet user requirements for better I/O performance. The use of burst buffers may increase the total bandwidth available to an application. Therefore, for a high-performance computer with a certain burst buffer capacity, according to the burst buffer capacity, some or all of the tasks may be allowed to use the burst buffer to improve the I/O efficiency of the application. When a task is allowed to use the burst buffer, all of its outputs are directed to the burst buffer. But for each task where (shared file system or burst buffer) the task reads data from depends on whether its predecessor uses a burst buffer. Therefore, the scene of dispatching the workflow on the high-performance computer is very complex, and an effective tool for researching the influence of each variable on the dispatching performance of the workflow in the dispatching operation process of the workflow, particularly the influence of each dispatching strategy on the dispatching performance of the workflow, is lacked at present. Furthermore, for a given workflow, it is difficult to determine the minimum amount of resources needed to achieve optimal workflow scheduling performance. Therefore, how to implement performance calculation of high-performance computer workflow scheduling becomes a key technical problem to be solved urgently.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a performance calculation method, a system and a medium for high-performance computer workflow scheduling, which can realize the performance quantitative calculation of the high-performance computer workflow scheduling so as to quickly determine the influence of each variable on the workflow scheduling performance in the workflow scheduling operation process, thereby conveniently determining the minimum resource quantity required by the optimal workflow scheduling performance.
In order to solve the technical problems, the invention adopts the technical scheme that:
a performance calculation method for high performance computer workflow scheduling, comprising:
s1, initializing a variable k to be 1, and finishing time T of the kth stage k Is 0; setting the starting time and the ending time of all tasks in the workflow to be 0; initializing a set Y which is used for recording that a set X of the current running task is empty and the running task can be started;
s2, aiming at all the tasks in the set X, if a certain task v i If the termination time is equal to the variable k, updating the set X and the set Y;
s3, if the set Y is not empty, selecting a set Z which starts to run the task from the set Y, and if the set Z is not empty, skipping to S4; otherwise, skipping S5;
s4, aiming at each task in the set Z, updating the starting time of the task to be a variable k, and adding the task into the set X:
s5, aiming at each task v in the set X i Calculation task v i Read-write bandwidth to the shared file system, and calculate task v according to the read-write bandwidth to the shared file system of the task i Estimated end time at kth stage
Figure BDA0003861992040000023
S6, determining the estimated ending time at the k stage according to all tasks in the set X
Figure BDA0003861992040000024
Shortest task v n Will task v n Corresponding estimated end time
Figure BDA0003861992040000021
Assigning a completion time T to the kth stage k And sets up task v n The end time of (d) is k +1;
s7, if the set X and the set Y are both empty, the completion time T of all the kth stages is determined k Taking the sum of the total completion time of the workflow as the total completion time of the workflow, ending and exiting; otherwise, the variable k is added with 1, and the step S2 is skipped to enter the next stage.
Optionally, the function expression of the set X and the set Y updated in step S2 is:
X=X-v i
Figure BDA0003861992040000022
in the above formula, v i For tasks whose termination time is equal to the variable k, v j For task v i Set of successor tasks succ (v) i ) Task of (1), v m For task v j Set of predecessor tasks of pred (v) j ) Task of (1), e m For task v m The end time of (c).
OptionallyIn step S5, the calculation task v i Estimated end time at kth stage
Figure BDA00038619920400000316
The functional expression of (a) is:
Figure BDA0003861992040000031
in the above-mentioned formula, the compound has the following structure,
Figure BDA00038619920400000317
for task v i Estimated end time, T, at stage k +1 k Is the completion time of the kth stage, C i For task v i The total amount of the calculation of (c),
Figure BDA00038619920400000318
for task v i Quantity calculated in m-th stage, p i For task v i The number of the used computing nodes, s is the computing speed of the computing nodes,
Figure BDA0003861992040000032
and
Figure BDA0003861992040000033
are respectively tasks v i Read and write Bandwidth to shared File System, FI, at stage k i And FO i Respectively representing tasks v i Total read and write data size, BI, to shared file system i And BO i Respectively represent tasks v i The total size of read and write data to the burst buffer, F the shared file system size, B the burst buffer size, I i And O i Are respectively task v i The total size of the read and write data of (c),
Figure BDA0003861992040000034
and W i (k) Are respectively tasks v i Read and write data size at the kth stage.
Optionally, step SAlso included in 5 are for each task v in the set X i Calculation task v i For the calculated amount in the k stage, and calculate task v i The functional expression for the calculated amount at the k-th stage is:
Figure BDA0003861992040000035
in the above formula, C i (k) For task v i Quantity of calculation in the k-th stage, T k+1 Is the completion time of the k +1 th stage, min represents the minimum value, R B And W B The read bandwidth and the write bandwidth of the burst buffer area are respectively.
Optionally, the task v i Read bandwidth at stage k, task v i The computational function expression of the write bandwidth at the kth stage is:
Figure BDA0003861992040000036
Figure BDA0003861992040000037
in the above formula, the first and second carbon atoms are,
Figure BDA0003861992040000038
and
Figure BDA0003861992040000039
are respectively tasks v i Read and write Bandwidth at stage k, task v j Set X of running tasks for the k-th stage (k) In the task (1), min represents taking the minimum value,
Figure BDA00038619920400000310
and
Figure BDA00038619920400000311
respectively representing tasks v i For the density of reading and writing to the shared file system,R s read bandwidth, W, for computing nodes to a parallel file system s Write Bandwidth, R, for a compute node to a parallel File System f Read bandwidth, W, for a parallel file system for a node partition f Write Bandwidth, p, to parallel File System for node partitioning i For task v i The number of compute nodes used.
Optionally, task v i The computational function expression for read-write density to a shared file system is:
Figure BDA00038619920400000312
Figure BDA00038619920400000313
in the above-mentioned formula, the compound has the following structure,
Figure BDA00038619920400000314
and
Figure BDA00038619920400000315
respectively representing tasks v i For read-write density to a shared file system, FI i And FO i Respectively representing tasks v i Total read and write data size, t, for a shared file system i For task v i The time required for operation.
Optionally, task v i The calculation function expression of the total size of the read data and the write data of the shared file system is as follows:
Figure BDA0003861992040000041
Figure BDA0003861992040000042
task v i The total read and write data size of the burst buffer is calculated by the following functional expression:
Figure BDA0003861992040000043
Figure BDA0003861992040000044
Wherein, FI i And FO i Respectively representing tasks v i Total read and write data size, BI, to shared file system i And BO i Respectively representing tasks v i Total read and write data size, v, to burst buffer j For task v i Succ (v) successor task set of i ) The task (2) of (1) is,
Figure BDA0003861992040000045
scheduling policy for employing preset storage resources
Figure BDA0003861992040000046
To schedule task v j Whether or not burst buffers are allowed to be used, e ji Representing a task v j And task v i Amount of data transferred between, O i For task v i Has a total write data size of v and has a task of v i The calculation function expression of the total calculation amount of (a) is:
C i =p i *s,
task v i The calculation function expression of the total size of the read data and the write data is as follows:
Figure BDA0003861992040000047
Figure BDA0003861992040000048
in the above formula, e ij For task v i And task v j Amount of data to be transmitted between, succ (v) i ) For task v i Is the set of successor tasks of, pred (v) i ) For task v i The set of predecessor tasks.
In addition, the invention also provides a performance calculation method for high-performance computer workflow scheduling, which comprises the following steps:
s101, aiming at four influencing factors of system computing resources, system storage resources, computing resource scheduling strategies and storage resource scheduling strategies in a high-performance computer on the scheduling performance of a workflow, generating a plurality of resource allocation schemes by fixing three influencing factors and changing the resource allocation of the remaining influencing factor, and aiming at a given workflow, calling a performance computing method for scheduling the workflow of the high-performance computer aiming at each resource allocation scheme to obtain the total completion time of the corresponding workflow, thereby obtaining a relation curve between each influencing factor and the total completion time of the workflow;
s102, based on a relation curve between each influence factor and the total completion time of the workflow, respectively selecting the optimal resource allocation from the four influence factors to obtain the optimal resource allocation scheme of the high-performance computer for the given workflow.
In addition, the invention also provides a performance computing system for high-performance computer workflow scheduling, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the performance computing method for the high-performance computer workflow scheduling.
Furthermore, the present invention also provides a computer-readable storage medium having stored therein a computer program for being programmed or configured by a microprocessor to perform the performance calculation method of the high-performance computer workflow schedule.
Compared with the prior art, the invention mainly has the following advantages: the invention can realize the performance quantitative calculation of the high-performance computer workflow scheduling so as to quickly determine the influence of each variable on the workflow scheduling performance in the workflow scheduling operation process, thereby conveniently determining the minimum resource quantity required by the optimal workflow scheduling performance.
Drawings
Fig. 1 is a schematic diagram of a conventional workflow.
Fig. 2 is a schematic diagram of a scenario of scheduling a workflow on a conventional high-performance computer.
FIG. 3 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.
FIG. 4 is a multi-stage schematic of a workflow process according to an embodiment of the invention.
Detailed Description
As shown in fig. 3, the performance calculating method for high-performance computer workflow scheduling in this embodiment includes:
s1, initializing a variable k to be 1, and finishing time T of the kth stage k Is 0; setting the starting time and the ending time of all tasks in the workflow to be 0; initializing a set Y for recording that a set X of the currently running tasks is empty and the tasks can be started to run;
s2, aiming at all the tasks in the set X, if a certain task v i If the ending time of the set X is equal to the variable k, updating the set X and the set Y;
s3, if the set Y is not empty, selecting a set Z which starts to run the task from the set Y, and if the set Z is not empty, skipping to S4; otherwise, skipping S5;
s4, aiming at each task in the set Z, updating the starting time of the task to be a variable k, and adding the task into the set X;
s5, aiming at each task v in the set X i Calculation task v i Read-write bandwidth to the shared file system, and calculate task v according to the read-write bandwidth to the shared file system of the task i Estimated end time at kth stage
Figure BDA0003861992040000053
S6, determining the estimated ending time at the k stage according to all tasks in the set X
Figure BDA0003861992040000054
Shortest task v n Will task v n Corresponding estimated end time
Figure BDA0003861992040000051
Assigning a completion time T to the kth stage k And sets up task v n The end time of (2) is k +1;
s7, if the set X and the set Y are both empty, the completion time T of all the kth stages is determined k Taking the sum of the total completion time of the workflow as the total completion time of the workflow, ending and exiting; otherwise, the variable k is added with 1, and the step S2 is skipped to enter the next stage.
In step S1 of this embodiment, setting the start time and the end time of all tasks in the workflow to 0 may be represented as:
Figure BDA0003861992040000052
in the above formula, G represents a workflow, v i For tasks in workflow G, s i And e i Respectively represent v i The start time and the end time of (c). Specifically, the workflow is represented by a directed acyclic graph in the present embodiment. In a directed acyclic graph G = (V, E). Set of nodes V = { V = 1 ,v 2 ,...,v n Are independent tasks in the workflow. Each task v i There are two attributes: one is task v i Number of compute nodes p used i Second is task v i Time t required for operation i . Thus, the total computation of the definable tasks is:
C i =p i *s,
where s is a constant representing the speed of operation of a single compute node in the system. Edge set
Figure BDA0003861992040000061
Representing dependencies between tasks in the workflow. If (v) 1 ,v 2 ) If it belongs to the edge set E, it indicates v 1 And v 2 Have a dependency relationship between them, i.e. v 2 Must be at v 1 And starting operation after the operation is finished. The weight of each edge represents the data dependency between two tasks with dependenciesIs, for example e ij =10 denotes task v i Writing 10GB data into file system or burst buffer, and reading task vj from file system or burst buffer by task v i 10GB of data are written. Task v i Is pred (v) as a set of predecessor tasks i ) The set of successor tasks is succ (v) i ). Adding an empty task v 0 If a task has no successor, the null task v 0 As a continuation of this task to constitute the initial task. Adding an empty task v n+1 If a task has no successor tasks, the task v is executed n+1 As a successor to the task (end task v) n+1 ). Initial task v 0 And end task v n+1 The number of compute nodes used is 0, as is the run time. In step S1 of this embodiment, initializing a set X for recording that a currently running task is empty and a set Y for starting to run a task is represented as:
X=φ,
Y={v i s.t.pred(v i )=v 0 },
in the above formula,. Phi.represents the null set, pred (v) i ) For task v i V. set of predecessors of 0 Is an initial task, i.e. a task that can be executed independently, independent of other task data.
In step S2 of this embodiment, the function expressions of the set X and the set Y are updated as follows:
X=X-v i
Figure BDA0003861992040000062
in the above formula, v i For tasks whose termination time is equal to the variable k, v j For task v i Set of successor tasks succ (v) i ) Task of (1), v m For task v j Set of predecessor tasks of pred (v) j ) Task of (1), e m For task v m The end time of (c).
If the set Y is not empty (Y ≠ Φ) in step S3 of this embodiment, selecting the set Z that starts to run the task from the set Y is implemented by invoking a preset computing resource scheduling policy η, which is generally expressed as Z = η (G, P, Y) according to the workflow G, the number of computing nodes P and the set Y, and the implementation of the computing resource scheduling policy η is not the content of interest of the method of this embodiment, such as random or round robin, and the detailed implementation details thereof are not described herein.
In step S4 of this embodiment, each task in the set Z is targeted
Figure BDA0003861992040000063
Updating the start time of the task to a variable k, and adding the task to the set X can be expressed as: s is i =k,X=X+v i
The present embodiment calculates task v in step S5 i Estimated end time F at the k-th stage i (k) The functional expression of (a) is:
Figure BDA0003861992040000064
in the above formula, the first and second carbon atoms are,
Figure BDA0003861992040000065
for task v i Estimated end time, T, at stage k +1 k Is the completion time of the kth stage, C i For task v i The total amount of the calculation of (c),
Figure BDA0003861992040000078
for task v i Calculated quantity in m-th stage, p i For task v i The number of the used computing nodes, s is the computing speed of the computing nodes,
Figure BDA0003861992040000071
and
Figure BDA0003861992040000072
are respectively tasks v i Read and write bandwidth to a shared file system in the kth stage, FI i And FO i Respectively representing tasks v i Total read and write data size, BI, to shared file system i And BO i Respectively represent tasks v i The total size of read and write data to the burst buffer, F the shared file system size, B the burst buffer size, I i And O i Are respectively tasks v i The total size of the read and write data of,
Figure BDA0003861992040000073
and W i (k) Are respectively task v i Read and write data size at the kth stage. The parameters involved in the calculation of the function may be calculated in this step, or may be calculated in an appropriate step as described above, as necessary.
In this embodiment, step S5 further includes, for each task v in the set X i Calculation task v i For the calculated amount in the k stage, and calculate task v i The functional expression for the calculated amount at the k-th stage is:
Figure BDA0003861992040000074
in the above formula, the first and second carbon atoms are,
Figure BDA0003861992040000079
for task v i Amount of calculation in the k-th stage, T k+1 Is the completion time of the k +1 th stage, min represents the minimum value, R B And W B Respectively the read and write bandwidths of the burst buffer. The burst buffer typically has a communication link and storage device that is independent of the shared file system, so the read and write bandwidth of the burst buffer is not affected by the shared file system. The probability of I/O interference in a burst buffer is lower than in a shared file system for two reasons. One is that the read-write bandwidth of the burst buffer is usually much larger than that of the shared file system, and the read-write requirement of the task can be completed in a short time by using the burst buffer, so that I/O contention is avoided. The second is due to the capacity of the burst bufferVolume constraints, usually only a portion of discrete tasks may use the burst buffer. In conjunction with the above analysis, for model simplicity, task v will be described i Read bandwidth for burst buffer at any stage is fixed to R B Write bandwidth to burst buffer is fixed as W B . When modeling the workflow operation process, the start or the end of each task is defined as an event, and the task v i Has an initial running time of s i The termination running time is e i . The time period between two consecutive events is defined as a phase. Obviously, the operation cycle of the workflow is composed of a plurality of phases, and the completion time of the workflow is the time when the last event occurs. As shown in FIG. 4, event p is task v i And v j Start running for a time T p Event p +1 is task v i Terminating running simultaneous tasks v j Start running for a time T p+1 The time period between the two is the phase p. In a certain phase, a running task is stable, so the read-write bandwidth of the file system is constant, which is the key property of the phase and is the original intention of the concept of defining the phase. The task set consisting of tasks running in phase k is X (k) . For task v i The amount of computation it performs in stage k is
Figure BDA0003861992040000077
Analyzing an already started stage k and analyzing any task v i ∈X (k) The following five constraints need to be met:
Figure BDA0003861992040000075
Figure BDA0003861992040000076
Figure BDA0003861992040000081
Figure BDA0003861992040000082
Figure BDA0003861992040000083
of the five constraints mentioned above, the first constraint describes that the task must complete all the computing tasks in phase k, and the second and third constraints describe that the task must complete the read and write tasks to the shared file system in phase k. Note that task v is based on the previous assumptions i Read-write demand on shared file system in phase k occupies task v i The proportion of the total read-write demand on the shared file system is equal to task v i The amount of computation in phase k accounts for task v i Ratio of the total calculated amount. Similarly, the fourth and fifth limits describe that this task must complete the read and write tasks to the burst buffer during phase k. According to the five limits, the task v can be obtained i The functional expression for the calculated amount at the k-th stage is shown as the above expression. According to task v i For the functional expression of the calculated quantities in the k-th phase, for each task v running in phase k i Assuming that no event has occurred all the time after phase k begins, task v may be determined i Is estimated to be the end time
Figure BDA00038619920400000811
Taking the minimum task end time as the occurrence time of the next event, namely:
Figure BDA0003861992040000084
the read-write operation of the task is assumed to be uniformly distributed in the task running process, and under the condition that no I/O interference exists, the read-write data volume of the task to the file system is in direct proportion to the task execution progress. To describe I/O interference in a shared file system generated by multiple parallel tasks running simultaneously within the same resource partition, task v is thus defined i Read bandwidth at stage k, task v i Write bandwidth at phase k. In this embodiment, task v i Read bandwidth, task v at stage k i The computational function expression of the write bandwidth at the kth stage is:
Figure BDA0003861992040000085
Figure BDA0003861992040000086
in the above formula, the first and second carbon atoms are,
Figure BDA0003861992040000087
and
Figure BDA0003861992040000088
are respectively tasks v i Read and write Bandwidth at stage k, task v j Set X of running tasks for the k-th stage (k) In (1), min represents taking the minimum value,
Figure BDA0003861992040000089
and
Figure BDA00038619920400000810
respectively representing tasks v i For read and write density to a shared file system, R s Read bandwidth, W, of a parallel file system for a compute node s Write Bandwidth, R, for computing nodes on a parallel File System f Read bandwidth, W, for a parallel file system for a node partition f Write Bandwidth, p, for node partitions to parallel File System i For task v i The number of compute nodes used.
The read bandwidth of a single computing node to the parallel file system is R s Write bandwidth of W s . Node partition pair parallelismThe reading bandwidth of the file system is R f The writing bandwidth of the node partition to the parallel file system is W f . The read bandwidth of a single computing node to the burst buffer is R b Write bandwidth of W b . In a practical scenario, the shared file system usually has a large enough capacity, and in this embodiment, the task v i The computational function expression for read-write density to a shared file system is:
Figure BDA0003861992040000091
Figure BDA0003861992040000092
in the above formula, the first and second carbon atoms are,
Figure BDA0003861992040000093
and
Figure BDA0003861992040000094
respectively representing tasks v i For read-write density to a shared file system, FI i And FO i Respectively representing tasks v i Total read and write data size, t, for a shared file system i For task v i The time required for operation. According to the functional expression, the task v i The occupation amount of the read-write bandwidth in the stage k is in positive correlation with the proportion of the read-write density of the task in the total read-write density, and the maximum read-write bandwidth corresponding to the used computing resource cannot be exceeded.
This embodiment, task v i The calculation function expression of the total size of the read data and the write data of the shared file system is as follows:
Figure BDA0003861992040000095
Figure BDA0003861992040000096
task v i The calculation function expression of the total size of the read data and the write data of the burst buffer is as follows:
Figure BDA0003861992040000097
Figure BDA0003861992040000098
wherein, FI i And FO i Respectively representing tasks v i Total read and write data size, BI, to shared file system i And BO i Respectively represent tasks v i Total read and write data size, v, to burst buffer j For task v i Succ (v) successor task set of i ) The task (2) of (1),
Figure BDA0003861992040000099
for adopting preset storage resource scheduling strategy
Figure BDA00038619920400000910
To schedule task v j Whether or not the burst buffer is allowed to be used (a value of 1 indicates permission, a value of 0 indicates non-permission, and only the shared file system can be used if not), e ji Representing a task v j And task v i Amount of data transferred between, O i For task v i Has a total write data size of v and has a task of v i The total computation of (a) is expressed as:
C i =p i *s,
task v i The calculation function expression of the total size of the read data and the write data is as follows:
Figure BDA00038619920400000911
Figure BDA00038619920400000912
in the above formula, e ij For task v i And task v j Amount of inter-transmission data (task v) i And task v j Edges between corresponding nodes in the directed acyclic graph G), succ (v) i ) For task v i Is the set of successor tasks of, pred (v) i ) For task v i The set of predecessor tasks. And has the following components: i is i =FI i +BI i ,O i =FO i +BO i
In this embodiment, step S6 determines the estimated ending time at the kth stage according to all tasks in the set X
Figure BDA00038619920400000915
Shortest task v n Can be expressed as:
Figure BDA00038619920400000913
will task v n Corresponding estimated end time
Figure BDA00038619920400000914
Assigning a completion time T to the kth stage k And sets up task v n Is k +1, can be expressed as:
Figure BDA0003861992040000101
e n =k+1。
finally, in step S7, the set X and the set Y are determined, and if the set X and the set Y are both empty, the completion time T of all the kth stages is determined k The sum of (b) is used as the total completion time of the workflow, and the operation is finished and quitted; otherwise, the variable k is added by 1 (k = k + 1), and the step S2 is skipped to enter the next stage.
In addition, on the basis of the foregoing method, this embodiment further provides a performance calculating method for high-performance computer workflow scheduling, including:
s101, aiming at four influencing factors of system computing resources, system storage resources, computing resource scheduling strategies and storage resource scheduling strategies in a high-performance computer on the scheduling performance of a workflow, generating a plurality of resource allocation schemes by fixing three influencing factors and changing the resource allocation of the remaining influencing factor, and aiming at a given workflow, calling a performance computing method for scheduling the workflow of the high-performance computer aiming at each resource allocation scheme to obtain the total completion time of the corresponding workflow, thereby obtaining a relation curve between each influencing factor and the total completion time of the workflow;
s102, based on a relation curve between each influence factor and the total completion time of the workflow, respectively selecting the optimal resource allocation from the four influence factors to obtain the optimal resource allocation scheme of the high-performance computer for the given workflow.
Through step S101, the performance calculation model mentioned above in this embodiment may be utilized to study the influence of the system computing resources, the system storage resources, the computing resource scheduling policy, and the storage resource scheduling policy on the scheduling performance of the workflow. For example, the fixed system storage resources, the computing resource scheduling policy and the storage resource scheduling policy are unchanged, the system computing resources are changed, and the scheduling performance of the workflow under different system computing resources can be obtained by calling the performance model, so that the influence of the number of the system computing resources on the scheduling performance of the workflow is obtained. Using the performance computation model described above, the minimum resource configuration combination that can achieve the best scheduling performance for a given workflow, i.e., the minimum resource partition size and the minimum amount of storage resources allocated to the workflow, can be computed. Specifically, the value ranges of two variables, namely the size of the resource partition and the number of the storage resources, can be set, the combination of the two variables is used as a search space, the minimum completion time of a given workflow is searched in the search space by using a performance model, and the size of the resource partition and the number of the storage resources corresponding to the minimum completion time are the optimal resource configuration.
In addition, the embodiment also provides a performance computing system for high-performance computer workflow scheduling, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the performance computing method for high-performance computer workflow scheduling. Furthermore, the present embodiment also provides a computer-readable storage medium, in which a computer program is stored, the computer program being programmed or configured by a microprocessor to execute the performance calculation method of the aforementioned high-performance computer workflow scheduling.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to those skilled in the art without departing from the principles of the present invention should also be considered as within the scope of the present invention.

Claims (10)

1. A performance calculation method for high-performance computer workflow scheduling is characterized by comprising the following steps:
s1, initializing a variable k to be 1, and finishing time T of the kth stage k Is 0; setting the starting time and the ending time of all tasks in the workflow to be 0; initializing a set Y for recording that a set X of the currently running tasks is empty and the tasks can be started to run;
s2, aiming at all the tasks in the set X, if a certain task v i If the termination time is equal to the variable k, updating the set X and the set Y;
s3, if the set Y is not empty, selecting a set Z which starts to run the task from the set Y, and if the set Z is not empty, skipping to S4; otherwise, skipping S5;
s4, aiming at each task in the set Z, updating the starting time of the task to be a variable k, and adding the task into the set X;
s5, aiming at each task v in the set X i Calculation task v i Read-write bandwidth to the shared file system, and calculate task v according to the read-write bandwidth to the shared file system of the task i Estimated end time at kth stage
Figure FDA0003861992030000011
S6, determining the estimated ending time at the k stage according to all tasks in the set X
Figure FDA0003861992030000012
Shortest task v n Will task v n Corresponding estimated end time
Figure FDA0003861992030000013
Assigning a completion time T to the kth stage k And sets up task v n The end time of (2) is k +1;
s7, if the set X and the set Y are both empty, finishing time T of all the kth stages k Taking the sum of the total completion time of the workflow as the total completion time of the workflow, ending and exiting; otherwise, the variable k is added with 1, and the step S2 is skipped to enter the next stage.
2. The method of claim 1, wherein the step S2 of updating the functional expressions of the set X and the set Y is as follows:
X=X-v i
Figure FDA0003861992030000014
in the above formula, v i For tasks whose termination time is equal to the variable k, v j For task v i Succ (v) successor task set of i ) Task of (1), v m For task v j Set of predecessor tasks of pred (v) j ) Task of (1), e m For task v m The end time of (c).
3. The method of claim 1, wherein the computing task v in step S5 is a computing task i Estimated end time at kth stage
Figure FDA0003861992030000015
The functional expression of (a) is:
Figure FDA0003861992030000016
in the above formula, the first and second carbon atoms are,
Figure FDA0003861992030000017
for task v i Estimated end time, T, at stage k +1 k Is the completion time of the kth stage, C i For task v i The total amount of the calculation of (c),
Figure FDA0003861992030000018
for task v i Quantity calculated in m-th stage, p i For task v i The number of the used computing nodes, s is the computing speed of the computing nodes,
Figure FDA0003861992030000019
and
Figure FDA00038619920300000110
are respectively tasks v i Read and write Bandwidth to shared File System, FI, at stage k i And FO i Respectively representing tasks v i Total read and write data size, BI, to shared file system i And BO i Respectively representing tasks v i The total size of read and write data to the burst buffer, F the shared file system size, B the burst buffer size, I i And O i Are respectively task v i The total size of the read and write data of (c),
Figure FDA00038619920300000111
and
Figure FDA00038619920300000112
are respectively tasks v i Read and write data size at the kth stage.
4. The method of claim 3, wherein step S5 further comprises for each task v in the set X i Calculation task v i For the calculated amount in the k stage, and calculate task v i The functional expression for the calculated amount at the k-th stage is:
Figure FDA0003861992030000021
in the above formula, the first and second carbon atoms are,
Figure FDA0003861992030000022
for task v i Amount of calculation in the k-th stage, T k+1 Is the completion time of the k +1 stage, min represents taking the minimum value, R B And W B Respectively the read and write bandwidths of the burst buffer.
5. The method of claim 4, wherein said task v is a task of a high-performance computer workflow schedule i Read bandwidth, task v at stage k i The computational function expression of the write bandwidth at the kth stage is:
Figure FDA0003861992030000023
Figure FDA0003861992030000024
in the above formula, the first and second carbon atoms are,
Figure FDA0003861992030000025
and
Figure FDA0003861992030000026
are respectively tasks v i Read and write Bandwidth at stage k, task v j Set X of running tasks for the k-th stage (k) In the task (1), min represents taking the minimum value,
Figure FDA0003861992030000027
and
Figure FDA0003861992030000028
respectively represent tasks v i For read and write density to a shared file system, R s Read bandwidth, W, for computing nodes to a parallel file system s Write Bandwidth, R, for computing nodes on a parallel File System f Read bandwidth, W, for a parallel file system for a node partition f Write Bandwidth, p, for node partitions to parallel File System i For task v i The number of compute nodes used.
6. The method of claim 5, wherein task v is a task of a high-performance computer workflow schedule i The computational function expression for read-write density to a shared file system is:
Figure FDA0003861992030000029
Figure FDA00038619920300000210
in the above formula, the first and second carbon atoms are,
Figure FDA00038619920300000211
and
Figure FDA00038619920300000212
respectively representing tasks v i For read-write density to a shared file system, FI i And FO i Respectively representing tasks v i Read and write data to shared file systemTotal size, t i For task v i The time required for operation.
7. The method of claim 3, wherein the task v is a task i The calculation function expression of the total size of the read data and the write data of the shared file system is as follows:
Figure FDA00038619920300000213
Figure FDA00038619920300000214
task v i The calculation function expression of the total size of the read data and the write data of the burst buffer is as follows:
Figure FDA00038619920300000215
Figure FDA0003861992030000031
wherein, FI i And FO i Respectively representing tasks v i Total read and write data size, BI, to shared file system i And BO i Respectively representing tasks v i Total read and write data size, v, to burst buffer j For task v i Succ (v) successor task set of i ) The task (2) of (1),
Figure FDA0003861992030000032
scheduling policy for employing preset storage resources
Figure FDA0003861992030000033
To schedule task v j Whether or not it is allowedUsing a burst buffer, e ji Representing a task v j And task v i Amount of data transferred therebetween, O i For task v i Has a total write data size of v and has a task of v i The calculation function expression of the total calculation amount of (a) is:
C i =p i *s,
task v i The calculation function expression of the total size of the read data and the write data is as follows:
Figure FDA0003861992030000034
Figure FDA0003861992030000035
in the above formula, e ij For task v i And task v j Amount of data to be transmitted between, succ (v) i ) For task v i Is the successor task set of pred (v) i ) For task v i The set of predecessor tasks.
8. A performance calculation method for high-performance computer workflow scheduling, comprising:
s101, aiming at four influence factors of system computing resources, system storage resources, computing resource scheduling strategies and storage resource scheduling strategies in a high-performance computer on workflow scheduling performance, generating a plurality of resource configuration schemes by fixing three influence factors and changing resource configuration of the remaining influence factor, and aiming at a given workflow, calling the performance computing method for workflow scheduling of the high-performance computer according to any one of claims 1 to 7 aiming at each resource configuration scheme to obtain total completion time of the corresponding workflow, so as to obtain a relation curve between each influence factor and the total completion time of the workflow;
s102, based on a relation curve between each influence factor and the total completion time of the workflow, respectively selecting the optimal resource allocation from the four influence factors to obtain the optimal resource allocation scheme of the high-performance computer for the given workflow.
9. A performance computing system for high performance computer workflow scheduling comprising a microprocessor and a memory connected to each other, wherein the microprocessor is programmed or configured to perform the performance computing method for high performance computer workflow scheduling according to any one of claims 1 to 8.
10. A computer-readable storage medium, in which a computer program is stored, the computer program being adapted to be programmed or configured by a microprocessor to perform a method of performance computation of a high performance computer workflow schedule according to any of the claims 1-8.
CN202211166770.4A 2022-09-23 2022-09-23 Performance calculation method, system and medium for high-performance computer workflow scheduling Pending CN115587014A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211166770.4A CN115587014A (en) 2022-09-23 2022-09-23 Performance calculation method, system and medium for high-performance computer workflow scheduling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211166770.4A CN115587014A (en) 2022-09-23 2022-09-23 Performance calculation method, system and medium for high-performance computer workflow scheduling

Publications (1)

Publication Number Publication Date
CN115587014A true CN115587014A (en) 2023-01-10

Family

ID=84778775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211166770.4A Pending CN115587014A (en) 2022-09-23 2022-09-23 Performance calculation method, system and medium for high-performance computer workflow scheduling

Country Status (1)

Country Link
CN (1) CN115587014A (en)

Similar Documents

Publication Publication Date Title
CN109814986B (en) Task parallel processing method, storage medium, computer equipment, device and system
CN115248728B (en) Distributed training task scheduling method, system and device for intelligent computing
Yu et al. Workflow scheduling algorithms for grid computing
US20130339972A1 (en) Determining an allocation of resources to a program having concurrent jobs
US20140019987A1 (en) Scheduling map and reduce tasks for jobs execution according to performance goals
US20130290972A1 (en) Workload manager for mapreduce environments
JPH09171503A (en) Method and apparatus for parallel processing
US20130268941A1 (en) Determining an allocation of resources to assign to jobs of a program
US20100036641A1 (en) System and method of estimating multi-tasking performance
CN112416585A (en) GPU resource management and intelligent scheduling method for deep learning
Moreira et al. Graph partitioning with acyclicity constraints
CN115660078A (en) Distributed computing method, system, storage medium and electronic equipment
CN115237580B (en) Intelligent calculation-oriented flow parallel training self-adaptive adjustment system and method
Deng et al. A data and task co-scheduling algorithm for scientific cloud workflows
Zhou et al. Concurrent workflow budget-and deadline-constrained scheduling in heterogeneous distributed environments
CN113037800B (en) Job scheduling method and job scheduling device
Almi'Ani et al. Partitioning-based workflow scheduling in clouds
CN114429195A (en) Performance optimization method and device for hybrid expert model training
CN108108242B (en) Storage layer intelligent distribution control method based on big data
CN114217930A (en) Accelerator system resource optimization management method based on mixed task scheduling
Yu et al. A sum-of-ratios multi-dimensional-knapsack decomposition for DNN resource scheduling
CN116048759A (en) Data processing method, device, computer and storage medium for data stream
CN112463340A (en) Tensorflow-based multi-task flexible scheduling method and system
CN114860417B (en) Multi-core neural network processor and multi-task allocation scheduling method for same
CN115587014A (en) Performance calculation method, system and medium for high-performance computer workflow scheduling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination