CN111367644B - Task scheduling method and device for heterogeneous fusion system - Google Patents

Task scheduling method and device for heterogeneous fusion system Download PDF

Info

Publication number
CN111367644B
CN111367644B CN202010187660.0A CN202010187660A CN111367644B CN 111367644 B CN111367644 B CN 111367644B CN 202010187660 A CN202010187660 A CN 202010187660A CN 111367644 B CN111367644 B CN 111367644B
Authority
CN
China
Prior art keywords
task
node
processor
degree
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010187660.0A
Other languages
Chinese (zh)
Other versions
CN111367644A (en
Inventor
安虹
林晗
李明凡
韩文廷
林增
陈俊仕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202010187660.0A priority Critical patent/CN111367644B/en
Publication of CN111367644A publication Critical patent/CN111367644A/en
Application granted granted Critical
Publication of CN111367644B publication Critical patent/CN111367644B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/484Precedence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

The invention discloses a task scheduling method for a heterogeneous fusion system, which comprises the following steps: when a scheduling request is received, acquiring a directed acyclic graph corresponding to each task in the scheduling request, wherein each node in the directed acyclic graph corresponds to each task; aiming at each node in the directed acyclic graph, calculating the corresponding weighted out-degree through the successive node with the dependency relationship to obtain each weighted out-degree; sequencing the weighted out-degrees, and determining the priority order of each task based on the sequencing result; and according to the priority sequence, respectively selecting a target processor for each task in each processor to complete the scheduling request. In the scheduling method, in the process of determining the priority of each task, only the subsequent nodes with the dependency relationship in the directed acyclic graph need to be calculated, and all the nodes in the directed acyclic graph do not need to be traversed for calculation, so that the calculation amount is reduced.

Description

Task scheduling method and device for heterogeneous fusion system
Technical Field
The invention relates to the technical field of computers, in particular to a task scheduling method and device for a heterogeneous fusion system.
Background
The heterogeneous fusion system comprises different types of computing resources deployed locally or remotely, and task scheduling of the heterogeneous fusion system comprises two stages: task selection (or task priority calculation) and processor selection.
The existing priority determination process is determined according to the necessary attributes of each task and a priority algorithm, wherein the necessary attributes comprise: the amount of computation (processor resources consumed by the task), the amount of memory access (memory resources consumed by the task), the source node and the destination node (the task is started by the source node and finished by the destination node), and the amount of traffic (communication link resources consumed by the task). The priority algorithm includes: the shorter the short job priority algorithm, i.e., the shorter (less computationally intensive) the task priority is higher; the critical path algorithm (the path from the originating node to the end node so the longest cost accumulates among the paths) sets all tasks on the critical path to high priority. In the process of determining the priority of each task, all nodes of the directed acyclic graph corresponding to each task need to be traversed for calculation, and the calculation amount is large.
Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for estimating muscle strength based on micro-neural drive information, so as to solve the problem that in the existing process of estimating muscle strength by using an issuing sequence, the accuracy of muscle strength estimation is affected because the difference of action potential waveforms of different motion units on the muscle strength is not considered, and the specific scheme is as follows:
a task scheduling method for a heterogeneous fusion system comprises the following steps:
when a scheduling request is received, acquiring a directed acyclic graph corresponding to each task in the scheduling request, wherein each node in the directed acyclic graph corresponds to each task;
aiming at each node in the directed acyclic graph, calculating the corresponding weighted out-degree through the successive node with the dependency relationship to obtain each weighted out-degree;
sequencing the weighted out-degrees, and determining the priority order of each task based on the sequencing result;
and according to the priority sequence, respectively selecting a target processor for each task in each processor to complete the scheduling request.
Optionally, in the method, for each node in the directed acyclic graph, the corresponding weighted out-degree is calculated according to the successor node having a dependency relationship with the node, and the method includes:
determining the node in-degree of each node based on the directed acyclic graph;
aiming at each node, acquiring the node in-degree of a subsequent node with a dependency relationship, and calculating the corresponding weighted out-degree of the subsequent node with the dependency relationship according to a target weighted out-degree calculation formula;
the target weighted out-degree formula is
Figure BDA0002414759100000021
Or
Figure BDA0002414759100000022
Or
Figure BDA0002414759100000023
In the above-mentioned manner, the first and second,
wherein, ID (v) j ) Is node v j The node-in-degree of (c) is,
Figure BDA0002414759100000024
is node v j Alpha is the 2 nd order degree factor of the node, WOD (v) j ) Is a first order weighted output, WOD 2 (v j ) Is a second order weighted output, WOD c (v j ) Full order weighted out degree, v exit Is the egress node succ (v) j ) Is the successor node.
Optionally, the method for selecting a target processor for each task in each processor to complete the scheduling request includes:
judging whether an idle processor exists in each processor;
if yes, judging whether the number of the idle processors can finish the calculation of each task;
if so, respectively acquiring the earliest completion time of each task in each idle processor according to the priority sequence, taking the processor corresponding to the shortest time in each earliest completion time as a target processor, and scheduling each task to the corresponding target processor to complete the scheduling request.
The method described above, optionally, further includes:
if not, scheduling the corresponding tasks to the idle processor for calculation according to the priority sequence, and when the idle processor finishes scheduling, allocating the rest unallocated tasks to the rest processors for calculation according to the rest unallocated tasks.
The above method, optionally, further includes:
if no idle processor exists, determining the earliest completion time of each task in each processor according to a completion time formula according to the priority sequence;
the completion time is formulated as
Figure BDA0002414759100000031
And
EFT conflict (v i ,p j )=EST conflict (v i ,p j )+w i,j
wherein, T ava (p j ) Is a processor p j The available time of (AFT) refers to the actual start time of the task, v m And v i Is a node, c m,i Finger task v m And task v i Time of communication between w i,j Is task v i At processor p j Overhead of calculation, EST conflict (v i ,p j ) Is task v i At processor p j At the earliest start time, EFT conflict Is task v i At processor p j The earliest completion time of (c);
and aiming at each task, taking the processor corresponding to the shortest time in the earliest completion time as a target processor, and scheduling each task to the corresponding target processor to complete the scheduling request.
A task scheduling device facing a heterogeneous convergence system comprises:
the system comprises an acquisition module, a scheduling module and a processing module, wherein the acquisition module is used for acquiring a directed acyclic graph corresponding to each task in a scheduling request when the scheduling request is received, and each node in the directed acyclic graph corresponds to each task;
the calculation module is used for calculating the corresponding weighted out-degree of each node in the directed acyclic graph through the subsequent node with the dependency relationship to obtain each weighted out-degree;
the priority determining module is used for sequencing the weighted out-degrees and determining the priority sequence of each task based on the sequencing result;
and the selecting and calculating module is used for selecting a target processor for each task in each processor to complete the scheduling request according to the priority sequence.
The above apparatus, optionally, the calculating module includes:
a node degree determining unit, configured to determine a node degree of each node based on the directed acyclic graph;
the acquisition and calculation unit is used for acquiring the node in-degree of each node of the successor nodes with the dependency relationship, and calculating the weighted out-degree corresponding to the successor nodes with the dependency relationship according to a target weighted out-degree calculation formula;
the target weighted out-degree formula is
Figure BDA0002414759100000041
Or
Figure BDA0002414759100000042
Or
Figure BDA0002414759100000043
In the above-mentioned manner, the first and second,
wherein, ID (v) j ) Is node v j The node-in-degree of (c) is,
Figure BDA0002414759100000044
is node v j Alpha is the 2 nd order degree factor of the node, WOD (v) j ) Is a first order weighted output, WOD 2 (v j ) Is a second order weighted output, WOD c (v j ) Full order weighted out, v exit Is the egress node succ (v) j ) Is the successor node.
The above apparatus, optionally, the selecting and calculating module includes:
a first judging unit, configured to judge whether an idle processor exists in each processor;
a second judging unit, configured to judge whether the number of idle processors can complete the calculation of each task if the idle processors exist;
and the first selection unit is used for respectively acquiring the earliest completion time of each task in each idle processor according to the priority sequence if the task can be completed, taking the processor corresponding to the shortest time in each earliest completion time as a target processor, and scheduling each task to the corresponding target processor to complete the scheduling request.
The above apparatus, optionally, further comprises:
and the calculating and distributing unit is used for scheduling the corresponding tasks to the idle processor for calculation according to the priority order if the tasks cannot be calculated, and distributing the rest unallocated tasks to the rest processors for calculation according to the scheduling of the idle processor.
The above apparatus, optionally, further comprises:
the determining unit is used for determining the earliest completion time of each task in each processor according to the completion time formula according to the priority order if no idle processor exists;
the completion time formula is
Figure BDA0002414759100000051
And
EFT conflict (v i ,p j )=EST conflict (v i ,p j )+w i,j
wherein, T ava (p j ) Is a processor p j The available time of (AFT) refers to the actual start time of the task, v m And v i Is a node, c m,i Finger task v m And task v i Time of communication between w i,j Is task v i At processor p j Overhead of calculation, EST conflict (v i ,p j ) Is task v i At processor p j The earliest start ofTime, EFT conflict Is task v i At processor p j The earliest completion time of (c);
and the second selection unit is used for taking the processor corresponding to the shortest time in the earliest completion time as a target processor for each task, and scheduling each task to the corresponding target processor to complete the scheduling request.
Compared with the prior art, the invention has the following advantages:
the invention discloses a task scheduling method for a heterogeneous fusion system, which comprises the following steps: when a scheduling request is received, acquiring a directed acyclic graph corresponding to each task in the scheduling request, wherein each node in the directed acyclic graph corresponds to each task; aiming at each node in the directed acyclic graph, calculating the corresponding weighted out-degree through the successive node with the dependency relationship to obtain each weighted out-degree; sequencing the weighted out-degrees, and determining the priority order of each task based on the sequencing result; and according to the priority sequence, respectively selecting a target processor for each task in each processor to complete the scheduling request. In the scheduling method, only the subsequent nodes with the dependency relationship in the directed acyclic graph need to be calculated in the process of determining the priority of each task, and all the nodes in the directed acyclic graph do not need to be traversed for calculation, so that the calculation amount is reduced.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a task scheduling method for a heterogeneous convergence system according to an embodiment of the present disclosure;
FIG. 2 is a schematic view of a directed acyclic graph according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a task execution flow disclosed in an embodiment of the present application;
fig. 4 is a block diagram of a task scheduling device for a heterogeneous convergence system according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The invention discloses a task scheduling method and device for a heterogeneous fusion system, which are applied to the task scheduling process of the heterogeneous fusion system, wherein the heterogeneous fusion system can be applied to High Performance Computing (HPC), cloud computing and deep learning. Typically, heterogeneous converged systems contain a range of different types of computing resources that can be deployed locally or remotely. The scheduling of parallel programs on a heterogeneous fusion system comprises two stages: task selection (or task priority calculation) and processor selection. The task selection calculates the priority of the task according to the task attribute, and selects the task to be scheduled from all the candidate tasks; processor selection selects the best processor for the scheduled task.
The scheduling algorithm research on the heterogeneous fusion system is mainly divided into static scheduling and dynamic scheduling. In dynamic scheduling, the execution overhead, communication overhead and the relationship between tasks are unknown in advance, and the decision is completely completed in running. In static scheduling, this information is known in advance. In general terms, static scheduling is compile-time scheduling, while dynamic scheduling is runtime scheduling. Static scheduling can be further divided into two categories: random search directed (guaranteed random search-based) scheduling and heuristic scheduling.
In the heterogeneous computation, in order to fully utilize the heterogeneous fusion system resources, a higher parallelism is preferably maintained in the scheduling process. Based on this assumption, the selected task should increase the overall parallelism of the program as much as possible, and the task with larger out-degree should be scheduled as early as possible to activate the execution of more subsequent tasks, so as to ensure that there is always enough parallelism in the program execution process and the processor resources are utilized more fully. The execution flow of the scheduling method is shown in fig. 1, and includes the steps of:
s101, when a scheduling request is received, acquiring a directed acyclic graph corresponding to each task in the scheduling request, wherein each node in the directed acyclic graph corresponds to each task;
in the embodiment of the present invention, when a general task is scheduled, the general task is preferentially decomposed to obtain each task included in the general task, wherein the decomposition process may be performed according to experience or specific conditions, and the specific decomposition process is not limited in the embodiment of the present invention. Wherein the tasks are represented by directed acyclic graphs: g = (V, E), where V is the set of nodes and E is the set of edges. Nodes represent specific computational tasks, and edges represent data and control dependencies between different tasks. In the abstract machine model, a plurality of heterogeneous processors form computer nodes through board-level interconnection, and the computer nodes are connected into a computing cluster through a network. The abstract machine model comprises: device (device), computer node (computer node) and cluster (cluster). The abstract machine model summarizes various hardware levels from heterogeneous processors to large-scale heterogeneous systems in a concise mode, and has good representativeness and universality. Therefore, when a scheduling request for each task is received, a directed acyclic graph corresponding to each task is acquired.
S102, calculating corresponding weighted out-degrees of each node in the directed acyclic graph through the subsequent nodes with dependency relationship to obtain each weighted out-degree;
in the embodiment of the present invention, first, the node incomes of each node are determined according to the directed acyclic graph, where the node incomes (inputdigree), that is, the number of other nodes that the node depends on, is determined. Taking the directed acyclic graph shown in fig. 2 as an example, node 0 in fig. 2 is a starting node, and does not depend on any node, and nodes with an entry degree of 0,1 and 2 only depend on node 0, so that nodes with an entry degree of 1,3 depend on node 1 and node 2, and an entry degree of 2.
For each node, the corresponding weighted out-degree can be first order, second order or full order, and the node in-degree of the subsequent node having a dependency relationship with the node in-degree is obtained, wherein the dependency relationship is related to the order of the weighted out-degree, and the calculation method of the weighted out-degree is as follows:
Figure BDA0002414759100000081
Figure BDA0002414759100000082
Figure BDA0002414759100000083
wherein, ID (v) j ) Is node v j The node-in-degree of (c) is,
Figure BDA0002414759100000084
is node v j A isOrder 2 out factor of a node, WOD (v) j ) Is a first order weighted output, WOD 2 (v j ) Is a second order weighted output, WOD c (v j ) Full order weighted out degree, v exit Is the egress node succ (v) j ) Is the successor node.
In the embodiment of the present invention, taking the directed acyclic graph shown in fig. 2 as an example, the successor nodes of node 0 are two (node 1, node 2); successor nodes of node 1 and node 2 are both 3; node 3 has no successor nodes. Taking node No. 0 as an example, if the first-order weighted out-degree calculation method is adopted for calculation, only node No. 1 and node No. 2 directly associated with the node No. 0 need to be calculated, and if the second-order weighted out-degree calculation method is adopted for calculation, in addition to node No. 1 and node No. 2, node No. 3 having a direct association relationship with node No. 1 and node No. 2 needs to be considered.
As shown in fig. 2, the successor nodes of node 0 have node 1 and node 2, and their respective degrees of entry are both 1:
Figure BDA0002414759100000091
the successor node of node 1 is only 3, and its degree of entry is 2:
Figure BDA0002414759100000092
the formula of the second-order weighted out degree also supplements the information of the successor node of the current node on the original basis, and is shown in formula 2. Node v i Is to find the node v i All successor nodes v j Accumulate the derivatives of their incomes while finding v j V of the successor node k Their derivatives of in-degree are accumulated by a factor alpha. For example, the successor nodes of the coefficient α =0.5, node No. 0 are taken as node No. 1 and node No. 2; aiming at the node No. 1, the degree of entrance is 1, the successor node is the node No. 3, and the degree of entrance of the node No. 3 is 2; for node number 2, the degree of entrance is 1, the successor node is node number 3, the degree of entrance is 2:
second order
Figure BDA0002414759100000093
Figure BDA0002414759100000094
Further, for the calculation of the full-order WOD value, that is, until the last end node, considering that the end node is the exit node and there is no subsequent node, the WOD value is defined as 0, and the calculation is performed in the same manner.
S103, sequencing the weighted out-degrees, and determining the priority order of each task based on the sequencing result;
in the embodiment of the invention, in the node out-degree priority scheduling algorithm, the weighted out-degree of the node is taken as the priority of the task during scheduling, and in all ready tasks, the weighted out-degrees are sequenced, and the node with the higher weighted out-degree is scheduled to be executed earlier. The WOD in the algorithm may be a 1 st, 2 nd, or full order WOD. And the scheduling process and the priority calculation process can be distinguished, and the value of the WOD can be calculated before the scheduling starts (for a static directed acyclic graph) or dynamically calculated at the runtime (for a dynamic directed acyclic graph).
And S104, respectively selecting a target processor for each task in each processor to complete the scheduling request according to the priority sequence.
In the embodiment of the invention, in the prior art, a specific processor is called according to the real-time state information of the processor to allocate corresponding processor resources to each task. However, the busy/idle status of the physical network is ignored in allocating processors. In the processor selection phase, the most appropriate processor is selected in the processor selection phase if the system uses the earliest completion time. This may result in a misalignment of the earliest completion time estimate in the presence of a large amount of communication. In order to eliminate the potential risk as much as possible and obtain better performance, in the embodiment of the present invention, in order to reasonably utilize processor resources, a process of selecting a target processor with the shortest completion time to complete calculation is as follows:
firstly, judging whether each processor has an idle processor, wherein the judging method can be based on a corresponding state identifier, an occupation percentage of the processor or other judging methods, if yes, respectively acquiring the number of the idle processors and the number of each task, further judging whether the number of the idle processors is greater than the number of each task, if so, allocating each task to each idle processor according to a priority sequence, and the allocation principle is as follows: and aiming at each task, selecting a target processor with the shortest completion time from the corresponding completion time to complete the calculation, wherein the completion time of each task on the corresponding processor is known.
If not, the calculation of each task cannot be completed, and according to the priority order, a target processor with the shortest completion time is selected for each task in each idle processor to complete the calculation, wherein the selection process is the same as the above process, and is not described again.
When the allocation of the idle processor is completed, allocating the other tasks to the other processors for calculation, and if no idle processor exists, selecting a target processor for each task as follows:
according to the priority sequence, each task is based on
Figure BDA0002414759100000101
EFT conflict (v i ,p j )=EST conflict (v i ,p j )+w i,j (5)
Determining an earliest completion time for each processor, wherein T ava (p j ) Is a processor p j The available time of (AFT) refers to the actual start time of the task, v m And v i Is a node, c m,i Finger task v m And task v i Time of communication between, w i,j Is task v i At processor p j Overhead of calculation, EST conflict (v i ,p j ) Is task v i At processor p j The earliest start time of (E)FT conflict Is task v i At processor p j The earliest completion time of (a);
and aiming at each task, selecting the target processor with the shortest completion time from the corresponding completion time to complete the calculation.
The invention discloses a task scheduling method for a heterogeneous fusion system, which comprises the following steps: when a scheduling request is received, acquiring a directed acyclic graph corresponding to each task in the scheduling request, wherein each node in the directed acyclic graph corresponds to each task; aiming at each node in the directed acyclic graph, calculating the corresponding weighted out-degree through the successive node with the dependency relationship to obtain each weighted out-degree; sequencing the weighted out-degrees, and determining the priority sequence of each task based on the sequencing result; and according to the priority sequence, respectively selecting a target processor for each task in each processor to complete the scheduling request. In the scheduling method, in the process of determining the priority of each task, only the subsequent nodes with the dependency relationship in the directed acyclic graph need to be calculated, and all the nodes in the directed acyclic graph do not need to be traversed for calculation, so that the calculation amount is reduced.
In the embodiment of the invention, a scheduling algorithm DONF (degree of node first) based on weighted out-degree of task nodes derives two variant strategies (2-order and full-order DONF) on the basis, and further considers more local and global information in an abstract program model. The DONF algorithm fully considers the characteristics of a data flow program execution model and a heterogeneous system, on one hand, the data flow program execution model has small task granularity and more complex dependency relationship among tasks, the DONF scheduling algorithm simplifies task selection logic, selects scheduling tasks with lower cost and avoids traversing of program directed acyclic graphs, so that the DONF algorithm can process more complex conditions, such as dynamic graph scheduling; on the other hand, different hardware in the heterogeneous system has large difference, the role played by communication in task scheduling is more important, and the DONF algorithm considers the condition of communication link conflict in the processor selection stage and constructs a novel communication model for task scheduling.
In the embodiment of the invention, in the task scheduling problem to be processed, an application program is represented by a directed acyclic graph: g = (V, E), where V is a set of nodes and E is a set of edges. Nodes represent specific computational tasks, and edges represent data and control dependencies between different tasks. In the abstract machine model, a plurality of heterogeneous processors form computer nodes through board-level interconnection, and the computer nodes are connected into a computing cluster through a network. The scheduling of the parallel program on the heterogeneous fusion system comprises two stages: task selection (or task priority calculation) and processor selection. The task selection calculates the priority of the task according to the task attribute and selects a task to be scheduled from all candidate tasks; processor selection selects the best processor for the scheduled task
In the embodiment of the present invention, an example is given based on the foregoing scheduling method, where the processor and the cluster configuration are shown in table 1, the detailed information of the cluster configuration is shown in table 1, and there are 3 types of processors: the small processor is used for calculating the speed of 10GFlos, 1GBRAM,1085MB/s memory bandwidth and 1562.5MB/s network I/O port; the intermediate processor has the calculation speeds of 100GFlops,1GBRAM,1310MB/s of memory bandwidth and 3125MB/s of network ports; large processors with computing speeds of 1TFlops,2GB RAM,1310MB/s memory bandwidth and 3125MB/s network port.
Figure BDA0002414759100000121
Figure BDA0002414759100000122
The overall execution flow of task execution is shown in fig. 3, and is as follows: the Global clock Timer is used for recording the time sequence information of program execution in the simulation process. The runtime maintains 3 important data structures based on system configuration: a waiting list (PendingList), a ready queue (ReadyQueue), and an execution queue (ExecutionQueue). The number of unsatisfied dependencies for all tasks is stored in the waiting list. Once the number of unsatisfied dependencies of a task has decreased to 0, it will be inserted into the ready queue, and the state will also transition to ready. The ready queue contains all ready tasks during program execution. The execution queue stores all nodes in execution and their completion times, wherein the task nodes are all in "execution" state. The entire pipeline of the simulation can be described in detail by the following steps:
initialization: the originating node is added to the ready queue.
S1: selecting a task from a ready queue according to a preset principle, wherein the preset principle is related to a scheduling strategy;
s2: selecting a processor to execute the selected task according to a method defined by a scheduling strategy;
s3: starting execution, adding the selected task into an execution queue, calculating and recording the completion time, and updating the states of a processor and a network link;
s4: calculating the next decision time point, updating the global timer, and correspondingly skipping S1 or S5;
s5: the task execution is completed, the corresponding processor and the network link state are updated, all subsequent tasks are reduced to satisfy the dependency number, if the unsatisfied dependency number of some tasks is reduced to 0, the tasks are added into the ready queue, and then the unsatisfied dependency number of the tasks is reset;
s6: and if the two queues are empty and the consistency of the needed iteration times is finished, ending the simulation and outputting a simulation report.
Each time the task with the largest WOD value is selected from the ready queue, the EFT of that task on all processors is then computed conflict Value and assign task with minimal EFT conflict A processor of values. If there are multiple tasks with the same maximum WOD value, the task that entered the ready queue earliest is scheduled first to ensure fairness of scheduling, avoiding starvation of the task. If there are multiple processors with the same minimum EFT conflict Value, the algorithm will choose one at random, but the idle processor (or least loaded processor) will be preferentially chosen in the process.
Based on the foregoing task scheduling method for the heterogeneous convergence system, in an embodiment of the present invention, a task scheduling apparatus for the heterogeneous convergence system is further provided, and a structural block diagram of the scheduling apparatus is shown in fig. 4, and includes:
an acquisition module 201, a calculation module 202, a priority determination module 203 and a selection and calculation module 204.
Wherein,
the obtaining module 201 is configured to obtain, when a scheduling request is received, a directed acyclic graph corresponding to each task in the scheduling request, where each node in the directed acyclic graph corresponds to each task;
the calculating module 202 is configured to calculate, for each node in the directed acyclic graph, a corresponding weighted out-degree according to a successor node having a dependency relationship with the node;
the priority determining module 203 is configured to determine a priority order of each task according to each weighted out-degree;
the selecting and calculating module 204 is configured to select, according to the priority order, a target processor with the shortest completion time for each task in each processor to complete calculation.
The invention discloses a task scheduling device facing a heterogeneous fusion system, which comprises: when a scheduling request is received, acquiring a directed acyclic graph corresponding to each task in the scheduling request, wherein each node in the directed acyclic graph corresponds to each task; aiming at each node in the directed acyclic graph, calculating the corresponding weighted out-degree through the successive node with the dependency relationship to obtain each weighted out-degree; sequencing the weighted out-degrees, and determining the priority order of each task based on the sequencing result; and according to the priority sequence, respectively selecting a target processor for each task in each processor to complete the scheduling request. In the scheduling device, in the process of determining the priority of each task, only the subsequent nodes which have the dependency relationship with the task in the directed acyclic graph need to be calculated, and all the nodes in the directed acyclic graph do not need to be traversed for calculation, so that the calculation amount is reduced.
In this embodiment of the present invention, the calculating module 202 includes:
a node in-degree determination unit 205 and an acquisition and calculation unit 206.
Wherein,
the node-degree-of-entry determining unit 205 is configured to determine the node degree of each node according to the directed acyclic graph;
the obtaining and calculating unit 206 is configured to obtain, for each node, a node degree of entry of a subsequent node having a dependency relationship with the node, according to
Figure BDA0002414759100000141
Or
Figure BDA0002414759100000142
Or
Figure BDA0002414759100000143
The weighted out-degree is calculated and calculated,
wherein, ID (v) j ) Is node v j The node-in-degree of (c) is,
Figure BDA0002414759100000151
is node v j Alpha is the 2 nd order degree factor of the node, WOD (v) j ) Is a first order weighted output, WOD 2 (v j ) Is a second order weighted output, WOD c (v j ) Full order weighted out degree, v exit Is the egress node succ (v) j ) Is the successor node.
In this embodiment of the present invention, the selecting and calculating module 204 includes:
a first judgment unit 207, a second judgment unit 208 and a first selection unit 209.
Wherein,
the first determining unit 207 is configured to determine whether an idle processor exists in the processors;
the second determining unit 208 is configured to determine whether the number of idle processors can complete the calculation of each task if the number of idle processors exists;
the first selecting unit 209 is configured to select, if applicable, a target processor with the shortest completion time for each task in each idle processor according to the priority order to complete the calculation.
In this embodiment of the present invention, the selecting and calculating module 204 further includes: a calculation and distribution unit 210.
Wherein,
the calculating and allocating unit 210 is configured to, if not, allocate the corresponding task to the idle processor for calculation according to the priority order, and when the idle processor is allocated completely, allocate the remaining unallocated tasks to the remaining processors for calculation according to the remaining unallocated tasks.
In this embodiment of the present invention, the selecting and calculating module 204 further includes: a determination unit 211 and a second selection unit 212.
Wherein,
the determining unit 211 is configured to base the tasks on the priority order if there is no idle processor
Figure BDA0002414759100000152
And
EFT conflict (v i ,p j )=EST conflict (v i ,p j )+w i,j determining an earliest completion time for each processor, wherein T ava (p j ) Is a processor p j The available time of (AFT) refers to the actual start time of the task, v m And v i Is a node, c m,i Finger task v m And task v i Time of communication between w i,j Is task v i At processor p j Upper calculation cost EST conflict (v i ,p j ) Is task v i At processor p j At the earliest start time, EFT conflict Is task v i At processor p j The earliest of (2)A completion time;
the second selecting unit 212 is configured to, for each task, select a target processor with the shortest completion time from the corresponding completion times to complete the calculation.
The embodiments are mainly described with different differences from the other embodiments, and the same and similar parts among the embodiments can be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.
For convenience of description, the above devices are described as being divided into various units by function, respectively. Of course, the functions of the units may be implemented in the same software and/or hardware or in a plurality of software and/or hardware when implementing the invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The task scheduling method and device for the heterogeneous convergence system provided by the invention are described in detail above, and a specific example is applied in the description to explain the principle and the implementation of the invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A task scheduling method for a heterogeneous convergence system is characterized by comprising the following steps:
when a scheduling request is received, acquiring a directed acyclic graph corresponding to each task in the scheduling request, wherein each node in the directed acyclic graph corresponds to each task;
aiming at each node in the directed acyclic graph, calculating the corresponding weighted out-degree through the successor node with the dependency relationship to obtain each weighted out-degree;
sequencing the weighted out-degrees, and determining the priority sequence of each task based on the sequencing result;
and according to the priority sequence, respectively selecting a target processor for each task in each processor to complete the scheduling request.
2. The method according to claim 1, wherein for each node in the directed acyclic graph, calculating a weighted out-degree corresponding to each node according to the successor nodes having a dependency relationship with the node, comprises:
determining the node in-degree of each node based on the directed acyclic graph;
aiming at each node, acquiring the node in-degree of a successor node with a dependency relationship with the node, and calculating the corresponding weighted out-degree of the successor node with the dependency relationship with the node according to a target weighted out-degree calculation formula;
the target weighted out-degree formula is
Figure FDA0002414759090000011
Or
Figure FDA0002414759090000012
Or
Figure FDA0002414759090000013
In the above-mentioned manner, the first and second,
wherein, ID (v) j ) Is node v j The node-in-degree of (c) is,
Figure FDA0002414759090000014
is node v j Alpha is the 2 nd order degree factor of the node, WOD (v) j ) Is a first order weighted output, WOD 2 (v j ) Is a second order weighted output, WOD c (v j ) Full order weighted out, v exit Is the egress node succ (v) j ) Is the successor node.
3. The method of claim 1, wherein selecting a target processor for each task in each processor to fulfill the scheduling request according to the priority order comprises:
judging whether an idle processor exists in each processor;
if yes, judging whether the number of the idle processors can finish the calculation of each task;
if so, respectively acquiring the earliest completion time of each task in each idle processor according to the priority sequence, taking the processor corresponding to the shortest time in each earliest completion time as a target processor, and scheduling each task to the corresponding target processor to complete the scheduling request.
4. The method of claim 3, further comprising:
if not, scheduling the corresponding tasks to the idle processor for calculation according to the priority sequence, and when the idle processor is scheduled completely, allocating the rest unallocated tasks to the rest processors for calculation according to the result of allocating the rest unallocated tasks to the rest processors.
5. The method of claim 3, further comprising:
if no idle processor exists, determining the earliest completion time of each task in each processor according to a completion time formula according to the priority sequence;
the completion time formula is
Figure FDA0002414759090000021
And
EFT conflict (v i ,p j )=EST conflict (v i ,p j )+w i,j
wherein, T ava (p j ) Is a processor p j The available time of (AFT) refers to the actual start time of the task, v m And v i Is a node, c m,i Finger task v m And task v i Time of communication between w i,j Is task v i At processor p j Overhead of calculation, EST conflict (v i ,p j ) Is task v i At processor p j Earliest start time, EFT conflict Is task v i At processor p j The earliest completion time of (a);
and aiming at each task, taking the processor corresponding to the shortest time in the earliest completion time as a target processor, and scheduling each task to the corresponding target processor to complete the scheduling request.
6. A task scheduling device for a heterogeneous convergence system is characterized by comprising:
the system comprises an acquisition module, a scheduling module and a processing module, wherein the acquisition module is used for acquiring a directed acyclic graph corresponding to each task in a scheduling request when the scheduling request is received, and each node in the directed acyclic graph corresponds to each task;
the calculation module is used for calculating the corresponding weighted out-degree of each node in the directed acyclic graph through the subsequent node with the dependency relationship to obtain each weighted out-degree;
the priority determining module is used for sequencing the weighted excesses and determining the priority sequence of each task based on the sequencing result;
and the selecting and calculating module is used for selecting a target processor for each task in each processor to complete the scheduling request according to the priority sequence.
7. The apparatus of claim 6, wherein the computing module comprises:
a node degree determining unit, configured to determine a node degree of each node based on the directed acyclic graph;
the acquisition and calculation unit is used for acquiring the node in-degree of each node of the successor nodes with the dependency relationship, and calculating the weighted out-degree corresponding to the successor nodes with the dependency relationship according to a target weighted out-degree calculation formula;
the target weighted out-degree formula is
Figure FDA0002414759090000031
Or
Figure FDA0002414759090000032
Or
Figure FDA0002414759090000033
In the above-mentioned (b) is,
wherein, ID (v) j ) Is node v j The node-in-degree of (c) is,
Figure FDA0002414759090000034
is node v j Alpha is the 2 nd order degree factor of the node, WOD (v) j ) Is a first order weighted output, WOD 2 (v j ) Is a second order weighted output, WOD c (v j ) Full order weighted out degree, v exit Is the egress node succ (v) j ) Is the successor node.
8. The apparatus of claim 6, wherein the selecting and calculating module comprises:
a first judging unit, configured to judge whether an idle processor exists in each processor;
a second judging unit, configured to judge whether the number of idle processors can complete the calculation for each task if the idle processors exist;
and the first selection unit is used for respectively acquiring the earliest completion time of each task in each idle processor according to the priority sequence if the task can be completed, taking the processor corresponding to the shortest time in each earliest completion time as a target processor, and scheduling each task to the corresponding target processor to complete the scheduling request.
9. The apparatus of claim 8, further comprising:
and the calculating and distributing unit is used for scheduling the corresponding tasks to the idle processor for calculation according to the priority order if the tasks are not available, and distributing the rest unallocated tasks to the rest processors for calculation according to the scheduling of the idle processor.
10. The apparatus of claim 8, further comprising:
the determining unit is used for determining the earliest completion time of each task in each processor according to the completion time formula according to the priority order if no idle processor exists;
the completion time is formulated as
Figure FDA0002414759090000041
And
EFT conflict (v i ,p j )=EST conflict (v i ,p j )+w i,j
wherein, T ava (p j ) Is a processor p j The available time of (AFT) refers to the actual start time of the task, v m And v i Is a node, c m,i Finger task v m And task v i Time of communication between w i,j Is task v i At processor p j Upper calculation cost, EST conflict (v i ,p j ) Is task v i At processor p j Earliest start time, EFT conflict Is task v i At processor p j The earliest completion time of (c);
and the second selection unit is used for taking the processor corresponding to the shortest time in the earliest completion time as a target processor for each task, and scheduling each task to the corresponding target processor to complete the scheduling request.
CN202010187660.0A 2020-03-17 2020-03-17 Task scheduling method and device for heterogeneous fusion system Active CN111367644B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010187660.0A CN111367644B (en) 2020-03-17 2020-03-17 Task scheduling method and device for heterogeneous fusion system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010187660.0A CN111367644B (en) 2020-03-17 2020-03-17 Task scheduling method and device for heterogeneous fusion system

Publications (2)

Publication Number Publication Date
CN111367644A CN111367644A (en) 2020-07-03
CN111367644B true CN111367644B (en) 2023-03-14

Family

ID=71210501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010187660.0A Active CN111367644B (en) 2020-03-17 2020-03-17 Task scheduling method and device for heterogeneous fusion system

Country Status (1)

Country Link
CN (1) CN111367644B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633753A (en) * 2020-12-30 2021-04-09 广东赛诺科技股份有限公司 Dynamic work order sharing system
CN112965797B (en) * 2021-03-05 2022-02-22 山东省计算中心(国家超级计算济南中心) Combined priority scheduling method for complex tasks under Kubernetes environment
CN113485819A (en) * 2021-08-03 2021-10-08 北京八分量信息科技有限公司 Heterogeneous task preprocessing method and device and related products
CN113535367B (en) * 2021-09-07 2022-01-25 北京达佳互联信息技术有限公司 Task scheduling method and related device
CN114741121B (en) * 2022-04-14 2023-10-20 哲库科技(北京)有限公司 Method and device for loading module and electronic equipment
CN118295775A (en) * 2023-01-03 2024-07-05 中兴通讯股份有限公司 Task retrieval method and device and electronic equipment
CN116880994B (en) * 2023-09-07 2023-12-12 之江实验室 Multiprocessor task scheduling method, device and equipment based on dynamic DAG

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193826B (en) * 2011-05-24 2012-12-19 哈尔滨工程大学 Method for high-efficiency task scheduling of heterogeneous multi-core processor
US10713088B2 (en) * 2017-03-23 2020-07-14 Amazon Technologies, Inc. Event-driven scheduling using directed acyclic graphs
CN109561148B (en) * 2018-11-30 2021-03-23 湘潭大学 Distributed task scheduling method based on directed acyclic graph in edge computing network

Also Published As

Publication number Publication date
CN111367644A (en) 2020-07-03

Similar Documents

Publication Publication Date Title
CN111367644B (en) Task scheduling method and device for heterogeneous fusion system
CN107301500B (en) Workflow scheduling method based on key path task look-ahead
US8185908B2 (en) Dynamic scheduling in a distributed environment
CN110689262B (en) Space-based information system task scheduling method and device and electronic equipment
WO2022028157A1 (en) Elastic scaling method and system for microservice system in cloud environment, medium and device
CN106874084B (en) Distributed workflow scheduling method and device and computer equipment
CN109634744B (en) Accurate matching method, equipment and storage medium based on cloud platform resource allocation
US20100036641A1 (en) System and method of estimating multi-tasking performance
CN108628665A (en) Task scheduling based on data-intensive scientific workflow and virtual machine integration method
CN111258745A (en) Task processing method and device
CN115033357A (en) Micro-service workflow scheduling method and device based on dynamic resource selection strategy
CN114217966A (en) Deep learning model dynamic batch processing scheduling method and system based on resource adjustment
Decker et al. Heuristic scheduling of grid workflows supporting co-allocation and advance reservation
CN115022311A (en) Selection method and device of micro-service container instances
CN116302519A (en) Micro-service workflow elastic scheduling method, system and equipment based on container cloud platform
Dong et al. Deep reinforcement learning for dynamic workflow scheduling in cloud environment
Kaur et al. Analysis, comparison and performance evaluation of BNP scheduling algorithms in parallel processing
CN114860417B (en) Multi-core neural network processor and multi-task allocation scheduling method for same
Hilman et al. Task-based budget distribution strategies for scientific workflows with coarse-grained billing periods in iaas clouds
CN113094155A (en) Task scheduling method and device under Hadoop platform
Qin et al. Dependent task scheduling algorithm in distributed system
CN115509926A (en) Multi-task scheduling cloud testing method based on improved particle swarm optimization
CN115033355A (en) Task scheduling method, electronic device and storage medium
US20160266935A1 (en) Parallel computing device, parallel computing system, and job control method
Suzuki et al. Execution Right Delegation Scheduling Algorithm for Multiprocessor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant