CN111367644B

CN111367644B - Task scheduling method and device for heterogeneous fusion system

Info

Publication number: CN111367644B
Application number: CN202010187660.0A
Authority: CN
Inventors: 安虹; 林晗; 李明凡; 韩文廷; 林增; 陈俊仕
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2020-03-17
Filing date: 2020-03-17
Publication date: 2023-03-14
Anticipated expiration: 2040-03-17
Also published as: CN111367644A

Abstract

The invention discloses a task scheduling method for a heterogeneous fusion system, which comprises the following steps: when a scheduling request is received, acquiring a directed acyclic graph corresponding to each task in the scheduling request, wherein each node in the directed acyclic graph corresponds to each task; aiming at each node in the directed acyclic graph, calculating the corresponding weighted out-degree through the successive node with the dependency relationship to obtain each weighted out-degree; sequencing the weighted out-degrees, and determining the priority order of each task based on the sequencing result; and according to the priority sequence, respectively selecting a target processor for each task in each processor to complete the scheduling request. In the scheduling method, in the process of determining the priority of each task, only the subsequent nodes with the dependency relationship in the directed acyclic graph need to be calculated, and all the nodes in the directed acyclic graph do not need to be traversed for calculation, so that the calculation amount is reduced.

Description

Task scheduling method and device for heterogeneous fusion system

Technical Field

The invention relates to the technical field of computers, in particular to a task scheduling method and device for a heterogeneous fusion system.

Background

The heterogeneous fusion system comprises different types of computing resources deployed locally or remotely, and task scheduling of the heterogeneous fusion system comprises two stages: task selection (or task priority calculation) and processor selection.

The existing priority determination process is determined according to the necessary attributes of each task and a priority algorithm, wherein the necessary attributes comprise: the amount of computation (processor resources consumed by the task), the amount of memory access (memory resources consumed by the task), the source node and the destination node (the task is started by the source node and finished by the destination node), and the amount of traffic (communication link resources consumed by the task). The priority algorithm includes: the shorter the short job priority algorithm, i.e., the shorter (less computationally intensive) the task priority is higher; the critical path algorithm (the path from the originating node to the end node so the longest cost accumulates among the paths) sets all tasks on the critical path to high priority. In the process of determining the priority of each task, all nodes of the directed acyclic graph corresponding to each task need to be traversed for calculation, and the calculation amount is large.

Disclosure of Invention

In view of this, the present invention provides a method and an apparatus for estimating muscle strength based on micro-neural drive information, so as to solve the problem that in the existing process of estimating muscle strength by using an issuing sequence, the accuracy of muscle strength estimation is affected because the difference of action potential waveforms of different motion units on the muscle strength is not considered, and the specific scheme is as follows:

a task scheduling method for a heterogeneous fusion system comprises the following steps:

when a scheduling request is received, acquiring a directed acyclic graph corresponding to each task in the scheduling request, wherein each node in the directed acyclic graph corresponds to each task;

aiming at each node in the directed acyclic graph, calculating the corresponding weighted out-degree through the successive node with the dependency relationship to obtain each weighted out-degree;

sequencing the weighted out-degrees, and determining the priority order of each task based on the sequencing result;

and according to the priority sequence, respectively selecting a target processor for each task in each processor to complete the scheduling request.

Optionally, in the method, for each node in the directed acyclic graph, the corresponding weighted out-degree is calculated according to the successor node having a dependency relationship with the node, and the method includes:

determining the node in-degree of each node based on the directed acyclic graph;

aiming at each node, acquiring the node in-degree of a subsequent node with a dependency relationship, and calculating the corresponding weighted out-degree of the subsequent node with the dependency relationship according to a target weighted out-degree calculation formula;

the target weighted out-degree formula is

Or

Or

In the above-mentioned manner, the first and second,

wherein, ID (v) _j ) Is node v _j The node-in-degree of (c) is,

is node v _j Alpha is the 2 nd order degree factor of the node, WOD (v) _j ) Is a first order weighted output, WOD ₂ (v _j ) Is a second order weighted output, WOD _c (v _j ) Full order weighted out degree, v _exit Is the egress node succ (v) _j ) Is the successor node.

Optionally, the method for selecting a target processor for each task in each processor to complete the scheduling request includes:

judging whether an idle processor exists in each processor;

if yes, judging whether the number of the idle processors can finish the calculation of each task;

if so, respectively acquiring the earliest completion time of each task in each idle processor according to the priority sequence, taking the processor corresponding to the shortest time in each earliest completion time as a target processor, and scheduling each task to the corresponding target processor to complete the scheduling request.

The method described above, optionally, further includes:

if not, scheduling the corresponding tasks to the idle processor for calculation according to the priority sequence, and when the idle processor finishes scheduling, allocating the rest unallocated tasks to the rest processors for calculation according to the rest unallocated tasks.

The above method, optionally, further includes:

if no idle processor exists, determining the earliest completion time of each task in each processor according to a completion time formula according to the priority sequence;

the completion time is formulated as

And

EFT _conflict (v _i ，p _j )＝EST _conflict (v _i ，p _j )+w _i，j ，

wherein, T _ava (p _j ) Is a processor p _j The available time of (AFT) refers to the actual start time of the task, v _m And v _i Is a node, c _m，i Finger task v _m And task v _i Time of communication between w _i，j Is task v _i At processor p _j Overhead of calculation, EST _conflict (v _i ，p _j ) Is task v _i At processor p _j At the earliest start time, EFT _conflict Is task v _i At processor p _j The earliest completion time of (c);

and aiming at each task, taking the processor corresponding to the shortest time in the earliest completion time as a target processor, and scheduling each task to the corresponding target processor to complete the scheduling request.

A task scheduling device facing a heterogeneous convergence system comprises:

the system comprises an acquisition module, a scheduling module and a processing module, wherein the acquisition module is used for acquiring a directed acyclic graph corresponding to each task in a scheduling request when the scheduling request is received, and each node in the directed acyclic graph corresponds to each task;

the calculation module is used for calculating the corresponding weighted out-degree of each node in the directed acyclic graph through the subsequent node with the dependency relationship to obtain each weighted out-degree;

the priority determining module is used for sequencing the weighted out-degrees and determining the priority sequence of each task based on the sequencing result;

and the selecting and calculating module is used for selecting a target processor for each task in each processor to complete the scheduling request according to the priority sequence.

The above apparatus, optionally, the calculating module includes:

a node degree determining unit, configured to determine a node degree of each node based on the directed acyclic graph;

the acquisition and calculation unit is used for acquiring the node in-degree of each node of the successor nodes with the dependency relationship, and calculating the weighted out-degree corresponding to the successor nodes with the dependency relationship according to a target weighted out-degree calculation formula;

the target weighted out-degree formula is

Or

Or

In the above-mentioned manner, the first and second,

wherein, ID (v) _j ) Is node v _j The node-in-degree of (c) is,

is node v _j Alpha is the 2 nd order degree factor of the node, WOD (v) _j ) Is a first order weighted output, WOD ₂ (v _j ) Is a second order weighted output, WOD _c (v _j ) Full order weighted out, v _exit Is the egress node succ (v) _j ) Is the successor node.

The above apparatus, optionally, the selecting and calculating module includes:

a first judging unit, configured to judge whether an idle processor exists in each processor;

a second judging unit, configured to judge whether the number of idle processors can complete the calculation of each task if the idle processors exist;

and the first selection unit is used for respectively acquiring the earliest completion time of each task in each idle processor according to the priority sequence if the task can be completed, taking the processor corresponding to the shortest time in each earliest completion time as a target processor, and scheduling each task to the corresponding target processor to complete the scheduling request.

The above apparatus, optionally, further comprises:

and the calculating and distributing unit is used for scheduling the corresponding tasks to the idle processor for calculation according to the priority order if the tasks cannot be calculated, and distributing the rest unallocated tasks to the rest processors for calculation according to the scheduling of the idle processor.

The above apparatus, optionally, further comprises:

the determining unit is used for determining the earliest completion time of each task in each processor according to the completion time formula according to the priority order if no idle processor exists;

the completion time formula is

And

EFT _conflict (v _i ，p _j )＝EST _conflict (v _i ，p _j )+w _i，j ，

wherein, T _ava (p _j ) Is a processor p _j The available time of (AFT) refers to the actual start time of the task, v _m And v _i Is a node, c _m，i Finger task v _m And task v _i Time of communication between w _i，j Is task v _i At processor p _j Overhead of calculation, EST _conflict (v _i ，p _j ) Is task v _i At processor p _j The earliest start ofTime, EFT _conflict Is task v _i At processor p _j The earliest completion time of (c);

and the second selection unit is used for taking the processor corresponding to the shortest time in the earliest completion time as a target processor for each task, and scheduling each task to the corresponding target processor to complete the scheduling request.

Compared with the prior art, the invention has the following advantages:

the invention discloses a task scheduling method for a heterogeneous fusion system, which comprises the following steps: when a scheduling request is received, acquiring a directed acyclic graph corresponding to each task in the scheduling request, wherein each node in the directed acyclic graph corresponds to each task; aiming at each node in the directed acyclic graph, calculating the corresponding weighted out-degree through the successive node with the dependency relationship to obtain each weighted out-degree; sequencing the weighted out-degrees, and determining the priority order of each task based on the sequencing result; and according to the priority sequence, respectively selecting a target processor for each task in each processor to complete the scheduling request. In the scheduling method, only the subsequent nodes with the dependency relationship in the directed acyclic graph need to be calculated in the process of determining the priority of each task, and all the nodes in the directed acyclic graph do not need to be traversed for calculation, so that the calculation amount is reduced.

Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a task scheduling method for a heterogeneous convergence system according to an embodiment of the present disclosure;

FIG. 2 is a schematic view of a directed acyclic graph according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a task execution flow disclosed in an embodiment of the present application;

fig. 4 is a block diagram of a task scheduling device for a heterogeneous convergence system according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The invention discloses a task scheduling method and device for a heterogeneous fusion system, which are applied to the task scheduling process of the heterogeneous fusion system, wherein the heterogeneous fusion system can be applied to High Performance Computing (HPC), cloud computing and deep learning. Typically, heterogeneous converged systems contain a range of different types of computing resources that can be deployed locally or remotely. The scheduling of parallel programs on a heterogeneous fusion system comprises two stages: task selection (or task priority calculation) and processor selection. The task selection calculates the priority of the task according to the task attribute, and selects the task to be scheduled from all the candidate tasks; processor selection selects the best processor for the scheduled task.

The scheduling algorithm research on the heterogeneous fusion system is mainly divided into static scheduling and dynamic scheduling. In dynamic scheduling, the execution overhead, communication overhead and the relationship between tasks are unknown in advance, and the decision is completely completed in running. In static scheduling, this information is known in advance. In general terms, static scheduling is compile-time scheduling, while dynamic scheduling is runtime scheduling. Static scheduling can be further divided into two categories: random search directed (guaranteed random search-based) scheduling and heuristic scheduling.

In the heterogeneous computation, in order to fully utilize the heterogeneous fusion system resources, a higher parallelism is preferably maintained in the scheduling process. Based on this assumption, the selected task should increase the overall parallelism of the program as much as possible, and the task with larger out-degree should be scheduled as early as possible to activate the execution of more subsequent tasks, so as to ensure that there is always enough parallelism in the program execution process and the processor resources are utilized more fully. The execution flow of the scheduling method is shown in fig. 1, and includes the steps of:

s101, when a scheduling request is received, acquiring a directed acyclic graph corresponding to each task in the scheduling request, wherein each node in the directed acyclic graph corresponds to each task;

in the embodiment of the present invention, when a general task is scheduled, the general task is preferentially decomposed to obtain each task included in the general task, wherein the decomposition process may be performed according to experience or specific conditions, and the specific decomposition process is not limited in the embodiment of the present invention. Wherein the tasks are represented by directed acyclic graphs: g = (V, E), where V is the set of nodes and E is the set of edges. Nodes represent specific computational tasks, and edges represent data and control dependencies between different tasks. In the abstract machine model, a plurality of heterogeneous processors form computer nodes through board-level interconnection, and the computer nodes are connected into a computing cluster through a network. The abstract machine model comprises: device (device), computer node (computer node) and cluster (cluster). The abstract machine model summarizes various hardware levels from heterogeneous processors to large-scale heterogeneous systems in a concise mode, and has good representativeness and universality. Therefore, when a scheduling request for each task is received, a directed acyclic graph corresponding to each task is acquired.

S102, calculating corresponding weighted out-degrees of each node in the directed acyclic graph through the subsequent nodes with dependency relationship to obtain each weighted out-degree;

in the embodiment of the present invention, first, the node incomes of each node are determined according to the directed acyclic graph, where the node incomes (inputdigree), that is, the number of other nodes that the node depends on, is determined. Taking the directed acyclic graph shown in fig. 2 as an example, node 0 in fig. 2 is a starting node, and does not depend on any node, and nodes with an entry degree of 0,1 and 2 only depend on node 0, so that nodes with an entry degree of 1,3 depend on node 1 and node 2, and an entry degree of 2.

For each node, the corresponding weighted out-degree can be first order, second order or full order, and the node in-degree of the subsequent node having a dependency relationship with the node in-degree is obtained, wherein the dependency relationship is related to the order of the weighted out-degree, and the calculation method of the weighted out-degree is as follows:

wherein, ID (v) _j ) Is node v _j The node-in-degree of (c) is,

is node v _j A isOrder 2 out factor of a node, WOD (v) _j ) Is a first order weighted output, WOD ₂ (v _j ) Is a second order weighted output, WOD _c (v _j ) Full order weighted out degree, v _exit Is the egress node succ (v) _j ) Is the successor node.

In the embodiment of the present invention, taking the directed acyclic graph shown in fig. 2 as an example, the successor nodes of node 0 are two (node 1, node 2); successor nodes of node 1 and node 2 are both 3; node 3 has no successor nodes. Taking node No. 0 as an example, if the first-order weighted out-degree calculation method is adopted for calculation, only node No. 1 and node No. 2 directly associated with the node No. 0 need to be calculated, and if the second-order weighted out-degree calculation method is adopted for calculation, in addition to node No. 1 and node No. 2, node No. 3 having a direct association relationship with node No. 1 and node No. 2 needs to be considered.

As shown in fig. 2, the successor nodes of node 0 have node 1 and node 2, and their respective degrees of entry are both 1:

the successor node of node 1 is only 3, and its degree of entry is 2:

the formula of the second-order weighted out degree also supplements the information of the successor node of the current node on the original basis, and is shown in formula 2. Node v _i Is to find the node v _i All successor nodes v _j Accumulate the derivatives of their incomes while finding v _j V of the successor node _k Their derivatives of in-degree are accumulated by a factor alpha. For example, the successor nodes of the coefficient α =0.5, node No. 0 are taken as node No. 1 and node No. 2; aiming at the node No. 1, the degree of entrance is 1, the successor node is the node No. 3, and the degree of entrance of the node No. 3 is 2; for node number 2, the degree of entrance is 1, the successor node is node number 3, the degree of entrance is 2:

second order

Further, for the calculation of the full-order WOD value, that is, until the last end node, considering that the end node is the exit node and there is no subsequent node, the WOD value is defined as 0, and the calculation is performed in the same manner.

S103, sequencing the weighted out-degrees, and determining the priority order of each task based on the sequencing result;

in the embodiment of the invention, in the node out-degree priority scheduling algorithm, the weighted out-degree of the node is taken as the priority of the task during scheduling, and in all ready tasks, the weighted out-degrees are sequenced, and the node with the higher weighted out-degree is scheduled to be executed earlier. The WOD in the algorithm may be a 1 st, 2 nd, or full order WOD. And the scheduling process and the priority calculation process can be distinguished, and the value of the WOD can be calculated before the scheduling starts (for a static directed acyclic graph) or dynamically calculated at the runtime (for a dynamic directed acyclic graph).

And S104, respectively selecting a target processor for each task in each processor to complete the scheduling request according to the priority sequence.

In the embodiment of the invention, in the prior art, a specific processor is called according to the real-time state information of the processor to allocate corresponding processor resources to each task. However, the busy/idle status of the physical network is ignored in allocating processors. In the processor selection phase, the most appropriate processor is selected in the processor selection phase if the system uses the earliest completion time. This may result in a misalignment of the earliest completion time estimate in the presence of a large amount of communication. In order to eliminate the potential risk as much as possible and obtain better performance, in the embodiment of the present invention, in order to reasonably utilize processor resources, a process of selecting a target processor with the shortest completion time to complete calculation is as follows:

firstly, judging whether each processor has an idle processor, wherein the judging method can be based on a corresponding state identifier, an occupation percentage of the processor or other judging methods, if yes, respectively acquiring the number of the idle processors and the number of each task, further judging whether the number of the idle processors is greater than the number of each task, if so, allocating each task to each idle processor according to a priority sequence, and the allocation principle is as follows: and aiming at each task, selecting a target processor with the shortest completion time from the corresponding completion time to complete the calculation, wherein the completion time of each task on the corresponding processor is known.

If not, the calculation of each task cannot be completed, and according to the priority order, a target processor with the shortest completion time is selected for each task in each idle processor to complete the calculation, wherein the selection process is the same as the above process, and is not described again.

When the allocation of the idle processor is completed, allocating the other tasks to the other processors for calculation, and if no idle processor exists, selecting a target processor for each task as follows:

according to the priority sequence, each task is based on

EFT _conflict (v _i ，p _j )＝EST _conflict (v _i ，p _j )+w _i，j (5)

Determining an earliest completion time for each processor, wherein T _ava (p _j ) Is a processor p _j The available time of (AFT) refers to the actual start time of the task, v _m And v _i Is a node, c _m，i Finger task v _m And task v _i Time of communication between, w _i，j Is task v _i At processor p _j Overhead of calculation, EST _conflict (v _i ，p _j ) Is task v _i At processor p _j The earliest start time of (E)FT _conflict Is task v _i At processor p _j The earliest completion time of (a);

and aiming at each task, selecting the target processor with the shortest completion time from the corresponding completion time to complete the calculation.

The invention discloses a task scheduling method for a heterogeneous fusion system, which comprises the following steps: when a scheduling request is received, acquiring a directed acyclic graph corresponding to each task in the scheduling request, wherein each node in the directed acyclic graph corresponds to each task; aiming at each node in the directed acyclic graph, calculating the corresponding weighted out-degree through the successive node with the dependency relationship to obtain each weighted out-degree; sequencing the weighted out-degrees, and determining the priority sequence of each task based on the sequencing result; and according to the priority sequence, respectively selecting a target processor for each task in each processor to complete the scheduling request. In the scheduling method, in the process of determining the priority of each task, only the subsequent nodes with the dependency relationship in the directed acyclic graph need to be calculated, and all the nodes in the directed acyclic graph do not need to be traversed for calculation, so that the calculation amount is reduced.

In the embodiment of the invention, a scheduling algorithm DONF (degree of node first) based on weighted out-degree of task nodes derives two variant strategies (2-order and full-order DONF) on the basis, and further considers more local and global information in an abstract program model. The DONF algorithm fully considers the characteristics of a data flow program execution model and a heterogeneous system, on one hand, the data flow program execution model has small task granularity and more complex dependency relationship among tasks, the DONF scheduling algorithm simplifies task selection logic, selects scheduling tasks with lower cost and avoids traversing of program directed acyclic graphs, so that the DONF algorithm can process more complex conditions, such as dynamic graph scheduling; on the other hand, different hardware in the heterogeneous system has large difference, the role played by communication in task scheduling is more important, and the DONF algorithm considers the condition of communication link conflict in the processor selection stage and constructs a novel communication model for task scheduling.

In the embodiment of the invention, in the task scheduling problem to be processed, an application program is represented by a directed acyclic graph: g = (V, E), where V is a set of nodes and E is a set of edges. Nodes represent specific computational tasks, and edges represent data and control dependencies between different tasks. In the abstract machine model, a plurality of heterogeneous processors form computer nodes through board-level interconnection, and the computer nodes are connected into a computing cluster through a network. The scheduling of the parallel program on the heterogeneous fusion system comprises two stages: task selection (or task priority calculation) and processor selection. The task selection calculates the priority of the task according to the task attribute and selects a task to be scheduled from all candidate tasks; processor selection selects the best processor for the scheduled task

In the embodiment of the present invention, an example is given based on the foregoing scheduling method, where the processor and the cluster configuration are shown in table 1, the detailed information of the cluster configuration is shown in table 1, and there are 3 types of processors: the small processor is used for calculating the speed of 10GFlos, 1GBRAM,1085MB/s memory bandwidth and 1562.5MB/s network I/O port; the intermediate processor has the calculation speeds of 100GFlops,1GBRAM,1310MB/s of memory bandwidth and 3125MB/s of network ports; large processors with computing speeds of 1TFlops,2GB RAM,1310MB/s memory bandwidth and 3125MB/s network port.

The overall execution flow of task execution is shown in fig. 3, and is as follows: the Global clock Timer is used for recording the time sequence information of program execution in the simulation process. The runtime maintains 3 important data structures based on system configuration: a waiting list (PendingList), a ready queue (ReadyQueue), and an execution queue (ExecutionQueue). The number of unsatisfied dependencies for all tasks is stored in the waiting list. Once the number of unsatisfied dependencies of a task has decreased to 0, it will be inserted into the ready queue, and the state will also transition to ready. The ready queue contains all ready tasks during program execution. The execution queue stores all nodes in execution and their completion times, wherein the task nodes are all in "execution" state. The entire pipeline of the simulation can be described in detail by the following steps:

initialization: the originating node is added to the ready queue.

S1: selecting a task from a ready queue according to a preset principle, wherein the preset principle is related to a scheduling strategy;

s2: selecting a processor to execute the selected task according to a method defined by a scheduling strategy;

s3: starting execution, adding the selected task into an execution queue, calculating and recording the completion time, and updating the states of a processor and a network link;

s4: calculating the next decision time point, updating the global timer, and correspondingly skipping S1 or S5;

s5: the task execution is completed, the corresponding processor and the network link state are updated, all subsequent tasks are reduced to satisfy the dependency number, if the unsatisfied dependency number of some tasks is reduced to 0, the tasks are added into the ready queue, and then the unsatisfied dependency number of the tasks is reset;

s6: and if the two queues are empty and the consistency of the needed iteration times is finished, ending the simulation and outputting a simulation report.

Each time the task with the largest WOD value is selected from the ready queue, the EFT of that task on all processors is then computed _conflict Value and assign task with minimal EFT _conflict A processor of values. If there are multiple tasks with the same maximum WOD value, the task that entered the ready queue earliest is scheduled first to ensure fairness of scheduling, avoiding starvation of the task. If there are multiple processors with the same minimum EFT _conflict Value, the algorithm will choose one at random, but the idle processor (or least loaded processor) will be preferentially chosen in the process.

Based on the foregoing task scheduling method for the heterogeneous convergence system, in an embodiment of the present invention, a task scheduling apparatus for the heterogeneous convergence system is further provided, and a structural block diagram of the scheduling apparatus is shown in fig. 4, and includes:

an acquisition module 201, a calculation module 202, a priority determination module 203 and a selection and calculation module 204.

Wherein,

the obtaining module 201 is configured to obtain, when a scheduling request is received, a directed acyclic graph corresponding to each task in the scheduling request, where each node in the directed acyclic graph corresponds to each task;

the calculating module 202 is configured to calculate, for each node in the directed acyclic graph, a corresponding weighted out-degree according to a successor node having a dependency relationship with the node;

the priority determining module 203 is configured to determine a priority order of each task according to each weighted out-degree;

the selecting and calculating module 204 is configured to select, according to the priority order, a target processor with the shortest completion time for each task in each processor to complete calculation.

The invention discloses a task scheduling device facing a heterogeneous fusion system, which comprises: when a scheduling request is received, acquiring a directed acyclic graph corresponding to each task in the scheduling request, wherein each node in the directed acyclic graph corresponds to each task; aiming at each node in the directed acyclic graph, calculating the corresponding weighted out-degree through the successive node with the dependency relationship to obtain each weighted out-degree; sequencing the weighted out-degrees, and determining the priority order of each task based on the sequencing result; and according to the priority sequence, respectively selecting a target processor for each task in each processor to complete the scheduling request. In the scheduling device, in the process of determining the priority of each task, only the subsequent nodes which have the dependency relationship with the task in the directed acyclic graph need to be calculated, and all the nodes in the directed acyclic graph do not need to be traversed for calculation, so that the calculation amount is reduced.

In this embodiment of the present invention, the calculating module 202 includes:

a node in-degree determination unit 205 and an acquisition and calculation unit 206.

Wherein,

the node-degree-of-entry determining unit 205 is configured to determine the node degree of each node according to the directed acyclic graph;

the obtaining and calculating unit 206 is configured to obtain, for each node, a node degree of entry of a subsequent node having a dependency relationship with the node, according to

Or

Or

The weighted out-degree is calculated and calculated,

wherein, ID (v) _j ) Is node v _j The node-in-degree of (c) is,

In this embodiment of the present invention, the selecting and calculating module 204 includes:

a first judgment unit 207, a second judgment unit 208 and a first selection unit 209.

Wherein,

the first determining unit 207 is configured to determine whether an idle processor exists in the processors;

the second determining unit 208 is configured to determine whether the number of idle processors can complete the calculation of each task if the number of idle processors exists;

the first selecting unit 209 is configured to select, if applicable, a target processor with the shortest completion time for each task in each idle processor according to the priority order to complete the calculation.

In this embodiment of the present invention, the selecting and calculating module 204 further includes: a calculation and distribution unit 210.

Wherein,

the calculating and allocating unit 210 is configured to, if not, allocate the corresponding task to the idle processor for calculation according to the priority order, and when the idle processor is allocated completely, allocate the remaining unallocated tasks to the remaining processors for calculation according to the remaining unallocated tasks.

In this embodiment of the present invention, the selecting and calculating module 204 further includes: a determination unit 211 and a second selection unit 212.

Wherein,

the determining unit 211 is configured to base the tasks on the priority order if there is no idle processor

And

EFT _conflict (v _i ，p _j )＝EST _conflict (v _i ，p _j )+w _i，j determining an earliest completion time for each processor, wherein T _ava (p _j ) Is a processor p _j The available time of (AFT) refers to the actual start time of the task, v _m And v _i Is a node, c _m，i Finger task v _m And task v _i Time of communication between w _i，j Is task v _i At processor p _j Upper calculation cost EST _conflict (v _i ，p _j ) Is task v _i At processor p _j At the earliest start time, EFT _conflict Is task v _i At processor p _j The earliest of (2)A completion time;

the second selecting unit 212 is configured to, for each task, select a target processor with the shortest completion time from the corresponding completion times to complete the calculation.

The embodiments are mainly described with different differences from the other embodiments, and the same and similar parts among the embodiments can be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Finally, it should also be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.

For convenience of description, the above devices are described as being divided into various units by function, respectively. Of course, the functions of the units may be implemented in the same software and/or hardware or in a plurality of software and/or hardware when implementing the invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The task scheduling method and device for the heterogeneous convergence system provided by the invention are described in detail above, and a specific example is applied in the description to explain the principle and the implementation of the invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A task scheduling method for a heterogeneous convergence system is characterized by comprising the following steps:

aiming at each node in the directed acyclic graph, calculating the corresponding weighted out-degree through the successor node with the dependency relationship to obtain each weighted out-degree;

sequencing the weighted out-degrees, and determining the priority sequence of each task based on the sequencing result;

2. The method according to claim 1, wherein for each node in the directed acyclic graph, calculating a weighted out-degree corresponding to each node according to the successor nodes having a dependency relationship with the node, comprises:

aiming at each node, acquiring the node in-degree of a successor node with a dependency relationship with the node, and calculating the corresponding weighted out-degree of the successor node with the dependency relationship with the node according to a target weighted out-degree calculation formula;

the target weighted out-degree formula is

Or

Or

In the above-mentioned manner, the first and second,

wherein, ID (v) _j ) Is node v _j The node-in-degree of (c) is,

3. The method of claim 1, wherein selecting a target processor for each task in each processor to fulfill the scheduling request according to the priority order comprises:

judging whether an idle processor exists in each processor;

4. The method of claim 3, further comprising:

if not, scheduling the corresponding tasks to the idle processor for calculation according to the priority sequence, and when the idle processor is scheduled completely, allocating the rest unallocated tasks to the rest processors for calculation according to the result of allocating the rest unallocated tasks to the rest processors.

5. The method of claim 3, further comprising:

the completion time formula is

And

EFT _conflict (v _i ，p _j )＝EST _conflict (v _i ，p _j )+w _i，j ，

wherein, T _ava (p _j ) Is a processor p _j The available time of (AFT) refers to the actual start time of the task, v _m And v _i Is a node, c _m，i Finger task v _m And task v _i Time of communication between w _i，j Is task v _i At processor p _j Overhead of calculation, EST _conflict (v _i ，p _j ) Is task v _i At processor p _j Earliest start time, EFT _conflict Is task v _i At processor p _j The earliest completion time of (a);

6. A task scheduling device for a heterogeneous convergence system is characterized by comprising:

the priority determining module is used for sequencing the weighted excesses and determining the priority sequence of each task based on the sequencing result;

7. The apparatus of claim 6, wherein the computing module comprises:

the target weighted out-degree formula is

Or

Or

In the above-mentioned (b) is,

wherein, ID (v) _j ) Is node v _j The node-in-degree of (c) is,

8. The apparatus of claim 6, wherein the selecting and calculating module comprises:

a second judging unit, configured to judge whether the number of idle processors can complete the calculation for each task if the idle processors exist;

9. The apparatus of claim 8, further comprising:

and the calculating and distributing unit is used for scheduling the corresponding tasks to the idle processor for calculation according to the priority order if the tasks are not available, and distributing the rest unallocated tasks to the rest processors for calculation according to the scheduling of the idle processor.

10. The apparatus of claim 8, further comprising:

the completion time is formulated as

And

EFT _conflict (v _i ，p _j )＝EST _conflict (v _i ，p _j )+w _i，j ，

wherein, T _ava (p _j ) Is a processor p _j The available time of (AFT) refers to the actual start time of the task, v _m And v _i Is a node, c _m，i Finger task v _m And task v _i Time of communication between w _i，j Is task v _i At processor p _j Upper calculation cost, EST _conflict (v _i ，p _j ) Is task v _i At processor p _j Earliest start time, EFT _conflict Is task v _i At processor p _j The earliest completion time of (c);