CN112328380A - Task scheduling method and device based on heterogeneous computing - Google Patents

Task scheduling method and device based on heterogeneous computing Download PDF

Info

Publication number
CN112328380A
CN112328380A CN202011245253.7A CN202011245253A CN112328380A CN 112328380 A CN112328380 A CN 112328380A CN 202011245253 A CN202011245253 A CN 202011245253A CN 112328380 A CN112328380 A CN 112328380A
Authority
CN
China
Prior art keywords
processor
task
node
population
task scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011245253.7A
Other languages
Chinese (zh)
Inventor
邹承明
史梦园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202011245253.7A priority Critical patent/CN112328380A/en
Publication of CN112328380A publication Critical patent/CN112328380A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Genetics & Genomics (AREA)
  • Physiology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Multi Processors (AREA)

Abstract

The invention relates to a task scheduling method based on heterogeneous computing, which comprises the following steps: establishing a DAG model of a task to be scheduled, and establishing an operation queue according to the DAG model; establishing a network topological graph among heterogeneous processors, and distributing initial task amount for each processor according to the calculation speed of each processor; randomly distributing tasks in the job queue to each processor according to the initial task amount, randomly distributing initial voltage to each processor, and constructing an initial scheduling list; generating an initial population according to the initial scheduling list, initializing parameters of a genetic algorithm, and executing the genetic algorithm in parallel by each processor to perform population iteration updating to obtain an optimal population; and acquiring a task scheduling list corresponding to the optimal population as an optimal task scheduling list. The method can obtain the global optimal solution of task scheduling and fully exert the advantages of heterogeneous computation.

Description

Task scheduling method and device based on heterogeneous computing
Technical Field
The present invention relates to the field of computing task scheduling technologies, and in particular, to a task scheduling method and apparatus based on heterogeneous computing, and a computer storage medium.
Background
With the rapid development of computer technology, the development of chips is subject to performance improvement one time. However, with the explosive development and the popularization of information of the internet and the rise of the fields with high demand on computing performance, such as machine learning, deep learning, artificial intelligence, industrial simulation and the like, in recent years, computing performance bottlenecks, such as low parallelism, insufficient bandwidth, high time delay and the like, appear. The computational performance and characteristics of different types of processors vary, for example: the CPU is composed of an internal arithmetic logic unit, a register unit and a control unit, but 70% of transistors are used for constructing a cache and a part of control units, the part responsible for logic arithmetic is not much, and more of the transistors are focused on control; the GPU is designed as a coprocessor, is suitable for being used for a large number of intensive computing types and is suitable for executing highly threaded parallel processing tasks; the FPGA has the characteristics of repeated programming and low power consumption, has larger parallelism and is mainly realized by two technologies of concurrency and pipelining. In order to improve the computing performance, different types of processors can be adopted to form a heterogeneous computing system, so that the advantages and the disadvantages among various processors are taken out. At present, heterogeneous computing is usually performed by adopting a CPU + GPU and a CPU + FPGA heterogeneous computing mode. The heterogeneous computing mode of the CPU and the GPU cannot realize flexible programming, and the energy consumption cost is increased. Compared with the FPGA, the GPU has the advantages that the bandwidth of a memory interface of the GPU is far better than that of the FPGA, and in addition, the computing capacity of an FPGA basic unit is limited. In order to realize the reconfigurable characteristic, a large number of basic units with extremely fine granularity exist in the FPGA, but the computing power of each unit (mainly relying on LUT lookup tables) is far lower than that of ALU modules in a CPU and a GPU.
When a computing task is executed between heterogeneous processors, the task needs to be reasonably scheduled, so that the task is guaranteed to be completed smoothly, and the scheduling length of task scheduling is minimized. The task scheduling method adopted at present has the following problems:
1. the technical method of table scheduling is adopted, the priority of the tasks is firstly calculated, then the tasks are sequenced according to the priority, and finally the tasks are scheduled to a proper processor for processing according to the priority sequence. This scheduling approach has a significant disadvantage and does not take into account energy consumption.
2. By adopting heuristic scheduling algorithms, such as task scheduling based on a genetic algorithm, task scheduling based on simulated annealing and task scheduling based on an ant colony algorithm, the scheduling algorithms can easily lead the result to enter a local optimal solution, so that a proper solution needs to be found for improvement. At present, although some researchers research parallel genetic algorithms, the researched algorithms have a fatal defect in the parallel process, namely, sub-populations are uniformly distributed to each processor, but the computing characteristics, the computing speed, the time consumed by transmission and the power consumption of the algorithms are different on different processors, and obviously, the uniformly distributed rules are not suitable for heterogeneous systems and cannot fully exert the advantages of heterogeneous computing.
3. At present, technologies adopted for reducing power consumption during task scheduling are mainly DPM and DVFS, wherein the working principle of the DPM is to switch an idle component to a low power consumption mode or to close the idle component so as to achieve the purpose of reducing power consumption. However, DPM reduces the processing speed of the CPU, so it generally has this constraint: the energy consumption is reduced while the QoS of the system is ensured. DVFS is dynamic voltage frequency adjustment, and the dynamic technique is to dynamically adjust the operating frequency and voltage of a chip according to different requirements of an application program operated by the chip on computing capability, thereby achieving the purpose of energy saving. However, the power consumption in the context of dynamic voltage frequency adjustment is derived from two aspects, namely the dynamic power consumption when the CMOS circuit is switched on and off and the static power consumption when the CMOS circuit leaks. However, in a practical application scenario, a part of the energy consumption should also include energy consumption loss during transmission, and in a sleep state, although the energy consumption loss is slower than the energy consumption in an active state, in a small task, often a task in a sleep state, reducing sleep energy consumption is also an effective way to facilitate achieving a low energy consumption goal.
In summary, the existing task scheduling method has defects, and a new heterogeneous computing task scheduling method is urgently needed.
Disclosure of Invention
In view of the above, it is necessary to provide a task scheduling method and device based on heterogeneous computing, so as to solve the problem that task scheduling optimization of heterogeneous computing is easy to fall into a local optimal solution and cannot exert the advantages of heterogeneous computing to the greatest extent.
The invention provides a task scheduling method based on heterogeneous computing, which comprises the following steps:
establishing a DAG model of a task to be scheduled, and establishing an operation queue according to the DAG model;
establishing a network topological graph among heterogeneous processors, and distributing initial task amount for each processor according to the calculation speed of each processor;
randomly distributing tasks in the job queue to each processor according to the initial task amount, randomly distributing initial voltage to each processor, and constructing an initial scheduling list;
generating an initial population according to the initial scheduling list, initializing parameters of a genetic algorithm, and executing the genetic algorithm in parallel by each processor to perform population iteration updating to obtain an optimal population;
and acquiring a task scheduling list corresponding to the optimal population as an optimal task scheduling list.
Further, establishing a DAG model of the task to be scheduled, specifically:
and establishing the DAG model by taking the operation contained in the task to be scheduled as a node of the DAG model, taking the operation execution time as a node attribute, establishing directed edges between the nodes according to the execution sequence dependency relationship between the operations, and taking the communication traffic between the operations as the directed edge attribute.
Further, establishing a job queue according to the DAG model specifically comprises:
copying all nodes in the DAG model to obtain a node set;
screening out nodes with the access degree of 0, if only one node with the access degree of 0 exists, directly putting the node into an operation queue, if a plurality of nodes with the access degree of 0 exist, further screening out the node with the minimum operation size, if only one node with the minimum operation size exists, directly putting the node into the operation queue, and if a plurality of nodes with the minimum operation size exist, randomly selecting one node from the nodes to put into the operation queue;
deleting the enqueued nodes from the node set, and deleting the dependency relationship between subsequent nodes connected with the enqueued nodes; and judging whether the current node set is empty, if so, outputting the operation queue, and if not, turning to the previous step for next node enqueue.
Further, establishing a network topology graph among the heterogeneous processors specifically includes:
the method comprises the steps of taking each processor as a node of a network topological graph, taking the execution speed of the processor as a node attribute, establishing a non-directional edge between the nodes according to whether communication between the processors is available, and taking the communication speed between the processors as a non-directional edge attribute to establish the network topological graph.
Further, allocating an initial task amount to each processor according to the calculation speed of each processor specifically includes:
according to the calculation speed of each processor, calculating the hyper-parameter of each processor:
Figure BDA0002769788290000041
wherein, thetaaIs a hyperparameter of the a-th processor, W (p)a) For the computation rate of the a-th processor, W (p)b) D is the calculation rate of the b-th processor, a is more than or equal to 1 and less than or equal to d, b is 1, …, d and d is the number of processors;
allocating initial task amount to each processor according to the hyper-parameters:
Num(pa)=θa*M;
wherein, Num (p)a) The initial amount of tasks allocated to the a-th processor, and M is the total number of tasks.
Further, each processor executes a genetic algorithm in parallel to perform population iteration updating to obtain an optimal population, specifically:
establishing an optimization model by taking the minimum energy consumption as an objective function and taking the task execution time as a constraint condition;
evaluating the fitness of the current population according to the objective function, and carrying out genetic operations of a selection operator, a crossover operator and a mutation operator on the current population in parallel by each processor; each processor exchanges information to realize population updating;
and judging whether an iteration termination condition is reached, if so, outputting the current population as an optimal population, and otherwise, turning to the previous step for next iteration.
Further, an optimization model is established by taking the minimum energy consumption as an objective function and taking the task execution time as a constraint condition, and specifically comprises the following steps:
and (3) obtaining total energy consumption:
ECtotal=ECdynamic+ECstatic+ECtrans+ECsleep
wherein, ECtotalFor total energy consumption, ECdynamicFor dynamic energy consumption, ECstaticFor static energy consumption, ECtransFor transmission of energy consumption, ECsleepEnergy consumption for sleep;
establishing an objective function by taking the minimum total energy consumption as a target:
Emin=min(ECtotal);
wherein E isminRepresents the objective function, min () represents the minimum value;
sequentially solving the earliest execution time and the earliest ending time of each operation node:
Figure BDA0002769788290000051
End(ti,pa)=Start(ti,pa)+W(ti)/W(pa);
wherein, Start (t)i,pa) Representing a working node tiAt processor paThe earliest start time of (d), (t)j) Is tiDirect predecessor node of, End (t)j,pb) Representing a working node tiAt processor pbThe earliest end time of (C)abRepresenting the communication time between the a-th processor and the b-th processor; end (t)i,pa) Representing a working node tiAt processor paThe earliest end time of the above (c),W(ti) Representing a working node tiJob size of (1), W (p)a) Representing the execution rate of the processor;
Figure BDA0002769788290000052
wherein, CabRepresenting the communication time, PL, between the a-th and b-th processorsabRepresenting the communication rate between the a-th processor and the B-th processor, BabRepresenting a communication bandwidth between the a-th processor and the b-th processor;
and (3) taking the last executed operation node as an exit node, constraining the earliest ending time of the exit node, and establishing a constraint condition:
End(texit)<deadline;
wherein, texitRepresents an egress job node, End (t)exit) Indicating an egress job node texitDeadline represents the latest deadline.
Further, the genetic operation of a mutation operator is performed on the current population, specifically:
respectively calculating the ratio of the fitness of each individual of the current population to the average fitness of the population;
judging whether the ratio is greater than 1, if so, reducing the variation rate of the corresponding individual, otherwise, increasing the variation rate of the corresponding individual;
and performing genetic operation of a mutation operator on the current population according to the regulated mutation rate.
The invention also provides a task scheduling device based on heterogeneous computing, which comprises a processor and a memory, wherein the memory is stored with a computer program, and the computer program is executed by the processor to realize the task scheduling method based on heterogeneous computing.
The invention also provides a computer storage medium, on which a computer program is stored, which, when executed by a processor, implements the heterogeneous computing-based task scheduling method.
Has the advantages that: before task scheduling, firstly, a DAG model is established so as to split tasks and establish a job queue, and secondly, a processor network topological graph is established. In order to avoid uniform task distribution when the processors execute the genetic algorithm in parallel, the task quantity is distributed to the processors according to the speed of the processors, and the advantages of different processors of the heterogeneous system are exerted to a greater extent. And then, an overall scheduling mode is adopted, namely, the voltage is provided while the operation nodes are allocated to the processors, and a genetic algorithm is adopted for task allocation to form an optimal task scheduling list. After the optimal task scheduling list is determined, the scheduling length and the energy consumption are also determined. The invention provides a real-time task scheduling method on a heterogeneous computing platform, which is different from the existing common heterogeneous computing platform, fully utilizes the existing computing resources and improves the task scheduling performance by task allocation based on the computing rate.
Drawings
FIG. 1 is a flowchart of a task scheduling method based on heterogeneous computing according to a first embodiment of the present invention;
FIG. 2 is a DAG model diagram of a task scheduling method based on heterogeneous computing according to a first embodiment of the present invention;
FIG. 3 is a network topology diagram of a task scheduling method based on heterogeneous computing according to a first embodiment of the present invention;
FIG. 4 is a comparison diagram of the runtime of the first embodiment of the task scheduling method based on heterogeneous computing according to the present invention on different computing platforms;
FIG. 5 is a comparison graph of the running times of a genetic algorithm NPGA and an existing parallel genetic algorithm NGA in the first embodiment of the task scheduling method based on heterogeneous computing according to the present invention;
fig. 6 is a diagram illustrating energy consumption comparison between a genetic algorithm NPGA and an existing genetic algorithm GA in a first embodiment of a task scheduling method based on heterogeneous computing according to the present invention.
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.
Example 1
As shown in fig. 1, embodiment 1 of the present invention provides a task scheduling method based on heterogeneous computing, including the following steps:
s1, establishing a DAG model of the task to be scheduled, and establishing a job queue according to the DAG model;
s2, establishing a network topological graph among the heterogeneous processors, and distributing initial task amount for each processor according to the calculation speed of each processor;
s3, randomly distributing the tasks in the job queue to each processor according to the initial task amount, randomly distributing initial voltage to each processor, and constructing an initial scheduling list;
s4, generating an initial population according to the initial scheduling list, initializing genetic algorithm parameters, and executing a genetic algorithm in parallel by each processor to perform population iteration updating to obtain an optimal population;
and S5, acquiring a task scheduling list corresponding to the optimal population as an optimal task scheduling list.
Before the scheduling method provided by the embodiment is operated, a heterogeneous computing system needs to be built to realize effective communication among different processors, and the method for building the heterogeneous system is as follows:
A1. in this embodiment, the heterogeneous computing platform is a heterogeneous system which is constructed by a CPU + GPU + FPGA based on an OpenCL framework, the CPU is used as a host of the heterogeneous system, and the GPU + FPGA is used as a computing device of the heterogeneous system;
A2. developing host machine codes and equipment codes according to an OpenCL framework;
and A3, the CPU host machine executes the host machine code developed based on the OpenCL framework, and information communication between the CPU host machine and the GPU + FPGA computing equipment is promoted. In addition, the CPU host machine is also responsible for scheduling the calculation tasks;
A4. the device code is placed on a computing device for execution in accordance with the device code written in step a2.
A5. After the CPU + GPU + FPGA heterogeneous system detects a computing task, an operation queue is generated, then operation is distributed to each computing node, and a plurality of operations form a working group.
A6. In the process of executing a computing task, if many data with low coupling degree need to execute the same operation, the OpenCL framework splits the data and respectively sends the data to a plurality of working examples to execute the same command, so that data parallel is realized.
A7. Each operation example runs independently, all the operation examples in the same working group share data in the local memory together, but the operation among the operation examples is independent, and the operation examples are not influenced, so that the task parallelism is realized.
A8. And converting the input into a calculation output result according to a defined function after the data parallel and the task parallel, and writing the calculation result into a local memory of the equipment.
A9. And returning the calculation result to the host machine, and reading the output cache by the host machine to obtain the calculation result on the GPU + FPGA calculation equipment.
A10. When the calculation is completed, various resources are released to wait for the next task.
After the heterogeneous system is built, task scheduling can be carried out, one task is formed by combining a plurality of jobs aiming at the scheduled tasks, and the execution sequence dependency relationship exists among the jobs, so that a DAG model is built so as to build a job queue. Second, a processor network topology is established. In order to avoid uniform task allocation when the processors execute the genetic algorithm in parallel, the embodiment allocates the task amount to each processor according to the speed of the processor, and exerts the advantages of different processors of the heterogeneous system to a greater extent. The overall scheduling is then used, i.e. the assignment of the job node to the processor is done while the voltage is supplied. And then, distributing tasks by adopting a genetic algorithm to form an optimal task scheduling list. After the optimal task scheduling list is determined, the scheduling length and the energy consumption are also determined.
The invention provides a real-time task scheduling method on a heterogeneous computing platform, which is different from the existing common heterogeneous computing platform, fully utilizes the existing computing resources and improves the task scheduling performance by task allocation based on the computing rate.
Preferably, the establishing of the DAG model of the task to be scheduled specifically includes:
and establishing the DAG model by taking the operation contained in the task to be scheduled as a node of the DAG model, taking the operation execution time as a node attribute, establishing directed edges between the nodes according to the execution sequence dependency relationship between the operations, and taking the communication traffic between the operations as the directed edge attribute.
A task is formed by combining a plurality of jobs, and the jobs have a dependency relationship of execution sequence. This relationship is represented by a DAG model, i.e., a directed acyclic graph.
The DAG model specifically comprises:
Task=(T,W,E);
wherein, Task is DAG model; t is a node set, T ═ T0,t1,...,tn-1},tiThe ith job of the task to be scheduled is represented, i is 0,1, …, n-1, and n is the number of jobs; w is the node execution time set, W ═ W0,w1,…,wn-1},wiRepresenting the worst expected execution time of the ith job, E is a node execution time set, and E { [ E ]00,e01,…,e0n-1],[e10,e11,…,e1n-1],[en-10en-11,…,en-1n-1]},eijRepresents the dependency relationship and traffic between the ith job and the jth job, j is 0,1, …, n-1, eijOperation t is represented by not less than 0jDependent on the job tiI.e. operation tjRequiring an operation tiAfter execution, the operation is executed, and the operation tiTo operation tjTraffic size of eijAt e ij1 denotes operation tjIndependent of job tiAnd no traffic.
Specifically, the DAG model graph created in this embodiment is shown in fig. 2, where the DAG model in fig. 2 is composed of 6 nodes, the nodes are represented by circle graphs, and each node represents one job. The value in a node is composed of two parts, the upper halfThe numerical values represent job numbers, i.e., i, j, and are determined by human beings for the purpose of distinguishing the respective jobs. The lower half of the numerical value represents the size w of each jobiTwo nodes connected by a directed edge represent that a dependency exists between jobs, and an arrow represents the direction of the dependency, that is, the execution of the jobs is sequential, such as: the earliest time job 1 starts must be after job 0 is completed before job 5 starts, and both job 3 and job 4 must be completed before job 5 starts. The values on the directed edges represent traffic.
Preferably, the job queue is established according to the DAG model, specifically:
copying all nodes in the DAG model to obtain a node set;
screening out nodes with the access degree of 0, if only one node with the access degree of 0 exists, directly putting the node into an operation queue, if a plurality of nodes with the access degree of 0 exist, further screening out the node with the minimum operation size, if only one node with the minimum operation size exists, directly putting the node into the operation queue, and if a plurality of nodes with the minimum operation size exist, randomly selecting one node from the nodes to put into the operation queue;
deleting the enqueued nodes from the node set, and deleting the dependency relationship between subsequent nodes connected with the enqueued nodes; and judging whether the current node set is empty, if so, outputting the operation queue, and if not, turning to the previous step for next node enqueue.
Because the DAG directed acyclic graph has a sequential dependency order, each job node has a sequential execution order, when task scheduling is executed, firstly, the job node T is copied and named as T _ copy, a node with the degree of entry of 0 is found out and is put into a job queue TQ, and the queued job node is deleted from the job node T _ copy set. Secondly, the dependency relationship between the node and the connected successor node is removed, namely the corresponding e is removedijIs set to-1. If there are multiple nodes with the degree of income of 0, one job with the smallest job size is selected, and if the jobs with the same size are selected randomly.And finally, repeating the steps to form a job queue, then randomly distributing processors and voltage for each task, and then optimizing the result by using a genetic algorithm.
Taking fig. 2 as an example, the execution steps are:
finding a node with the degree of income of 0 in the T _ copy operation set to obtain T0,t0In the entry queue TQ, TQ is { t }0}. Deleting T in T _ copy0Node, T _ copy ═ T1,t2,t3,t4,t5}. Will be compared with t0Is directly succeeding node t1、t2、t3Communication amount e of01、e02、e03Is set to-1.
Finding out node T with 0 degree of income from nodes of T _ copy1、t2t 33, according to the principle of selecting the task with the minimum size and randomly selecting if the task with the same size, t2Performing enqueue operation, wherein TQ is equal to { t0,t2},T_copy={t1,t3,t4,t5Will be with t2Is directly succeeding node t4Communication amount e of24Is set to-1.
Finding out the point with the degree of income of 0 from the nodes of T _ copy as T1、t3According to the principle of selecting the minimum task size, t3Performing enqueue operation, wherein TQ is equal to { t0,t2,t3},T_copy={t1,t4,t5Will be with t3Is directly succeeding node t5Communication amount e of35Is set to-1.
Finding out the point with the degree of income of 0 from the nodes of T _ copy as T1,t1Performing enqueue operation, wherein TQ is equal to { t0,t2,t3,t1},T_copy={t4,t5Will be with t1Is directly succeeding node t4Communication amount e of14Is set to-1.
Finding out the point with the degree of income of 0 from the nodes of T _ copy as T4,t4Performing enqueue operation, wherein TQ is equal to { t0,t2,t3,t1,t4},T_copy={t5Will be with t4Is directly succeeding node t5Communication amount e of45Is set to-1.
Finding out the point with the degree of income of 0 from the nodes of T _ copy as T5,t5Performing enqueue operation, wherein TQ is equal to { t0,t2,t3,t1,t4,t5And f, ending the operation.
Preferably, the establishing of the network topology graph among the heterogeneous processors specifically includes:
the method comprises the steps of taking each processor as a node of a network topological graph, taking the execution speed of the processor as a node attribute, establishing a non-directional edge between the nodes according to whether communication between the processors is available, and taking the communication speed between the processors as a non-directional edge attribute to establish the network topological graph.
The network topology specifically comprises:
Net=(P,V,B);
where Net is a network topology, P is a processor set, and P is { P ═ P0,p1,…,pd-1},paRepresenting the a-th processor, wherein a is 0,1, …, d-1, and d is the number of processors; v is the processor execution speed set, V ═ V0,v1,…,vd-1},vaRepresenting the execution speed of the a-th processor; b { [ B { ]00,b01,…,b0d-1],[b10,b11,…,b1d-1],[bd-10bd-11,…,bd-1d-1]},babRepresenting the communication relationship and the communication bandwidth between the a-th processor and the b-th processor, babThe value of more than or equal to 0 indicates that the a-th processor and the b-th processor can communicate with each other, and the communication bandwidth size is eijIn babNo communication can be made between-1 a-th and b-th processors.
The network topology of the processor in this embodiment is shown in fig. 3, where fig. 3 has 4 processor nodes, which are indicated by circles, and each node represents a processor. The values in the nodes are composed of two parts, the values in the upper half represent the processor numbers, namely a and b, and are artificialThe determination is made in order to distinguish between the processors. The lower half of the values represent the execution speed of each processor, with or without p connected to the edgeaAnd pbRepresenting the processor paAnd processor pbTwo-way communication is possible between the heterogeneous processors, and the values on the edges represent the communication speed, i.e., the communication bandwidth, between the heterogeneous processors. P in FIG. 30Can communicate with any processor, in this case p0May act as a host. p is a radical of1And p2There is no connection without a direct edge between them, which means that they cannot communicate. p is a radical of0And p3The communication speed of (2). At the same time, the execution speed is different between different processors, e.g. p0Is 2, and p3Is 5, which indicates that in the same time, p3The amount of computation that can be performed is better than p0
Preferably, the allocating the initial task amount to each processor according to the calculation speed of each processor specifically includes:
according to the calculation speed of each processor, calculating the hyper-parameter of each processor:
Figure BDA0002769788290000121
wherein, thetaaIs a hyperparameter of the a-th processor, W (p)a) For the computation rate of the a-th processor, W (p)b) D is the calculation rate of the b-th processor, a is more than or equal to 1 and less than or equal to d, b is 1, …, d and d is the number of processors;
allocating initial task amount to each processor according to the hyper-parameters:
Num(pa)=θa*M;
wherein, Num (p)a) The initial amount of tasks allocated to the a-th processor, and M is the total number of tasks.
The improvement of the invention lies in that when the genetic evolution operation is carried out, the task is not evenly distributed to each processor, and a hyper-parameter theta is introducedaFor controllingAnd distributing the task amount.
In particular, according to θaCalculating Num (p)a),Num(pa) Rounding off, and then according to thetaaAnd performing distribution tasks in descending order. If it is not
Figure BDA0002769788290000122
The last processor assigned task is
Figure BDA0002769788290000123
And (4) each task.
And after the task allocation amount is determined, randomly allocating processors and voltages for the job. In summary, the last formed initial scheduling list S { (p)1,v1),(p1,v2),(p2,v2),(p3,v0),(p3,v1),(p2,v3)}. The specific scheduling process is shown in table 1:
TABLE 1 task scheduling flow sheet
Figure BDA0002769788290000124
Figure BDA0002769788290000131
Preferably, each processor executes the genetic algorithm in parallel to perform population iteration updating to obtain an optimal population, specifically:
establishing an optimization model by taking the minimum energy consumption as an objective function and taking the task execution time as a constraint condition;
evaluating the fitness of the current population according to the objective function, and carrying out genetic operations of a selection operator, a crossover operator and a mutation operator on the current population in parallel by each processor; each processor exchanges information to realize population updating;
and judging whether an iteration termination condition is reached, if so, outputting the current population as an optimal population, and otherwise, turning to the previous step for next iteration.
Firstly, an optimization model is established, the task scheduling energy consumption is the lowest, a constraint condition is established, the task scheduling adopts an integral scheduling mode, and the voltage is provided while the task nodes are allocated to the processor.
The novel parallel genetic algorithm model provided by the embodiment is named as NPGA, each parameter of the novel parallel genetic algorithm NPGA is initialized, the optimal individual is used as an exchange object, and information exchange is carried out on each evolution generation. In order to facilitate information exchange of each generation of population, firstly, a novel parallel genetic algorithm establishing model NPGA ═ P, C, F, N, NGA is established, wherein P represents each processor set, C represents the content obtained by information exchange among the processors, namely, the individual exchange in the subgroup is carried out according to what rule, in other words, the best individual in the exchange subgroup is or the individual is randomly selected for exchange, F represents the frequency of information exchange, N represents the quantity of information exchanged each time, namely the quantity of exchanged individuals, and NGA represents the genetic algorithm operated on each processor.
The information exchange specifically comprises the following steps: when a pre-specified information exchange time, i.e. exchange frequency F, is reached, each processor sends information C to be exchanged to the other processors, while also receiving exchanged information from the other processors. The processor replaces the exchange information obtained by exchange by one or more individuals in the processor according to a certain rule, and the exchange quantity depends on the quantity N. And continuously performing population iteration according to the steps until an iteration termination condition is met.
The novel genetic algorithm NGA is specifically as follows: defining a genetic algebra t, and initializing a parameter t. Defining a chromosome, which should satisfy uniqueness, and selecting a schedule S to be made to the chromosome. Initial population number Pl(t), initializing t to 0. Defining maximum evolutionary algebra T, population size M and initialized cross rate PcThe rate of variation Pm
When T < T, define parameter l, initialize to l ═ 1, calculate Num (p)a) And executing the following steps circularly and parallelly until an iteration termination condition is reached: evaluation of population Pl(t) fitness and speciesGroup Pl(t) performing a selection operator operation on the population Pl(t) performing a crossover operator operation on the population Pl(t) performing a mutation operator operation. If the defined frequency F of information exchange is reached, the information exchange is carried out according to the information exchange steps to obtain the child group Pl(t+1)=N[Pl(t),C1,C2,…,Ck]In which C iskThe size of k depends on the number of exchanges N. The variable t performs a self-increment operation.
Specifically, after the initial scheduling list is formed, the initial scheduling list is normalized and proportioned to select the first M individuals to form the sub-population. Taking DAG graph as an example, further analysis is carried out. The initialization genetic algebra is 0, and the initialization population S { (p)1,v1),(p1,v2),(p2,v2),(p3,v0),(p3,v1),(p2,v3)}. The maximum genetic algebra is set to be 500, the size of the population is set to be 6, the number of sub-populations classified into each processor is calculated to be 2, and when the genetic algebra is less than 500, the NGA algorithm is executed in parallel on each processor. The fitness obtained for the initialized population is shown in table 2:
TABLE 2 initialization fitness table
Figure BDA0002769788290000141
Figure BDA0002769788290000151
And performing single-point crossing by setting the crossing rate to be 0.8 according to the result crossing operation after the selection operation. And carrying out mutation according to a mutation rate formula. If the exchange frequency is reached, information exchange is carried out to obtain the next generation of population Pl(t+1)=N[Pl(t),C1,C2,…,Ck]Until the number of iterations 500 is satisfied.
The NPGA algorithm provided in this example is described as follows:
Figure BDA0002769788290000152
preferably, the minimum energy consumption is used as an objective function, the task execution time is used as a constraint condition, and an optimization model is established, specifically:
and (3) obtaining total energy consumption:
ECtotal=ECdynamic+ECstatic+ECtrans+ECsleep
wherein, ECtotalFor total energy consumption, ECdynamicFor dynamic energy consumption, ECstaticFor static energy consumption, ECtransFor transmission of energy consumption, ECsleepEnergy consumption for sleep;
establishing an objective function by taking the minimum total energy consumption as a target:
Emin=min(ECtotal);
wherein E isminRepresents the objective function, min () represents the minimum value;
sequentially solving the earliest execution time and the earliest ending time of each operation node:
Figure BDA0002769788290000161
End(ti,pa)=Start(ti,pa)+W(ti)/W(pa);
wherein, Start (t)i,pa) Representing a working node tiAt processor paThe earliest start time of (d), (t)j) Is tiDirect predecessor node of, End (t)j,pb) Representing a working node tiAt processor pbThe earliest end time of (C)abRepresenting the communication time between the a-th processor and the b-th processor; end (t)i,pa) Representing a working node tiAt processor paThe earliest end time of (d), W (t)i) Representing a working node tiJob size of (1), W (p)a) Representing the execution rate of the processor;
the earliest execution time of the task:
Figure BDA0002769788290000162
wherein, CabRepresenting the communication time, PL, between the a-th and b-th processorsabRepresenting the communication rate between the a-th processor and the B-th processor, BabRepresenting a communication bandwidth between the a-th processor and the b-th processor;
and (3) taking the last executed operation node as an exit node, constraining the earliest ending time of the exit node, and establishing a constraint condition:
End(texit)<deadline;
wherein, texitRepresents an egress job node, End (t)exit) Indicating an egress job node texitDeadline represents the latest deadline.
In the embodiment, the DVFS is optimized by considering energy consumption in the transmission process and energy consumption in the sleep state, so that a higher low-energy consumption effect is achieved. In the transition stage that the previous task is finished and the next task is not started, the voltage of the system is determined according to whether the total energy consumption is low under the voltage of the previous task or the total energy consumption is low under the sleep state, and the voltage with the lowest energy consumption is selected as the voltage of the transition stage.
Specifically, the dynamic power calculation formula is as follows:
Figure BDA0002769788290000171
the dynamic energy consumption calculation formula is as follows:
ECdynamic=Pa,m,k*t;
wherein a is capacitance, Pa,m,kFor processor PaAt a voltage va,mFrequency of fa,kDynamic power of the time. EC (EC)dynamicFor dynamic energy consumption, t represents the corresponding time.
The static power calculation formula is as follows:
Pa,s=Ia*va,s
the static energy consumption calculation formula is as follows:
ECstatic=Pa,s*t
wherein, ECstaticFor static energy consumption, Pa,sIs static power, IaFor reverse biasing the junction current, va,sThe assigned voltage of the task, t represents the corresponding time.
The transmission energy consumption calculation formula is as follows:
Figure BDA0002769788290000172
wherein, ECtransFor transmission of energy, Pa,sFor communication power, B (t)i,tj) Indicates an operation tiProcessor and task tjCommunication bandwidth, comm (t) between the processorsi,tj) Indicates an operation tiTo task tjThe amount of communication data.
The sleep energy consumption calculation formula is as follows:
ECsleep=Pa,sleep*t
wherein, ECsleepFor sleep energy consumption, Pa,sleepFor sleep power, t represents the corresponding time.
Preferably, the genetic operation of the mutation operator is performed on the current population, specifically:
respectively calculating the ratio of the fitness of each individual of the current population to the average fitness of the population;
judging whether the ratio is greater than 1, if so, reducing the variation rate of the corresponding individual, otherwise, increasing the variation rate of the corresponding individual;
and performing genetic operation of a mutation operator on the current population according to the regulated mutation rate.
Because the present invention solves the minimized objective function problem and the objective function value is greater than 0, the fitness function is chosen to be:
Figure BDA0002769788290000181
wherein F (S) is an objective function value, F (S) is an adaptability value, and the value range of F (S) is (0, 1).
Specifically, the operations of the three genetic operators are as follows:
selecting an operator operation: when the value of the objective function value F (S) is greater than 0, the value of F (S) is increased, the value of F (S) is reduced, namely, the energy consumption is higher, the fitness is lower, and the elimination possibility is higher; the value of F (S) is reduced, the value of F (S) is increased, in other words, the smaller the energy consumption is, the higher the fitness is, and the higher the probability of being selected is. And because the value range of the fitness is (0, 1), the fitness is normalized and the first M individuals are selected to form the sub-population according to the proportion.
And (3) operation of a crossover operator: the crossover operator is helpful for inheriting chromosome segments of excellent individuals to offspring, and meanwhile, the crossover operator generally plays a role in global search and can exploit unknown space.
Mutation operator operation: the mutation operator has the function of enabling the genetic algorithm to have local random search capability. After the operation of the crossover operator, the result of the genetic algorithm approaches to the optimal solution, and at the moment, the addition of the mutation operator can accelerate the convergence of the result to the optimal solution. Meanwhile, through variation, the diversity of the population can be increased. However, at different times, the variation values adopted should be different, so the invention provides a calculation method of a variation operator capable of adapting, if the ratio of the fitness of an individual to the average fitness is greater than or equal to 1, the individual is biased to a high-quality individual, the variation rate of the individual should be reduced, thereby reducing the variation probability and achieving the goal of maintaining the good quality, and if the ratio is less than 1, the individual is biased to a poor-quality individual, the variation rate is increased, thereby being beneficial to generating a new individual and increasing the possibility of the good quality. So as to gradually improve the whole, and the formula is as follows:
Figure BDA0002769788290000191
wherein, PmThe variation rate F is the individual fitness FavgAnd the average value of population fitness is obtained.
The embodiment improves a scheduling mode that local optimal solution is easy to occur in scheduling based on a genetic algorithm, and improves a mutation operator to realize real-time task scheduling on a heterogeneous computing platform.
In order to verify the effect of the present invention, the present invention is compared with the prior art from different aspects, and the following specific description is provided:
fig. 4 shows the runtime of the method when it is run on a different platform. As can be seen from fig. 4, at the beginning, the advantages of the heterogeneous computing system in data processing cannot be well reflected due to the small number of tasks, and on the contrary, when the number of tasks is small, the communication cost between different types of processors does not exist due to the fact that the heterogeneous computing system runs on a single multi-core CPU, and the running time is less. When the number is small, the running time is long because the communication cost between different processors is more compared with that of a single multi-core CPU when the multi-core CPU is run on a plurality of platforms. But with the increasing number of tasks, the heterogeneous computing platform gradually shows advantages and is far superior to the running speed of a single multi-core CPU. When the task amount is small, the advantages of the FPGA cannot be well played, so that the running time of the CPU + the GPU is basically equivalent to that of the CPU + the GPU + the FPGA, but the advantages of the CPU + the GPU + the FPGA are gradually revealed along with the increase of the task amount.
Fig. 5 shows a running-time comparison graph of the NGA algorithm and the NPGA algorithm, namely, a running-time comparison graph of the novel genetic algorithm provided by the invention executed in parallel and a running-time comparison graph of the novel genetic algorithm provided by the invention executed in non-parallel. As can be seen from fig. 5, as the number of tasks increases, the operation efficiency of the NPGA provided by the invention is greatly improved compared with the serial NGA algorithm. The advantage depends on the rapid development of computer technology, and the development of parallel technology shortens the processing time of the same task, thereby providing a solid and posterous guarantee for a large-scale computing scene. The novel parallel genetic algorithm NPGA is realized based on the improved NGA algorithm, and the NGA algorithm redesigns a mutation operator on the traditional GA algorithm, so that the mutation operator can be adaptively changed along with specific conditions, a guarantee is provided for jumping out of a local optimal solution, a novel fitness function is conformed to the subject of the invention, and compared with a fitness calculation mode of the traditional GA algorithm, the rigor and the conformity degree are increased, and the global optimal solution is favorably searched.
Fig. 6 shows a comparison graph of energy consumption of the conventional four GA algorithm and the NPGA algorithm in the present invention, and meanwhile, the parallel genetic algorithm provided by the present invention saves energy consumption compared to the conventional genetic algorithm in terms of energy consumption saving through an energy consumption calculation formula. From the energy consumption calculation formula, the total energy consumption is in direct proportion to the running time, when the number of tasks is large, the task scheduling time based on the NPGA algorithm is shorter than that of the traditional GA algorithm, and the generated energy consumption is correspondingly reduced, so that the aim of low energy consumption is fulfilled.
By analyzing the comparison of the running time of different tasks on different execution platforms and the comparison of the running time of different tasks under different task scheduling algorithms on a CPU + GPU + FPGA platform, the following conclusion is finally obtained:
1) under the condition of small task amount, the running time of the CPU + GPU + FPGA heterogeneous computing platform is longer than that of the multi-core CPU, and the communication cost exists in the scheduling of the heterogeneous computing platform. However, in the case of a large task amount, the running time of the heterogeneous computing platform is shorter than that of the multi-core CPU, because the heterogeneous computing platform can fully utilize the computing power with the increase of the tasks, and the communication cost time is far shorter than that of the multi-core CPU.
2) The difference between the CPU + GPU + FPGA heterogeneous computing platform and the CPU + GPU heterogeneous computing platform is almost equal to the difference between the CPU + GPU heterogeneous computing platform and the CPU + GPU heterogeneous computing platform in operation time, and the execution time is superior to that of the CPU + GPU in large task amount.
3) Compared with the traditional GA algorithm, the NPGA algorithm provided by the invention has the advantages that the addition of parallelism plays a role in improving the operation speed and shortening the operation time, and lays a foundation for realizing low energy consumption.
4) With the shortening of the operation time, the aim of low energy consumption can be achieved according to the energy consumption calculation formula. On the basis of the NPGA algorithm, the local optimal solution is improved by realizing the adaptive mutation operator and the improved fitness calculation function, and a new solution idea is provided for realizing the global optimal solution.
In the embodiment, a heterogeneous computing system of CPU + GPU + FPGA is firstly built according to OpenCL. The CPU is used as a host machine and is responsible for distributing and scheduling tasks and counting results of the GPU + FPGA computing equipment. And the GPU and the FPGA are used as computing equipment and are responsible for processing the operation. After the heterogeneous computing system is built, firstly, according to the dependency relationship of the DAG task graph, the operation execution sequence is sequentially eliminated from the operation node with the degree of entry of 0. And copying the task node set to form a T _ copy set, constructing an empty queue, executing enqueue operation on a node with the degree of enqueue of 0, deleting the node from the T _ copy, and setting the communication traffic of a directed edge between the task node and a node directly succeeding the task node to be-1. If there are a plurality of points with an in-degree of 0, a rule for short job priority is followed, and based on this, if there is a case where the sizes of a plurality of jobs match, random selection is performed. And finally forming a task queue. In the aspect of selection of a scheduling algorithm, important differences of heterogeneous computation in the aspects of computing power, communication time and the like are fully considered, and a novel parallel genetic algorithm NPGA is provided to solve the problem of a traditional parallel genetic algorithm. The parallel genetic algorithm has the advantages that the number of tasks processed by each processor is fixed, the problems of heterogeneous computing power, communication markets and the like are not considered, the problem is improved through the novel parallel genetic algorithm NPGA provided by the invention, and the number of tasks is distributed according to the computing power of each computer. On the basis of a novel parallel genetic algorithm NPGA (genetic algorithm), each processor implements a novel genetic algorithm NGA, the inherent genetic algorithm is easy to fall into a local optimal solution, and the mode for improving the defect provided by the invention is to provide a novel fitness calculation method and a self-adaptive mutation operator probability calculation mode. The defect that the traditional genetic algorithm is easy to fall into the local optimal solution is effectively overcome. And randomly generating a scheduling queue by adopting an overall scheduling mode according to the job queue and the job number divided by each processor. And performing task modeling according to a known condition, wherein the objective function is to obtain the minimum value of energy consumption, and the constraint condition is that the latest execution ending time is less than the final deadline. On each processor, the population is divided into M individuals according to the number of distributed tasks, and the evaluation fitness, selection operators, crossover operators and mutation operators in the genetic algorithm are executed in parallel on each processor. After the execution of each generation is finished, each processor receives exchange information sent by other processors, and the exchange information exchanges the information of the optimal sub-population in other processors. Meanwhile, the processor can also send the optimal individual information in the processor to other processors, so that the information sharing is realized. And repeating the genetic iteration until the genetic generation reaches the maximum genetic generation to obtain the optimal scheduling list S. And according to the scheduling list, obtaining the minimum energy consumption and the optimal distribution mode on the premise of meeting the real-time performance.
Example 2
Embodiment 2 of the present invention provides a task scheduling apparatus based on heterogeneous computing, including a processor and a memory, where the memory stores a computer program, and when the computer program is executed by the processor, the task scheduling apparatus based on heterogeneous computing according to embodiment 1 is implemented.
The task scheduling device based on heterogeneous computing provided in the embodiments of the present invention is used to implement the task scheduling method based on heterogeneous computing, and therefore, the task scheduling device based on heterogeneous computing also has the technical effect, and is not described herein again.
The method provided by the invention can be applied to any form of heterogeneous systems, namely CPU + GPU, CPU + FPGA, CPU + GPU + FPGA. In the embodiment, the heterogeneous system of CPU + GPU + FPGA is selected, the heterogeneous computing structure can fully use respective advantages of the CPU, the GPU and the FPGA, the CPU is responsible for logic control operation, the GPU and the FPGA are responsible for parallelization accelerated processing, in addition, the FPGA has the characteristic of low power consumption, customizable repeated programming is supported, and flexibility and performance are greatly improved. However, the prior art rarely uses such a heterogeneous system including more than two types of processors, because the combination of such heterogeneous processors brings about performance improvement, and the following disadvantages are obvious, and switching between different processors increases energy consumption. In order to solve the problem that high energy consumption occurs while performance is improved, an optimized scheduling algorithm is needed urgently, and the purpose of reducing power consumption is achieved by reducing the core frequency of a processor, prolonging the working time and reducing the working idle time of the processor.
Specifically, before task scheduling is carried out, a heterogeneous computing system of CPU + GPU + FPGA is built according to OpenCL. The CPU is used as a host machine and is responsible for distributing and scheduling tasks and counting results of the GPU + FPGA computing equipment. And the GPU and the FPGA are used as computing equipment and are responsible for processing the operation. The specific implementation steps for developing the CPU + GPU + FPGA heterogeneous system by adopting the OpenCL framework are as follows:
firstly, available equipment is obtained, namely, the equipment without tasks at present is obtained, and according to the scheduling principle, the most appropriate equipment is selected from the available equipment to complete loading and initialization.
A context environment is created, the role of the context being responsible for managing the device.
Command queues are created, one for each device, thereby ensuring independence.
And sending the commands to the command queue by using the context according to the context environment created in the step, and executing the commands by the equipment corresponding to the command queue according to the order.
The context environment is used for creating and managing a device cache which is used for storing data to be processed by the program, and one or more devices managed by the context environment can share the data in the device cache.
Because the host machine is in the core position of the brain in the whole heterogeneous system, the data of the host machine is written into the device cache, so that the other devices can smoothly receive the data shared by the host machine.
The source program file is acquired in preparation for later task execution.
And creating a device program, and writing the device program by the OpenCL framework, wherein the part of the program can run on a corresponding device.
And acquiring the corresponding parameter configuration of the equipment program, and initializing the parameters for the smooth execution of the subsequent task.
The index space, workgroup, and working instance also initialize their parameters in preparation for smooth execution of subsequent computational tasks.
And the preparation work is completely finished, and the equipment executes the task according to the equipment program.
After the computing equipment completes the computation, the computation result is written into the local memory, and meanwhile, the computation result is returned to the host machine.
And the host machine receives a result returned by the computing equipment, reads the output cache, acquires the Simon, namely the completion of the current computing task of the computing result, releases resources and waits for the next task.
Example 3
Embodiment 3 of the present invention provides a computer storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the heterogeneous computing-based task scheduling method provided in embodiment 1.
The computer storage medium provided by the embodiment of the invention is used for realizing the task scheduling method based on heterogeneous computing, so that the computer storage medium has the technical effects of the task scheduling method based on heterogeneous computing, and the details are not repeated herein.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (10)

1. A task scheduling method based on heterogeneous computing is characterized by comprising the following steps:
establishing a DAG model of a task to be scheduled, and establishing an operation queue according to the DAG model;
establishing a network topological graph among heterogeneous processors, and distributing initial task amount for each processor according to the calculation speed of each processor;
randomly distributing tasks in the job queue to each processor according to the initial task amount, randomly distributing initial voltage to each processor, and constructing an initial scheduling list;
generating an initial population according to the initial scheduling list, initializing parameters of a genetic algorithm, and executing the genetic algorithm in parallel by each processor to perform population iteration updating to obtain an optimal population;
and acquiring a task scheduling list corresponding to the optimal population as an optimal task scheduling list.
2. The task scheduling method based on heterogeneous computing according to claim 1, wherein the establishing of the DAG model of the task to be scheduled specifically comprises:
and establishing the DAG model by taking the operation contained in the task to be scheduled as a node of the DAG model, taking the operation execution time as a node attribute, establishing directed edges between the nodes according to the execution sequence dependency relationship between the operations, and taking the communication traffic between the operations as the directed edge attribute.
3. The task scheduling method based on heterogeneous computing according to claim 2, wherein the job queue is established according to the DAG model, specifically:
copying all nodes in the DAG model to obtain a node set;
screening out nodes with the access degree of 0, if only one node with the access degree of 0 exists, directly putting the node into an operation queue, if a plurality of nodes with the access degree of 0 exist, further screening out the node with the minimum operation size, if only one node with the minimum operation size exists, directly putting the node into the operation queue, and if a plurality of nodes with the minimum operation size exist, randomly selecting one node from the nodes to put into the operation queue;
deleting the enqueued nodes from the node set, and deleting the dependency relationship between subsequent nodes connected with the enqueued nodes; and judging whether the current node set is empty, if so, outputting the operation queue, and if not, turning to the previous step for next node enqueue.
4. The task scheduling method based on heterogeneous computing according to claim 1, wherein a network topology map between heterogeneous processors is established, specifically:
the method comprises the steps of taking each processor as a node of a network topological graph, taking the execution speed of the processor as a node attribute, establishing a non-directional edge between the nodes according to whether communication between the processors is available, and taking the communication speed between the processors as a non-directional edge attribute to establish the network topological graph.
5. The task scheduling method based on heterogeneous computing according to claim 1, wherein the allocating of the initial task amount to each processor according to the computing speed of each processor specifically comprises:
according to the calculation speed of each processor, calculating the hyper-parameter of each processor:
Figure FDA0002769788280000021
wherein, thetaaIs a hyperparameter of the a-th processor, W (p)a) For the computation rate of the a-th processor, W (p)b) D is the calculation rate of the b-th processor, a is more than or equal to 1 and less than or equal to d, b is 1, …, d and d is the number of processors;
allocating initial task amount to each processor according to the hyper-parameters:
Num(pa)=θa*M;
wherein, Num (p)a) The initial amount of tasks allocated to the a-th processor, M is the task totalThe number of the cells.
6. The task scheduling method based on heterogeneous computing according to claim 1, wherein each processor executes a genetic algorithm in parallel to perform population iteration updating to obtain an optimal population, specifically:
establishing an optimization model by taking the minimum energy consumption as an objective function and taking the task execution time as a constraint condition;
evaluating the fitness of the current population according to the objective function, and carrying out genetic operations of a selection operator, a crossover operator and a mutation operator on the current population in parallel by each processor; each processor exchanges information to realize population updating;
and judging whether an iteration termination condition is reached, if so, outputting the current population as an optimal population, and otherwise, turning to the previous step for next iteration.
7. The task scheduling method based on heterogeneous computing according to claim 6, wherein an optimization model is established with a minimum energy consumption as an objective function and a task execution time as a constraint condition, and specifically comprises:
and (3) obtaining total energy consumption:
ECtotal=ECdynamic+ECstatic+ECtrans+ECsleep
wherein, ECtotalFor total energy consumption, ECdynamicFor dynamic energy consumption, ECstaticFor static energy consumption, ECtransFor transmission of energy consumption, ECsleepEnergy consumption for sleep;
establishing an objective function by taking the minimum total energy consumption as a target:
Emin=min(ECtotal);
wherein E isminRepresents the objective function, min () represents the minimum value;
sequentially calculating the earliest execution time and the earliest ending time of each operation node, taking the operation node executed last as an exit node, constraining the earliest ending time of the exit node, and establishing a constraint condition:
End(texit)<deadline;
wherein, texitRepresents an egress job node, End (t)exit) Indicating an egress job node texitDeadline represents the latest deadline.
8. The task scheduling method based on heterogeneous computing according to claim 6, wherein the genetic operation of a mutation operator is performed on the current population, specifically:
respectively calculating the ratio of the fitness of each individual of the current population to the average fitness of the population;
judging whether the ratio is greater than 1, if so, reducing the variation rate of the corresponding individual, otherwise, increasing the variation rate of the corresponding individual;
and performing genetic operation of a mutation operator on the current population according to the regulated mutation rate.
9. A task scheduling apparatus based on heterogeneous computing, comprising a processor and a memory, wherein the memory stores a computer program, and the computer program, when executed by the processor, implements the task scheduling method based on heterogeneous computing according to any one of claims 1 to 8.
10. A computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the heterogeneous computing based task scheduling method according to any one of claims 1 to 8.
CN202011245253.7A 2020-11-10 2020-11-10 Task scheduling method and device based on heterogeneous computing Pending CN112328380A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011245253.7A CN112328380A (en) 2020-11-10 2020-11-10 Task scheduling method and device based on heterogeneous computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011245253.7A CN112328380A (en) 2020-11-10 2020-11-10 Task scheduling method and device based on heterogeneous computing

Publications (1)

Publication Number Publication Date
CN112328380A true CN112328380A (en) 2021-02-05

Family

ID=74317869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011245253.7A Pending CN112328380A (en) 2020-11-10 2020-11-10 Task scheduling method and device based on heterogeneous computing

Country Status (1)

Country Link
CN (1) CN112328380A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127167A (en) * 2021-03-18 2021-07-16 国家卫星气象中心(国家空间天气监测预警中心) Heterogeneous resource intelligent parallel scheduling method based on improved genetic algorithm
CN113327442A (en) * 2021-04-30 2021-08-31 广州中国科学院软件应用技术研究所 Cooperative control system and method based on end cloud fusion
CN113448736A (en) * 2021-07-22 2021-09-28 东南大学 Task mapping method for approximate computation task on multi-core heterogeneous processing platform based on energy and QoS joint optimization
CN113568730A (en) * 2021-08-03 2021-10-29 北京八分量信息科技有限公司 Constraint scheduling method and device for heterogeneous tasks and related products
CN114020476A (en) * 2021-12-30 2022-02-08 荣耀终端有限公司 Job processing method, device and medium
CN114092073A (en) * 2022-01-21 2022-02-25 苏州浪潮智能科技有限公司 Method, system and device for converting undirected weighted data graph into DAG task graph
CN115237582A (en) * 2022-09-22 2022-10-25 摩尔线程智能科技(北京)有限责任公司 Method for processing multiple tasks, processing equipment and heterogeneous computing system
WO2022236834A1 (en) * 2021-05-14 2022-11-17 Alipay (Hangzhou) Information Technology Co., Ltd. Method and system for scheduling tasks
CN117056089A (en) * 2023-10-11 2023-11-14 创瑞技术有限公司 Service dynamic allocation system and method
CN117453379A (en) * 2023-12-25 2024-01-26 麒麟软件有限公司 Scheduling method and system for AOE network computing tasks in Linux system
CN117556893A (en) * 2024-01-12 2024-02-13 芯动微电子科技(武汉)有限公司 GPU operation GEMM optimization method and device based on parallel genetic algorithm
CN117891584A (en) * 2024-03-15 2024-04-16 福建顶点软件股份有限公司 Task parallelism scheduling method, medium and device based on DAG grouping

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102866912A (en) * 2012-10-16 2013-01-09 首都师范大学 Single-instruction-set heterogeneous multi-core system static task scheduling method
CN106250650A (en) * 2016-08-15 2016-12-21 北京理工大学 The resource allocation and optimization method of model in high flux emulation
CN107133091A (en) * 2017-05-08 2017-09-05 武汉轻工大学 The cloud workflow task dispatching method being classified based on top-down task
CN108416465A (en) * 2018-01-31 2018-08-17 杭州电子科技大学 A kind of Workflow optimization method under mobile cloud environment
CN108762927A (en) * 2018-05-29 2018-11-06 武汉轻工大学 The multiple target method for scheduling task of mobile cloud computing
CN108829501A (en) * 2018-05-18 2018-11-16 天津科技大学 A kind of batch processing scientific workflow task scheduling algorithm based on improved adaptive GA-IAGA
CN109960576A (en) * 2019-03-29 2019-07-02 北京工业大学 A kind of low energy consumption task scheduling strategy towards CPU-GPU isomery
CN110908782A (en) * 2019-11-01 2020-03-24 湖北省楚天云有限公司 Genetic algorithm optimization-based packaging type distributed job task scheduling method and system
CN111061569A (en) * 2019-12-18 2020-04-24 北京工业大学 Heterogeneous multi-core processor task allocation and scheduling strategy based on genetic algorithm
CN111209095A (en) * 2019-08-20 2020-05-29 杭州电子科技大学 Pruning method based on tree search in DAG parallel task scheduling

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102866912A (en) * 2012-10-16 2013-01-09 首都师范大学 Single-instruction-set heterogeneous multi-core system static task scheduling method
CN106250650A (en) * 2016-08-15 2016-12-21 北京理工大学 The resource allocation and optimization method of model in high flux emulation
CN107133091A (en) * 2017-05-08 2017-09-05 武汉轻工大学 The cloud workflow task dispatching method being classified based on top-down task
CN108416465A (en) * 2018-01-31 2018-08-17 杭州电子科技大学 A kind of Workflow optimization method under mobile cloud environment
CN108829501A (en) * 2018-05-18 2018-11-16 天津科技大学 A kind of batch processing scientific workflow task scheduling algorithm based on improved adaptive GA-IAGA
CN108762927A (en) * 2018-05-29 2018-11-06 武汉轻工大学 The multiple target method for scheduling task of mobile cloud computing
CN109960576A (en) * 2019-03-29 2019-07-02 北京工业大学 A kind of low energy consumption task scheduling strategy towards CPU-GPU isomery
CN111209095A (en) * 2019-08-20 2020-05-29 杭州电子科技大学 Pruning method based on tree search in DAG parallel task scheduling
CN110908782A (en) * 2019-11-01 2020-03-24 湖北省楚天云有限公司 Genetic algorithm optimization-based packaging type distributed job task scheduling method and system
CN111061569A (en) * 2019-12-18 2020-04-24 北京工业大学 Heterogeneous multi-core processor task allocation and scheduling strategy based on genetic algorithm

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127167B (en) * 2021-03-18 2023-11-03 国家卫星气象中心(国家空间天气监测预警中心) Heterogeneous resource intelligent parallel scheduling method based on improved genetic algorithm
CN113127167A (en) * 2021-03-18 2021-07-16 国家卫星气象中心(国家空间天气监测预警中心) Heterogeneous resource intelligent parallel scheduling method based on improved genetic algorithm
CN113327442A (en) * 2021-04-30 2021-08-31 广州中国科学院软件应用技术研究所 Cooperative control system and method based on end cloud fusion
WO2022236834A1 (en) * 2021-05-14 2022-11-17 Alipay (Hangzhou) Information Technology Co., Ltd. Method and system for scheduling tasks
CN113448736A (en) * 2021-07-22 2021-09-28 东南大学 Task mapping method for approximate computation task on multi-core heterogeneous processing platform based on energy and QoS joint optimization
CN113448736B (en) * 2021-07-22 2024-03-19 东南大学 Task mapping method based on energy and QoS joint optimization for approximate calculation task on multi-core heterogeneous processing platform
CN113568730A (en) * 2021-08-03 2021-10-29 北京八分量信息科技有限公司 Constraint scheduling method and device for heterogeneous tasks and related products
CN114020476A (en) * 2021-12-30 2022-02-08 荣耀终端有限公司 Job processing method, device and medium
CN114020476B (en) * 2021-12-30 2022-06-03 荣耀终端有限公司 Job processing method, device and medium
CN114092073A (en) * 2022-01-21 2022-02-25 苏州浪潮智能科技有限公司 Method, system and device for converting undirected weighted data graph into DAG task graph
CN114092073B (en) * 2022-01-21 2022-04-22 苏州浪潮智能科技有限公司 Method, system and device for converting undirected weighted data graph into DAG task graph
CN115237582B (en) * 2022-09-22 2022-12-09 摩尔线程智能科技(北京)有限责任公司 Method for processing multiple tasks, processing equipment and heterogeneous computing system
CN115237582A (en) * 2022-09-22 2022-10-25 摩尔线程智能科技(北京)有限责任公司 Method for processing multiple tasks, processing equipment and heterogeneous computing system
CN117056089A (en) * 2023-10-11 2023-11-14 创瑞技术有限公司 Service dynamic allocation system and method
CN117056089B (en) * 2023-10-11 2024-02-06 创瑞技术有限公司 Service dynamic allocation system and method
CN117453379A (en) * 2023-12-25 2024-01-26 麒麟软件有限公司 Scheduling method and system for AOE network computing tasks in Linux system
CN117453379B (en) * 2023-12-25 2024-04-05 麒麟软件有限公司 Scheduling method and system for AOE network computing tasks in Linux system
CN117556893A (en) * 2024-01-12 2024-02-13 芯动微电子科技(武汉)有限公司 GPU operation GEMM optimization method and device based on parallel genetic algorithm
CN117556893B (en) * 2024-01-12 2024-05-03 芯动微电子科技(武汉)有限公司 GPU operation GEMM optimization method and device based on parallel genetic algorithm
CN117891584A (en) * 2024-03-15 2024-04-16 福建顶点软件股份有限公司 Task parallelism scheduling method, medium and device based on DAG grouping
CN117891584B (en) * 2024-03-15 2024-05-14 福建顶点软件股份有限公司 Task parallelism scheduling method, medium and device based on DAG grouping

Similar Documents

Publication Publication Date Title
CN112328380A (en) Task scheduling method and device based on heterogeneous computing
Abed-Alguni et al. Distributed Grey Wolf Optimizer for scheduling of workflow applications in cloud environments
Guo et al. Cloud resource scheduling with deep reinforcement learning and imitation learning
CN112416585B (en) Deep learning-oriented GPU resource management and intelligent scheduling method
Ahmed et al. Using differential evolution and Moth–Flame optimization for scientific workflow scheduling in fog computing
CN110717574B (en) Neural network operation method and device and heterogeneous intelligent chip
Tantalaki et al. Pipeline-based linear scheduling of big data streams in the cloud
Rabiee et al. Job scheduling in grid computing with cuckoo optimization algorithm
CN115330189A (en) Workflow optimization scheduling method based on improved moth flame algorithm
Wu et al. Adaptive DAG tasks scheduling with deep reinforcement learning
Shafique et al. Minority-game-based resource allocation for run-time reconfigurable multi-core processors
Asghari et al. Combined use of coral reefs optimization and reinforcement learning for improving resource utilization and load balancing in cloud environments
Mojab et al. iCATS: Scheduling big data workflows in the cloud using cultural algorithms
Ma et al. Adaptive stochastic gradient descent for deep learning on heterogeneous cpu+ gpu architectures
CN115016938A (en) Calculation graph automatic partitioning method based on reinforcement learning
Zhou et al. Deep reinforcement learning-based algorithms selectors for the resource scheduling in hierarchical cloud computing
CN117032807A (en) AI acceleration processor architecture based on RISC-V instruction set
Chraibi et al. Makespan optimisation in cloudlet scheduling with improved DQN algorithm in cloud computing
Öz et al. Scalable parallel implementation of migrating birds optimization for the multi-objective task allocation problem
Pérez et al. Parallel/distributed implementation of cellular training for generative adversarial neural networks
Muthu et al. Optimized scheduling and resource allocation using evolutionary algorithms in cloud environment
CN117579701A (en) Mobile edge network computing and unloading method and system
CN116841710A (en) Task scheduling method, task scheduling system and computer storage medium
Kumar et al. EAEFA: An Efficient Energy-Aware Task Scheduling in Cloud Environment
Singh Hybrid genetic, variable neighbourhood search and particle swarm optimisation-based job scheduling for cloud computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination