CN112256422A

CN112256422A - Heterogeneous platform task scheduling method and system based on Q learning

Info

Publication number: CN112256422A
Application number: CN202011284585.6A
Authority: CN
Inventors: 高博; 李娜; 谢宗甫; 岳春生; 张锋印; 董春宵; 马金全; 余果; 郭璐
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2021-01-22
Anticipated expiration: 2040-11-17
Also published as: CN112256422B

Abstract

The invention belongs to the technical field of heterogeneous multiprocessor computing, and particularly relates to a heterogeneous platform task scheduling method and system based on Q learning, wherein all tasks are used as a state space of Q learning, a processor is integrated as an action space, a task waiting for distribution is used as a current state, and a task initial mapping scheme is obtained according to the execution time required by the task in Q learning to be mapped to the action space; establishing a genetic algorithm model, carrying out fitness evaluation on a task initial mapping scheme, setting individuals copied to a next generation population in the genetic algorithm model according to the fitness, carrying out cross variation on reserved individuals, and determining the optimization efficiency and the minimum threshold of a new population; obtaining an approximate optimal solution of mapping from tasks in the model to the processor according to the genetic algorithm model; and converting the model approximate optimal solution into ant colony information initial information distribution, and iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution to obtain a task scheduling optimal scheme so as to better improve the performance of the heterogeneous platform.

Description

Heterogeneous platform task scheduling method and system based on Q learning

Technical Field

The invention belongs to the technical field of heterogeneous multiprocessor computing, and particularly relates to a heterogeneous platform task scheduling method and system based on Q learning.

Background

With the continuous improvement of various signal processing tasks on high-performance computing requirements and the rapid development of hardware accelerators, a general processor cannot meet the requirements of strong real-time and large-scale computing, and heterogeneous computing systems are increasingly used for solving the problem of complex task processing. The heterogeneous system architecture comprises a series of processors with greatly different structures, such as a CPU, a GPU, an FPGA, a DSP and the like, and all the processors are connected through a special network or an interface to meet the requirements of different types of computing tasks on hardware performance so as to improve the resource utilization rate and the computing efficiency. The efficiency and reliability of heterogeneous multiprocessors is critical in order to meet the demands of increasingly complex signal processing tasks. Whether a heterogeneous computing system can deliver its high performance benefits depends on the following: hardware resource platform architecture, matching degree between tasks and processors, and task scheduling strategy. Scheduling is inherently a multi-target, NP-hard problem, and the dynamic, heterogeneous characteristics of heterogeneous computing systems add some difficulty to mission planning. But for a given heterogeneous system, an efficient scheduling strategy is key to improve the strong real-time and high-throughput performance of the platform.

Disclosure of Invention

Aiming at the problems of low flexibility, low convergence speed, poor predictability and the like of the existing scheduling algorithm, the invention provides a method and a system for scheduling tasks of a heterogeneous platform based on Q learning, which can adjust the network searching direction in time and give consideration to local and global searching to obtain better results, so that each processor of the heterogeneous platform can exert the maximum efficiency, the parallel processing of the tasks is facilitated, and the performance of the heterogeneous platform is improved.

According to the design scheme provided by the invention, the heterogeneous platform task scheduling method based on Q learning comprises the following contents:

taking all tasks as a state space of Q learning, integrating a processor into an action space, waiting for the distributed tasks as a current state, and acquiring a task initial mapping scheme according to the execution time required by the task mapped to the action space in the Q learning;

establishing a genetic algorithm model, carrying out fitness evaluation on a task initial mapping scheme, setting individuals copied to a next generation population in the genetic algorithm model according to the fitness, carrying out cross variation on reserved individuals, and determining the optimization efficiency and the minimum threshold of a new population; obtaining an approximate optimal solution of mapping from tasks in the model to the processor according to the genetic algorithm model;

and converting the model approximate optimal solution into ant colony information initial information distribution, and iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution to obtain a task scheduling optimal scheme.

As the Q-learning-based heterogeneous platform task scheduling method, further, a system application model is represented as G ═ V, E, C, L }, a target system model is represented as P ═ N, H, W, T }, V is a task set, E is a dependent-relationship directed edge set, C is a task computation amount set, L is inter-subtask traffic, N is a processor set, H is a processor characteristic, W is computation overhead, and T is communication overhead of tasks between processors, and a state space and an action space of Q-learning are obtained according to the task set and the processor set in both the system application model and the target system model.

As the heterogeneous platform task scheduling method based on Q learning, further, a Q learning agent executes actions in the current state according to an epsilon-greedy behavior strategy to obtain a Q value of a task mapped to a processor, cannot obtain immediate rewards, and transfers to a new state; and aiming at the minimum Q value after the execution of each action is finished, selecting the action with the minimum Q value in a new state for execution, storing, attenuating the learning rate, endowing the next state with the current state, selecting the action in the next state according to the epsilon-greedy behavior strategy, and performing iteration to obtain a task initial mapping scheme according to the storage condition.

As the heterogeneous platform task scheduling method based on Q learning, the invention further comprises the steps of establishing a genetic algorithm model, coding a task initial mapping scheme and mapping the tasks to corresponding processors; evaluating the fitness according to a fitness evaluation function, copying individuals meeting the fitness value and directly entering next generation of population, and performing cross variation on the original reserved individuals; and comparing the population optimization efficiency of each generation with a minimum threshold value by utilizing the iterative process of the genetic algorithm to determine the current population optimization efficiency until the population optimization efficiencies of the generations which are continuously set are all smaller than the minimum threshold value, terminating iteration and acquiring an approximate optimal solution set of the genetic algorithm model about the mapping of the tasks and the processor.

As the Q learning-based heterogeneous platform task scheduling method, the fitness evaluation function is further expressed as:

t is 1, 2, …, where Q (s, a) denotes the execution time of the task s mapped on the processor a and t denotes the iteration algebra.

As the Q-learning-based heterogeneous platform task scheduling method, further, in each iteration, a minimum threshold is set according to the optimization efficiency of a new population, and if the population optimization efficiency of each descendant is smaller than the minimum threshold, the size of the minimum threshold is replaced by the current population optimization efficiency.

As the Q learning-based heterogeneous platform task scheduling method, the quantity of released pheromones of ants is further determined according to the required execution time of tasks in a processor in an ant colony algorithm; recording the path that the ant has traveled at the current moment in a tabu table according to the selected processor; and obtaining the optimal distribution of the tasks to the processor and the shortest time for executing the tasks by iteratively outputting the ant optimal path.

Further, based on the above method, the present invention further provides a Q-learning based heterogeneous platform task scheduling system, including: comprises the following steps: an initial mapping module, a fitness evaluation module and an optimal output module, wherein,

the initial mapping module is used for taking all tasks as a state space of Q learning, the processor is integrated as an action space, the tasks to be distributed are taken as a current state, and a task initial mapping scheme is obtained according to the execution time required by the task mapped to the action space in the Q learning;

the fitness evaluation module is used for creating a genetic algorithm model, evaluating the fitness of the task initial mapping scheme, setting individuals copied to the next generation of population in the genetic algorithm model according to the fitness, performing cross variation on the retained individuals and determining the optimization efficiency and the minimum threshold of a new population; obtaining an approximate optimal solution of mapping from tasks in the model to the processor according to the genetic algorithm model;

and the optimal output module is used for converting the model approximate optimal solution into ant colony information initial information distribution and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.

Further, the present invention also provides a computer readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, performs the method described above

Further, the present invention also provides a computer device comprising a processor and a memory, wherein the memory stores machine executable instructions capable of being executed by the processor, and the processor executes the machine executable instructions to execute the method.

The invention has the beneficial effects that:

according to the method, an initial solution set is generated by mapping a task set and a processor set through Q learning and is used as initial information of a genetic algorithm GA, and the performance of the whole scheme is improved by accelerating the search rate of the GA; and heuristic variation operation in the GA algorithm is utilized to improve the local search capability of the ant colony algorithm, maintain the diversity of the colony, accelerate the speed of convergence to the optimal solution, maintain better speed in the whole search process, shorten the scheduling length, enable each processor of the heterogeneous platform to exert the maximum efficiency, improve the utilization rate of hardware resources, facilitate the parallel processing of tasks, improve the performance of the heterogeneous platform and have better application prospect.

Description of the drawings:

FIG. 1 is a flowchart illustrating a task scheduling method for a heterogeneous platform in an embodiment;

FIG. 2 is a schematic diagram illustrating the comparison between the scheduling length and the average resource utilization rate of the existing algorithm in the embodiment;

FIG. 3 is a flow diagram of a QGA-ACO algorithm for task scheduling of a heterogeneous platform in the embodiment;

FIG. 4 is a schematic representation of a task and target system model in an embodiment;

FIG. 5 is a schematic diagram of the crossover operation in the example;

FIG. 6 is a schematic diagram of a variant operation in the example;

FIG. 7 is a graph illustrating algorithm rate versus iteration number in an example.

The specific implementation mode is as follows:

in order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.

Aiming at the problems of low flexibility, low convergence rate, poor predictability and the like of the task scheduling algorithm of the conventional heterogeneous multiprocessor computing platform, the embodiment of the invention, as shown in fig. 1, provides a heterogeneous platform task scheduling method based on Q learning, as shown in fig. 1, comprising the following contents:

s101, taking all tasks as a state space of Q learning, integrating a processor into an action space, waiting for the distributed tasks as a current state, and acquiring a task initial mapping scheme according to the execution time required by the task mapped to the action space in the Q learning;

s102, establishing a genetic algorithm model, carrying out fitness evaluation on a task initial mapping scheme, setting individuals copied to a next generation population in the genetic algorithm model according to the fitness, carrying out cross variation on reserved individuals, and determining new population optimization efficiency and a minimum threshold; obtaining an approximate optimal solution of mapping from tasks in the model to the processor according to the genetic algorithm model;

s103, converting the model approximate optimal solution into ant colony information initial information distribution, and iteratively searching and outputting an optimal path according to the information distribution and an ant colony algorithm to obtain a task scheduling optimal scheme.

The heterogeneous multiprocessor computing system has dynamic property and heterogeneous property, namely various resources and tasks have the characteristic of random participation and quitting, and the structure and the performance of the processors are different. The high performance characteristics make the heterogeneous multiprocessor computing system become one of the development directions of high performance computing, but the resource management and task scheduling problems are more complicated. How to exert the high computing efficiency of the heterogeneous system is the key to meet the required task scheduling strategy. The traditional scheduling algorithm has higher algorithm complexity or time complexity when solving complex problems, so that heuristic methods such as genetic algorithm, ant colony algorithm and the like are widely applied. However, the initial information of such algorithms is mostly randomly constructed, and the quality of the initial search information is difficult to guarantee, so that the convergence performance of the algorithms is affected to a certain extent. In the embodiment of the scheme, feedback information is obtained by utilizing the interaction of Q-Learning and the environment, the network searching direction can be adjusted in time, and meanwhile, local and global searching is considered to obtain a better result. The result is used as the initial information of the GA to make up the defects that the initial population of the genetic algorithm has low individual quality and can not utilize network feedback information, and the iteration time is shortened by combining the ACO at the later stage, so that the search is more efficient, and the application task can be optimally distributed to the processors. Due to the defects of the single task scheduling algorithm, the scheduling requirements of the platform cannot be completely met, and the advantages and disadvantages of different algorithms are combined to form good and bad complementation so as to meet the requirements of different types of computing tasks on hardware performance and improve the resource utilization rate and the computing efficiency.

The MLSH, ACO and GAACO respectively represent three algorithms of a traditional scheduling algorithm, a heuristic method and combination optimization, as shown in (a) and (b) in FIG. 2, and are respectively a comparison schematic of the MLSH, ACO and GAACO algorithms in the aspects of scheduling length and average resource utilization rate. For small-scale tasks, the scheduling lengths of the three algorithms are close, but with the increase of the number of tasks, it can be seen that the scheduling lengths of the ACO algorithm and the GAACO algorithm are almost different and are lower than MLSH (maximum likelihood ratio), because the algorithm complexity of the traditional scheduling algorithm is higher than that of the heuristic algorithm and increases exponentially along with the increase of the number of tasks. In terms of resource utilization, GAACO has an average resource utilization significantly higher than ACO and MLSH. The GAACO algorithm combines the advantages of universality, expandability and global convergence of the genetic algorithm and parallelism and high solving precision of the ant colony algorithm, realizes the complementation of advantages and disadvantages, and improves the execution efficiency and the solving precision. Therefore, in both aspects, the GAACO can achieve the best scheduling performance, but the GAACO does not achieve a significant improvement in the scheduling length.

Referring to fig. 3, the QGA-ACO algorithm implementing the embodiment of the present disclosure can be designed as follows: in the first stage, initial allocation of tasks to resources is realized through Q learning, a better scheme and a Q value are obtained, a GA model is established on the basis, parameters and constraint conditions are determined, the initial solution is crossed, mutated and copied to ensure the diversity of the population, and the convergence speed is improved to further optimize the result. And adopting an ACO algorithm in the second stage, if the continuous four-generation evolutionary rates in the first stage are all smaller than a threshold value, entering the second stage, converting the result in the first stage into an initial pheromone value of the ACO, avoiding blind search of an ant colony, and iteratively searching according to pheromone distribution until a termination condition is met.

As the Q-learning-based heterogeneous platform task scheduling method in the embodiment of the present invention, further, a system application model is represented as G ═ V, E, C, L }, a target system model is represented as P ═ N, H, W, T }, V is a task set, E is a dependency-oriented edge set, C is a task computation amount set, L is inter-subtask traffic, N is a processor set, H is a processor characteristic, W is computation overhead, and T is communication overhead of a task between processors, and a state space and an action space of Q-learning are obtained according to the task set and the processor set in both the system application model and the target system model.

Referring to fig. 4, the system application model is represented as G ═ { V, E, C, L }, and V ═ V { (V }₁,v₂,...,v_nRepresents the set of signal processing tasks to be processed, E ═ E₁₂,e₂₃,...,e_ijIs a set of dependent directed edges, C ═ C₁,c₂,...,c_nDenotes a set of task computation overheads, L (v)_i,v_j) Indicating the communication overhead between subtasks, if task v_i，v_jMapped to the same node, the communication overhead is 0. Hardware resources may be represented by an undirected graph, abstracted as P ═ N, H, W, T }, and N ═ N (N)₁,n₂…), H is a processor profile, W is the execution rate of the processor, and T represents the inter-processor communication rate.

As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the invention, further, a Q learning agent executes actions in the current state according to an epsilon-greedy behavior strategy to obtain a Q value of a task mapped to a processor, cannot obtain immediate reward, and transfers to a new state; and aiming at the minimum Q value after the execution of each action is finished, selecting the action with the minimum Q value in a new state for execution, storing, attenuating the learning rate, endowing the next state with the current state, selecting the action in the next state according to the epsilon-greedy behavior strategy, and performing iteration to obtain a task initial mapping scheme according to the storage condition.

The contents of the algorithm stage one shown in fig. 3 can be designed as follows:

the task set V may be formed into a state space S and the processor set N as an action space a. Task v to be waiting for allocation_iAs the current state s, the execution action n of the current state_iIs a.

Step 1, initializing a state motion space of Q-learning, wherein Q (S, A) ═ C is the execution time required by mapping the task S to A, the search factor is epsilon, the discount factor is gamma, and the learning rate alpha.

Step2, the agent executes the action a according to a certain probability under the current state s according to an epsilon-greedy behavior strategy, can obtain the Q value of the task mapped to the processor, and obtains an immediate reward r (the larger the Q value is, the longer the execution time of the task is, the smaller the obtained immediate reward is), and then the agent transfers to a new state s'; an epsilon-greedy strategy is adopted to balance the search of the Agent on the state space and the utilization of the obtained information, so that the greedy utilization is prevented from falling into local optimization, and the performance of the algorithm is prevented from being influenced by excessive search;

step 3, aiming at the minimum Q value after each action execution, selecting the action a 'with the minimum Q value to execute according to the formula (1) in the s' state, calculating the Q value of the Agent at the (s, a) position, and storing the Q value into a Q table;

Q(s_t,a_t)←Q(s_t,a_t)+α[r_t+1+γminQ(s_t+1,a′)-Q(s_t,a_t)](1)

step 4, attenuating the learning rate, endowing the next state to the current state s ← s ', selecting the action a' in the next state s 'according to the behavior strategy epsilon-greedy, and endowing the action a ← a' to be executed;

step 5, judging whether the current Step number is plus 1, and if not, entering Step2 to continue execution; and if the iteration is finished, finishing the algorithm and obtaining a task initial mapping scheme according to the Q table.

As the Q learning-based heterogeneous platform task scheduling method in the embodiment of the invention, further, in the establishment of a genetic algorithm model, a task initial mapping scheme is coded, and tasks are mapped to corresponding processors; evaluating the fitness according to a fitness evaluation function, copying individuals meeting the fitness value and directly entering next generation of population, and performing cross variation on the original reserved individuals; and comparing the population optimization efficiency of each generation with a minimum threshold value by utilizing the iterative process of the genetic algorithm to determine the current population optimization efficiency until the population optimization efficiencies of the generations which are continuously set are all smaller than the minimum threshold value, terminating iteration and acquiring an approximate optimal solution set of the genetic algorithm model about the mapping of the tasks and the processor. Further, in each iteration, a minimum threshold is set according to the optimization efficiency of the new population, and if the population optimization efficiency of each descendant is smaller than the minimum threshold, the size of the minimum threshold is replaced by the current population optimization efficiency.

Encoding (encoding rule: chromosome [ p ]) for the initial population (mapping scheme)]Q, namely mapping the task p to the processor q) and fitness evaluation, copying individuals with high fitness values directly into next generation populations, and performing crossover and mutation on original reserved individuals. The fitness evaluation function may be expressed as:

t is 1, 2, …, where Q (s, a) denotes the execution time of the task s mapped on the processor a and t denotes the iteration algebra. And a crossover operator for randomly selecting a crossover point on the chromosome, dividing the parent chromosome and the parent chromosome into two parts respectively, wherein the new chromosome gene composition is shown in figure 5, the left segment of the crossover point chromosome gene is from one side of the corresponding parent chromosome, and the right segment of the crossover point chromosome gene is copied from the other genes corresponding to the other chromosomes, thereby obtaining two new chromosomes. Heuristic mutation operation, as shown in fig. 6, randomly selects a locus i of a chromosome, searches the last subsequent succ (i) of i from the position, randomly selects a locus j e (i, succ (i)), and replaces the positions of i and j to form a new chromosome. And collecting the optimization efficiency of the new population obtained by the genetic algorithm, and determining the optimization efficiency as a minimum threshold value. And collecting the optimization efficiency of the new population obtained by the genetic algorithm, and determining the optimization efficiency as a minimum threshold value. And adding 1 to the iteration number, continuing to shift to the fitness evaluation step to re-execute the steps in the genetic algorithm, comparing the population optimization efficiency of each offspring with the minimum threshold, if the population optimization efficiency is smaller than the threshold, replacing the threshold with the current population optimization efficiency, and setting until the population optimization efficiencies of 4 successive generations of offspring are lower than the threshold, and terminating the genetic GA algorithm.

As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the present invention, further, in the ant colony algorithm, the amount of the ants releasing the pheromone is determined according to the length of the execution time required by the task in the processor; recording the path that the ant has traveled at the current moment in a tabu table according to the selected processor; and obtaining the optimal distribution of the tasks to the processor and the shortest time for executing the tasks by iteratively outputting the ant optimal path.

The algorithm stage two implementation shown in fig. 3 can be designed as follows:

and converting the approximate optimal solution set { chromosome [ i ] ═ j } obtained in the first stage into the initial information distribution of the ant colony information.

Step 1, initializing ant colony parameters, wherein the number m of the ant colonies, the weights alpha, beta and rho are pheromone volatilization factors and the pheromone intensity tau_ijAdjustment factor μ, maximum number of iterations T. And assume that: tau is the pheromone matrix, eta is the heuristic function, k is the kth ant, l is the processor that ant k can select next.

And Step2, placing m ants on n nodes (m > n), wherein the shorter the execution time required by the task i in the processor j is, the more pheromones are released by the ants, the paths (namely the selected processors) which are traveled by the ants at the current moment are recorded in the taboo table, the next node is selected according to the probability selection formula (2), the next node is stored in the taboo table, and the traveling paths of the ants are recorded.

Step 3: and after the m ants traverse the n nodes, finishing one round of search by all the ants, updating the pheromone according to the formula (3) until iteration is terminated, and outputting an optimal path.

Therefore, the optimal distribution of the tasks to the processors and the shortest time for executing the tasks can be obtained.

The GA algorithm has a high convergence speed in the early stage of search, but the early stage search is not timely adjusted according to feedback information, so that a large amount of redundant information generated in the later stage occupies resources, and the rate is remarkably reduced after a certain number of iterations; the ACO algorithm has the advantages that the early pheromone is few and the distribution is limited, the speed is slow due to random search, and the convergence speed is accelerated by heuristic search after the later pheromone is accumulated to a certain level and is distributed comprehensively. As can be seen from the analysis of the rate-iteration times shown in fig. 7, the QGA-ACO algorithm proposed in the embodiment of the present invention first adopts the Q-learning algorithm to increase the initial search rate of the GA algorithm, performs optimization through modes such as crossover and mutation, and runs the ACO algorithm after iteration to generation, and finds an optimal solution of the problem by using the parallel search capability and the positive feedback of the ant colony algorithm, so that a better rate can be maintained in the entire search process, and the scheduling length can be shortened.

Further, based on the foregoing method, an embodiment of the present invention further provides a Q-learning based heterogeneous platform task scheduling system, including: comprises the following steps: an initial mapping module, a fitness evaluation module and an optimal output module, wherein,

Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.

Based on the above method or system, an embodiment of the present invention further provides a computer device, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the system or perform the method described above.

Based on the system, the embodiment of the invention further provides a computer readable medium, on which a computer program is stored, wherein the program, when executed by a processor, performs the method.

The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the system embodiment, and for the sake of brief description, reference may be made to the corresponding content in the system embodiment for the part where the device embodiment is not mentioned.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing system embodiments, and are not described herein again.

In all examples shown and described herein, any particular value should be construed as merely exemplary, and not as a limitation, and thus other examples of example embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the system according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A heterogeneous platform task scheduling method based on Q learning is characterized by comprising the following contents:

2. The Q-learning-based heterogeneous platform task scheduling method according to claim 1, wherein a system application model is represented as G ═ V, E, C, L }, a target system model is represented as P ═ N, H, W, T }, V is a task set, E is a set of dependent directed edges, C is a set of task computation quantities, L is inter-subtask traffic, N is a set of processors, H is a processor characteristic, W is computation overhead, and T is communication overhead of tasks among processors, and a state space and an action space of Q learning are obtained according to the task set and the processor set in both the system application model and the target system model.

3. The Q learning-based heterogeneous platform task scheduling method according to claim 1 or 2, wherein a Q learning agent executes an action in a current state according to an epsilon-greedy behavior strategy to obtain a Q value of a task mapped to a processor, cannot obtain an immediate reward, and transfers to a new state; and aiming at the minimum Q value after the execution of each action is finished, selecting the action with the minimum Q value in a new state for execution, storing, attenuating the learning rate, endowing the next state with the current state, selecting the action in the next state according to the epsilon-greedy behavior strategy, and performing iteration to obtain a task initial mapping scheme according to the storage condition.

4. The Q-learning based heterogeneous platform task scheduling method according to claim 1, wherein in the creation of the genetic algorithm model, a task initial mapping scheme is encoded and tasks are mapped onto corresponding processors; evaluating the fitness according to a fitness evaluation function, copying individuals meeting the fitness value and directly entering next generation of population, and performing cross variation on the original reserved individuals; and comparing the population optimization efficiency of each generation with a minimum threshold value by utilizing the iterative process of the genetic algorithm to determine the current population optimization efficiency until the population optimization efficiencies of the generations which are continuously set are all smaller than the minimum threshold value, terminating iteration and acquiring an approximate optimal solution set of the genetic algorithm model about the mapping of the tasks and the processor.

5. The Q-learning based heterogeneous platform task scheduling method according to claim 4, wherein the fitness evaluation function is expressed as:

wherein, Q (s, a) represents the execution time of the task s mapped on the processor a, and t represents the iteration algebra.

6. The Q-learning based task scheduling method for the heterogeneous platform according to claim 4, wherein in each iteration, a minimum threshold is set according to the optimization efficiency of a new population, and if the population optimization efficiency of each descendant is smaller than the minimum threshold, the size of the minimum threshold is replaced by the current population optimization efficiency.

7. The Q-learning based heterogeneous platform task scheduling method according to claim 1, wherein in the ant colony algorithm, the amount of released pheromones of ants is determined according to the execution time of the tasks in the processor; recording the path that the ant has traveled at the current moment in a tabu table according to the selected processor; and obtaining the optimal distribution of the tasks to the processor and the shortest time for executing the tasks by iteratively outputting the ant optimal path.

8. A heterogeneous platform task scheduling system based on Q learning, comprising: an initial mapping module, a fitness evaluation module and an optimal output module, wherein,

9. A computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, performs the method of any of claims 1 to 7.

10. A computer device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to perform the method of any one of claims 1 to 7.