CN112256422B

CN112256422B - Heterogeneous platform task scheduling method and system based on Q learning

Info

Publication number: CN112256422B
Application number: CN202011284585.6A
Authority: CN
Inventors: 高博; 李娜; 谢宗甫; 岳春生; 张锋印; 董春宵; 马金全; 余果; 郭璐
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2023-08-04
Anticipated expiration: 2040-11-17
Also published as: CN112256422A

Abstract

The invention belongs to the technical field of heterogeneous multiprocessor computing, and particularly relates to a heterogeneous platform task scheduling method and system based on Q learning, wherein all tasks are used as a state space of the Q learning, a processor set is used as an action space, tasks waiting to be distributed are used as current states, and an initial mapping scheme of the tasks is obtained according to the execution time required by mapping the tasks to the action space in the Q learning; creating a genetic algorithm model, performing fitness evaluation on a task initial mapping scheme, setting individuals in the genetic algorithm model, which are copied to the next generation population, performing cross variation on reserved individuals, and determining new population optimization efficiency and a minimum threshold value; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model; and converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution so as to better improve the performance of the heterogeneous platform.

Description

Heterogeneous platform task scheduling method and system based on Q learning

Technical Field

The invention belongs to the technical field of heterogeneous multiprocessor computing, and particularly relates to a heterogeneous platform task scheduling method and system based on Q learning.

Background

With the continuous increase of the high-performance computing requirements of various signal processing tasks and the rapid development of hardware accelerators, general processors have failed to meet the requirements of strong real-time and large-scale computing, and heterogeneous computing systems are increasingly used for solving the complex task processing problem. The heterogeneous system architecture comprises a series of processors with very different structures, such as CPU, GPU, FPGA, DSP, and the processors are connected through special networks or interfaces to meet the requirements of different types of computing tasks on hardware performance so as to improve the resource utilization rate and the computing efficiency. To meet the demands of increasingly complex signal processing tasks, the efficiency and reliability of heterogeneous multiprocessors is critical. Whether a heterogeneous computing system can take advantage of its high performance depends on the following aspects: hardware resource platform architecture, matching degree between tasks and processors, and task scheduling strategy. Scheduling is essentially a multi-objective, NP-hard problem, while the dynamic, heterogeneous nature of heterogeneous computing systems adds some difficulty to task planning. But for a given heterogeneous system, an efficient scheduling strategy is critical to improving the strong real-time and high throughput performance of the platform.

Disclosure of Invention

Aiming at the problems of low flexibility, low convergence speed, poor predictability and the like of the conventional scheduling algorithm, the invention provides a heterogeneous platform task scheduling method and system based on Q learning, which can timely adjust the network searching direction and simultaneously give consideration to local and global searching to obtain a better result, so that each processor of a heterogeneous platform can exert maximum efficiency, the parallel processing of tasks is facilitated, and the performance of the heterogeneous platform is improved.

According to the design scheme provided by the invention, the heterogeneous platform task scheduling method based on Q learning comprises the following steps:

taking all tasks as a state space of Q learning, taking a processor set as an action space, taking the tasks to be allocated as current states, and acquiring a task initial mapping scheme according to execution time required by mapping the tasks to the action space in Q learning;

creating a genetic algorithm model, performing fitness evaluation on a task initial mapping scheme, setting individuals in the genetic algorithm model, which are copied to the next generation population, performing cross variation on reserved individuals, and determining new population optimization efficiency and a minimum threshold value; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model;

and converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.

As the heterogeneous platform task scheduling method based on Q learning, further, a system application model is expressed as G= { V, E, C and L }, a target system model is expressed as P= { N, H, W and T }, V is a task set, E is a directed edge set with a dependency relationship, C is a task calculation amount set, L is inter-subtask communication traffic, N is a processor set, H is a processor characteristic, W is calculation cost, T is communication cost of tasks among the processors, and a state space and an action space of Q learning are obtained according to the task set and the processor set in both the system application model and the target system model.

As the heterogeneous platform task scheduling method based on Q learning, the Q learning agent further executes actions in the current state according to epsilon-greedy behavior strategies to obtain the Q value of the task mapped to the processor, so that immediate rewards cannot be obtained, and the task is transferred to a new state; and selecting the action with the minimum Q value from the new state to execute by taking the minimum Q value of each action execution as a target, storing, attenuating the learning rate, giving the next state to the current state, and selecting the action in the next state according to the epsilon-greedy behavior strategy to execute iteratively so as to obtain the task initial mapping scheme according to the storage condition.

As the heterogeneous platform task scheduling method based on Q learning, the invention further creates a genetic algorithm model, encodes the initial mapping scheme of the task and maps the task to a corresponding processor; carrying out fitness evaluation according to a fitness evaluation function, copying individuals meeting fitness values to directly enter a next generation population, and carrying out cross mutation on the original reserved individuals; and comparing the population optimization efficiency of each offspring with a minimum threshold value by utilizing an iteration process of the genetic algorithm to determine the current population optimization efficiency until the offspring population optimization efficiency of the successive set generations is smaller than the minimum threshold value, terminating the iteration, and obtaining an approximate optimal solution set of the genetic algorithm model with respect to the task and processor mapping.

As the heterogeneous platform task scheduling method based on Q learning, the fitness evaluation function is further expressed as:t=1, 2, …, where Q (s, a) represents the execution time of task s mapping at processor a and t represents the iteration algebra.

As the heterogeneous platform task scheduling method based on Q learning, further, in each iteration, a minimum threshold is set according to the optimization efficiency of the new population, and if the population optimization efficiency of each offspring is smaller than the minimum threshold, the size of the minimum threshold is replaced by the current population optimization efficiency.

As the heterogeneous platform task scheduling method based on Q learning, further, in the ant colony algorithm, determining the amount of the ant released pheromone according to the execution time of the task in the processor; recording the current walked path of the ants in the tabu list according to the selected processor; and obtaining the optimal allocation of the task to the processor and the shortest task execution time through iterative output of the ant optimal path.

Further, based on the method, the invention also provides a heterogeneous platform task scheduling system based on Q learning, which comprises the following steps: comprising: an initial mapping module, an adaptability evaluation module and an optimal output module, wherein,

the initial mapping module is used for taking all tasks as a state space of Q learning, a processor set as an action space, and the tasks waiting to be allocated as current states, and acquiring a task initial mapping scheme according to the execution time required by mapping the tasks to the action space in the Q learning;

the fitness evaluation module is used for creating a genetic algorithm model, performing fitness evaluation on the task initial mapping scheme, setting individuals in the genetic algorithm model, copying the individuals to the next generation population according to the fitness, performing cross mutation on reserved individuals, and determining new population optimization efficiency and a minimum threshold; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model;

and the optimal output module is used for converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.

Further, the present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, performs the method described above

Further, the present invention also provides a computer device comprising a processor and a memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to perform the method described above.

The invention has the beneficial effects that:

according to the task set and the processor set, the mapping of the task set and the processor set is used for generating an initial solution set serving as initial information of a genetic algorithm GA through Q learning, and the performance of the whole scheme is improved through accelerating the searching rate of the GA; the heuristic mutation operation in the GA algorithm is utilized to improve the local searching capability of the ant colony algorithm, maintain the diversity of the population, accelerate the convergence speed to the optimal solution, maintain a better speed in the whole searching process, shorten the scheduling length, enable each processor of the heterogeneous platform to exert the maximum efficiency, improve the utilization rate of hardware resources, facilitate the parallel processing of tasks, improve the performance of the heterogeneous platform and have better application prospect.

Description of the drawings:

FIG. 1 is a flow chart of a heterogeneous platform task scheduling method in an embodiment;

FIG. 2 is a schematic diagram of a comparison of scheduling length and average resource utilization of an existing algorithm in an embodiment;

FIG. 3 is a flowchart of a heterogeneous platform task scheduling QGA-ACO algorithm in an embodiment;

FIG. 4 is a task and target system model illustration in an embodiment;

FIG. 5 is a schematic illustration of a crossover operation in an embodiment;

FIG. 6 is a schematic diagram of the mutation operation in the example;

fig. 7 is a graph illustrating algorithm rate versus iteration number in an embodiment.

The specific embodiment is as follows:

the present invention will be described in further detail with reference to the drawings and the technical scheme, in order to make the objects, technical schemes and advantages of the present invention more apparent.

Aiming at the problems of low flexibility, low convergence speed, poor predictability and the like of a task scheduling algorithm of the existing heterogeneous multiprocessor computing platform, the embodiment of the invention, as shown in fig. 1, provides a heterogeneous platform task scheduling method based on Q learning, as shown in fig. 1, which comprises the following steps:

s101, taking all tasks as a state space of Q learning, taking a processor set as an action space, taking tasks to be allocated as current states, and acquiring a task initial mapping scheme according to execution time required by mapping the tasks to the action space in the Q learning;

s102, creating a genetic algorithm model, performing fitness evaluation on a task initial mapping scheme, setting individuals in the genetic algorithm model, copying the individuals to a next generation population according to the fitness, performing cross mutation on reserved individuals, and determining new population optimization efficiency and a minimum threshold; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model;

s103, converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.

Heterogeneous multiprocessor computing systems have dynamics and heterogeneity, i.e., various resources and tasks have the property of randomly participating and exiting, with processor architecture and performance differing from each other. The high performance characteristics exhibited by heterogeneous multiprocessor computing systems make heterogeneous multiprocessor computing systems one of the directions of development for high performance computing, but the problems faced by resource management and task scheduling are also more complex. How to exert high computing efficiency of the heterogeneous system, a task scheduling strategy meeting the requirements is critical. The traditional scheduling algorithm has higher algorithm complexity or time complexity when solving complex problems, so that heuristic methods such as genetic algorithm, ant colony algorithm and the like are widely applied. However, most of the initial information of the algorithm is randomly constructed, and the quality of the initial search information is difficult to ensure, so that the convergence performance of the algorithm is affected to a certain extent. In the embodiment of the scheme, the Q-Learning is utilized to acquire feedback information through interaction with the environment, so that the network searching direction can be adjusted in time, and meanwhile, local and global searching is considered to acquire a better result. The result is used as the initial information of GA to make up the defects of low individual quality of the initial population of genetic algorithm and incapability of utilizing network feedback information, and the iterative time is shortened by combining ACO at the later stage, so that the searching is more efficient, and the application task can be optimally distributed to the processor. The single task scheduling algorithm can not completely meet the scheduling requirement of the platform due to the defects of the single task scheduling algorithm, and the advantages and disadvantages of the single task scheduling algorithm are complemented by the combination of different algorithms so as to meet the requirements of different types of computing tasks on hardware performance, thereby improving the resource utilization rate and the computing efficiency.

MLSH, ACO, GAACO is respectively representative of a traditional scheduling algorithm, a heuristic method and a combined optimization algorithm, and is respectively a comparison illustration of MLSH, ACO, GAACO algorithm in terms of scheduling length and average resource utilization rate as shown in (a) and (b) in fig. 2. For small-scale tasks, the scheduling lengths of the three algorithms are close, but as the number of tasks increases, the difference between the scheduling lengths of the ACO and GAACO algorithms can be seen to be almost equal and lower than MLSH, because the algorithm complexity of the traditional scheduling algorithm is higher than that of the heuristic algorithm, and the algorithm increases exponentially with the increase of the number of tasks. In terms of resource utilization, the average resource utilization of GAACO is significantly higher than ACO and MLSH. Because the GAACO algorithm combines the advantages of the universality, the expandability, the global convergence, the parallelism of the ant colony algorithm and the high solving precision of the genetic algorithm, the complementary advantages and disadvantages are realized, and the execution efficiency and the solving precision are improved. Therefore, the best scheduling performance is embodied by the GAACO in terms of both aspects, but the GAACO is not significantly improved in terms of scheduling length.

Referring to FIG. 3, the QGA-ACO algorithm implementing the present embodiment can be designed as follows: in the first stage, the initial allocation of tasks to resources is realized through Q learning, a better scheme and Q values are obtained, a GA model is established on the basis, parameters and constraint conditions are determined, the diversity of the population is ensured by intersecting, mutating and copying the initial solution, and the convergence speed is improved to further optimize the result. And in the second stage, adopting an ACO algorithm, if the evolution rate of four successive generations in the first stage is smaller than a threshold value, entering the second stage, converting the result of the first stage into an initial pheromone value of ACO, avoiding blind searching of ant colony, and carrying out iterative searching according to the pheromone distribution until a termination condition is met.

As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the invention, further, a system application model is expressed as G= { V, E, C and L }, a target system model is expressed as P= { N, H, W and T }, V is a task set, E is a directed edge set with a dependency relationship, C is a task calculation amount set, L is inter-subtask traffic, N is a processor set, H is a processor feature, W is calculation cost, T is communication cost of tasks among processors, and a state space and an action space of Q learning are obtained according to the task set and the processor set in both the system application model and the target system model.

Referring to fig. 4, the system application model is expressed as g= { V, E, C, L }, v= { V ₁ ,v ₂ ,...,v _n And (c) represents a set of signal processing tasks to be processed, e= { E ₁₂ ,e ₂₃ ,...,e _ij The set of dependency directed edges, c= { C ₁ ,c ₂ ,...,c _n And represents a set of task computing overheads, L (v _i ,v _j ) Representing communication overhead between subtasks, if task v _i ，v _j Mapped onto the same node, the communication overhead is 0. The hardware resource can be represented by an undirected graph, and is abstracted to be P= { N, H, W, T }, N= (N) ₁ ,n ₂ …) is a processor set, H is a processor featureW is the execution rate of the processors, and T is the inter-processor communication rate.

As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the present invention, further, the Q learning agent performs an action in the current state according to the epsilon-greedy behavior policy, obtains the Q value of the task mapped to the processor, cannot obtain immediate rewards, and transitions to a new state; and selecting the action with the minimum Q value from the new state to execute by taking the minimum Q value of each action execution as a target, storing, attenuating the learning rate, giving the next state to the current state, and selecting the action in the next state according to the epsilon-greedy behavior strategy to execute iteratively so as to obtain the task initial mapping scheme according to the storage condition.

The contents of algorithm stage one shown in fig. 3 can be designed as follows:

the task set V may be configured as a state space S and the processor set N as an action space a. Task v to be assigned _i As the current state s, the current state performs action n _i A is the number.

Step 1, initializing a state action space of Q-learning, wherein Q (S, A) =C is execution time required for mapping task S to A, a search factor is epsilon, a discount factor is gamma, and a learning rate alpha.

Step2, the agent executes action a according to a certain probability in the current state s according to the epsilon-greedy behavior strategy, the task can be mapped to the Q value of the processor, an immediate reward r is obtained (the larger the Q value is, the longer the execution time of the task is, the smaller the obtained immediate reward is), and the task is transferred to a new state s'; the epsilon-greedy strategy is adopted to balance the search of the Agent on the state space and the utilization of the obtained information, so that the greedy utilization of the taste is prevented from being trapped into local optimum, and the performance of an algorithm is prevented from being influenced by excessive search;

step 3, selecting an action a 'with the minimum Q value according to the formula (1) to execute in an s' state with the minimum Q value of each action execution, calculating the Q value of the Agent at (s, a), and storing the Q value in a Q table;

Q(s _t ,a _t )←Q(s _t ,a _t )+α[r _t+1 +γminQ(s _t+1 ,a′)-Q(s _t ,a _t )](1)

step 4, attenuating the learning rate, giving the next state to the current state s 'and selecting an action a' in the next state s 'according to the behavior strategy epsilon-greedy, and giving the action a' to the execution action a 'and's;

step 5, judging whether the iteration termination condition is met or not according to the current Step number +1, and if not, entering Step2 to continue execution; if the iteration is completed, the algorithm is ended, and a task initial mapping scheme is obtained according to the Q table.

As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the present invention, further, in creating a genetic algorithm model, encoding a task initial mapping scheme, and mapping the task to a corresponding processor; carrying out fitness evaluation according to a fitness evaluation function, copying individuals meeting fitness values to directly enter a next generation population, and carrying out cross mutation on the original reserved individuals; and comparing the population optimization efficiency of each offspring with a minimum threshold value by utilizing an iteration process of the genetic algorithm to determine the current population optimization efficiency until the offspring population optimization efficiency of the successive set generations is smaller than the minimum threshold value, terminating the iteration, and obtaining an approximate optimal solution set of the genetic algorithm model with respect to the task and processor mapping. Further, in each iteration, a minimum threshold is set according to the optimization efficiency of the new population, and if the population optimization efficiency of each offspring is smaller than the minimum threshold, the size of the minimum threshold is replaced by the current population optimization efficiency.

Coding (coding rule: chromosomep) for initial population (mapping scheme)]=q, i.e. task p mapped onto processor q) and fitness evaluation, individuals with high fitness values are replicated directly into the next generation population, and the original retained individuals are crossed and mutated. The fitness evaluation function may be expressed as:t=1, 2, …, where Q (s, a) represents the execution time of task s mapping at processor a and t represents the iteration algebra. Crossover operator, randomly selecting one crossover point on chromosome, dividing parent chromosome and parent chromosome into two parts, and new chromosome gene composition as shown in figure 5The left segment of chromosome gene at the crossing point comes from one of the corresponding parent chromosomes, and the right segment of chromosome gene is copied from the other genes corresponding to the other chromosomes, so that two new chromosomes are obtained. Heuristic mutation operation, as shown in FIG. 6, randomly selects the gene locus i of one chromosome, searches the last successor succ (i) of i from the locus, randomly selects one gene locus j E (i, succ (i)), and replaces the loci of i and j to form a new chromosome. And collecting the optimized efficiency of the new population obtained by the genetic algorithm, and setting the optimized efficiency as a minimum threshold. And collecting the optimized efficiency of the new population obtained by the genetic algorithm, and setting the optimized efficiency as a minimum threshold. The iteration times are increased by 1, the steps in the genetic algorithm are executed again in the fitness evaluation step, the efficiency of population optimization of each offspring is compared with a minimum threshold value, if the efficiency is smaller than the threshold value, the threshold value is replaced by the current population optimization efficiency, and the genetic GA algorithm is terminated when the efficiency of population optimization of offspring of 4 successive generations is lower than the threshold value.

As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the invention, further, in the ant colony algorithm, the amount of the ant released pheromone is determined according to the execution time of the task in the processor; recording the current walked path of the ants in the tabu list according to the selected processor; and obtaining the optimal allocation of the task to the processor and the shortest task execution time through iterative output of the ant optimal path.

The algorithm stage two implementation as shown in fig. 3 can be designed as follows:

and converting the approximate optimal solution set { chromoname [ i ] =j } obtained in the stage one into the initial information distribution of the ant colony information.

Step 1, initializing ant colony parameters, wherein the ant colony number m, the weight values alpha, beta and rho are pheromone volatilization factors, and the pheromone strength tau _ij The factor mu, the maximum number of iterations T. And assuming that: τ is an pheromone matrix, η is a heuristic function, k is the kth ant, and l is a processor which can be selected by the ant k in the next step.

Step2, placing m ants on n nodes (m is greater than n), wherein the shorter the execution time of the task i in the processor j is, the more pheromones are released by the ants, the paths (namely the selected processors) which the ants have walked at the current moment are recorded in the tabu list, the next node is selected according to the probability selection formula (2), and the next node is stored in the tabu list, so that the walking paths of the ants are recorded.

Step 3: after m ants traverse n nodes, all ants complete one round of search, update pheromone according to formula (3) until iteration is terminated, and output an optimal path.

Thus, the optimal allocation of tasks to the processor and the shortest task execution time can be obtained.

The GA algorithm has higher convergence speed in the early search stage, but the early search is not timely adjusted according to feedback information, so that a large amount of redundant information is generated in the later stage to occupy resources, and the rate is obviously reduced after a certain number of iterations; the ACO algorithm has the advantages that the early pheromone is fewer and the distribution is limited, the speed is low due to the fact that the early pheromone can only be searched randomly, and the convergence speed is increased through heuristic search after the later pheromone is accumulated to a certain level and distributed comprehensively. According to the analysis of the rate-iteration number shown in fig. 7, the QGA-ACO algorithm provided in the embodiment of the present invention firstly adopts the Q-learning algorithm to increase the initial search rate of the GA algorithm, performs optimization through the modes of crossover, mutation, and the like, and runs the aCO algorithm after iteration until generation, and utilizes the parallel search capability and positive feedback of the ant colony algorithm to search for the optimal solution of the problem, so that the overall search process can maintain a better rate, and the scheduling length is shortened.

Further, based on the above method, the embodiment of the present invention further provides a heterogeneous platform task scheduling system based on Q learning, including: comprising: an initial mapping module, an adaptability evaluation module and an optimal output module, wherein,

The relative steps, numerical expressions and numerical values of the components and steps set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.

Based on the above method or system, the embodiment of the present invention further provides a computer device, including: one or more processors; and a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the system or perform the method described above.

Based on the above system, the embodiment of the present invention further provides a computer readable medium, on which a computer program is stored, wherein the program, when executed by a processor, performs the above method.

The device provided by the embodiment of the present invention has the same implementation principle and technical effects as those of the embodiment of the system, and for the sake of brevity, reference may be made to the corresponding content of the embodiment of the system.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing system embodiments, which are not described herein again.

Any particular values in all examples shown and described herein are to be construed as merely illustrative and not a limitation, and thus other examples of exemplary embodiments may have different values.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, systems and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to execute all or part of the steps of the system according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A heterogeneous platform task scheduling method based on Q learning is characterized by comprising the following steps:

converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution;

the system application model is expressed as G= { V, E, C, L }, the target system model is expressed as P= { N, H, W, T }, V is a task set, E is a directed edge set with a dependency relationship, C is a task calculation amount set, L is inter-subtask communication amount, N is a processor set, H is a processor feature, W is calculation cost, T is communication cost of tasks among processors, a state space and an action space of Q learning are obtained according to the task set and the processor set in both the system application model and the target system model, and an epsilon-greedy strategy is adopted to balance the searching of the state space and the utilization of obtained information by an agent;

the task set V is formed into a state space S, the processor set N is used as an action space A, and the tasks V waiting to be allocated are formed _i As the current state s, the current state performs action n _i A is a;

the Q learning agent executes action in the current state according to the epsilon-greedy behavior strategy, obtains the Q value of the task mapped to the processor, obtains immediate rewards, and transfers to a new state; selecting an action with the minimum Q value from which each action is executed as a target, executing the action with the minimum Q value in a new state, storing, attenuating the learning rate, giving the next state to the current state, and selecting the action in the next state according to the epsilon-greedy behavior strategy for iterative execution so as to obtain a task initial mapping scheme according to the storage condition;

creating a genetic algorithm model, encoding a task initial mapping scheme, and mapping the task to a corresponding processor; carrying out fitness evaluation according to a fitness evaluation function, copying individuals meeting fitness values to directly enter a next generation population, and carrying out cross mutation on the original reserved individuals; comparing the population optimization efficiency of each offspring with a minimum threshold value by utilizing an iteration process of the genetic algorithm to determine the current population optimization efficiency until the offspring population optimization efficiency of the successive set generations is smaller than the minimum threshold value, terminating the iteration, and obtaining an approximate optimal solution set of the genetic algorithm model with respect to the task and processor mapping;

degree of fitnessThe evaluation function is expressed as:where Q (s, a) represents the execution time of task s mapped on processor a and t represents the iteration algebra.

2. The heterogeneous platform task scheduling method based on Q learning according to claim 1, wherein in each iteration, a minimum threshold is set according to the optimization efficiency of the new population, and if the population optimization efficiency of each offspring is smaller than the minimum threshold, the magnitude of the minimum threshold is replaced by the current population optimization efficiency.

3. The heterogeneous platform task scheduling method based on Q learning according to claim 1, wherein in the ant colony algorithm, the amount of the ant released pheromone is determined according to the execution time of the task in the processor; recording the current walked path of the ants in the tabu list according to the selected processor; and obtaining the optimal allocation of the task to the processor and the shortest task execution time through iterative output of the ant optimal path.

4. A heterogeneous platform task scheduling system based on Q learning, implemented based on the method of claim 1, comprising: an initial mapping module, an adaptability evaluation module and an optimal output module, wherein,

5. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor performs the method of any of claims 1-3.

6. A computer device comprising a processor and a memory storing machine executable instructions executable by the processor to perform the method of any one of claims 1 to 3.