CN112256422A - Heterogeneous platform task scheduling method and system based on Q learning - Google Patents

Heterogeneous platform task scheduling method and system based on Q learning Download PDF

Info

Publication number
CN112256422A
CN112256422A CN202011284585.6A CN202011284585A CN112256422A CN 112256422 A CN112256422 A CN 112256422A CN 202011284585 A CN202011284585 A CN 202011284585A CN 112256422 A CN112256422 A CN 112256422A
Authority
CN
China
Prior art keywords
task
learning
processor
tasks
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011284585.6A
Other languages
Chinese (zh)
Other versions
CN112256422B (en
Inventor
高博
李娜
谢宗甫
岳春生
张锋印
董春宵
马金全
余果
郭璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN202011284585.6A priority Critical patent/CN112256422B/en
Publication of CN112256422A publication Critical patent/CN112256422A/en
Application granted granted Critical
Publication of CN112256422B publication Critical patent/CN112256422B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physiology (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of heterogeneous multiprocessor computing, and particularly relates to a heterogeneous platform task scheduling method and system based on Q learning, wherein all tasks are used as a state space of Q learning, a processor is integrated as an action space, a task waiting for distribution is used as a current state, and a task initial mapping scheme is obtained according to the execution time required by the task in Q learning to be mapped to the action space; establishing a genetic algorithm model, carrying out fitness evaluation on a task initial mapping scheme, setting individuals copied to a next generation population in the genetic algorithm model according to the fitness, carrying out cross variation on reserved individuals, and determining the optimization efficiency and the minimum threshold of a new population; obtaining an approximate optimal solution of mapping from tasks in the model to the processor according to the genetic algorithm model; and converting the model approximate optimal solution into ant colony information initial information distribution, and iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution to obtain a task scheduling optimal scheme so as to better improve the performance of the heterogeneous platform.

Description

Heterogeneous platform task scheduling method and system based on Q learning
Technical Field
The invention belongs to the technical field of heterogeneous multiprocessor computing, and particularly relates to a heterogeneous platform task scheduling method and system based on Q learning.
Background
With the continuous improvement of various signal processing tasks on high-performance computing requirements and the rapid development of hardware accelerators, a general processor cannot meet the requirements of strong real-time and large-scale computing, and heterogeneous computing systems are increasingly used for solving the problem of complex task processing. The heterogeneous system architecture comprises a series of processors with greatly different structures, such as a CPU, a GPU, an FPGA, a DSP and the like, and all the processors are connected through a special network or an interface to meet the requirements of different types of computing tasks on hardware performance so as to improve the resource utilization rate and the computing efficiency. The efficiency and reliability of heterogeneous multiprocessors is critical in order to meet the demands of increasingly complex signal processing tasks. Whether a heterogeneous computing system can deliver its high performance benefits depends on the following: hardware resource platform architecture, matching degree between tasks and processors, and task scheduling strategy. Scheduling is inherently a multi-target, NP-hard problem, and the dynamic, heterogeneous characteristics of heterogeneous computing systems add some difficulty to mission planning. But for a given heterogeneous system, an efficient scheduling strategy is key to improve the strong real-time and high-throughput performance of the platform.
Disclosure of Invention
Aiming at the problems of low flexibility, low convergence speed, poor predictability and the like of the existing scheduling algorithm, the invention provides a method and a system for scheduling tasks of a heterogeneous platform based on Q learning, which can adjust the network searching direction in time and give consideration to local and global searching to obtain better results, so that each processor of the heterogeneous platform can exert the maximum efficiency, the parallel processing of the tasks is facilitated, and the performance of the heterogeneous platform is improved.
According to the design scheme provided by the invention, the heterogeneous platform task scheduling method based on Q learning comprises the following contents:
taking all tasks as a state space of Q learning, integrating a processor into an action space, waiting for the distributed tasks as a current state, and acquiring a task initial mapping scheme according to the execution time required by the task mapped to the action space in the Q learning;
establishing a genetic algorithm model, carrying out fitness evaluation on a task initial mapping scheme, setting individuals copied to a next generation population in the genetic algorithm model according to the fitness, carrying out cross variation on reserved individuals, and determining the optimization efficiency and the minimum threshold of a new population; obtaining an approximate optimal solution of mapping from tasks in the model to the processor according to the genetic algorithm model;
and converting the model approximate optimal solution into ant colony information initial information distribution, and iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution to obtain a task scheduling optimal scheme.
As the Q-learning-based heterogeneous platform task scheduling method, further, a system application model is represented as G ═ V, E, C, L }, a target system model is represented as P ═ N, H, W, T }, V is a task set, E is a dependent-relationship directed edge set, C is a task computation amount set, L is inter-subtask traffic, N is a processor set, H is a processor characteristic, W is computation overhead, and T is communication overhead of tasks between processors, and a state space and an action space of Q-learning are obtained according to the task set and the processor set in both the system application model and the target system model.
As the heterogeneous platform task scheduling method based on Q learning, further, a Q learning agent executes actions in the current state according to an epsilon-greedy behavior strategy to obtain a Q value of a task mapped to a processor, cannot obtain immediate rewards, and transfers to a new state; and aiming at the minimum Q value after the execution of each action is finished, selecting the action with the minimum Q value in a new state for execution, storing, attenuating the learning rate, endowing the next state with the current state, selecting the action in the next state according to the epsilon-greedy behavior strategy, and performing iteration to obtain a task initial mapping scheme according to the storage condition.
As the heterogeneous platform task scheduling method based on Q learning, the invention further comprises the steps of establishing a genetic algorithm model, coding a task initial mapping scheme and mapping the tasks to corresponding processors; evaluating the fitness according to a fitness evaluation function, copying individuals meeting the fitness value and directly entering next generation of population, and performing cross variation on the original reserved individuals; and comparing the population optimization efficiency of each generation with a minimum threshold value by utilizing the iterative process of the genetic algorithm to determine the current population optimization efficiency until the population optimization efficiencies of the generations which are continuously set are all smaller than the minimum threshold value, terminating iteration and acquiring an approximate optimal solution set of the genetic algorithm model about the mapping of the tasks and the processor.
As the Q learning-based heterogeneous platform task scheduling method, the fitness evaluation function is further expressed as:
Figure BDA0002781917960000021
t is 1, 2, …, where Q (s, a) denotes the execution time of the task s mapped on the processor a and t denotes the iteration algebra.
As the Q-learning-based heterogeneous platform task scheduling method, further, in each iteration, a minimum threshold is set according to the optimization efficiency of a new population, and if the population optimization efficiency of each descendant is smaller than the minimum threshold, the size of the minimum threshold is replaced by the current population optimization efficiency.
As the Q learning-based heterogeneous platform task scheduling method, the quantity of released pheromones of ants is further determined according to the required execution time of tasks in a processor in an ant colony algorithm; recording the path that the ant has traveled at the current moment in a tabu table according to the selected processor; and obtaining the optimal distribution of the tasks to the processor and the shortest time for executing the tasks by iteratively outputting the ant optimal path.
Further, based on the above method, the present invention further provides a Q-learning based heterogeneous platform task scheduling system, including: comprises the following steps: an initial mapping module, a fitness evaluation module and an optimal output module, wherein,
the initial mapping module is used for taking all tasks as a state space of Q learning, the processor is integrated as an action space, the tasks to be distributed are taken as a current state, and a task initial mapping scheme is obtained according to the execution time required by the task mapped to the action space in the Q learning;
the fitness evaluation module is used for creating a genetic algorithm model, evaluating the fitness of the task initial mapping scheme, setting individuals copied to the next generation of population in the genetic algorithm model according to the fitness, performing cross variation on the retained individuals and determining the optimization efficiency and the minimum threshold of a new population; obtaining an approximate optimal solution of mapping from tasks in the model to the processor according to the genetic algorithm model;
and the optimal output module is used for converting the model approximate optimal solution into ant colony information initial information distribution and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.
Further, the present invention also provides a computer readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, performs the method described above
Further, the present invention also provides a computer device comprising a processor and a memory, wherein the memory stores machine executable instructions capable of being executed by the processor, and the processor executes the machine executable instructions to execute the method.
The invention has the beneficial effects that:
according to the method, an initial solution set is generated by mapping a task set and a processor set through Q learning and is used as initial information of a genetic algorithm GA, and the performance of the whole scheme is improved by accelerating the search rate of the GA; and heuristic variation operation in the GA algorithm is utilized to improve the local search capability of the ant colony algorithm, maintain the diversity of the colony, accelerate the speed of convergence to the optimal solution, maintain better speed in the whole search process, shorten the scheduling length, enable each processor of the heterogeneous platform to exert the maximum efficiency, improve the utilization rate of hardware resources, facilitate the parallel processing of tasks, improve the performance of the heterogeneous platform and have better application prospect.
Description of the drawings:
FIG. 1 is a flowchart illustrating a task scheduling method for a heterogeneous platform in an embodiment;
FIG. 2 is a schematic diagram illustrating the comparison between the scheduling length and the average resource utilization rate of the existing algorithm in the embodiment;
FIG. 3 is a flow diagram of a QGA-ACO algorithm for task scheduling of a heterogeneous platform in the embodiment;
FIG. 4 is a schematic representation of a task and target system model in an embodiment;
FIG. 5 is a schematic diagram of the crossover operation in the example;
FIG. 6 is a schematic diagram of a variant operation in the example;
FIG. 7 is a graph illustrating algorithm rate versus iteration number in an example.
The specific implementation mode is as follows:
in order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.
Aiming at the problems of low flexibility, low convergence rate, poor predictability and the like of the task scheduling algorithm of the conventional heterogeneous multiprocessor computing platform, the embodiment of the invention, as shown in fig. 1, provides a heterogeneous platform task scheduling method based on Q learning, as shown in fig. 1, comprising the following contents:
s101, taking all tasks as a state space of Q learning, integrating a processor into an action space, waiting for the distributed tasks as a current state, and acquiring a task initial mapping scheme according to the execution time required by the task mapped to the action space in the Q learning;
s102, establishing a genetic algorithm model, carrying out fitness evaluation on a task initial mapping scheme, setting individuals copied to a next generation population in the genetic algorithm model according to the fitness, carrying out cross variation on reserved individuals, and determining new population optimization efficiency and a minimum threshold; obtaining an approximate optimal solution of mapping from tasks in the model to the processor according to the genetic algorithm model;
s103, converting the model approximate optimal solution into ant colony information initial information distribution, and iteratively searching and outputting an optimal path according to the information distribution and an ant colony algorithm to obtain a task scheduling optimal scheme.
The heterogeneous multiprocessor computing system has dynamic property and heterogeneous property, namely various resources and tasks have the characteristic of random participation and quitting, and the structure and the performance of the processors are different. The high performance characteristics make the heterogeneous multiprocessor computing system become one of the development directions of high performance computing, but the resource management and task scheduling problems are more complicated. How to exert the high computing efficiency of the heterogeneous system is the key to meet the required task scheduling strategy. The traditional scheduling algorithm has higher algorithm complexity or time complexity when solving complex problems, so that heuristic methods such as genetic algorithm, ant colony algorithm and the like are widely applied. However, the initial information of such algorithms is mostly randomly constructed, and the quality of the initial search information is difficult to guarantee, so that the convergence performance of the algorithms is affected to a certain extent. In the embodiment of the scheme, feedback information is obtained by utilizing the interaction of Q-Learning and the environment, the network searching direction can be adjusted in time, and meanwhile, local and global searching is considered to obtain a better result. The result is used as the initial information of the GA to make up the defects that the initial population of the genetic algorithm has low individual quality and can not utilize network feedback information, and the iteration time is shortened by combining the ACO at the later stage, so that the search is more efficient, and the application task can be optimally distributed to the processors. Due to the defects of the single task scheduling algorithm, the scheduling requirements of the platform cannot be completely met, and the advantages and disadvantages of different algorithms are combined to form good and bad complementation so as to meet the requirements of different types of computing tasks on hardware performance and improve the resource utilization rate and the computing efficiency.
The MLSH, ACO and GAACO respectively represent three algorithms of a traditional scheduling algorithm, a heuristic method and combination optimization, as shown in (a) and (b) in FIG. 2, and are respectively a comparison schematic of the MLSH, ACO and GAACO algorithms in the aspects of scheduling length and average resource utilization rate. For small-scale tasks, the scheduling lengths of the three algorithms are close, but with the increase of the number of tasks, it can be seen that the scheduling lengths of the ACO algorithm and the GAACO algorithm are almost different and are lower than MLSH (maximum likelihood ratio), because the algorithm complexity of the traditional scheduling algorithm is higher than that of the heuristic algorithm and increases exponentially along with the increase of the number of tasks. In terms of resource utilization, GAACO has an average resource utilization significantly higher than ACO and MLSH. The GAACO algorithm combines the advantages of universality, expandability and global convergence of the genetic algorithm and parallelism and high solving precision of the ant colony algorithm, realizes the complementation of advantages and disadvantages, and improves the execution efficiency and the solving precision. Therefore, in both aspects, the GAACO can achieve the best scheduling performance, but the GAACO does not achieve a significant improvement in the scheduling length.
Referring to fig. 3, the QGA-ACO algorithm implementing the embodiment of the present disclosure can be designed as follows: in the first stage, initial allocation of tasks to resources is realized through Q learning, a better scheme and a Q value are obtained, a GA model is established on the basis, parameters and constraint conditions are determined, the initial solution is crossed, mutated and copied to ensure the diversity of the population, and the convergence speed is improved to further optimize the result. And adopting an ACO algorithm in the second stage, if the continuous four-generation evolutionary rates in the first stage are all smaller than a threshold value, entering the second stage, converting the result in the first stage into an initial pheromone value of the ACO, avoiding blind search of an ant colony, and iteratively searching according to pheromone distribution until a termination condition is met.
As the Q-learning-based heterogeneous platform task scheduling method in the embodiment of the present invention, further, a system application model is represented as G ═ V, E, C, L }, a target system model is represented as P ═ N, H, W, T }, V is a task set, E is a dependency-oriented edge set, C is a task computation amount set, L is inter-subtask traffic, N is a processor set, H is a processor characteristic, W is computation overhead, and T is communication overhead of a task between processors, and a state space and an action space of Q-learning are obtained according to the task set and the processor set in both the system application model and the target system model.
Referring to fig. 4, the system application model is represented as G ═ { V, E, C, L }, and V ═ V { (V }1,v2,...,vnRepresents the set of signal processing tasks to be processed, E ═ E12,e23,...,eijIs a set of dependent directed edges, C ═ C1,c2,...,cnDenotes a set of task computation overheads, L (v)i,vj) Indicating the communication overhead between subtasks, if task vi,vjMapped to the same node, the communication overhead is 0. Hardware resources may be represented by an undirected graph, abstracted as P ═ N, H, W, T }, and N ═ N (N)1,n2…), H is a processor profile, W is the execution rate of the processor, and T represents the inter-processor communication rate.
As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the invention, further, a Q learning agent executes actions in the current state according to an epsilon-greedy behavior strategy to obtain a Q value of a task mapped to a processor, cannot obtain immediate reward, and transfers to a new state; and aiming at the minimum Q value after the execution of each action is finished, selecting the action with the minimum Q value in a new state for execution, storing, attenuating the learning rate, endowing the next state with the current state, selecting the action in the next state according to the epsilon-greedy behavior strategy, and performing iteration to obtain a task initial mapping scheme according to the storage condition.
The contents of the algorithm stage one shown in fig. 3 can be designed as follows:
the task set V may be formed into a state space S and the processor set N as an action space a. Task v to be waiting for allocationiAs the current state s, the execution action n of the current stateiIs a.
Step 1, initializing a state motion space of Q-learning, wherein Q (S, A) ═ C is the execution time required by mapping the task S to A, the search factor is epsilon, the discount factor is gamma, and the learning rate alpha.
Step2, the agent executes the action a according to a certain probability under the current state s according to an epsilon-greedy behavior strategy, can obtain the Q value of the task mapped to the processor, and obtains an immediate reward r (the larger the Q value is, the longer the execution time of the task is, the smaller the obtained immediate reward is), and then the agent transfers to a new state s'; an epsilon-greedy strategy is adopted to balance the search of the Agent on the state space and the utilization of the obtained information, so that the greedy utilization is prevented from falling into local optimization, and the performance of the algorithm is prevented from being influenced by excessive search;
step 3, aiming at the minimum Q value after each action execution, selecting the action a 'with the minimum Q value to execute according to the formula (1) in the s' state, calculating the Q value of the Agent at the (s, a) position, and storing the Q value into a Q table;
Q(st,at)←Q(st,at)+α[rt+1+γminQ(st+1,a′)-Q(st,at)](1)
step 4, attenuating the learning rate, endowing the next state to the current state s ← s ', selecting the action a' in the next state s 'according to the behavior strategy epsilon-greedy, and endowing the action a ← a' to be executed;
step 5, judging whether the current Step number is plus 1, and if not, entering Step2 to continue execution; and if the iteration is finished, finishing the algorithm and obtaining a task initial mapping scheme according to the Q table.
As the Q learning-based heterogeneous platform task scheduling method in the embodiment of the invention, further, in the establishment of a genetic algorithm model, a task initial mapping scheme is coded, and tasks are mapped to corresponding processors; evaluating the fitness according to a fitness evaluation function, copying individuals meeting the fitness value and directly entering next generation of population, and performing cross variation on the original reserved individuals; and comparing the population optimization efficiency of each generation with a minimum threshold value by utilizing the iterative process of the genetic algorithm to determine the current population optimization efficiency until the population optimization efficiencies of the generations which are continuously set are all smaller than the minimum threshold value, terminating iteration and acquiring an approximate optimal solution set of the genetic algorithm model about the mapping of the tasks and the processor. Further, in each iteration, a minimum threshold is set according to the optimization efficiency of the new population, and if the population optimization efficiency of each descendant is smaller than the minimum threshold, the size of the minimum threshold is replaced by the current population optimization efficiency.
Encoding (encoding rule: chromosome [ p ]) for the initial population (mapping scheme)]Q, namely mapping the task p to the processor q) and fitness evaluation, copying individuals with high fitness values directly into next generation populations, and performing crossover and mutation on original reserved individuals. The fitness evaluation function may be expressed as:
Figure BDA0002781917960000051
t is 1, 2, …, where Q (s, a) denotes the execution time of the task s mapped on the processor a and t denotes the iteration algebra. And a crossover operator for randomly selecting a crossover point on the chromosome, dividing the parent chromosome and the parent chromosome into two parts respectively, wherein the new chromosome gene composition is shown in figure 5, the left segment of the crossover point chromosome gene is from one side of the corresponding parent chromosome, and the right segment of the crossover point chromosome gene is copied from the other genes corresponding to the other chromosomes, thereby obtaining two new chromosomes. Heuristic mutation operation, as shown in fig. 6, randomly selects a locus i of a chromosome, searches the last subsequent succ (i) of i from the position, randomly selects a locus j e (i, succ (i)), and replaces the positions of i and j to form a new chromosome. And collecting the optimization efficiency of the new population obtained by the genetic algorithm, and determining the optimization efficiency as a minimum threshold value. And collecting the optimization efficiency of the new population obtained by the genetic algorithm, and determining the optimization efficiency as a minimum threshold value. And adding 1 to the iteration number, continuing to shift to the fitness evaluation step to re-execute the steps in the genetic algorithm, comparing the population optimization efficiency of each offspring with the minimum threshold, if the population optimization efficiency is smaller than the threshold, replacing the threshold with the current population optimization efficiency, and setting until the population optimization efficiencies of 4 successive generations of offspring are lower than the threshold, and terminating the genetic GA algorithm.
As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the present invention, further, in the ant colony algorithm, the amount of the ants releasing the pheromone is determined according to the length of the execution time required by the task in the processor; recording the path that the ant has traveled at the current moment in a tabu table according to the selected processor; and obtaining the optimal distribution of the tasks to the processor and the shortest time for executing the tasks by iteratively outputting the ant optimal path.
The algorithm stage two implementation shown in fig. 3 can be designed as follows:
and converting the approximate optimal solution set { chromosome [ i ] ═ j } obtained in the first stage into the initial information distribution of the ant colony information.
Step 1, initializing ant colony parameters, wherein the number m of the ant colonies, the weights alpha, beta and rho are pheromone volatilization factors and the pheromone intensity tauijAdjustment factor μ, maximum number of iterations T. And assume that: tau is the pheromone matrix, eta is the heuristic function, k is the kth ant, l is the processor that ant k can select next.
And Step2, placing m ants on n nodes (m > n), wherein the shorter the execution time required by the task i in the processor j is, the more pheromones are released by the ants, the paths (namely the selected processors) which are traveled by the ants at the current moment are recorded in the taboo table, the next node is selected according to the probability selection formula (2), the next node is stored in the taboo table, and the traveling paths of the ants are recorded.
Figure BDA0002781917960000061
Step 3: and after the m ants traverse the n nodes, finishing one round of search by all the ants, updating the pheromone according to the formula (3) until iteration is terminated, and outputting an optimal path.
Figure BDA0002781917960000062
Therefore, the optimal distribution of the tasks to the processors and the shortest time for executing the tasks can be obtained.
The GA algorithm has a high convergence speed in the early stage of search, but the early stage search is not timely adjusted according to feedback information, so that a large amount of redundant information generated in the later stage occupies resources, and the rate is remarkably reduced after a certain number of iterations; the ACO algorithm has the advantages that the early pheromone is few and the distribution is limited, the speed is slow due to random search, and the convergence speed is accelerated by heuristic search after the later pheromone is accumulated to a certain level and is distributed comprehensively. As can be seen from the analysis of the rate-iteration times shown in fig. 7, the QGA-ACO algorithm proposed in the embodiment of the present invention first adopts the Q-learning algorithm to increase the initial search rate of the GA algorithm, performs optimization through modes such as crossover and mutation, and runs the ACO algorithm after iteration to generation, and finds an optimal solution of the problem by using the parallel search capability and the positive feedback of the ant colony algorithm, so that a better rate can be maintained in the entire search process, and the scheduling length can be shortened.
Further, based on the foregoing method, an embodiment of the present invention further provides a Q-learning based heterogeneous platform task scheduling system, including: comprises the following steps: an initial mapping module, a fitness evaluation module and an optimal output module, wherein,
the initial mapping module is used for taking all tasks as a state space of Q learning, the processor is integrated as an action space, the tasks to be distributed are taken as a current state, and a task initial mapping scheme is obtained according to the execution time required by the task mapped to the action space in the Q learning;
the fitness evaluation module is used for creating a genetic algorithm model, evaluating the fitness of the task initial mapping scheme, setting individuals copied to the next generation of population in the genetic algorithm model according to the fitness, performing cross variation on the retained individuals and determining the optimization efficiency and the minimum threshold of a new population; obtaining an approximate optimal solution of mapping from tasks in the model to the processor according to the genetic algorithm model;
and the optimal output module is used for converting the model approximate optimal solution into ant colony information initial information distribution and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.
Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.
Based on the above method or system, an embodiment of the present invention further provides a computer device, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the system or perform the method described above.
Based on the system, the embodiment of the invention further provides a computer readable medium, on which a computer program is stored, wherein the program, when executed by a processor, performs the method.
The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the system embodiment, and for the sake of brief description, reference may be made to the corresponding content in the system embodiment for the part where the device embodiment is not mentioned.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing system embodiments, and are not described herein again.
In all examples shown and described herein, any particular value should be construed as merely exemplary, and not as a limitation, and thus other examples of example embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the system according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A heterogeneous platform task scheduling method based on Q learning is characterized by comprising the following contents:
taking all tasks as a state space of Q learning, integrating a processor into an action space, waiting for the distributed tasks as a current state, and acquiring a task initial mapping scheme according to the execution time required by the task mapped to the action space in the Q learning;
establishing a genetic algorithm model, carrying out fitness evaluation on a task initial mapping scheme, setting individuals copied to a next generation population in the genetic algorithm model according to the fitness, carrying out cross variation on reserved individuals, and determining the optimization efficiency and the minimum threshold of a new population; obtaining an approximate optimal solution of mapping from tasks in the model to the processor according to the genetic algorithm model;
and converting the model approximate optimal solution into ant colony information initial information distribution, and iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution to obtain a task scheduling optimal scheme.
2. The Q-learning-based heterogeneous platform task scheduling method according to claim 1, wherein a system application model is represented as G ═ V, E, C, L }, a target system model is represented as P ═ N, H, W, T }, V is a task set, E is a set of dependent directed edges, C is a set of task computation quantities, L is inter-subtask traffic, N is a set of processors, H is a processor characteristic, W is computation overhead, and T is communication overhead of tasks among processors, and a state space and an action space of Q learning are obtained according to the task set and the processor set in both the system application model and the target system model.
3. The Q learning-based heterogeneous platform task scheduling method according to claim 1 or 2, wherein a Q learning agent executes an action in a current state according to an epsilon-greedy behavior strategy to obtain a Q value of a task mapped to a processor, cannot obtain an immediate reward, and transfers to a new state; and aiming at the minimum Q value after the execution of each action is finished, selecting the action with the minimum Q value in a new state for execution, storing, attenuating the learning rate, endowing the next state with the current state, selecting the action in the next state according to the epsilon-greedy behavior strategy, and performing iteration to obtain a task initial mapping scheme according to the storage condition.
4. The Q-learning based heterogeneous platform task scheduling method according to claim 1, wherein in the creation of the genetic algorithm model, a task initial mapping scheme is encoded and tasks are mapped onto corresponding processors; evaluating the fitness according to a fitness evaluation function, copying individuals meeting the fitness value and directly entering next generation of population, and performing cross variation on the original reserved individuals; and comparing the population optimization efficiency of each generation with a minimum threshold value by utilizing the iterative process of the genetic algorithm to determine the current population optimization efficiency until the population optimization efficiencies of the generations which are continuously set are all smaller than the minimum threshold value, terminating iteration and acquiring an approximate optimal solution set of the genetic algorithm model about the mapping of the tasks and the processor.
5. The Q-learning based heterogeneous platform task scheduling method according to claim 4, wherein the fitness evaluation function is expressed as:
Figure FDA0002781917950000011
wherein, Q (s, a) represents the execution time of the task s mapped on the processor a, and t represents the iteration algebra.
6. The Q-learning based task scheduling method for the heterogeneous platform according to claim 4, wherein in each iteration, a minimum threshold is set according to the optimization efficiency of a new population, and if the population optimization efficiency of each descendant is smaller than the minimum threshold, the size of the minimum threshold is replaced by the current population optimization efficiency.
7. The Q-learning based heterogeneous platform task scheduling method according to claim 1, wherein in the ant colony algorithm, the amount of released pheromones of ants is determined according to the execution time of the tasks in the processor; recording the path that the ant has traveled at the current moment in a tabu table according to the selected processor; and obtaining the optimal distribution of the tasks to the processor and the shortest time for executing the tasks by iteratively outputting the ant optimal path.
8. A heterogeneous platform task scheduling system based on Q learning, comprising: an initial mapping module, a fitness evaluation module and an optimal output module, wherein,
the initial mapping module is used for taking all tasks as a state space of Q learning, the processor is integrated as an action space, the tasks to be distributed are taken as a current state, and a task initial mapping scheme is obtained according to the execution time required by the task mapped to the action space in the Q learning;
the fitness evaluation module is used for creating a genetic algorithm model, evaluating the fitness of the task initial mapping scheme, setting individuals copied to the next generation of population in the genetic algorithm model according to the fitness, performing cross variation on the retained individuals and determining the optimization efficiency and the minimum threshold of a new population; obtaining an approximate optimal solution of mapping from tasks in the model to the processor according to the genetic algorithm model;
and the optimal output module is used for converting the model approximate optimal solution into ant colony information initial information distribution and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.
9. A computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, performs the method of any of claims 1 to 7.
10. A computer device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to perform the method of any one of claims 1 to 7.
CN202011284585.6A 2020-11-17 2020-11-17 Heterogeneous platform task scheduling method and system based on Q learning Active CN112256422B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011284585.6A CN112256422B (en) 2020-11-17 2020-11-17 Heterogeneous platform task scheduling method and system based on Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011284585.6A CN112256422B (en) 2020-11-17 2020-11-17 Heterogeneous platform task scheduling method and system based on Q learning

Publications (2)

Publication Number Publication Date
CN112256422A true CN112256422A (en) 2021-01-22
CN112256422B CN112256422B (en) 2023-08-04

Family

ID=74265947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011284585.6A Active CN112256422B (en) 2020-11-17 2020-11-17 Heterogeneous platform task scheduling method and system based on Q learning

Country Status (1)

Country Link
CN (1) CN112256422B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114996198A (en) * 2022-08-03 2022-09-02 中国空气动力研究与发展中心计算空气动力研究所 Cross-processor data transmission method, device, equipment and medium
CN116501503A (en) * 2023-06-27 2023-07-28 上海燧原科技有限公司 Architecture mapping method and device for load task, computer equipment and medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162385A1 (en) * 2006-12-28 2008-07-03 Yahoo! Inc. System and method for learning a weighted index to categorize objects
US20120130929A1 (en) * 2010-11-24 2012-05-24 International Business Machines Corporation Controlling quarantining and biasing in cataclysms for optimization simulations
CN103345657A (en) * 2013-04-02 2013-10-09 江苏大学 Task scheduling method based on heredity and ant colony in cloud computing environment
US20140287950A1 (en) * 2013-03-15 2014-09-25 Sera Prognostics, Inc. Biomarkers and methods for predicting preterm birth
CN104811491A (en) * 2015-04-17 2015-07-29 华南理工大学 Cloud computing resource scheduling method based on genetic algorithm
CN107104899A (en) * 2017-06-09 2017-08-29 中山大学 A kind of method for routing based on ant group algorithm being applied in vehicular ad hoc network
CN107563555A (en) * 2017-09-04 2018-01-09 南京信息工程大学 Dynamic multi-objective Scheduling method based on Q study memetic algorithms
CN108776483A (en) * 2018-08-16 2018-11-09 圆通速递有限公司 AGV paths planning methods and system based on ant group algorithm and multiple agent Q study
CN110298589A (en) * 2019-07-01 2019-10-01 河海大学常州校区 Based on heredity-ant colony blending algorithm dynamic Service resource regulating method
CN110515735A (en) * 2019-08-29 2019-11-29 哈尔滨理工大学 A kind of multiple target cloud resource dispatching method based on improvement Q learning algorithm
US20200241921A1 (en) * 2019-01-28 2020-07-30 EMC IP Holding Company LLC Building neural networks for resource allocation for iterative workloads using reinforcement learning
US20200320035A1 (en) * 2019-04-02 2020-10-08 Micro Focus Software Inc. Temporal difference learning, reinforcement learning approach to determine optimal number of threads to use for file copying

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162385A1 (en) * 2006-12-28 2008-07-03 Yahoo! Inc. System and method for learning a weighted index to categorize objects
US20120130929A1 (en) * 2010-11-24 2012-05-24 International Business Machines Corporation Controlling quarantining and biasing in cataclysms for optimization simulations
US20140287950A1 (en) * 2013-03-15 2014-09-25 Sera Prognostics, Inc. Biomarkers and methods for predicting preterm birth
CN103345657A (en) * 2013-04-02 2013-10-09 江苏大学 Task scheduling method based on heredity and ant colony in cloud computing environment
CN104811491A (en) * 2015-04-17 2015-07-29 华南理工大学 Cloud computing resource scheduling method based on genetic algorithm
CN107104899A (en) * 2017-06-09 2017-08-29 中山大学 A kind of method for routing based on ant group algorithm being applied in vehicular ad hoc network
CN107563555A (en) * 2017-09-04 2018-01-09 南京信息工程大学 Dynamic multi-objective Scheduling method based on Q study memetic algorithms
CN108776483A (en) * 2018-08-16 2018-11-09 圆通速递有限公司 AGV paths planning methods and system based on ant group algorithm and multiple agent Q study
US20200241921A1 (en) * 2019-01-28 2020-07-30 EMC IP Holding Company LLC Building neural networks for resource allocation for iterative workloads using reinforcement learning
US20200320035A1 (en) * 2019-04-02 2020-10-08 Micro Focus Software Inc. Temporal difference learning, reinforcement learning approach to determine optimal number of threads to use for file copying
CN110298589A (en) * 2019-07-01 2019-10-01 河海大学常州校区 Based on heredity-ant colony blending algorithm dynamic Service resource regulating method
CN110515735A (en) * 2019-08-29 2019-11-29 哈尔滨理工大学 A kind of multiple target cloud resource dispatching method based on improvement Q learning algorithm

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
CHIA-FENG JUANG ET AL: "Ant Colony Optimization Incorporated With Fuzzy Q-Learning for Reinforcement Fuzzy Control", 《 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS - PART A: SYSTEMS AND HUMANS》 *
FRANCISCO CHAGAS DE LIMA JUNIOR ET AL: "Using Q-learning Algorithm for Initialization of the GRASP Metaheuristic and Genetic Algorithm", 《PROCEEDINGS OF INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS》 *
崔丽梅: "云计算中的基于ACA-GA的资源调度的研究", 《科技通报》 *
张新华: "基于强化学习和蚁群算法的协同依赖多任务网格集群调度", 《长沙大学学报》 *
潘燕春等: "Job-shop排序问题的遗传强化学习算法", 《计算机工程》 *
王天一: "异构信号处理平台通信中间件的设计与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
王本年等: "RLGA:一种基于强化学习机制的遗传算法", 《电子学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114996198A (en) * 2022-08-03 2022-09-02 中国空气动力研究与发展中心计算空气动力研究所 Cross-processor data transmission method, device, equipment and medium
CN114996198B (en) * 2022-08-03 2022-10-21 中国空气动力研究与发展中心计算空气动力研究所 Cross-processor data transmission method, device, equipment and medium
CN116501503A (en) * 2023-06-27 2023-07-28 上海燧原科技有限公司 Architecture mapping method and device for load task, computer equipment and medium
CN116501503B (en) * 2023-06-27 2023-09-15 上海燧原科技有限公司 Architecture mapping method and device for load task, computer equipment and medium

Also Published As

Publication number Publication date
CN112256422B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
Tong et al. A scheduling scheme in the cloud computing environment using deep Q-learning
Zhu et al. An efficient evolutionary grey wolf optimizer for multi-objective flexible job shop scheduling problem with hierarchical job precedence constraints
Chakaravarthy et al. Scalable single source shortest path algorithms for massively parallel systems
US20160171366A1 (en) Solving vehicle routing problems using evolutionary computing techniques
CN109522104B (en) Method for optimizing scheduling of two target tasks of Iaas by using differential evolution algorithm
CN114281104B (en) Multi-unmanned aerial vehicle cooperative regulation and control method based on improved ant colony algorithm
CN112988345A (en) Dependency task unloading method and device based on mobile edge calculation
CN112256422A (en) Heterogeneous platform task scheduling method and system based on Q learning
CN110321217A (en) A kind of cloud resource dispatching method, device, equipment and the storage medium of multiple target
CN109710372B (en) Calculation intensive cloud workflow scheduling method based on owl search algorithm
CN113391894A (en) Optimization method of optimal hyper-task network based on RBP neural network
CN116594748A (en) Model customization processing method, device, equipment and medium for task
CN110163255B (en) Data stream clustering method and device based on density peak value
Kousalya et al. To improve ant algorithm’s grid scheduling using local search
CN111176784A (en) Virtual machine integration method based on extreme learning machine and ant colony system
CN112884368A (en) Multi-target scheduling method and system for minimizing delivery time and delay of high-end equipment
CN114980216A (en) Dependent task unloading system and method based on mobile edge calculation
CN106874215B (en) Serialized storage optimization method based on Spark operator
CN115421885A (en) Distributed multi-target cloud task scheduling method and device and cloud service system
CN112598112B (en) Resource scheduling method based on graph neural network
Esfahanizadeh et al. Stream iterative distributed coded computing for learning applications in heterogeneous systems
CN111813525B (en) Heterogeneous system workflow scheduling method
CN115080225A (en) Single-source shortest path calculation method and system
CN114528094A (en) Distributed system resource optimization allocation method based on LSTM and genetic algorithm
Bazoobandi et al. Solving task scheduling problem in multi-processors with genetic algorithm and task duplication

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant