CN112256422B - Heterogeneous platform task scheduling method and system based on Q learning - Google Patents
Heterogeneous platform task scheduling method and system based on Q learning Download PDFInfo
- Publication number
- CN112256422B CN112256422B CN202011284585.6A CN202011284585A CN112256422B CN 112256422 B CN112256422 B CN 112256422B CN 202011284585 A CN202011284585 A CN 202011284585A CN 112256422 B CN112256422 B CN 112256422B
- Authority
- CN
- China
- Prior art keywords
- task
- processor
- learning
- model
- tasks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 88
- 238000013507 mapping Methods 0.000 claims abstract description 51
- 230000009471 action Effects 0.000 claims abstract description 41
- 230000002068 genetic effect Effects 0.000 claims abstract description 38
- 238000005457 optimization Methods 0.000 claims abstract description 30
- 238000011156 evaluation Methods 0.000 claims abstract description 24
- 239000003016 pheromone Substances 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 11
- 230000035772 mutation Effects 0.000 claims description 11
- 241000257303 Hymenoptera Species 0.000 claims description 9
- 230000006399 behavior Effects 0.000 claims description 8
- 238000004891 communication Methods 0.000 claims description 8
- 239000003795 chemical substances by application Substances 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 210000000349 chromosome Anatomy 0.000 description 11
- 230000008901 benefit Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 108090000623 proteins and genes Proteins 0.000 description 6
- KDYFGRWQOYBRFD-UHFFFAOYSA-N succinic acid Chemical compound OC(=O)CCC(O)=O KDYFGRWQOYBRFD-UHFFFAOYSA-N 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/086—Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Physiology (AREA)
- Evolutionary Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention belongs to the technical field of heterogeneous multiprocessor computing, and particularly relates to a heterogeneous platform task scheduling method and system based on Q learning, wherein all tasks are used as a state space of the Q learning, a processor set is used as an action space, tasks waiting to be distributed are used as current states, and an initial mapping scheme of the tasks is obtained according to the execution time required by mapping the tasks to the action space in the Q learning; creating a genetic algorithm model, performing fitness evaluation on a task initial mapping scheme, setting individuals in the genetic algorithm model, which are copied to the next generation population, performing cross variation on reserved individuals, and determining new population optimization efficiency and a minimum threshold value; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model; and converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution so as to better improve the performance of the heterogeneous platform.
Description
Technical Field
The invention belongs to the technical field of heterogeneous multiprocessor computing, and particularly relates to a heterogeneous platform task scheduling method and system based on Q learning.
Background
With the continuous increase of the high-performance computing requirements of various signal processing tasks and the rapid development of hardware accelerators, general processors have failed to meet the requirements of strong real-time and large-scale computing, and heterogeneous computing systems are increasingly used for solving the complex task processing problem. The heterogeneous system architecture comprises a series of processors with very different structures, such as CPU, GPU, FPGA, DSP, and the processors are connected through special networks or interfaces to meet the requirements of different types of computing tasks on hardware performance so as to improve the resource utilization rate and the computing efficiency. To meet the demands of increasingly complex signal processing tasks, the efficiency and reliability of heterogeneous multiprocessors is critical. Whether a heterogeneous computing system can take advantage of its high performance depends on the following aspects: hardware resource platform architecture, matching degree between tasks and processors, and task scheduling strategy. Scheduling is essentially a multi-objective, NP-hard problem, while the dynamic, heterogeneous nature of heterogeneous computing systems adds some difficulty to task planning. But for a given heterogeneous system, an efficient scheduling strategy is critical to improving the strong real-time and high throughput performance of the platform.
Disclosure of Invention
Aiming at the problems of low flexibility, low convergence speed, poor predictability and the like of the conventional scheduling algorithm, the invention provides a heterogeneous platform task scheduling method and system based on Q learning, which can timely adjust the network searching direction and simultaneously give consideration to local and global searching to obtain a better result, so that each processor of a heterogeneous platform can exert maximum efficiency, the parallel processing of tasks is facilitated, and the performance of the heterogeneous platform is improved.
According to the design scheme provided by the invention, the heterogeneous platform task scheduling method based on Q learning comprises the following steps:
taking all tasks as a state space of Q learning, taking a processor set as an action space, taking the tasks to be allocated as current states, and acquiring a task initial mapping scheme according to execution time required by mapping the tasks to the action space in Q learning;
creating a genetic algorithm model, performing fitness evaluation on a task initial mapping scheme, setting individuals in the genetic algorithm model, which are copied to the next generation population, performing cross variation on reserved individuals, and determining new population optimization efficiency and a minimum threshold value; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model;
and converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.
As the heterogeneous platform task scheduling method based on Q learning, further, a system application model is expressed as G= { V, E, C and L }, a target system model is expressed as P= { N, H, W and T }, V is a task set, E is a directed edge set with a dependency relationship, C is a task calculation amount set, L is inter-subtask communication traffic, N is a processor set, H is a processor characteristic, W is calculation cost, T is communication cost of tasks among the processors, and a state space and an action space of Q learning are obtained according to the task set and the processor set in both the system application model and the target system model.
As the heterogeneous platform task scheduling method based on Q learning, the Q learning agent further executes actions in the current state according to epsilon-greedy behavior strategies to obtain the Q value of the task mapped to the processor, so that immediate rewards cannot be obtained, and the task is transferred to a new state; and selecting the action with the minimum Q value from the new state to execute by taking the minimum Q value of each action execution as a target, storing, attenuating the learning rate, giving the next state to the current state, and selecting the action in the next state according to the epsilon-greedy behavior strategy to execute iteratively so as to obtain the task initial mapping scheme according to the storage condition.
As the heterogeneous platform task scheduling method based on Q learning, the invention further creates a genetic algorithm model, encodes the initial mapping scheme of the task and maps the task to a corresponding processor; carrying out fitness evaluation according to a fitness evaluation function, copying individuals meeting fitness values to directly enter a next generation population, and carrying out cross mutation on the original reserved individuals; and comparing the population optimization efficiency of each offspring with a minimum threshold value by utilizing an iteration process of the genetic algorithm to determine the current population optimization efficiency until the offspring population optimization efficiency of the successive set generations is smaller than the minimum threshold value, terminating the iteration, and obtaining an approximate optimal solution set of the genetic algorithm model with respect to the task and processor mapping.
As the heterogeneous platform task scheduling method based on Q learning, the fitness evaluation function is further expressed as:t=1, 2, …, where Q (s, a) represents the execution time of task s mapping at processor a and t represents the iteration algebra.
As the heterogeneous platform task scheduling method based on Q learning, further, in each iteration, a minimum threshold is set according to the optimization efficiency of the new population, and if the population optimization efficiency of each offspring is smaller than the minimum threshold, the size of the minimum threshold is replaced by the current population optimization efficiency.
As the heterogeneous platform task scheduling method based on Q learning, further, in the ant colony algorithm, determining the amount of the ant released pheromone according to the execution time of the task in the processor; recording the current walked path of the ants in the tabu list according to the selected processor; and obtaining the optimal allocation of the task to the processor and the shortest task execution time through iterative output of the ant optimal path.
Further, based on the method, the invention also provides a heterogeneous platform task scheduling system based on Q learning, which comprises the following steps: comprising: an initial mapping module, an adaptability evaluation module and an optimal output module, wherein,
the initial mapping module is used for taking all tasks as a state space of Q learning, a processor set as an action space, and the tasks waiting to be allocated as current states, and acquiring a task initial mapping scheme according to the execution time required by mapping the tasks to the action space in the Q learning;
the fitness evaluation module is used for creating a genetic algorithm model, performing fitness evaluation on the task initial mapping scheme, setting individuals in the genetic algorithm model, copying the individuals to the next generation population according to the fitness, performing cross mutation on reserved individuals, and determining new population optimization efficiency and a minimum threshold; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model;
and the optimal output module is used for converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.
Further, the present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, performs the method described above
Further, the present invention also provides a computer device comprising a processor and a memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to perform the method described above.
The invention has the beneficial effects that:
according to the task set and the processor set, the mapping of the task set and the processor set is used for generating an initial solution set serving as initial information of a genetic algorithm GA through Q learning, and the performance of the whole scheme is improved through accelerating the searching rate of the GA; the heuristic mutation operation in the GA algorithm is utilized to improve the local searching capability of the ant colony algorithm, maintain the diversity of the population, accelerate the convergence speed to the optimal solution, maintain a better speed in the whole searching process, shorten the scheduling length, enable each processor of the heterogeneous platform to exert the maximum efficiency, improve the utilization rate of hardware resources, facilitate the parallel processing of tasks, improve the performance of the heterogeneous platform and have better application prospect.
Description of the drawings:
FIG. 1 is a flow chart of a heterogeneous platform task scheduling method in an embodiment;
FIG. 2 is a schematic diagram of a comparison of scheduling length and average resource utilization of an existing algorithm in an embodiment;
FIG. 3 is a flowchart of a heterogeneous platform task scheduling QGA-ACO algorithm in an embodiment;
FIG. 4 is a task and target system model illustration in an embodiment;
FIG. 5 is a schematic illustration of a crossover operation in an embodiment;
FIG. 6 is a schematic diagram of the mutation operation in the example;
fig. 7 is a graph illustrating algorithm rate versus iteration number in an embodiment.
The specific embodiment is as follows:
the present invention will be described in further detail with reference to the drawings and the technical scheme, in order to make the objects, technical schemes and advantages of the present invention more apparent.
Aiming at the problems of low flexibility, low convergence speed, poor predictability and the like of a task scheduling algorithm of the existing heterogeneous multiprocessor computing platform, the embodiment of the invention, as shown in fig. 1, provides a heterogeneous platform task scheduling method based on Q learning, as shown in fig. 1, which comprises the following steps:
s101, taking all tasks as a state space of Q learning, taking a processor set as an action space, taking tasks to be allocated as current states, and acquiring a task initial mapping scheme according to execution time required by mapping the tasks to the action space in the Q learning;
s102, creating a genetic algorithm model, performing fitness evaluation on a task initial mapping scheme, setting individuals in the genetic algorithm model, copying the individuals to a next generation population according to the fitness, performing cross mutation on reserved individuals, and determining new population optimization efficiency and a minimum threshold; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model;
s103, converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.
Heterogeneous multiprocessor computing systems have dynamics and heterogeneity, i.e., various resources and tasks have the property of randomly participating and exiting, with processor architecture and performance differing from each other. The high performance characteristics exhibited by heterogeneous multiprocessor computing systems make heterogeneous multiprocessor computing systems one of the directions of development for high performance computing, but the problems faced by resource management and task scheduling are also more complex. How to exert high computing efficiency of the heterogeneous system, a task scheduling strategy meeting the requirements is critical. The traditional scheduling algorithm has higher algorithm complexity or time complexity when solving complex problems, so that heuristic methods such as genetic algorithm, ant colony algorithm and the like are widely applied. However, most of the initial information of the algorithm is randomly constructed, and the quality of the initial search information is difficult to ensure, so that the convergence performance of the algorithm is affected to a certain extent. In the embodiment of the scheme, the Q-Learning is utilized to acquire feedback information through interaction with the environment, so that the network searching direction can be adjusted in time, and meanwhile, local and global searching is considered to acquire a better result. The result is used as the initial information of GA to make up the defects of low individual quality of the initial population of genetic algorithm and incapability of utilizing network feedback information, and the iterative time is shortened by combining ACO at the later stage, so that the searching is more efficient, and the application task can be optimally distributed to the processor. The single task scheduling algorithm can not completely meet the scheduling requirement of the platform due to the defects of the single task scheduling algorithm, and the advantages and disadvantages of the single task scheduling algorithm are complemented by the combination of different algorithms so as to meet the requirements of different types of computing tasks on hardware performance, thereby improving the resource utilization rate and the computing efficiency.
MLSH, ACO, GAACO is respectively representative of a traditional scheduling algorithm, a heuristic method and a combined optimization algorithm, and is respectively a comparison illustration of MLSH, ACO, GAACO algorithm in terms of scheduling length and average resource utilization rate as shown in (a) and (b) in fig. 2. For small-scale tasks, the scheduling lengths of the three algorithms are close, but as the number of tasks increases, the difference between the scheduling lengths of the ACO and GAACO algorithms can be seen to be almost equal and lower than MLSH, because the algorithm complexity of the traditional scheduling algorithm is higher than that of the heuristic algorithm, and the algorithm increases exponentially with the increase of the number of tasks. In terms of resource utilization, the average resource utilization of GAACO is significantly higher than ACO and MLSH. Because the GAACO algorithm combines the advantages of the universality, the expandability, the global convergence, the parallelism of the ant colony algorithm and the high solving precision of the genetic algorithm, the complementary advantages and disadvantages are realized, and the execution efficiency and the solving precision are improved. Therefore, the best scheduling performance is embodied by the GAACO in terms of both aspects, but the GAACO is not significantly improved in terms of scheduling length.
Referring to FIG. 3, the QGA-ACO algorithm implementing the present embodiment can be designed as follows: in the first stage, the initial allocation of tasks to resources is realized through Q learning, a better scheme and Q values are obtained, a GA model is established on the basis, parameters and constraint conditions are determined, the diversity of the population is ensured by intersecting, mutating and copying the initial solution, and the convergence speed is improved to further optimize the result. And in the second stage, adopting an ACO algorithm, if the evolution rate of four successive generations in the first stage is smaller than a threshold value, entering the second stage, converting the result of the first stage into an initial pheromone value of ACO, avoiding blind searching of ant colony, and carrying out iterative searching according to the pheromone distribution until a termination condition is met.
As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the invention, further, a system application model is expressed as G= { V, E, C and L }, a target system model is expressed as P= { N, H, W and T }, V is a task set, E is a directed edge set with a dependency relationship, C is a task calculation amount set, L is inter-subtask traffic, N is a processor set, H is a processor feature, W is calculation cost, T is communication cost of tasks among processors, and a state space and an action space of Q learning are obtained according to the task set and the processor set in both the system application model and the target system model.
Referring to fig. 4, the system application model is expressed as g= { V, E, C, L }, v= { V 1 ,v 2 ,...,v n And (c) represents a set of signal processing tasks to be processed, e= { E 12 ,e 23 ,...,e ij The set of dependency directed edges, c= { C 1 ,c 2 ,...,c n And represents a set of task computing overheads, L (v i ,v j ) Representing communication overhead between subtasks, if task v i ,v j Mapped onto the same node, the communication overhead is 0. The hardware resource can be represented by an undirected graph, and is abstracted to be P= { N, H, W, T }, N= (N) 1 ,n 2 …) is a processor set, H is a processor featureW is the execution rate of the processors, and T is the inter-processor communication rate.
As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the present invention, further, the Q learning agent performs an action in the current state according to the epsilon-greedy behavior policy, obtains the Q value of the task mapped to the processor, cannot obtain immediate rewards, and transitions to a new state; and selecting the action with the minimum Q value from the new state to execute by taking the minimum Q value of each action execution as a target, storing, attenuating the learning rate, giving the next state to the current state, and selecting the action in the next state according to the epsilon-greedy behavior strategy to execute iteratively so as to obtain the task initial mapping scheme according to the storage condition.
The contents of algorithm stage one shown in fig. 3 can be designed as follows:
the task set V may be configured as a state space S and the processor set N as an action space a. Task v to be assigned i As the current state s, the current state performs action n i A is the number.
Step 1, initializing a state action space of Q-learning, wherein Q (S, A) =C is execution time required for mapping task S to A, a search factor is epsilon, a discount factor is gamma, and a learning rate alpha.
Step2, the agent executes action a according to a certain probability in the current state s according to the epsilon-greedy behavior strategy, the task can be mapped to the Q value of the processor, an immediate reward r is obtained (the larger the Q value is, the longer the execution time of the task is, the smaller the obtained immediate reward is), and the task is transferred to a new state s'; the epsilon-greedy strategy is adopted to balance the search of the Agent on the state space and the utilization of the obtained information, so that the greedy utilization of the taste is prevented from being trapped into local optimum, and the performance of an algorithm is prevented from being influenced by excessive search;
step 3, selecting an action a 'with the minimum Q value according to the formula (1) to execute in an s' state with the minimum Q value of each action execution, calculating the Q value of the Agent at (s, a), and storing the Q value in a Q table;
Q(s t ,a t )←Q(s t ,a t )+α[r t+1 +γminQ(s t+1 ,a′)-Q(s t ,a t )](1)
step 4, attenuating the learning rate, giving the next state to the current state s 'and selecting an action a' in the next state s 'according to the behavior strategy epsilon-greedy, and giving the action a' to the execution action a 'and's;
step 5, judging whether the iteration termination condition is met or not according to the current Step number +1, and if not, entering Step2 to continue execution; if the iteration is completed, the algorithm is ended, and a task initial mapping scheme is obtained according to the Q table.
As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the present invention, further, in creating a genetic algorithm model, encoding a task initial mapping scheme, and mapping the task to a corresponding processor; carrying out fitness evaluation according to a fitness evaluation function, copying individuals meeting fitness values to directly enter a next generation population, and carrying out cross mutation on the original reserved individuals; and comparing the population optimization efficiency of each offspring with a minimum threshold value by utilizing an iteration process of the genetic algorithm to determine the current population optimization efficiency until the offspring population optimization efficiency of the successive set generations is smaller than the minimum threshold value, terminating the iteration, and obtaining an approximate optimal solution set of the genetic algorithm model with respect to the task and processor mapping. Further, in each iteration, a minimum threshold is set according to the optimization efficiency of the new population, and if the population optimization efficiency of each offspring is smaller than the minimum threshold, the size of the minimum threshold is replaced by the current population optimization efficiency.
Coding (coding rule: chromosomep) for initial population (mapping scheme)]=q, i.e. task p mapped onto processor q) and fitness evaluation, individuals with high fitness values are replicated directly into the next generation population, and the original retained individuals are crossed and mutated. The fitness evaluation function may be expressed as:t=1, 2, …, where Q (s, a) represents the execution time of task s mapping at processor a and t represents the iteration algebra. Crossover operator, randomly selecting one crossover point on chromosome, dividing parent chromosome and parent chromosome into two parts, and new chromosome gene composition as shown in figure 5The left segment of chromosome gene at the crossing point comes from one of the corresponding parent chromosomes, and the right segment of chromosome gene is copied from the other genes corresponding to the other chromosomes, so that two new chromosomes are obtained. Heuristic mutation operation, as shown in FIG. 6, randomly selects the gene locus i of one chromosome, searches the last successor succ (i) of i from the locus, randomly selects one gene locus j E (i, succ (i)), and replaces the loci of i and j to form a new chromosome. And collecting the optimized efficiency of the new population obtained by the genetic algorithm, and setting the optimized efficiency as a minimum threshold. And collecting the optimized efficiency of the new population obtained by the genetic algorithm, and setting the optimized efficiency as a minimum threshold. The iteration times are increased by 1, the steps in the genetic algorithm are executed again in the fitness evaluation step, the efficiency of population optimization of each offspring is compared with a minimum threshold value, if the efficiency is smaller than the threshold value, the threshold value is replaced by the current population optimization efficiency, and the genetic GA algorithm is terminated when the efficiency of population optimization of offspring of 4 successive generations is lower than the threshold value.
As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the invention, further, in the ant colony algorithm, the amount of the ant released pheromone is determined according to the execution time of the task in the processor; recording the current walked path of the ants in the tabu list according to the selected processor; and obtaining the optimal allocation of the task to the processor and the shortest task execution time through iterative output of the ant optimal path.
The algorithm stage two implementation as shown in fig. 3 can be designed as follows:
and converting the approximate optimal solution set { chromoname [ i ] =j } obtained in the stage one into the initial information distribution of the ant colony information.
Step 1, initializing ant colony parameters, wherein the ant colony number m, the weight values alpha, beta and rho are pheromone volatilization factors, and the pheromone strength tau ij The factor mu, the maximum number of iterations T. And assuming that: τ is an pheromone matrix, η is a heuristic function, k is the kth ant, and l is a processor which can be selected by the ant k in the next step.
Step2, placing m ants on n nodes (m is greater than n), wherein the shorter the execution time of the task i in the processor j is, the more pheromones are released by the ants, the paths (namely the selected processors) which the ants have walked at the current moment are recorded in the tabu list, the next node is selected according to the probability selection formula (2), and the next node is stored in the tabu list, so that the walking paths of the ants are recorded.
Step 3: after m ants traverse n nodes, all ants complete one round of search, update pheromone according to formula (3) until iteration is terminated, and output an optimal path.
Thus, the optimal allocation of tasks to the processor and the shortest task execution time can be obtained.
The GA algorithm has higher convergence speed in the early search stage, but the early search is not timely adjusted according to feedback information, so that a large amount of redundant information is generated in the later stage to occupy resources, and the rate is obviously reduced after a certain number of iterations; the ACO algorithm has the advantages that the early pheromone is fewer and the distribution is limited, the speed is low due to the fact that the early pheromone can only be searched randomly, and the convergence speed is increased through heuristic search after the later pheromone is accumulated to a certain level and distributed comprehensively. According to the analysis of the rate-iteration number shown in fig. 7, the QGA-ACO algorithm provided in the embodiment of the present invention firstly adopts the Q-learning algorithm to increase the initial search rate of the GA algorithm, performs optimization through the modes of crossover, mutation, and the like, and runs the aCO algorithm after iteration until generation, and utilizes the parallel search capability and positive feedback of the ant colony algorithm to search for the optimal solution of the problem, so that the overall search process can maintain a better rate, and the scheduling length is shortened.
Further, based on the above method, the embodiment of the present invention further provides a heterogeneous platform task scheduling system based on Q learning, including: comprising: an initial mapping module, an adaptability evaluation module and an optimal output module, wherein,
the initial mapping module is used for taking all tasks as a state space of Q learning, a processor set as an action space, and the tasks waiting to be allocated as current states, and acquiring a task initial mapping scheme according to the execution time required by mapping the tasks to the action space in the Q learning;
the fitness evaluation module is used for creating a genetic algorithm model, performing fitness evaluation on the task initial mapping scheme, setting individuals in the genetic algorithm model, copying the individuals to the next generation population according to the fitness, performing cross mutation on reserved individuals, and determining new population optimization efficiency and a minimum threshold; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model;
and the optimal output module is used for converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.
The relative steps, numerical expressions and numerical values of the components and steps set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
Based on the above method or system, the embodiment of the present invention further provides a computer device, including: one or more processors; and a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the system or perform the method described above.
Based on the above system, the embodiment of the present invention further provides a computer readable medium, on which a computer program is stored, wherein the program, when executed by a processor, performs the above method.
The device provided by the embodiment of the present invention has the same implementation principle and technical effects as those of the embodiment of the system, and for the sake of brevity, reference may be made to the corresponding content of the embodiment of the system.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing system embodiments, which are not described herein again.
Any particular values in all examples shown and described herein are to be construed as merely illustrative and not a limitation, and thus other examples of exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, systems and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to execute all or part of the steps of the system according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (6)
1. A heterogeneous platform task scheduling method based on Q learning is characterized by comprising the following steps:
taking all tasks as a state space of Q learning, taking a processor set as an action space, taking the tasks to be allocated as current states, and acquiring a task initial mapping scheme according to execution time required by mapping the tasks to the action space in Q learning;
creating a genetic algorithm model, performing fitness evaluation on a task initial mapping scheme, setting individuals in the genetic algorithm model, which are copied to the next generation population, performing cross variation on reserved individuals, and determining new population optimization efficiency and a minimum threshold value; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model;
converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution;
the system application model is expressed as G= { V, E, C, L }, the target system model is expressed as P= { N, H, W, T }, V is a task set, E is a directed edge set with a dependency relationship, C is a task calculation amount set, L is inter-subtask communication amount, N is a processor set, H is a processor feature, W is calculation cost, T is communication cost of tasks among processors, a state space and an action space of Q learning are obtained according to the task set and the processor set in both the system application model and the target system model, and an epsilon-greedy strategy is adopted to balance the searching of the state space and the utilization of obtained information by an agent;
the task set V is formed into a state space S, the processor set N is used as an action space A, and the tasks V waiting to be allocated are formed i As the current state s, the current state performs action n i A is a;
the Q learning agent executes action in the current state according to the epsilon-greedy behavior strategy, obtains the Q value of the task mapped to the processor, obtains immediate rewards, and transfers to a new state; selecting an action with the minimum Q value from which each action is executed as a target, executing the action with the minimum Q value in a new state, storing, attenuating the learning rate, giving the next state to the current state, and selecting the action in the next state according to the epsilon-greedy behavior strategy for iterative execution so as to obtain a task initial mapping scheme according to the storage condition;
creating a genetic algorithm model, encoding a task initial mapping scheme, and mapping the task to a corresponding processor; carrying out fitness evaluation according to a fitness evaluation function, copying individuals meeting fitness values to directly enter a next generation population, and carrying out cross mutation on the original reserved individuals; comparing the population optimization efficiency of each offspring with a minimum threshold value by utilizing an iteration process of the genetic algorithm to determine the current population optimization efficiency until the offspring population optimization efficiency of the successive set generations is smaller than the minimum threshold value, terminating the iteration, and obtaining an approximate optimal solution set of the genetic algorithm model with respect to the task and processor mapping;
degree of fitnessThe evaluation function is expressed as:where Q (s, a) represents the execution time of task s mapped on processor a and t represents the iteration algebra.
2. The heterogeneous platform task scheduling method based on Q learning according to claim 1, wherein in each iteration, a minimum threshold is set according to the optimization efficiency of the new population, and if the population optimization efficiency of each offspring is smaller than the minimum threshold, the magnitude of the minimum threshold is replaced by the current population optimization efficiency.
3. The heterogeneous platform task scheduling method based on Q learning according to claim 1, wherein in the ant colony algorithm, the amount of the ant released pheromone is determined according to the execution time of the task in the processor; recording the current walked path of the ants in the tabu list according to the selected processor; and obtaining the optimal allocation of the task to the processor and the shortest task execution time through iterative output of the ant optimal path.
4. A heterogeneous platform task scheduling system based on Q learning, implemented based on the method of claim 1, comprising: an initial mapping module, an adaptability evaluation module and an optimal output module, wherein,
the initial mapping module is used for taking all tasks as a state space of Q learning, a processor set as an action space, and the tasks waiting to be allocated as current states, and acquiring a task initial mapping scheme according to the execution time required by mapping the tasks to the action space in the Q learning;
the fitness evaluation module is used for creating a genetic algorithm model, performing fitness evaluation on the task initial mapping scheme, setting individuals in the genetic algorithm model, copying the individuals to the next generation population according to the fitness, performing cross mutation on reserved individuals, and determining new population optimization efficiency and a minimum threshold; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model;
and the optimal output module is used for converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.
5. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor performs the method of any of claims 1-3.
6. A computer device comprising a processor and a memory storing machine executable instructions executable by the processor to perform the method of any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011284585.6A CN112256422B (en) | 2020-11-17 | 2020-11-17 | Heterogeneous platform task scheduling method and system based on Q learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011284585.6A CN112256422B (en) | 2020-11-17 | 2020-11-17 | Heterogeneous platform task scheduling method and system based on Q learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112256422A CN112256422A (en) | 2021-01-22 |
CN112256422B true CN112256422B (en) | 2023-08-04 |
Family
ID=74265947
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011284585.6A Expired - Fee Related CN112256422B (en) | 2020-11-17 | 2020-11-17 | Heterogeneous platform task scheduling method and system based on Q learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112256422B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114996198B (en) * | 2022-08-03 | 2022-10-21 | 中国空气动力研究与发展中心计算空气动力研究所 | Cross-processor data transmission method, device, equipment and medium |
CN116501503B (en) * | 2023-06-27 | 2023-09-15 | 上海燧原科技有限公司 | Architecture mapping method and device for load task, computer equipment and medium |
CN118779117A (en) * | 2024-09-10 | 2024-10-15 | 山东省计算中心(国家超级计算济南中心) | Large model wide-area heterogeneous distributed training method and system based on dual optimization |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107563555A (en) * | 2017-09-04 | 2018-01-09 | 南京信息工程大学 | Dynamic multi-objective Scheduling method based on Q study memetic algorithms |
CN108776483A (en) * | 2018-08-16 | 2018-11-09 | 圆通速递有限公司 | AGV paths planning methods and system based on ant group algorithm and multiple agent Q study |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7756845B2 (en) * | 2006-12-28 | 2010-07-13 | Yahoo! Inc. | System and method for learning a weighted index to categorize objects |
US8489526B2 (en) * | 2010-11-24 | 2013-07-16 | International Business Machines Corporation | Controlling quarantining and biasing in cataclysms for optimization simulations |
JP2016518589A (en) * | 2013-03-15 | 2016-06-23 | セラ プログノスティックス, インコーポレイテッド | Biomarkers and methods for predicting preterm birth |
CN103345657B (en) * | 2013-04-02 | 2016-05-25 | 江苏大学 | Method for scheduling task based on heredity and ant group under cloud computing environment |
CN104811491A (en) * | 2015-04-17 | 2015-07-29 | 华南理工大学 | Cloud computing resource scheduling method based on genetic algorithm |
CN107104899B (en) * | 2017-06-09 | 2021-04-20 | 中山大学 | Ant colony algorithm-based routing method applied to vehicle-mounted self-organizing network |
US11461145B2 (en) * | 2019-01-28 | 2022-10-04 | EMC IP Holding Company LLC | Building neural networks for resource allocation for iterative workloads using reinforcement learning |
US20200320035A1 (en) * | 2019-04-02 | 2020-10-08 | Micro Focus Software Inc. | Temporal difference learning, reinforcement learning approach to determine optimal number of threads to use for file copying |
CN110298589A (en) * | 2019-07-01 | 2019-10-01 | 河海大学常州校区 | Based on heredity-ant colony blending algorithm dynamic Service resource regulating method |
CN110515735A (en) * | 2019-08-29 | 2019-11-29 | 哈尔滨理工大学 | A kind of multiple target cloud resource dispatching method based on improvement Q learning algorithm |
-
2020
- 2020-11-17 CN CN202011284585.6A patent/CN112256422B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107563555A (en) * | 2017-09-04 | 2018-01-09 | 南京信息工程大学 | Dynamic multi-objective Scheduling method based on Q study memetic algorithms |
CN108776483A (en) * | 2018-08-16 | 2018-11-09 | 圆通速递有限公司 | AGV paths planning methods and system based on ant group algorithm and multiple agent Q study |
Non-Patent Citations (1)
Title |
---|
异构信号处理平台通信中间件的设计与实现;王天一;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112256422A (en) | 2021-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhu et al. | An efficient evolutionary grey wolf optimizer for multi-objective flexible job shop scheduling problem with hierarchical job precedence constraints | |
CN112256422B (en) | Heterogeneous platform task scheduling method and system based on Q learning | |
Kaur et al. | Deep‐Q learning‐based heterogeneous earliest finish time scheduling algorithm for scientific workflows in cloud | |
Almezeini et al. | Task scheduling in cloud computing using lion optimization algorithm | |
CN110109822B (en) | Regression testing method for carrying out test case priority ranking based on ant colony algorithm | |
CN114281104B (en) | Multi-unmanned aerial vehicle cooperative regulation and control method based on improved ant colony algorithm | |
CN105975342A (en) | Improved cuckoo search algorithm based cloud computing task scheduling method and system | |
CN110321217B (en) | Multi-target cloud resource scheduling method, device, equipment and storage medium | |
Yao et al. | Improved artificial bee colony algorithm for vehicle routing problem with time windows | |
Pooranian et al. | Hybrid metaheuristic algorithm for job scheduling on computational grids | |
CN112199172A (en) | Hybrid task scheduling method for heterogeneous multi-core processor | |
CN104506576B (en) | A kind of wireless sensor network and its node tasks moving method | |
CN109710372B (en) | Calculation intensive cloud workflow scheduling method based on owl search algorithm | |
CN116033026A (en) | Resource scheduling method | |
Entezari-Maleki et al. | A genetic algorithm to increase the throughput of the computational grids | |
CN108415773B (en) | Efficient software and hardware partitioning method based on fusion algorithm | |
CN112884368B (en) | Multi-target scheduling method and system for minimizing delivery time and delay of high-end equipment | |
Wang et al. | A coordinated two-stages virtual network embedding algorithm based on reinforcement learning | |
CN106874215B (en) | Serialized storage optimization method based on Spark operator | |
CN114980216A (en) | Dependent task unloading system and method based on mobile edge calculation | |
CN115756646A (en) | Industrial internet-based edge computing task unloading optimization method | |
Dai et al. | Cloud workflow scheduling algorithm based on multi-objective hybrid particle swarm optimisation | |
CN113191534A (en) | Logistics resource allocation method, device, equipment and storage medium | |
CN112882917A (en) | Virtual machine service quality dynamic prediction method based on Bayesian network migration | |
CN111813525A (en) | Heterogeneous system workflow scheduling method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20230804 |