CN112256422B - Heterogeneous platform task scheduling method and system based on Q learning - Google Patents

Heterogeneous platform task scheduling method and system based on Q learning Download PDF

Info

Publication number
CN112256422B
CN112256422B CN202011284585.6A CN202011284585A CN112256422B CN 112256422 B CN112256422 B CN 112256422B CN 202011284585 A CN202011284585 A CN 202011284585A CN 112256422 B CN112256422 B CN 112256422B
Authority
CN
China
Prior art keywords
task
processor
learning
model
tasks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN202011284585.6A
Other languages
Chinese (zh)
Other versions
CN112256422A (en
Inventor
高博
李娜
谢宗甫
岳春生
张锋印
董春宵
马金全
余果
郭璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN202011284585.6A priority Critical patent/CN112256422B/en
Publication of CN112256422A publication Critical patent/CN112256422A/en
Application granted granted Critical
Publication of CN112256422B publication Critical patent/CN112256422B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physiology (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of heterogeneous multiprocessor computing, and particularly relates to a heterogeneous platform task scheduling method and system based on Q learning, wherein all tasks are used as a state space of the Q learning, a processor set is used as an action space, tasks waiting to be distributed are used as current states, and an initial mapping scheme of the tasks is obtained according to the execution time required by mapping the tasks to the action space in the Q learning; creating a genetic algorithm model, performing fitness evaluation on a task initial mapping scheme, setting individuals in the genetic algorithm model, which are copied to the next generation population, performing cross variation on reserved individuals, and determining new population optimization efficiency and a minimum threshold value; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model; and converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution so as to better improve the performance of the heterogeneous platform.

Description

Heterogeneous platform task scheduling method and system based on Q learning
Technical Field
The invention belongs to the technical field of heterogeneous multiprocessor computing, and particularly relates to a heterogeneous platform task scheduling method and system based on Q learning.
Background
With the continuous increase of the high-performance computing requirements of various signal processing tasks and the rapid development of hardware accelerators, general processors have failed to meet the requirements of strong real-time and large-scale computing, and heterogeneous computing systems are increasingly used for solving the complex task processing problem. The heterogeneous system architecture comprises a series of processors with very different structures, such as CPU, GPU, FPGA, DSP, and the processors are connected through special networks or interfaces to meet the requirements of different types of computing tasks on hardware performance so as to improve the resource utilization rate and the computing efficiency. To meet the demands of increasingly complex signal processing tasks, the efficiency and reliability of heterogeneous multiprocessors is critical. Whether a heterogeneous computing system can take advantage of its high performance depends on the following aspects: hardware resource platform architecture, matching degree between tasks and processors, and task scheduling strategy. Scheduling is essentially a multi-objective, NP-hard problem, while the dynamic, heterogeneous nature of heterogeneous computing systems adds some difficulty to task planning. But for a given heterogeneous system, an efficient scheduling strategy is critical to improving the strong real-time and high throughput performance of the platform.
Disclosure of Invention
Aiming at the problems of low flexibility, low convergence speed, poor predictability and the like of the conventional scheduling algorithm, the invention provides a heterogeneous platform task scheduling method and system based on Q learning, which can timely adjust the network searching direction and simultaneously give consideration to local and global searching to obtain a better result, so that each processor of a heterogeneous platform can exert maximum efficiency, the parallel processing of tasks is facilitated, and the performance of the heterogeneous platform is improved.
According to the design scheme provided by the invention, the heterogeneous platform task scheduling method based on Q learning comprises the following steps:
taking all tasks as a state space of Q learning, taking a processor set as an action space, taking the tasks to be allocated as current states, and acquiring a task initial mapping scheme according to execution time required by mapping the tasks to the action space in Q learning;
creating a genetic algorithm model, performing fitness evaluation on a task initial mapping scheme, setting individuals in the genetic algorithm model, which are copied to the next generation population, performing cross variation on reserved individuals, and determining new population optimization efficiency and a minimum threshold value; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model;
and converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.
As the heterogeneous platform task scheduling method based on Q learning, further, a system application model is expressed as G= { V, E, C and L }, a target system model is expressed as P= { N, H, W and T }, V is a task set, E is a directed edge set with a dependency relationship, C is a task calculation amount set, L is inter-subtask communication traffic, N is a processor set, H is a processor characteristic, W is calculation cost, T is communication cost of tasks among the processors, and a state space and an action space of Q learning are obtained according to the task set and the processor set in both the system application model and the target system model.
As the heterogeneous platform task scheduling method based on Q learning, the Q learning agent further executes actions in the current state according to epsilon-greedy behavior strategies to obtain the Q value of the task mapped to the processor, so that immediate rewards cannot be obtained, and the task is transferred to a new state; and selecting the action with the minimum Q value from the new state to execute by taking the minimum Q value of each action execution as a target, storing, attenuating the learning rate, giving the next state to the current state, and selecting the action in the next state according to the epsilon-greedy behavior strategy to execute iteratively so as to obtain the task initial mapping scheme according to the storage condition.
As the heterogeneous platform task scheduling method based on Q learning, the invention further creates a genetic algorithm model, encodes the initial mapping scheme of the task and maps the task to a corresponding processor; carrying out fitness evaluation according to a fitness evaluation function, copying individuals meeting fitness values to directly enter a next generation population, and carrying out cross mutation on the original reserved individuals; and comparing the population optimization efficiency of each offspring with a minimum threshold value by utilizing an iteration process of the genetic algorithm to determine the current population optimization efficiency until the offspring population optimization efficiency of the successive set generations is smaller than the minimum threshold value, terminating the iteration, and obtaining an approximate optimal solution set of the genetic algorithm model with respect to the task and processor mapping.
As the heterogeneous platform task scheduling method based on Q learning, the fitness evaluation function is further expressed as:t=1, 2, …, where Q (s, a) represents the execution time of task s mapping at processor a and t represents the iteration algebra.
As the heterogeneous platform task scheduling method based on Q learning, further, in each iteration, a minimum threshold is set according to the optimization efficiency of the new population, and if the population optimization efficiency of each offspring is smaller than the minimum threshold, the size of the minimum threshold is replaced by the current population optimization efficiency.
As the heterogeneous platform task scheduling method based on Q learning, further, in the ant colony algorithm, determining the amount of the ant released pheromone according to the execution time of the task in the processor; recording the current walked path of the ants in the tabu list according to the selected processor; and obtaining the optimal allocation of the task to the processor and the shortest task execution time through iterative output of the ant optimal path.
Further, based on the method, the invention also provides a heterogeneous platform task scheduling system based on Q learning, which comprises the following steps: comprising: an initial mapping module, an adaptability evaluation module and an optimal output module, wherein,
the initial mapping module is used for taking all tasks as a state space of Q learning, a processor set as an action space, and the tasks waiting to be allocated as current states, and acquiring a task initial mapping scheme according to the execution time required by mapping the tasks to the action space in the Q learning;
the fitness evaluation module is used for creating a genetic algorithm model, performing fitness evaluation on the task initial mapping scheme, setting individuals in the genetic algorithm model, copying the individuals to the next generation population according to the fitness, performing cross mutation on reserved individuals, and determining new population optimization efficiency and a minimum threshold; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model;
and the optimal output module is used for converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.
Further, the present invention also provides a computer-readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, performs the method described above
Further, the present invention also provides a computer device comprising a processor and a memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to perform the method described above.
The invention has the beneficial effects that:
according to the task set and the processor set, the mapping of the task set and the processor set is used for generating an initial solution set serving as initial information of a genetic algorithm GA through Q learning, and the performance of the whole scheme is improved through accelerating the searching rate of the GA; the heuristic mutation operation in the GA algorithm is utilized to improve the local searching capability of the ant colony algorithm, maintain the diversity of the population, accelerate the convergence speed to the optimal solution, maintain a better speed in the whole searching process, shorten the scheduling length, enable each processor of the heterogeneous platform to exert the maximum efficiency, improve the utilization rate of hardware resources, facilitate the parallel processing of tasks, improve the performance of the heterogeneous platform and have better application prospect.
Description of the drawings:
FIG. 1 is a flow chart of a heterogeneous platform task scheduling method in an embodiment;
FIG. 2 is a schematic diagram of a comparison of scheduling length and average resource utilization of an existing algorithm in an embodiment;
FIG. 3 is a flowchart of a heterogeneous platform task scheduling QGA-ACO algorithm in an embodiment;
FIG. 4 is a task and target system model illustration in an embodiment;
FIG. 5 is a schematic illustration of a crossover operation in an embodiment;
FIG. 6 is a schematic diagram of the mutation operation in the example;
fig. 7 is a graph illustrating algorithm rate versus iteration number in an embodiment.
The specific embodiment is as follows:
the present invention will be described in further detail with reference to the drawings and the technical scheme, in order to make the objects, technical schemes and advantages of the present invention more apparent.
Aiming at the problems of low flexibility, low convergence speed, poor predictability and the like of a task scheduling algorithm of the existing heterogeneous multiprocessor computing platform, the embodiment of the invention, as shown in fig. 1, provides a heterogeneous platform task scheduling method based on Q learning, as shown in fig. 1, which comprises the following steps:
s101, taking all tasks as a state space of Q learning, taking a processor set as an action space, taking tasks to be allocated as current states, and acquiring a task initial mapping scheme according to execution time required by mapping the tasks to the action space in the Q learning;
s102, creating a genetic algorithm model, performing fitness evaluation on a task initial mapping scheme, setting individuals in the genetic algorithm model, copying the individuals to a next generation population according to the fitness, performing cross mutation on reserved individuals, and determining new population optimization efficiency and a minimum threshold; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model;
s103, converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.
Heterogeneous multiprocessor computing systems have dynamics and heterogeneity, i.e., various resources and tasks have the property of randomly participating and exiting, with processor architecture and performance differing from each other. The high performance characteristics exhibited by heterogeneous multiprocessor computing systems make heterogeneous multiprocessor computing systems one of the directions of development for high performance computing, but the problems faced by resource management and task scheduling are also more complex. How to exert high computing efficiency of the heterogeneous system, a task scheduling strategy meeting the requirements is critical. The traditional scheduling algorithm has higher algorithm complexity or time complexity when solving complex problems, so that heuristic methods such as genetic algorithm, ant colony algorithm and the like are widely applied. However, most of the initial information of the algorithm is randomly constructed, and the quality of the initial search information is difficult to ensure, so that the convergence performance of the algorithm is affected to a certain extent. In the embodiment of the scheme, the Q-Learning is utilized to acquire feedback information through interaction with the environment, so that the network searching direction can be adjusted in time, and meanwhile, local and global searching is considered to acquire a better result. The result is used as the initial information of GA to make up the defects of low individual quality of the initial population of genetic algorithm and incapability of utilizing network feedback information, and the iterative time is shortened by combining ACO at the later stage, so that the searching is more efficient, and the application task can be optimally distributed to the processor. The single task scheduling algorithm can not completely meet the scheduling requirement of the platform due to the defects of the single task scheduling algorithm, and the advantages and disadvantages of the single task scheduling algorithm are complemented by the combination of different algorithms so as to meet the requirements of different types of computing tasks on hardware performance, thereby improving the resource utilization rate and the computing efficiency.
MLSH, ACO, GAACO is respectively representative of a traditional scheduling algorithm, a heuristic method and a combined optimization algorithm, and is respectively a comparison illustration of MLSH, ACO, GAACO algorithm in terms of scheduling length and average resource utilization rate as shown in (a) and (b) in fig. 2. For small-scale tasks, the scheduling lengths of the three algorithms are close, but as the number of tasks increases, the difference between the scheduling lengths of the ACO and GAACO algorithms can be seen to be almost equal and lower than MLSH, because the algorithm complexity of the traditional scheduling algorithm is higher than that of the heuristic algorithm, and the algorithm increases exponentially with the increase of the number of tasks. In terms of resource utilization, the average resource utilization of GAACO is significantly higher than ACO and MLSH. Because the GAACO algorithm combines the advantages of the universality, the expandability, the global convergence, the parallelism of the ant colony algorithm and the high solving precision of the genetic algorithm, the complementary advantages and disadvantages are realized, and the execution efficiency and the solving precision are improved. Therefore, the best scheduling performance is embodied by the GAACO in terms of both aspects, but the GAACO is not significantly improved in terms of scheduling length.
Referring to FIG. 3, the QGA-ACO algorithm implementing the present embodiment can be designed as follows: in the first stage, the initial allocation of tasks to resources is realized through Q learning, a better scheme and Q values are obtained, a GA model is established on the basis, parameters and constraint conditions are determined, the diversity of the population is ensured by intersecting, mutating and copying the initial solution, and the convergence speed is improved to further optimize the result. And in the second stage, adopting an ACO algorithm, if the evolution rate of four successive generations in the first stage is smaller than a threshold value, entering the second stage, converting the result of the first stage into an initial pheromone value of ACO, avoiding blind searching of ant colony, and carrying out iterative searching according to the pheromone distribution until a termination condition is met.
As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the invention, further, a system application model is expressed as G= { V, E, C and L }, a target system model is expressed as P= { N, H, W and T }, V is a task set, E is a directed edge set with a dependency relationship, C is a task calculation amount set, L is inter-subtask traffic, N is a processor set, H is a processor feature, W is calculation cost, T is communication cost of tasks among processors, and a state space and an action space of Q learning are obtained according to the task set and the processor set in both the system application model and the target system model.
Referring to fig. 4, the system application model is expressed as g= { V, E, C, L }, v= { V 1 ,v 2 ,...,v n And (c) represents a set of signal processing tasks to be processed, e= { E 12 ,e 23 ,...,e ij The set of dependency directed edges, c= { C 1 ,c 2 ,...,c n And represents a set of task computing overheads, L (v i ,v j ) Representing communication overhead between subtasks, if task v i ,v j Mapped onto the same node, the communication overhead is 0. The hardware resource can be represented by an undirected graph, and is abstracted to be P= { N, H, W, T }, N= (N) 1 ,n 2 …) is a processor set, H is a processor featureW is the execution rate of the processors, and T is the inter-processor communication rate.
As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the present invention, further, the Q learning agent performs an action in the current state according to the epsilon-greedy behavior policy, obtains the Q value of the task mapped to the processor, cannot obtain immediate rewards, and transitions to a new state; and selecting the action with the minimum Q value from the new state to execute by taking the minimum Q value of each action execution as a target, storing, attenuating the learning rate, giving the next state to the current state, and selecting the action in the next state according to the epsilon-greedy behavior strategy to execute iteratively so as to obtain the task initial mapping scheme according to the storage condition.
The contents of algorithm stage one shown in fig. 3 can be designed as follows:
the task set V may be configured as a state space S and the processor set N as an action space a. Task v to be assigned i As the current state s, the current state performs action n i A is the number.
Step 1, initializing a state action space of Q-learning, wherein Q (S, A) =C is execution time required for mapping task S to A, a search factor is epsilon, a discount factor is gamma, and a learning rate alpha.
Step2, the agent executes action a according to a certain probability in the current state s according to the epsilon-greedy behavior strategy, the task can be mapped to the Q value of the processor, an immediate reward r is obtained (the larger the Q value is, the longer the execution time of the task is, the smaller the obtained immediate reward is), and the task is transferred to a new state s'; the epsilon-greedy strategy is adopted to balance the search of the Agent on the state space and the utilization of the obtained information, so that the greedy utilization of the taste is prevented from being trapped into local optimum, and the performance of an algorithm is prevented from being influenced by excessive search;
step 3, selecting an action a 'with the minimum Q value according to the formula (1) to execute in an s' state with the minimum Q value of each action execution, calculating the Q value of the Agent at (s, a), and storing the Q value in a Q table;
Q(s t ,a t )←Q(s t ,a t )+α[r t+1 +γminQ(s t+1 ,a′)-Q(s t ,a t )](1)
step 4, attenuating the learning rate, giving the next state to the current state s 'and selecting an action a' in the next state s 'according to the behavior strategy epsilon-greedy, and giving the action a' to the execution action a 'and's;
step 5, judging whether the iteration termination condition is met or not according to the current Step number +1, and if not, entering Step2 to continue execution; if the iteration is completed, the algorithm is ended, and a task initial mapping scheme is obtained according to the Q table.
As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the present invention, further, in creating a genetic algorithm model, encoding a task initial mapping scheme, and mapping the task to a corresponding processor; carrying out fitness evaluation according to a fitness evaluation function, copying individuals meeting fitness values to directly enter a next generation population, and carrying out cross mutation on the original reserved individuals; and comparing the population optimization efficiency of each offspring with a minimum threshold value by utilizing an iteration process of the genetic algorithm to determine the current population optimization efficiency until the offspring population optimization efficiency of the successive set generations is smaller than the minimum threshold value, terminating the iteration, and obtaining an approximate optimal solution set of the genetic algorithm model with respect to the task and processor mapping. Further, in each iteration, a minimum threshold is set according to the optimization efficiency of the new population, and if the population optimization efficiency of each offspring is smaller than the minimum threshold, the size of the minimum threshold is replaced by the current population optimization efficiency.
Coding (coding rule: chromosomep) for initial population (mapping scheme)]=q, i.e. task p mapped onto processor q) and fitness evaluation, individuals with high fitness values are replicated directly into the next generation population, and the original retained individuals are crossed and mutated. The fitness evaluation function may be expressed as:t=1, 2, …, where Q (s, a) represents the execution time of task s mapping at processor a and t represents the iteration algebra. Crossover operator, randomly selecting one crossover point on chromosome, dividing parent chromosome and parent chromosome into two parts, and new chromosome gene composition as shown in figure 5The left segment of chromosome gene at the crossing point comes from one of the corresponding parent chromosomes, and the right segment of chromosome gene is copied from the other genes corresponding to the other chromosomes, so that two new chromosomes are obtained. Heuristic mutation operation, as shown in FIG. 6, randomly selects the gene locus i of one chromosome, searches the last successor succ (i) of i from the locus, randomly selects one gene locus j E (i, succ (i)), and replaces the loci of i and j to form a new chromosome. And collecting the optimized efficiency of the new population obtained by the genetic algorithm, and setting the optimized efficiency as a minimum threshold. And collecting the optimized efficiency of the new population obtained by the genetic algorithm, and setting the optimized efficiency as a minimum threshold. The iteration times are increased by 1, the steps in the genetic algorithm are executed again in the fitness evaluation step, the efficiency of population optimization of each offspring is compared with a minimum threshold value, if the efficiency is smaller than the threshold value, the threshold value is replaced by the current population optimization efficiency, and the genetic GA algorithm is terminated when the efficiency of population optimization of offspring of 4 successive generations is lower than the threshold value.
As the heterogeneous platform task scheduling method based on Q learning in the embodiment of the invention, further, in the ant colony algorithm, the amount of the ant released pheromone is determined according to the execution time of the task in the processor; recording the current walked path of the ants in the tabu list according to the selected processor; and obtaining the optimal allocation of the task to the processor and the shortest task execution time through iterative output of the ant optimal path.
The algorithm stage two implementation as shown in fig. 3 can be designed as follows:
and converting the approximate optimal solution set { chromoname [ i ] =j } obtained in the stage one into the initial information distribution of the ant colony information.
Step 1, initializing ant colony parameters, wherein the ant colony number m, the weight values alpha, beta and rho are pheromone volatilization factors, and the pheromone strength tau ij The factor mu, the maximum number of iterations T. And assuming that: τ is an pheromone matrix, η is a heuristic function, k is the kth ant, and l is a processor which can be selected by the ant k in the next step.
Step2, placing m ants on n nodes (m is greater than n), wherein the shorter the execution time of the task i in the processor j is, the more pheromones are released by the ants, the paths (namely the selected processors) which the ants have walked at the current moment are recorded in the tabu list, the next node is selected according to the probability selection formula (2), and the next node is stored in the tabu list, so that the walking paths of the ants are recorded.
Step 3: after m ants traverse n nodes, all ants complete one round of search, update pheromone according to formula (3) until iteration is terminated, and output an optimal path.
Thus, the optimal allocation of tasks to the processor and the shortest task execution time can be obtained.
The GA algorithm has higher convergence speed in the early search stage, but the early search is not timely adjusted according to feedback information, so that a large amount of redundant information is generated in the later stage to occupy resources, and the rate is obviously reduced after a certain number of iterations; the ACO algorithm has the advantages that the early pheromone is fewer and the distribution is limited, the speed is low due to the fact that the early pheromone can only be searched randomly, and the convergence speed is increased through heuristic search after the later pheromone is accumulated to a certain level and distributed comprehensively. According to the analysis of the rate-iteration number shown in fig. 7, the QGA-ACO algorithm provided in the embodiment of the present invention firstly adopts the Q-learning algorithm to increase the initial search rate of the GA algorithm, performs optimization through the modes of crossover, mutation, and the like, and runs the aCO algorithm after iteration until generation, and utilizes the parallel search capability and positive feedback of the ant colony algorithm to search for the optimal solution of the problem, so that the overall search process can maintain a better rate, and the scheduling length is shortened.
Further, based on the above method, the embodiment of the present invention further provides a heterogeneous platform task scheduling system based on Q learning, including: comprising: an initial mapping module, an adaptability evaluation module and an optimal output module, wherein,
the initial mapping module is used for taking all tasks as a state space of Q learning, a processor set as an action space, and the tasks waiting to be allocated as current states, and acquiring a task initial mapping scheme according to the execution time required by mapping the tasks to the action space in the Q learning;
the fitness evaluation module is used for creating a genetic algorithm model, performing fitness evaluation on the task initial mapping scheme, setting individuals in the genetic algorithm model, copying the individuals to the next generation population according to the fitness, performing cross mutation on reserved individuals, and determining new population optimization efficiency and a minimum threshold; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model;
and the optimal output module is used for converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.
The relative steps, numerical expressions and numerical values of the components and steps set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
Based on the above method or system, the embodiment of the present invention further provides a computer device, including: one or more processors; and a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the system or perform the method described above.
Based on the above system, the embodiment of the present invention further provides a computer readable medium, on which a computer program is stored, wherein the program, when executed by a processor, performs the above method.
The device provided by the embodiment of the present invention has the same implementation principle and technical effects as those of the embodiment of the system, and for the sake of brevity, reference may be made to the corresponding content of the embodiment of the system.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing system embodiments, which are not described herein again.
Any particular values in all examples shown and described herein are to be construed as merely illustrative and not a limitation, and thus other examples of exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, systems and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to execute all or part of the steps of the system according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. A heterogeneous platform task scheduling method based on Q learning is characterized by comprising the following steps:
taking all tasks as a state space of Q learning, taking a processor set as an action space, taking the tasks to be allocated as current states, and acquiring a task initial mapping scheme according to execution time required by mapping the tasks to the action space in Q learning;
creating a genetic algorithm model, performing fitness evaluation on a task initial mapping scheme, setting individuals in the genetic algorithm model, which are copied to the next generation population, performing cross variation on reserved individuals, and determining new population optimization efficiency and a minimum threshold value; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model;
converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution;
the system application model is expressed as G= { V, E, C, L }, the target system model is expressed as P= { N, H, W, T }, V is a task set, E is a directed edge set with a dependency relationship, C is a task calculation amount set, L is inter-subtask communication amount, N is a processor set, H is a processor feature, W is calculation cost, T is communication cost of tasks among processors, a state space and an action space of Q learning are obtained according to the task set and the processor set in both the system application model and the target system model, and an epsilon-greedy strategy is adopted to balance the searching of the state space and the utilization of obtained information by an agent;
the task set V is formed into a state space S, the processor set N is used as an action space A, and the tasks V waiting to be allocated are formed i As the current state s, the current state performs action n i A is a;
the Q learning agent executes action in the current state according to the epsilon-greedy behavior strategy, obtains the Q value of the task mapped to the processor, obtains immediate rewards, and transfers to a new state; selecting an action with the minimum Q value from which each action is executed as a target, executing the action with the minimum Q value in a new state, storing, attenuating the learning rate, giving the next state to the current state, and selecting the action in the next state according to the epsilon-greedy behavior strategy for iterative execution so as to obtain a task initial mapping scheme according to the storage condition;
creating a genetic algorithm model, encoding a task initial mapping scheme, and mapping the task to a corresponding processor; carrying out fitness evaluation according to a fitness evaluation function, copying individuals meeting fitness values to directly enter a next generation population, and carrying out cross mutation on the original reserved individuals; comparing the population optimization efficiency of each offspring with a minimum threshold value by utilizing an iteration process of the genetic algorithm to determine the current population optimization efficiency until the offspring population optimization efficiency of the successive set generations is smaller than the minimum threshold value, terminating the iteration, and obtaining an approximate optimal solution set of the genetic algorithm model with respect to the task and processor mapping;
degree of fitnessThe evaluation function is expressed as:where Q (s, a) represents the execution time of task s mapped on processor a and t represents the iteration algebra.
2. The heterogeneous platform task scheduling method based on Q learning according to claim 1, wherein in each iteration, a minimum threshold is set according to the optimization efficiency of the new population, and if the population optimization efficiency of each offspring is smaller than the minimum threshold, the magnitude of the minimum threshold is replaced by the current population optimization efficiency.
3. The heterogeneous platform task scheduling method based on Q learning according to claim 1, wherein in the ant colony algorithm, the amount of the ant released pheromone is determined according to the execution time of the task in the processor; recording the current walked path of the ants in the tabu list according to the selected processor; and obtaining the optimal allocation of the task to the processor and the shortest task execution time through iterative output of the ant optimal path.
4. A heterogeneous platform task scheduling system based on Q learning, implemented based on the method of claim 1, comprising: an initial mapping module, an adaptability evaluation module and an optimal output module, wherein,
the initial mapping module is used for taking all tasks as a state space of Q learning, a processor set as an action space, and the tasks waiting to be allocated as current states, and acquiring a task initial mapping scheme according to the execution time required by mapping the tasks to the action space in the Q learning;
the fitness evaluation module is used for creating a genetic algorithm model, performing fitness evaluation on the task initial mapping scheme, setting individuals in the genetic algorithm model, copying the individuals to the next generation population according to the fitness, performing cross mutation on reserved individuals, and determining new population optimization efficiency and a minimum threshold; obtaining an approximate optimal solution of the task-to-processor mapping in the model according to the genetic algorithm model;
and the optimal output module is used for converting the model approximate optimal solution into ant colony information initial information distribution, and obtaining a task scheduling optimal scheme by iteratively searching and outputting an optimal path through an ant colony algorithm according to the information distribution.
5. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor performs the method of any of claims 1-3.
6. A computer device comprising a processor and a memory storing machine executable instructions executable by the processor to perform the method of any one of claims 1 to 3.
CN202011284585.6A 2020-11-17 2020-11-17 Heterogeneous platform task scheduling method and system based on Q learning Expired - Fee Related CN112256422B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011284585.6A CN112256422B (en) 2020-11-17 2020-11-17 Heterogeneous platform task scheduling method and system based on Q learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011284585.6A CN112256422B (en) 2020-11-17 2020-11-17 Heterogeneous platform task scheduling method and system based on Q learning

Publications (2)

Publication Number Publication Date
CN112256422A CN112256422A (en) 2021-01-22
CN112256422B true CN112256422B (en) 2023-08-04

Family

ID=74265947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011284585.6A Expired - Fee Related CN112256422B (en) 2020-11-17 2020-11-17 Heterogeneous platform task scheduling method and system based on Q learning

Country Status (1)

Country Link
CN (1) CN112256422B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114996198B (en) * 2022-08-03 2022-10-21 中国空气动力研究与发展中心计算空气动力研究所 Cross-processor data transmission method, device, equipment and medium
CN116501503B (en) * 2023-06-27 2023-09-15 上海燧原科技有限公司 Architecture mapping method and device for load task, computer equipment and medium
CN118779117A (en) * 2024-09-10 2024-10-15 山东省计算中心(国家超级计算济南中心) Large model wide-area heterogeneous distributed training method and system based on dual optimization

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563555A (en) * 2017-09-04 2018-01-09 南京信息工程大学 Dynamic multi-objective Scheduling method based on Q study memetic algorithms
CN108776483A (en) * 2018-08-16 2018-11-09 圆通速递有限公司 AGV paths planning methods and system based on ant group algorithm and multiple agent Q study

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7756845B2 (en) * 2006-12-28 2010-07-13 Yahoo! Inc. System and method for learning a weighted index to categorize objects
US8489526B2 (en) * 2010-11-24 2013-07-16 International Business Machines Corporation Controlling quarantining and biasing in cataclysms for optimization simulations
JP2016518589A (en) * 2013-03-15 2016-06-23 セラ プログノスティックス, インコーポレイテッド Biomarkers and methods for predicting preterm birth
CN103345657B (en) * 2013-04-02 2016-05-25 江苏大学 Method for scheduling task based on heredity and ant group under cloud computing environment
CN104811491A (en) * 2015-04-17 2015-07-29 华南理工大学 Cloud computing resource scheduling method based on genetic algorithm
CN107104899B (en) * 2017-06-09 2021-04-20 中山大学 Ant colony algorithm-based routing method applied to vehicle-mounted self-organizing network
US11461145B2 (en) * 2019-01-28 2022-10-04 EMC IP Holding Company LLC Building neural networks for resource allocation for iterative workloads using reinforcement learning
US20200320035A1 (en) * 2019-04-02 2020-10-08 Micro Focus Software Inc. Temporal difference learning, reinforcement learning approach to determine optimal number of threads to use for file copying
CN110298589A (en) * 2019-07-01 2019-10-01 河海大学常州校区 Based on heredity-ant colony blending algorithm dynamic Service resource regulating method
CN110515735A (en) * 2019-08-29 2019-11-29 哈尔滨理工大学 A kind of multiple target cloud resource dispatching method based on improvement Q learning algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563555A (en) * 2017-09-04 2018-01-09 南京信息工程大学 Dynamic multi-objective Scheduling method based on Q study memetic algorithms
CN108776483A (en) * 2018-08-16 2018-11-09 圆通速递有限公司 AGV paths planning methods and system based on ant group algorithm and multiple agent Q study

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
异构信号处理平台通信中间件的设计与实现;王天一;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;全文 *

Also Published As

Publication number Publication date
CN112256422A (en) 2021-01-22

Similar Documents

Publication Publication Date Title
Zhu et al. An efficient evolutionary grey wolf optimizer for multi-objective flexible job shop scheduling problem with hierarchical job precedence constraints
CN112256422B (en) Heterogeneous platform task scheduling method and system based on Q learning
Kaur et al. Deep‐Q learning‐based heterogeneous earliest finish time scheduling algorithm for scientific workflows in cloud
Almezeini et al. Task scheduling in cloud computing using lion optimization algorithm
CN110109822B (en) Regression testing method for carrying out test case priority ranking based on ant colony algorithm
CN114281104B (en) Multi-unmanned aerial vehicle cooperative regulation and control method based on improved ant colony algorithm
CN105975342A (en) Improved cuckoo search algorithm based cloud computing task scheduling method and system
CN110321217B (en) Multi-target cloud resource scheduling method, device, equipment and storage medium
Yao et al. Improved artificial bee colony algorithm for vehicle routing problem with time windows
Pooranian et al. Hybrid metaheuristic algorithm for job scheduling on computational grids
CN112199172A (en) Hybrid task scheduling method for heterogeneous multi-core processor
CN104506576B (en) A kind of wireless sensor network and its node tasks moving method
CN109710372B (en) Calculation intensive cloud workflow scheduling method based on owl search algorithm
CN116033026A (en) Resource scheduling method
Entezari-Maleki et al. A genetic algorithm to increase the throughput of the computational grids
CN108415773B (en) Efficient software and hardware partitioning method based on fusion algorithm
CN112884368B (en) Multi-target scheduling method and system for minimizing delivery time and delay of high-end equipment
Wang et al. A coordinated two-stages virtual network embedding algorithm based on reinforcement learning
CN106874215B (en) Serialized storage optimization method based on Spark operator
CN114980216A (en) Dependent task unloading system and method based on mobile edge calculation
CN115756646A (en) Industrial internet-based edge computing task unloading optimization method
Dai et al. Cloud workflow scheduling algorithm based on multi-objective hybrid particle swarm optimisation
CN113191534A (en) Logistics resource allocation method, device, equipment and storage medium
CN112882917A (en) Virtual machine service quality dynamic prediction method based on Bayesian network migration
CN111813525A (en) Heterogeneous system workflow scheduling method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20230804