CN117971411B - Cloud platform task scheduling method and device based on reinforcement learning - Google Patents

Cloud platform task scheduling method and device based on reinforcement learning Download PDF

Info

Publication number
CN117971411B
CN117971411B CN202311659700.7A CN202311659700A CN117971411B CN 117971411 B CN117971411 B CN 117971411B CN 202311659700 A CN202311659700 A CN 202311659700A CN 117971411 B CN117971411 B CN 117971411B
Authority
CN
China
Prior art keywords
task
scheduling
target
tasks
target matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311659700.7A
Other languages
Chinese (zh)
Other versions
CN117971411A (en
Inventor
王月虎
刘军
谭仲春
丁军军
邱明灏
包祥文
韩峰
陶军
郑翔
王超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Finance and Economics
Original Assignee
Nanjing University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Finance and Economics filed Critical Nanjing University of Finance and Economics
Priority to CN202311659700.7A priority Critical patent/CN117971411B/en
Publication of CN117971411A publication Critical patent/CN117971411A/en
Application granted granted Critical
Publication of CN117971411B publication Critical patent/CN117971411B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Software Systems (AREA)
  • Educational Administration (AREA)
  • General Physics & Mathematics (AREA)
  • Educational Technology (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a cloud platform task scheduling method and device based on reinforcement learning, and relates to the technical field of reinforcement learning; the target matrix is generated by acquiring the effective track in the task execution track, so that the execution time and the result of the task can be predicted more accurately, and the optimal scheduling strategy can be found rapidly by generating the target matrix and optimizing the target matrix, so that the scheme generated by scheduling each task is prevented from being rewarded and designed directly under the condition of multitasking, a large amount of system resources are not occupied, and the tasks in the transverse axis of the target matrix are optimized according to the occupation ratio of the tasks and the preset rule, the calculated amount is smaller and smaller along with the optimizing process, and the task scheduling efficiency is improved.

Description

Cloud platform task scheduling method and device based on reinforcement learning
Technical Field
The invention belongs to the technical field of reinforcement learning, and particularly relates to a cloud platform task scheduling method and device based on reinforcement learning.
Background
The task scheduling refers to a process that a system executes a task at a contracted characteristic time in order to automatically complete a specific task, and liberates the manpower to do. The task schedule may automatically execute tasks based on a given point in time, a given time interval, or a given number of executions.
Reinforcement learning (Reinforcement Learning, RL) is one of the paradigm and methodology of machine learning to describe and solve the problem of agents through learning strategies to maximize returns or achieve specific goals during interactions with an environment. Reinforcement learning is a trial and error approach that aims to allow software agents to take action that maximizes returns in a particular environment.
The main flow direction of reinforcement learning is based on rewards, the optimization goal of reinforcement learning is long-term accumulated rewards, and how to set a good rewarding function usually needs a great deal of expertise, and whether the proposed rewarding function affects the learning process or not needs to be considered, so how to design rewards is difficult, while the current solution method is to collect samples through making rules or to imitate learning through giving cases, but in some situations, such rewards are not well defined, for example, in terms of task scheduling, task scheduling is mostly multi-task, as the task scheduling is increased, the scheme generated by the task scheduling is exponentially increased, if the rewarding design is carried out for each scheme, the calculation amount of the design process is very large, not only occupies a great deal of system resources, but also causes the task scheduling of a cloud platform to become stuck, and the task scheduling of the cloud platform is also inefficient.
Disclosure of Invention
The invention aims to solve the problems that if rewarding design is carried out for each scheme, the calculated amount in the design process is very large, a large amount of system resources are occupied, cloud platform task scheduling becomes stuck and the cloud platform task scheduling efficiency is low in the design process, and provides a cloud platform task scheduling method and device based on reinforcement learning.
In a first aspect of the present invention, a method for scheduling a task of a cloud platform based on reinforcement learning is provided, where the method includes:
Judging whether a scheduling grade task set exists in a preset database, and if the scheduling grade task set does not exist in the preset database, randomly sequencing the tasks in the scheduling task set to obtain task execution tracks of each task under different conditions; the scheduling grade task set is a task set obtained by classifying tasks in the cloud platform according to task grades;
Acquiring an effective track in the task execution track to generate a target matrix; the effective track means that all tasks under the track can start to be executed when the task starts, and the tasks are ended before the task ends; the longitudinal axis of the target matrix is the execution sequence of each task in an effective track; the horizontal axis of the target matrix is a task corresponding to each effective track in the same position;
acquiring the occupation ratio of each task in the cross axis of the target matrix, and optimizing each task in the cross axis of the target matrix according to a preset rule according to the occupation ratio of each task to obtain a scheduling strategy corresponding to each cross axis in the target matrix;
And carrying out task scheduling on each task in the scheduling task set according to the scheduling strategy.
Optionally, determining whether the scheduling-level task set exists in the preset database specifically includes:
Acquiring a task set with the same level as a scheduling level task set in the preset database, and searching a task set with the same time interval as the total task processing time slice in the task set according to the total task processing time slice of the scheduling level task set; the total task processing time slice of the scheduling grade task set is determined by the starting time and the ending time of each task in the scheduling grade task set;
if the task set exists, traversing the tasks in the task set and comparing the tasks with the task set of the scheduling level; the task comparison is determined according to the starting time and the ending time of the task;
if all the tasks in the scheduling level task set can be successfully compared in the task set and are not repeated, judging that the task set identical to the scheduling level task set exists in the preset database;
Otherwise, the task set of the scheduling grade does not exist in the preset database;
The preset database is used for storing task sets which have completed task scheduling, and the task sets comprise task levels, completion time, starting time and ending time of each scheduling task and task scheduling strategies corresponding to each scheduling task.
Optionally, if the task set same as the task set of the scheduling level exists in the preset database, task scheduling is performed according to a task scheduling policy of each task in the task set.
Optionally, the occupation ratio of each task in the target matrix transverse axis is obtained, and optimization processing is performed on each task in the target matrix transverse axis according to a preset rule according to the occupation ratio of each task, wherein the preset rule specifically includes:
acquiring the occupation ratio of each task in the transverse axis of the target matrix, and acquiring the task with the largest occupation ratio of each task in the current transverse axis as a preprocessing task;
obtaining a target task according to the maximum occupation ratio of each preprocessing task in the longitudinal axis of the target matrix;
If the occupation ratio of the target task at the current position is larger than the sum of other positions, the current position is recorded as a scheduling strategy of the target task, and a transverse shaft corresponding to the target task is locked; the locking is used for fixing the position of the target task in the current horizontal axis.
Optionally, according to the maximum ratio of each preprocessing task in the longitudinal axis of the target matrix, the method further includes:
If the ratio of the target task at the current position is smaller than the sum of other positions, performing target position replacement on the position of the target task in the current transverse axis according to the ratio probability corresponding to the target task; the target position is replaced by replacing a target task in a vertical axis which corresponds to the non-target task and is not the target task in the current horizontal axis;
If the execution path corresponding to the target task after replacement is still an effective path, storing the effective path;
repeating the steps until the ratio of the target task to the current transverse axis of the target matrix is greater than the sum of other positions, and marking the current position as a scheduling strategy of the target task;
And according to the occupation ratio of each preprocessing task in the horizontal axis of the target matrix, sequentially replacing the execution tasks according to the positions from large to small.
Optionally, after replacing the target position of the position, which is not the target task, in the current horizontal axis with the duty ratio probability corresponding to the target task, the method further includes:
If the execution path corresponding to the target task after replacement is a non-effective path, restoring the replaced path, and deleting the position of the target task in the longitudinal axis of the target matrix; the non-valid path is that the task exists under the track and cannot start to execute at the start time of the task or cannot end the task before the end time of the task.
In a second aspect of the present invention, a cloud platform task scheduling device based on reinforcement learning is provided, including: the system comprises a preprocessing module, a task matrix module, a scheduling strategy distribution module and a task scheduling module:
The preprocessing module is used for judging whether a scheduling grade task set exists in a preset database, and if the scheduling grade task set does not exist in the preset database, randomly sequencing the tasks in the scheduling task set to obtain task execution tracks of each task under different conditions; the scheduling grade task set is a task set obtained by classifying tasks in the cloud platform according to task grades;
the task matrix module is used for acquiring an effective track generation target matrix in the task execution track; the effective track means that all tasks under the track can start to be executed when the task starts, and the tasks are ended before the task ends; the longitudinal axis of the target matrix is the execution sequence of each task in an effective track; the horizontal axis of the target matrix is a task corresponding to each effective track in the same position;
the scheduling policy allocation module is used for obtaining the occupation ratio of each task in the cross axis of the target matrix, and optimizing each task in the cross axis of the target matrix according to a preset rule according to the occupation ratio of each task to obtain a scheduling policy corresponding to each cross axis in the target matrix;
And the task scheduling module is used for scheduling the tasks in the scheduling task set according to the scheduling strategy.
The invention has the beneficial effects that:
The invention provides a cloud platform task scheduling method based on reinforcement learning, which can more accurately predict the execution time and result of a task by acquiring an effective track in a task execution track to generate a target matrix, and can quickly find an optimal scheduling strategy by generating the target matrix and optimizing the target matrix, so that a scheme generated by scheduling each task is prevented from being rewarded and designed directly under the condition of multitasking, a large amount of system resources are not occupied, and each task in a transverse axis of the target matrix is optimized according to a preset rule according to the occupation ratio of each task, and the calculated amount of the method is smaller and smaller along with the optimization, so that the task scheduling efficiency is improved.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a flowchart of a cloud platform task scheduling method based on reinforcement learning according to an embodiment of the present invention;
fig. 2 is a frame diagram of a cloud platform task scheduling device based on reinforcement learning according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention provides a cloud platform task scheduling method based on reinforcement learning. Referring to fig. 1, fig. 1 is a flowchart of a cloud platform task scheduling method based on reinforcement learning according to an embodiment of the present invention. The method comprises the following steps:
s101, judging whether a scheduling grade task set exists in a preset database.
S102, if the scheduling-level task set does not exist in the preset database, the tasks in the scheduling task set are randomly ordered to obtain task execution tracks of the tasks under different conditions.
S103, obtaining an effective track generation target matrix in the task execution track.
The effective track means that all tasks under the track can start to be executed when the task starts, and the tasks are ended before the task ends; the vertical axis of the target matrix is the execution sequence of each task in an effective track; the horizontal axis of the target matrix is the task corresponding to each effective track in the same position.
S104, obtaining the occupation ratio of each task in the transverse axis of the target matrix, and optimizing each task in the transverse axis of the target matrix according to the occupation ratio of each task and a preset rule to obtain a scheduling strategy corresponding to each transverse axis in the target matrix.
S105, performing task scheduling on each task in the scheduling task set according to the scheduling strategy.
According to the cloud platform task scheduling method based on reinforcement learning, the target matrix is generated by acquiring the effective track in the task execution track, so that the execution time and result of the task can be predicted more accurately, the optimal scheduling strategy can be found rapidly by generating the target matrix and optimizing the target matrix, the scheme generated by scheduling each task is prevented from being rewarded and designed directly under the condition of multitasking, a large amount of system resources are not occupied, and the tasks in the transverse axis of the target matrix are optimized according to the occupation ratio of the tasks and the preset rule, so that the calculated amount is smaller and smaller along with the optimization, and the task scheduling efficiency is improved.
In one implementation mode, a target matrix is generated according to a preset rule and a task execution track, and a scheduling strategy is obtained through optimizing the target matrix, so that each task in a task set is effectively scheduled, and the execution efficiency is improved.
In one implementation, the preset database is used for storing a task set for which task scheduling has been completed, including task level, completion time, start time and end time of each scheduled task and task scheduling policies corresponding to each scheduled task of the task set.
In one implementation manner, if each task in the task set of the scheduling level can be scheduled and belongs to an effective track, the starting time and the ending time of each task in the task set of the scheduling level and the task scheduling strategies corresponding to each scheduling task are stored in a preset database.
In one embodiment, determining whether a scheduling-level task set exists in the preset database specifically includes:
acquiring a task set with the same level as a scheduling level task set in a preset database, and searching a task set with the same time interval as the total task processing time slice in the task set according to the total task processing time slice of the scheduling level task set;
If the task set exists, traversing the tasks in the task set and comparing the tasks with the task set of the scheduling level; the task comparison is determined according to the starting time and the ending time of the task;
If all the tasks in the dispatching level task set can be successfully compared in the task set and are not repeated, judging that the task set identical to the dispatching level task set exists in a preset database;
Otherwise, judging that the scheduling level task set does not exist in the preset database;
in one implementation, the total task processing time slice of a set of scheduling-level tasks is determined by the start time and end time of each task in the set of scheduling-level tasks.
In one embodiment, if the task set same as the task set of the scheduling level exists in the preset database, task scheduling is performed according to a task scheduling policy of each task in the task set.
In one implementation manner, if a task set of a scheduling level already exists in the preset database, it is indicated that the task is scheduled according to a certain priority and sequence, and at this time, task execution can be directly performed according to an existing scheduling policy, so that repeated scheduling can be avoided, and execution efficiency is improved.
In one embodiment, the occupation ratio of each task in the horizontal axis of the target matrix is obtained, and optimization processing is performed on each task in the horizontal axis of the target matrix according to a preset rule according to the occupation ratio of each task, wherein the preset rule specifically includes:
acquiring the occupation ratio of each task in a transverse axis of a target matrix, and acquiring the task with the largest occupation ratio of each task in the current transverse axis as a preprocessing task;
obtaining a target task according to the maximum occupation ratio of each preprocessing task in the longitudinal axis of the target matrix;
If the occupation ratio of the target task at the current position is larger than the sum of other positions, the current position is marked as a scheduling strategy of the target task, and the transverse axis corresponding to the target task is locked; the lock is used for fixing the position of the target task in the current horizontal axis.
In one implementation, the accuracy and the high efficiency of task execution are ensured by locking the transverse axis corresponding to the target task to avoid repeated scheduling and unnecessary task conflict.
In one implementation, the target task is locked according to a scheduling strategy of the target task, so that the maximum occupation ratio of the target task at a specific position is ensured, the task scheduling strategy is optimized, and the execution efficiency is improved.
In one embodiment, according to the maximum ratio of each preprocessing task in the vertical axis of the target matrix, the method further comprises the following steps of:
if the ratio of the target task to the current position is smaller than the sum of other positions, replacing the target position of the non-target task in the current transverse axis by the ratio probability corresponding to the target task;
if the execution path corresponding to the replaced target task is still an effective path, storing the effective path;
Repeating the steps until the ratio of the target task to the current transverse axis of the target matrix is greater than the sum of other positions, and marking the current position as a scheduling strategy of the target task;
And according to the occupation ratio of each preprocessing task in the horizontal axis of the target matrix, sequentially replacing the execution tasks from large to small.
In one implementation, the target location is replaced by replacing a target task in a vertical axis corresponding to a non-target task in a current horizontal axis.
In one implementation, non-target tasks are replaced according to the duty ratio probability, the effectiveness of the replaced paths is checked, and the execution is repeated until the duty ratio of the target tasks is larger than the sum of other positions, so that the task scheduling strategy is optimized, the execution efficiency is improved, and the method can better balance the execution sequence among different tasks, so that the task scheduling is more reasonable and efficient.
In one embodiment, after replacing the target position of the non-target task in the current horizontal axis with the duty ratio probability corresponding to the target task, the method further includes:
And if the execution path corresponding to the replaced target task is a non-effective path, restoring the replaced path, and deleting the position of the target task in the longitudinal axis of the target matrix.
Wherein, the non-effective path is that the task exists under the track and cannot start to be executed at the starting time of the task or cannot end the task before the ending time of the task.
In one implementation, by deleting the position of the target task in the longitudinal axis of the target matrix, interference generated by the position determination of the target task on other tasks is avoided, and efficiency is improved.
The embodiment of the invention also provides a cloud platform task scheduling device based on reinforcement learning based on the same inventive concept. Referring to fig. 2, fig. 2 is a diagram of a cloud platform task scheduling device based on reinforcement learning according to an embodiment of the present invention, where the device includes: the system comprises a preprocessing module, a task matrix module, a scheduling strategy distribution module and a task scheduling module:
The preprocessing module is used for judging whether a scheduling grade task set exists in a preset database, and if the scheduling grade task set does not exist in the preset database, randomly sequencing the tasks in the scheduling task set to obtain task execution tracks of all the tasks under different conditions; the scheduling grade task set is a task set after classifying tasks in the cloud platform according to task grades;
The task matrix module is used for acquiring an effective track in the task execution track to generate a target matrix; the effective track means that all tasks under the track can start to be executed at the starting time of the task and end the task before the ending time of the task; the vertical axis of the target matrix is the execution sequence of each task in an effective track; the horizontal axis of the target matrix is the task corresponding to each effective track in the same position;
The scheduling policy distribution module is used for obtaining the occupation ratio of each task in the cross axis of the target matrix, and optimizing each task in the cross axis of the target matrix according to the occupation ratio of each task and a preset rule to obtain a scheduling policy corresponding to each cross axis in the target matrix;
and the task scheduling module is used for scheduling the tasks in the scheduling task set according to the scheduling strategy.
According to the cloud platform task scheduling device based on reinforcement learning, the task execution track is obtained through the preprocessing module, the effective track in the task execution track is generated into the target matrix through the task matrix module, so that the execution time and the result of the task can be predicted more accurately, the optimal scheduling strategy can be found out rapidly through generating the target matrix and optimizing processing, the scheme generated by scheduling each task is prevented from being rewarded and designed directly under the condition of multitasking, a large amount of system resources are not occupied, and the tasks in the transverse axis of the target matrix are optimized according to the occupation ratio of the tasks and the optimization processing is carried out according to the preset rule, so that the calculated amount is smaller and smaller along with the optimization processing, and the task scheduling efficiency is improved.
The foregoing describes one embodiment of the present invention in detail, but the disclosure is only a preferred embodiment of the present invention and should not be construed as limiting the scope of the invention. All equivalent changes and modifications within the scope of the present invention are intended to be covered by the present invention.

Claims (6)

1. The cloud platform task scheduling method based on reinforcement learning is characterized by comprising the following steps of:
Judging whether a scheduling grade task set exists in a preset database, and if the scheduling grade task set does not exist in the preset database, randomly sequencing the tasks in the scheduling task set to obtain task execution tracks of each task under different conditions; the scheduling grade task set is a task set obtained by classifying tasks in the cloud platform according to task grades;
Acquiring an effective track in the task execution track to generate a target matrix; the effective track means that all tasks under the track can start to be executed when the task starts, and the tasks are ended before the task ends; the longitudinal axis of the target matrix is the execution sequence of each task in an effective track; the horizontal axis of the target matrix is a task corresponding to each effective track in the same position;
acquiring the occupation ratio of each task in the cross axis of the target matrix, and optimizing each task in the cross axis of the target matrix according to a preset rule according to the occupation ratio of each task to obtain a scheduling strategy corresponding to each cross axis in the target matrix;
According to the scheduling strategy, carrying out task scheduling on each task in the scheduling task set;
The preset rule specifically comprises the following steps:
acquiring the occupation ratio of each task in the transverse axis of the target matrix, and acquiring the task with the largest occupation ratio of each task in the current transverse axis as a preprocessing task;
obtaining a target task according to the maximum occupation ratio of each preprocessing task in the longitudinal axis of the target matrix;
If the occupation ratio of the target task at the current position is larger than the sum of other positions, the current position is recorded as a scheduling strategy of the target task, and a transverse shaft corresponding to the target task is locked; the locking is used for fixing the position of the target task in the current horizontal axis.
2. The reinforcement learning-based cloud platform task scheduling method of claim 1, wherein determining whether a scheduling-level task set exists in a preset database specifically comprises:
Acquiring a task set with the same level as a scheduling level task set in the preset database, and searching a task set with the same time interval as the total task processing time slice in the task set according to the total task processing time slice of the scheduling level task set; the total task processing time slice of the scheduling grade task set is determined by the starting time and the ending time of each task in the scheduling grade task set;
if the task set exists, traversing the tasks in the task set and comparing the tasks with the task set of the scheduling level; the task comparison is determined according to the starting time and the ending time of the task;
if all the tasks in the scheduling level task set can be successfully compared in the task set and are not repeated, judging that the task set identical to the scheduling level task set exists in the preset database;
Otherwise, the task set of the scheduling grade does not exist in the preset database;
The preset database is used for storing task sets which have completed task scheduling, and the task sets comprise task levels, completion time, starting time and ending time of each scheduling task and task scheduling strategies corresponding to each scheduling task.
3. The reinforcement learning-based cloud platform task scheduling method according to claim 2, wherein if a task set identical to the task set of the scheduling level exists in the preset database, task scheduling is performed according to a task scheduling policy of each task in the task set.
4. The reinforcement learning-based cloud platform task scheduling method of claim 1, wherein obtaining the target task according to the maximum occupation ratio of each preprocessing task in the target matrix vertical axis further comprises:
If the ratio of the target task at the current position is smaller than the sum of other positions, performing target position replacement on the position of the target task in the current transverse axis according to the ratio probability corresponding to the target task; the target position is replaced by replacing a target task in a vertical axis which corresponds to the target task and is not the target task in the current horizontal axis;
If the execution path corresponding to the target task after replacement is still an effective path, storing the effective path;
repeating the steps until the ratio of the target task to the current transverse axis of the target matrix is greater than the sum of other positions, and marking the current position as a scheduling strategy of the target task;
And according to the occupation ratio of each preprocessing task in the horizontal axis of the target matrix, sequentially replacing the execution tasks according to the positions from large to small.
5. The reinforcement learning-based cloud platform task scheduling method according to claim 4, wherein after performing target position replacement on a position, which is not the target task, in a current horizontal axis by using a duty ratio probability corresponding to the target task, further comprises:
If the execution path corresponding to the target task after replacement is a non-effective path, restoring the replaced path, and deleting the position of the target task in the longitudinal axis of the target matrix; the non-valid path is that the task exists under the track and cannot start to execute at the start time of the task or cannot end the task before the end time of the task.
6. A reinforcement learning-based cloud platform task scheduling device, the device comprising: the system comprises a preprocessing module, a task matrix module, a scheduling strategy distribution module and a task scheduling module:
The preprocessing module is used for judging whether a scheduling grade task set exists in a preset database, and if the scheduling grade task set does not exist in the preset database, randomly sequencing the tasks in the scheduling task set to obtain task execution tracks of each task under different conditions; the scheduling grade task set is a task set obtained by classifying tasks in the cloud platform according to task grades;
the task matrix module is used for acquiring an effective track generation target matrix in the task execution track; the effective track means that all tasks under the track can start to be executed when the task starts, and the tasks are ended before the task ends; the longitudinal axis of the target matrix is the execution sequence of each task in an effective track; the horizontal axis of the target matrix is a task corresponding to each effective track in the same position;
the scheduling policy allocation module is used for obtaining the occupation ratio of each task in the cross axis of the target matrix, and optimizing each task in the cross axis of the target matrix according to a preset rule according to the occupation ratio of each task to obtain a scheduling policy corresponding to each cross axis in the target matrix;
The task scheduling module is used for scheduling the tasks in the scheduling task set according to the scheduling strategy;
The preset rule specifically comprises the following steps:
acquiring the occupation ratio of each task in the transverse axis of the target matrix, and acquiring the task with the largest occupation ratio of each task in the current transverse axis as a preprocessing task;
obtaining a target task according to the maximum occupation ratio of each preprocessing task in the longitudinal axis of the target matrix;
If the occupation ratio of the target task at the current position is larger than the sum of other positions, the current position is recorded as a scheduling strategy of the target task, and a transverse shaft corresponding to the target task is locked; the locking is used for fixing the position of the target task in the current horizontal axis.
CN202311659700.7A 2023-12-06 2023-12-06 Cloud platform task scheduling method and device based on reinforcement learning Active CN117971411B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311659700.7A CN117971411B (en) 2023-12-06 2023-12-06 Cloud platform task scheduling method and device based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311659700.7A CN117971411B (en) 2023-12-06 2023-12-06 Cloud platform task scheduling method and device based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN117971411A CN117971411A (en) 2024-05-03
CN117971411B true CN117971411B (en) 2024-08-06

Family

ID=90854366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311659700.7A Active CN117971411B (en) 2023-12-06 2023-12-06 Cloud platform task scheduling method and device based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN117971411B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210081787A1 (en) * 2019-09-12 2021-03-18 Beijing University Of Posts And Telecommunications Method and apparatus for task scheduling based on deep reinforcement learning, and device
CN114756358A (en) * 2022-06-15 2022-07-15 苏州浪潮智能科技有限公司 DAG task scheduling method, device, equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857534A (en) * 2019-02-12 2019-06-07 浙江方正印务有限公司 A kind of intelligent task scheduling strategy training method based on Policy-Gradient Reinforcement Learning
US20210256313A1 (en) * 2020-02-19 2021-08-19 Google Llc Learning policies using sparse and underspecified rewards

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210081787A1 (en) * 2019-09-12 2021-03-18 Beijing University Of Posts And Telecommunications Method and apparatus for task scheduling based on deep reinforcement learning, and device
CN114756358A (en) * 2022-06-15 2022-07-15 苏州浪潮智能科技有限公司 DAG task scheduling method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN117971411A (en) 2024-05-03

Similar Documents

Publication Publication Date Title
Harman et al. The impact of input domain reduction on search-based test data generation
Nakasuka et al. Dynamic scheduling system utilizing machine learning as a knowledge acquisition tool
US6456996B1 (en) Computer implemented scheduling system and process using abstract local search technique
Zhang et al. Bottleneck machine identification method based on constraint transformation for job shop scheduling with genetic algorithm
US20090281818A1 (en) Quality of service aware scheduling for composite web service workflows
Roijers et al. Bounded approximations for linear multi-objective planning under uncertainty
Kheiri Heuristic sequence selection for inventory routing problem
González-Rodríguez et al. Multi-objective evolutionary algorithm for solving energy-aware fuzzy job shop problems
CN113052467B (en) Shared vehicle scheduling method and device based on operation and maintenance cost
Li et al. Dynamic resource levelling in projects under uncertainty
Alvarez-Valdes et al. A scatter search algorithm for project scheduling under partially renewable resources
CN117971411B (en) Cloud platform task scheduling method and device based on reinforcement learning
Müller et al. Filter-and-fan approaches for scheduling flexible job shops under workforce constraints
CN117370065B (en) Abnormal task determining method, electronic equipment and storage medium
CN114661577A (en) Fuzzy test method and tool based on deterministic strategy and coverage guidance
US8108868B2 (en) Workflow execution plans through completion condition critical path analysis
Zhu et al. A metaheuristic scheduling procedure for resource‐constrained projects with cash flows
Roselli et al. SMT solvers for flexible job-shop scheduling problems: A computational analysis
Junior et al. An exact constraint programming based procedure for the multi-manned assembly line balancing problem
De Reyck et al. Algorithms for scheduling projects with generalized precedence relations
Miyashita Case-based knowledge acquisition for schedule optimization
Patel Progressively strengthening and tuning MIP solvers for reoptimization
Bilolikar et al. An annealed genetic algorithm for multi mode resource constrained project scheduling problem
Chen et al. A multi-arm bandit neighbourhood search for routing and scheduling problems
Ribeiro et al. Parallel cooperative approaches for the labor constrained scheduling problem

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant