CN117971411B

CN117971411B - Cloud platform task scheduling method and device based on reinforcement learning

Info

Publication number: CN117971411B
Application number: CN202311659700.7A
Authority: CN
Inventors: 王月虎; 刘军; 谭仲春; 丁军军; 邱明灏; 包祥文; 韩峰; 陶军; 郑翔; 王超
Original assignee: Nanjing University of Finance and Economics
Current assignee: Nanjing University of Finance and Economics
Priority date: 2023-12-06
Filing date: 2023-12-06
Publication date: 2024-08-06
Anticipated expiration: 2043-12-06
Also published as: CN117971411A

Abstract

The invention discloses a cloud platform task scheduling method and device based on reinforcement learning, and relates to the technical field of reinforcement learning; the target matrix is generated by acquiring the effective track in the task execution track, so that the execution time and the result of the task can be predicted more accurately, and the optimal scheduling strategy can be found rapidly by generating the target matrix and optimizing the target matrix, so that the scheme generated by scheduling each task is prevented from being rewarded and designed directly under the condition of multitasking, a large amount of system resources are not occupied, and the tasks in the transverse axis of the target matrix are optimized according to the occupation ratio of the tasks and the preset rule, the calculated amount is smaller and smaller along with the optimizing process, and the task scheduling efficiency is improved.

Description

Cloud platform task scheduling method and device based on reinforcement learning

Technical Field

The invention belongs to the technical field of reinforcement learning, and particularly relates to a cloud platform task scheduling method and device based on reinforcement learning.

Background

The task scheduling refers to a process that a system executes a task at a contracted characteristic time in order to automatically complete a specific task, and liberates the manpower to do. The task schedule may automatically execute tasks based on a given point in time, a given time interval, or a given number of executions.

Reinforcement learning (Reinforcement Learning, RL) is one of the paradigm and methodology of machine learning to describe and solve the problem of agents through learning strategies to maximize returns or achieve specific goals during interactions with an environment. Reinforcement learning is a trial and error approach that aims to allow software agents to take action that maximizes returns in a particular environment.

The main flow direction of reinforcement learning is based on rewards, the optimization goal of reinforcement learning is long-term accumulated rewards, and how to set a good rewarding function usually needs a great deal of expertise, and whether the proposed rewarding function affects the learning process or not needs to be considered, so how to design rewards is difficult, while the current solution method is to collect samples through making rules or to imitate learning through giving cases, but in some situations, such rewards are not well defined, for example, in terms of task scheduling, task scheduling is mostly multi-task, as the task scheduling is increased, the scheme generated by the task scheduling is exponentially increased, if the rewarding design is carried out for each scheme, the calculation amount of the design process is very large, not only occupies a great deal of system resources, but also causes the task scheduling of a cloud platform to become stuck, and the task scheduling of the cloud platform is also inefficient.

Disclosure of Invention

The invention aims to solve the problems that if rewarding design is carried out for each scheme, the calculated amount in the design process is very large, a large amount of system resources are occupied, cloud platform task scheduling becomes stuck and the cloud platform task scheduling efficiency is low in the design process, and provides a cloud platform task scheduling method and device based on reinforcement learning.

In a first aspect of the present invention, a method for scheduling a task of a cloud platform based on reinforcement learning is provided, where the method includes:

Judging whether a scheduling grade task set exists in a preset database, and if the scheduling grade task set does not exist in the preset database, randomly sequencing the tasks in the scheduling task set to obtain task execution tracks of each task under different conditions; the scheduling grade task set is a task set obtained by classifying tasks in the cloud platform according to task grades;

Acquiring an effective track in the task execution track to generate a target matrix; the effective track means that all tasks under the track can start to be executed when the task starts, and the tasks are ended before the task ends; the longitudinal axis of the target matrix is the execution sequence of each task in an effective track; the horizontal axis of the target matrix is a task corresponding to each effective track in the same position;

acquiring the occupation ratio of each task in the cross axis of the target matrix, and optimizing each task in the cross axis of the target matrix according to a preset rule according to the occupation ratio of each task to obtain a scheduling strategy corresponding to each cross axis in the target matrix;

And carrying out task scheduling on each task in the scheduling task set according to the scheduling strategy.

Optionally, determining whether the scheduling-level task set exists in the preset database specifically includes:

Acquiring a task set with the same level as a scheduling level task set in the preset database, and searching a task set with the same time interval as the total task processing time slice in the task set according to the total task processing time slice of the scheduling level task set; the total task processing time slice of the scheduling grade task set is determined by the starting time and the ending time of each task in the scheduling grade task set;

if the task set exists, traversing the tasks in the task set and comparing the tasks with the task set of the scheduling level; the task comparison is determined according to the starting time and the ending time of the task;

if all the tasks in the scheduling level task set can be successfully compared in the task set and are not repeated, judging that the task set identical to the scheduling level task set exists in the preset database;

Otherwise, the task set of the scheduling grade does not exist in the preset database;

The preset database is used for storing task sets which have completed task scheduling, and the task sets comprise task levels, completion time, starting time and ending time of each scheduling task and task scheduling strategies corresponding to each scheduling task.

Optionally, if the task set same as the task set of the scheduling level exists in the preset database, task scheduling is performed according to a task scheduling policy of each task in the task set.

Optionally, the occupation ratio of each task in the target matrix transverse axis is obtained, and optimization processing is performed on each task in the target matrix transverse axis according to a preset rule according to the occupation ratio of each task, wherein the preset rule specifically includes:

acquiring the occupation ratio of each task in the transverse axis of the target matrix, and acquiring the task with the largest occupation ratio of each task in the current transverse axis as a preprocessing task;

obtaining a target task according to the maximum occupation ratio of each preprocessing task in the longitudinal axis of the target matrix;

If the occupation ratio of the target task at the current position is larger than the sum of other positions, the current position is recorded as a scheduling strategy of the target task, and a transverse shaft corresponding to the target task is locked; the locking is used for fixing the position of the target task in the current horizontal axis.

Optionally, according to the maximum ratio of each preprocessing task in the longitudinal axis of the target matrix, the method further includes:

If the ratio of the target task at the current position is smaller than the sum of other positions, performing target position replacement on the position of the target task in the current transverse axis according to the ratio probability corresponding to the target task; the target position is replaced by replacing a target task in a vertical axis which corresponds to the non-target task and is not the target task in the current horizontal axis;

If the execution path corresponding to the target task after replacement is still an effective path, storing the effective path;

repeating the steps until the ratio of the target task to the current transverse axis of the target matrix is greater than the sum of other positions, and marking the current position as a scheduling strategy of the target task;

And according to the occupation ratio of each preprocessing task in the horizontal axis of the target matrix, sequentially replacing the execution tasks according to the positions from large to small.

Optionally, after replacing the target position of the position, which is not the target task, in the current horizontal axis with the duty ratio probability corresponding to the target task, the method further includes:

If the execution path corresponding to the target task after replacement is a non-effective path, restoring the replaced path, and deleting the position of the target task in the longitudinal axis of the target matrix; the non-valid path is that the task exists under the track and cannot start to execute at the start time of the task or cannot end the task before the end time of the task.

In a second aspect of the present invention, a cloud platform task scheduling device based on reinforcement learning is provided, including: the system comprises a preprocessing module, a task matrix module, a scheduling strategy distribution module and a task scheduling module:

The preprocessing module is used for judging whether a scheduling grade task set exists in a preset database, and if the scheduling grade task set does not exist in the preset database, randomly sequencing the tasks in the scheduling task set to obtain task execution tracks of each task under different conditions; the scheduling grade task set is a task set obtained by classifying tasks in the cloud platform according to task grades;

the task matrix module is used for acquiring an effective track generation target matrix in the task execution track; the effective track means that all tasks under the track can start to be executed when the task starts, and the tasks are ended before the task ends; the longitudinal axis of the target matrix is the execution sequence of each task in an effective track; the horizontal axis of the target matrix is a task corresponding to each effective track in the same position;

the scheduling policy allocation module is used for obtaining the occupation ratio of each task in the cross axis of the target matrix, and optimizing each task in the cross axis of the target matrix according to a preset rule according to the occupation ratio of each task to obtain a scheduling policy corresponding to each cross axis in the target matrix;

And the task scheduling module is used for scheduling the tasks in the scheduling task set according to the scheduling strategy.

The invention has the beneficial effects that:

The invention provides a cloud platform task scheduling method based on reinforcement learning, which can more accurately predict the execution time and result of a task by acquiring an effective track in a task execution track to generate a target matrix, and can quickly find an optimal scheduling strategy by generating the target matrix and optimizing the target matrix, so that a scheme generated by scheduling each task is prevented from being rewarded and designed directly under the condition of multitasking, a large amount of system resources are not occupied, and each task in a transverse axis of the target matrix is optimized according to a preset rule according to the occupation ratio of each task, and the calculated amount of the method is smaller and smaller along with the optimization, so that the task scheduling efficiency is improved.

Drawings

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a flowchart of a cloud platform task scheduling method based on reinforcement learning according to an embodiment of the present invention;

fig. 2 is a frame diagram of a cloud platform task scheduling device based on reinforcement learning according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention provides a cloud platform task scheduling method based on reinforcement learning. Referring to fig. 1, fig. 1 is a flowchart of a cloud platform task scheduling method based on reinforcement learning according to an embodiment of the present invention. The method comprises the following steps:

s101, judging whether a scheduling grade task set exists in a preset database.

S102, if the scheduling-level task set does not exist in the preset database, the tasks in the scheduling task set are randomly ordered to obtain task execution tracks of the tasks under different conditions.

S103, obtaining an effective track generation target matrix in the task execution track.

The effective track means that all tasks under the track can start to be executed when the task starts, and the tasks are ended before the task ends; the vertical axis of the target matrix is the execution sequence of each task in an effective track; the horizontal axis of the target matrix is the task corresponding to each effective track in the same position.

S104, obtaining the occupation ratio of each task in the transverse axis of the target matrix, and optimizing each task in the transverse axis of the target matrix according to the occupation ratio of each task and a preset rule to obtain a scheduling strategy corresponding to each transverse axis in the target matrix.

S105, performing task scheduling on each task in the scheduling task set according to the scheduling strategy.

According to the cloud platform task scheduling method based on reinforcement learning, the target matrix is generated by acquiring the effective track in the task execution track, so that the execution time and result of the task can be predicted more accurately, the optimal scheduling strategy can be found rapidly by generating the target matrix and optimizing the target matrix, the scheme generated by scheduling each task is prevented from being rewarded and designed directly under the condition of multitasking, a large amount of system resources are not occupied, and the tasks in the transverse axis of the target matrix are optimized according to the occupation ratio of the tasks and the preset rule, so that the calculated amount is smaller and smaller along with the optimization, and the task scheduling efficiency is improved.

In one implementation mode, a target matrix is generated according to a preset rule and a task execution track, and a scheduling strategy is obtained through optimizing the target matrix, so that each task in a task set is effectively scheduled, and the execution efficiency is improved.

In one implementation, the preset database is used for storing a task set for which task scheduling has been completed, including task level, completion time, start time and end time of each scheduled task and task scheduling policies corresponding to each scheduled task of the task set.

In one implementation manner, if each task in the task set of the scheduling level can be scheduled and belongs to an effective track, the starting time and the ending time of each task in the task set of the scheduling level and the task scheduling strategies corresponding to each scheduling task are stored in a preset database.

In one embodiment, determining whether a scheduling-level task set exists in the preset database specifically includes:

acquiring a task set with the same level as a scheduling level task set in a preset database, and searching a task set with the same time interval as the total task processing time slice in the task set according to the total task processing time slice of the scheduling level task set;

If all the tasks in the dispatching level task set can be successfully compared in the task set and are not repeated, judging that the task set identical to the dispatching level task set exists in a preset database;

Otherwise, judging that the scheduling level task set does not exist in the preset database;

in one implementation, the total task processing time slice of a set of scheduling-level tasks is determined by the start time and end time of each task in the set of scheduling-level tasks.

In one embodiment, if the task set same as the task set of the scheduling level exists in the preset database, task scheduling is performed according to a task scheduling policy of each task in the task set.

In one implementation manner, if a task set of a scheduling level already exists in the preset database, it is indicated that the task is scheduled according to a certain priority and sequence, and at this time, task execution can be directly performed according to an existing scheduling policy, so that repeated scheduling can be avoided, and execution efficiency is improved.

In one embodiment, the occupation ratio of each task in the horizontal axis of the target matrix is obtained, and optimization processing is performed on each task in the horizontal axis of the target matrix according to a preset rule according to the occupation ratio of each task, wherein the preset rule specifically includes:

acquiring the occupation ratio of each task in a transverse axis of a target matrix, and acquiring the task with the largest occupation ratio of each task in the current transverse axis as a preprocessing task;

If the occupation ratio of the target task at the current position is larger than the sum of other positions, the current position is marked as a scheduling strategy of the target task, and the transverse axis corresponding to the target task is locked; the lock is used for fixing the position of the target task in the current horizontal axis.

In one implementation, the accuracy and the high efficiency of task execution are ensured by locking the transverse axis corresponding to the target task to avoid repeated scheduling and unnecessary task conflict.

In one implementation, the target task is locked according to a scheduling strategy of the target task, so that the maximum occupation ratio of the target task at a specific position is ensured, the task scheduling strategy is optimized, and the execution efficiency is improved.

In one embodiment, according to the maximum ratio of each preprocessing task in the vertical axis of the target matrix, the method further comprises the following steps of:

if the ratio of the target task to the current position is smaller than the sum of other positions, replacing the target position of the non-target task in the current transverse axis by the ratio probability corresponding to the target task;

if the execution path corresponding to the replaced target task is still an effective path, storing the effective path;

And according to the occupation ratio of each preprocessing task in the horizontal axis of the target matrix, sequentially replacing the execution tasks from large to small.

In one implementation, the target location is replaced by replacing a target task in a vertical axis corresponding to a non-target task in a current horizontal axis.

In one implementation, non-target tasks are replaced according to the duty ratio probability, the effectiveness of the replaced paths is checked, and the execution is repeated until the duty ratio of the target tasks is larger than the sum of other positions, so that the task scheduling strategy is optimized, the execution efficiency is improved, and the method can better balance the execution sequence among different tasks, so that the task scheduling is more reasonable and efficient.

In one embodiment, after replacing the target position of the non-target task in the current horizontal axis with the duty ratio probability corresponding to the target task, the method further includes:

And if the execution path corresponding to the replaced target task is a non-effective path, restoring the replaced path, and deleting the position of the target task in the longitudinal axis of the target matrix.

Wherein, the non-effective path is that the task exists under the track and cannot start to be executed at the starting time of the task or cannot end the task before the ending time of the task.

In one implementation, by deleting the position of the target task in the longitudinal axis of the target matrix, interference generated by the position determination of the target task on other tasks is avoided, and efficiency is improved.

The embodiment of the invention also provides a cloud platform task scheduling device based on reinforcement learning based on the same inventive concept. Referring to fig. 2, fig. 2 is a diagram of a cloud platform task scheduling device based on reinforcement learning according to an embodiment of the present invention, where the device includes: the system comprises a preprocessing module, a task matrix module, a scheduling strategy distribution module and a task scheduling module:

The preprocessing module is used for judging whether a scheduling grade task set exists in a preset database, and if the scheduling grade task set does not exist in the preset database, randomly sequencing the tasks in the scheduling task set to obtain task execution tracks of all the tasks under different conditions; the scheduling grade task set is a task set after classifying tasks in the cloud platform according to task grades;

The task matrix module is used for acquiring an effective track in the task execution track to generate a target matrix; the effective track means that all tasks under the track can start to be executed at the starting time of the task and end the task before the ending time of the task; the vertical axis of the target matrix is the execution sequence of each task in an effective track; the horizontal axis of the target matrix is the task corresponding to each effective track in the same position;

The scheduling policy distribution module is used for obtaining the occupation ratio of each task in the cross axis of the target matrix, and optimizing each task in the cross axis of the target matrix according to the occupation ratio of each task and a preset rule to obtain a scheduling policy corresponding to each cross axis in the target matrix;

According to the cloud platform task scheduling device based on reinforcement learning, the task execution track is obtained through the preprocessing module, the effective track in the task execution track is generated into the target matrix through the task matrix module, so that the execution time and the result of the task can be predicted more accurately, the optimal scheduling strategy can be found out rapidly through generating the target matrix and optimizing processing, the scheme generated by scheduling each task is prevented from being rewarded and designed directly under the condition of multitasking, a large amount of system resources are not occupied, and the tasks in the transverse axis of the target matrix are optimized according to the occupation ratio of the tasks and the optimization processing is carried out according to the preset rule, so that the calculated amount is smaller and smaller along with the optimization processing, and the task scheduling efficiency is improved.

The foregoing describes one embodiment of the present invention in detail, but the disclosure is only a preferred embodiment of the present invention and should not be construed as limiting the scope of the invention. All equivalent changes and modifications within the scope of the present invention are intended to be covered by the present invention.

Claims

1. The cloud platform task scheduling method based on reinforcement learning is characterized by comprising the following steps of:

According to the scheduling strategy, carrying out task scheduling on each task in the scheduling task set;

The preset rule specifically comprises the following steps:

2. The reinforcement learning-based cloud platform task scheduling method of claim 1, wherein determining whether a scheduling-level task set exists in a preset database specifically comprises:

3. The reinforcement learning-based cloud platform task scheduling method according to claim 2, wherein if a task set identical to the task set of the scheduling level exists in the preset database, task scheduling is performed according to a task scheduling policy of each task in the task set.

4. The reinforcement learning-based cloud platform task scheduling method of claim 1, wherein obtaining the target task according to the maximum occupation ratio of each preprocessing task in the target matrix vertical axis further comprises:

If the ratio of the target task at the current position is smaller than the sum of other positions, performing target position replacement on the position of the target task in the current transverse axis according to the ratio probability corresponding to the target task; the target position is replaced by replacing a target task in a vertical axis which corresponds to the target task and is not the target task in the current horizontal axis;

5. The reinforcement learning-based cloud platform task scheduling method according to claim 4, wherein after performing target position replacement on a position, which is not the target task, in a current horizontal axis by using a duty ratio probability corresponding to the target task, further comprises:

6. A reinforcement learning-based cloud platform task scheduling device, the device comprising: the system comprises a preprocessing module, a task matrix module, a scheduling strategy distribution module and a task scheduling module:

The task scheduling module is used for scheduling the tasks in the scheduling task set according to the scheduling strategy;

The preset rule specifically comprises the following steps: