CN113821323B

CN113821323B - Offline job task scheduling algorithm for mixed deployment data center scene

Info

Publication number: CN113821323B
Application number: CN202111089490.3A
Authority: CN
Inventors: 李嘉伦; 吴维刚
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-09-16
Filing date: 2021-09-16
Publication date: 2023-09-19
Anticipated expiration: 2041-09-16
Also published as: CN113821323A

Abstract

The invention provides an offline operation task scheduling algorithm for a hybrid deployment data center scene, which adopts an adaptive action-critic (A2C) reinforcement Learning model which is more advanced than Deep Q-Learning and Policy Gradient to be applied to the data center task scheduling scene in cloud computing resource management; in order to enhance the understanding capability in the reinforcement learning training process, the load state information of the data center is efficiently encoded and finally a relationship induction bias is generated, and the bias continuously enhances the understanding capability of the reinforcement learning model to the environment and the execution action in the training process, so that a better scheduling strategy is formed, and the model is integrally applied to the task scheduling problem in the mixed deployment data center scene.

Description

Offline job task scheduling algorithm for mixed deployment data center scene

Technical Field

The invention relates to the field of large-scale data center resource management in cloud computing, in particular to an offline job task scheduling algorithm for a mixed deployment data center scene.

Background

Improving the resource utilization of a data center is critical to reducing the infrastructure investment of cloud resource providers. However, current real world production data centers typically exhibit relatively low resource utilization.

For data centers with a huge number of nodes, considerable cost savings can be achieved even if the average utilization is only marginally increased. Therefore, improving resource utilization is a major concern for cloud providers to achieve economic goals. At the same time, it is common practice to mix different workloads on the same data center to increase resource utilization, which is a so-called hybrid deployment.

In a hybrid deployment scenario, there are mainly two workloads: long-running applications (online jobs) and batch applications (offline jobs). Online jobs have higher priority in a hybrid deployment scenario, while offline jobs are best effort, they may be suspended to reserve resources for online jobs. Furthermore, online job runtime behavior is typically dynamically changing.

Deep rm is a task scheduling algorithm that utilizes (Deep Reinforcement Learning) a DRL training policy network to implement multidimensional resource scheduling in a data center. It uses (Convolutional Neural Network) CNN module to extract information from the data center in the form of pictures, with higher level targets that are closer to the work in terms of targets and methods. However, deep rm can only treat the data center as a single physical resource pool, without deploying LRAs. Furthermore, deep rm treats all tasks running in the data center as stateless batch offline tasks. DeepJS also uses DRLs for data center task scheduling. It is embedded in the framework of the boxing problem, treating the entire data center as a collection of multiple compute nodes. However, deepJS is still not used to solve the task scheduling problem of the hybrid deployment scenario. Decima combines the graph neural network (Graph Neural Network) GNN with DRL, processes the state information into feature embedding (empeddings), and then passes it to its policy network. In addition, decima is a scheduling algorithm that solves the problem of DAG (Directed Acyclic Graph) dependent task scheduling in Spark distributed environments. The deep RM Plus is an extended version of the deep RM, mainly enhances a CNN module of the deep RM while introducing imitation learning, and further utilizes expert knowledge (i.e. short task priority, first-come first-get and the like) to accelerate convergence speed during DRL training.

Compared with the existing task scheduling algorithm, task scheduling in a data center hybrid deployment scene needs to consider not only the data center state change caused by the task scheduling process, but also the data center state change caused by the deployed online job. For example, online jobs deployed on a hybrid deployment data center typically have various runtime features, and the runtime features are dynamically changing, requiring proper processing of these features to ensure final scheduling quality. However, considering a large number of online jobs with unknown dynamically changing runtime behavior features in the task scheduling process is very challenging because it is labor intensive, labor costs outweigh possible benefits to fully analyze all online jobs and collect their runtime behavior features. Therefore, there is a need to develop new scheduling algorithms to meet the needs of co-existence data center scenarios that do not require prior knowledge of the information profile of the data center scheduling scenario, and that can automatically learn the scheduling policy through historical experience.

Disclosure of Invention

The invention provides an offline job task scheduling algorithm for a mixed deployment data center scene, which can form a better scheduling strategy.

In order to achieve the technical effects, the technical scheme of the invention is as follows:

an offline job task scheduling algorithm for a mixed deployment data center scene comprises the following steps:

s1: processing each piece of node state information and offline operation information in the data center as an entity, and constructing an entity set E;

s2: converting the entity set into a data center node state and an offline operation runtime behavior characteristic representation;

s3: the feature representation is converted into a relational summary bias during the deep reinforcement learning DRL training process.

Further, each entity in the set of entities will calculate pairwise interactions with all other entities, and each accumulated interaction of pairwise interactions with all other entities will be a behavior feature representation of the original entity, respectively.

Further, the constructed entity set E is mapped into three matrices Q, K, V, the attention output value is calculated by using the matrix Q, K, V, and the attention output value vector set is used to enhance the decision ability of the reinforcement learning agent.

Further, the attention output value is calculated using the matrix Q, K, V in the following manner: the attention-output is the final calculated offline task-policy attention weight, the weight value is derived based on three matrices Q, K, V derived from the linear mapping of the original entity set E,is a scaling factor for normalization, which aims to prevent the output distribution of the softmax () function from being too steep, thereby keeping the gradient values stable in deep learning training, d _k The number of columns of the Q and K matrixes, namely the dimension of the vector, and the softmax () function normalizes the output calculated value to be in a (0, 1) interval, so that the stability of the gradient value in the deep learning training is ensured:

attention_output＝Attention(Q,K,V)

further, in step S3, based on the attention weight vector set output by the attention module, under the feedback mechanism driven by the reward signal, the DRL model captures the attention weight beneficial to promoting the positive reward, and gradually filters out the relationship links that cause the scheduling result to be worse under the driving of the feedback reward, and the required relationship induction bias is obtained by performing the above-mentioned process iteratively.

Further, in step S3, the training process of the DRL model is:

1) Collecting node states of the data center and offline operation information in a waiting queue at each time point in the operation of the data center;

2) Integrating and combining the information into an entity set E for subsequent use;

3) The self-attention module performs weight calculation and codes on the E to output a weight vector group;

4) The strategy of the DRL model is generated and optimized based on the output attention weight vector group of the self-attention module, and the DRL model gradually filters invalid attention weights and retains valid attention weights under the guidance of a reward signal excitation mechanism, so that high-quality DRL model strategy parameters are gradually constructed;

5) Based on the strategy parameters which are updated continuously, selecting proper scheduling actions according to the current data center state and obtaining corresponding rewarding feedback signals, so that the DRL strategy is optimized in an iterative mode.

Further, the relational summary bias allows the DRL model to capture different hybrid deployment state patterns aggregated by currently running online job runtime behavior with unknown arriving offline job tasks without manually analyzing them during DRL training.

Further, the set of attention weight vectors based on the output from the attention module represents a relational link between the attention vector set capturing the data center node state and the offline job task waiting to be scheduled.

Further, each entity in the set of entities will calculate pairwise interactions with all other entities, including itself.

Further, each accumulated interaction of paired interactions with all other entities, including itself, will be the final representation of the original entity, respectively.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention adopts an advanced reagent action-critic (A2C) reinforcement Learning model which is more advanced than Deep Q-Learning and Policy Gradient to be applied to a data center task scheduling scene in cloud computing resource management; in order to enhance the understanding capability in the reinforcement learning training process, the load state information of the data center is efficiently encoded and finally a relationship induction bias is generated, and the bias continuously enhances the understanding capability of the reinforcement learning model to the environment and the execution action in the training process, so that a better scheduling strategy is formed, and the model is integrally applied to the task scheduling problem in the mixed deployment data center scene.

Drawings

FIG. 1 is a diagram of a model structure employed in the method of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions;

it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.

The invention provides an offline job task scheduling algorithm Co-ScheRRL for a mixed deployment data center scene. The Co-ScheRRL adopts a Deep Reinforcement Learning (DRL) method, and automatically learns a scheduling strategy through historical experience. The DRL method is well suited for automatically learning scheduling policies without prior knowledge of workload characteristics. The Co-ScheRRL encodes its scheduling policy in the neural network through past scheduling operation experience, during which the Co-ScheRRL schedules offline jobs, observes effects, and gradually perfects the scheduling policy. In addition, in order to effectively learn a high quality scheduling policy in a hybrid deployment data center scenario, it is necessary to develop a new information processing representation method and integrate the new representation into a DRL method, improving the DRL training technique.

First, the scheduling algorithm must be able to generalize to hundreds of online jobs and millions of offline jobs, and decisions must be made in the tens of data center runtime behaviors that may exist per timestamp. This makes the problem more complex and increases the scale of the problem compared to existing DRL application scenarios (e.g., gaming and robotic control). In addition, the amount of information and the number of available choices in the scene is also of greater scale than existing DRL application scenes. Thus, an improved self-attention module is introduced to efficiently and effectively process information about incoming offline jobs and already deployed online job runtime behaviors without requiring manual engineering of properties.

The improved self-attention module is based on Google's self-attention model transducer. The transducer structure is shown below. The Transformer model is essentially a focus-added seq2seq model, but the existing seq2seq models are both combined with RNN and focus-attention models, which replace the RNN network layer with a full focus-attention structure. Similar to most seq2seq models, the structure of the transducer is also composed of an encoder and a decoder. The Encoder consists of N identical layers, which refer to the elements on the left side of FIG. 1. Each Layer consists of two sub-layers, multi-head self-attention mechanism and fully connected feed-forward network, respectively. Wherein residual connection and normalization are added to each sub-layer. The structure of the Decoder is similar to that of the Encoder.

Multi-head self-attention mechanism is the core of the self-attention module. Each piece of node state information and offline-job information in the data center is treated as one entity, and an entity set E is constructed, which is then converted into a final representation of the data center node state and offline-job runtime behavior characteristics.

In particular, each entity in the set of entities will calculate pairwise interactions with all other entities (including itself), and each accumulated interaction of pairwise interactions with all other entities (including itself) will be the final representation of the original entity, respectively. This is similar to the messaging mechanism in the Graph Neural Network (GNN). The calculation procedure is as follows.

The constructed entity set E is first mapped into three matrices Q, K, V. The attention output value is then calculated with the following formula. The set of attention weight vectors after completion of the calculation will be used for enhanced reinforcement learning agent decision capability.

attention_output＝Attention(Q,K,V)

Second, existing DRL algorithms cannot handle continuously arriving offline job distribution scenarios and decision-making scenarios where already deployed LRAs may change their runtime behavior. For example, the quality of both decisions may be due to the quality of the current policy or state patterns of the offline and online jobs currently located in the same hybrid deployment scenario. Therefore, it is key to obtain a high quality scheduling strategy to effectively understand different hybrid deployment state modes in the DRL model training process. To address the problem of handling different hybrid deployment state patterns during training, feature representations are extracted from the output of the improved self-attention module described above, which are then translated into relationships inducing bias during DRL training.

Specifically, the relationship induction deviation is generated as follows. Based on the set of attention weight vectors output from the attention module, which captures the relational links between the data center node states and the offline job tasks waiting to be scheduled, the DRL model under the feedback mechanism driven by the reward signal will capture attention weight values beneficial to promote positive rewards and gradually filter out relational links that cause the scheduling results to be worse under the feedback rewards. By performing the above-described process iteratively, a desired relational summary bias can be obtained. In feedback rewards training, it is believed that relational summary bias can learn some general, abstract concepts, successfully schedule and generalize to data center node state patterns that it did not observe before. By doing so, the relational summary bias will help the DRL model capture different hybrid deployment state patterns aggregated by currently running online job runtime behavior with unknown arriving offline job tasks without manually analyzing them during DRL training. This helps build up the state of various hybrid deployments with respect to high quality scheduling strategies. The Co-ScheRRL training procedure is as follows.

1) At each point in time in the operation of the data center, the Co-ScheRRL first gathers node status of the data center and offline job information in the wait queue.

2) Co-ScheRRL consolidates the information into one entity set E for subsequent use.

3) The Co-ScheRRL self-attention module will weight E and encode the set of output weight vectors.

4) The DRL model of the Co-ScheRRL receives the attention weight vector group output by the attention module, the strategy of the DRL model is generated and optimized based on the attention weight vector group output by the attention module, and the DRL model gradually filters invalid attention weights and retains valid attention weights under the guidance of a reward signal excitation mechanism, so that high-quality DRL model strategy parameters are gradually constructed.

5) Based on the strategy parameters updated continuously, the Co-ScheRRL can select proper scheduling actions according to the current data center state and obtain corresponding rewarding feedback signals, so that DRL strategies are optimized in an iterative mode.

The same or similar reference numerals correspond to the same or similar components;

the positional relationship depicted in the drawings is for illustrative purposes only and is not to be construed as limiting the present patent;

it is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. An offline job task scheduling algorithm for a mixed deployment data center scene is characterized by comprising the following steps of:

s3: converting the characteristic representation into relation induction deviation in the deep reinforcement learning DRL training process;

mapping the constructed entity set E into three matrixes Q, K, V, calculating attention output values by using the matrixes Q, K, V, and enhancing the decision capability of the reinforcement learning agent by using the attention output value vector group;

the attention output value is calculated using the matrix Q, K, V in the following manner: the attention-output is the final calculated offline task-policy attention weight, the weight value is derived based on three matrices Q, K, V derived from the linear mapping of the original entity set E,is a scaling factor for normalization, which aims to prevent the output distribution of the softmax () function from being too steep, thereby keeping the gradient values stable in deep learning training, d _k Is the number of columns of the Q and K matrix, namely the dimension of the vector, the softmax () function normalizes the output calculated value to be in the (0, 1) interval, thereby ensuring the deep learning trainingStability of the in-training gradient values:

attention_output＝Attention(Q,K,V)

in step S3, based on the attention weight vector set output from the attention module, under the feedback mechanism driven by the reward signal, the DRL model captures the attention weight value beneficial to promoting the positive reward, and gradually filters out the relationship link causing the deterioration of the scheduling result under the feedback reward driving, and the required relationship induction deviation is obtained by performing the above-mentioned process iteratively.

2. The hybrid deployment data center scenario-oriented offline job task scheduling algorithm of claim 1, wherein each entity in the set of entities will calculate pairwise interactions with all other entities, and each of the pairwise interactions with all other entities' cumulative interactions will be a behavior feature representation of the original entity, respectively.

3. The offline-job task scheduling algorithm for a hybrid deployment data center scenario according to claim 1, wherein in step S3, the training process of the DRL model is:

4. The hybrid deployment data center scenario-oriented offline job task scheduling algorithm of claim 3, wherein the relational summary bias is such that the DRL model captures different hybrid deployment state patterns aggregated by currently running online job runtime behavior with unknown arriving offline job tasks without manually analyzing them during DRL training.

5. The hybrid deployment-oriented data center scenario offline job task scheduling algorithm of claim 3, wherein the attention weight vector set based on the output from the attention module represents a relational link between the data center node state and offline job tasks waiting to be scheduled captured by the attention vector set.

6. The hybrid deployment data center scenario-oriented offline job task scheduling algorithm of claim 5, wherein each entity in the set of entities will calculate pairwise interactions with all other entities, including itself.

7. The hybrid deployment data center scenario-oriented offline job task scheduling algorithm of claim 5, wherein each of the cumulative interactions with paired interactions with all other entities, including itself, will be the final representation of the original entity, respectively.