CN114675975A

CN114675975A - Job scheduling method, device and equipment based on reinforcement learning

Info

Publication number: CN114675975A
Application number: CN202210569531.7A
Authority: CN
Inventors: 黄慧娟; 吴华运; 陈拓; 范嘉烨
Original assignee: Xinhuasan Artificial Intelligence Technology Co ltd
Current assignee: Xinhuasan Artificial Intelligence Technology Co ltd
Priority date: 2022-05-24
Filing date: 2022-05-24
Publication date: 2022-06-28
Anticipated expiration: 2042-05-24
Also published as: CN114675975B

Abstract

The application provides a job scheduling method, a job scheduling device and job scheduling equipment based on reinforcement learning, wherein a management node inputs job information of different jobs into a deep neural network to obtain a job scheduling strategy and a calculation node queue associated with each job; and scheduling and operating each job according to the job scheduling strategy and the calculation node queue, determining an evaluation parameter for evaluating the deep neural network according to the scheduling and operating condition of each job, and updating the model parameter in the deep neural network according to the evaluation parameter and an expected parameter set for each job so as to adjust the job scheduling strategy. Therefore, by applying the technical scheme provided by the embodiment, the job scheduling policy can be adaptively adjusted according to the expected parameters configured by the user, and the user does not need to consume long time to select the scheduling policy, so that the job scheduling policy can be adaptively adjusted, and the job scheduling efficiency can be improved.

Description

Job scheduling method, device and equipment based on reinforcement learning

Technical Field

The present application relates to the field of computer technologies, and in particular, to a job scheduling method, apparatus, and device based on reinforcement learning.

Background

With the rapid development of computer technology, a high-performance computing technology is developed, the core of the high-performance computing technology is Resource management and job scheduling, a sweep (simple Linux Utility for Resource management) job scheduling system for providing a scheduling policy for a job is also deployed at a management node deployed on a high-performance computing platform, and the management node submits the job to a computing node of the high-performance computing platform according to a job execution sequence determined by the sweep job scheduling system.

In practical application, the Slurm job scheduling system has a default scheduling strategy, and a user configures parameters of the scheduling strategy according to attributes such as job priority, job load and the like of each job, so that hardware resources of computing nodes in a high-performance computing platform are more reasonably utilized, a computing node queue related to the jobs is manually allocated to the jobs, and the execution sequence of each job in the computer node queue to which the allocated jobs belong is optimized. However, both the selection of the computing node queue and the scheduling policy parameters of the churm job scheduling system require manual configuration by a user, however, the manual configuration is highly dependent on user experience, and the configuration result cannot be quantized, so that sometimes the job scheduling cannot play an optimization role, and even has a negative influence on the operation of a high-performance computing platform. Meanwhile, the scheduling results generated by the scheduling strategy parameters are not clear, no scheduling result analysis report exists, and a user does not know whether the job queue can reasonably utilize and optimize the hardware resources in the high-performance computing platform after the existing resource parameters are configured, so that once a new job is submitted or a Slurm job scheduling system is changed, the configuration parameters of the job need to be reconfigured, and a large amount of time is consumed.

Disclosure of Invention

In view of the above, the present application provides a job scheduling method, apparatus and device based on reinforcement learning, so as to improve job scheduling efficiency while adaptively adjusting a job scheduling policy.

In a first aspect, an embodiment of the present application provides a job scheduling method based on reinforcement learning, where the method is applied to a management node in a server cluster, where the server cluster further includes at least one computing node for running a job, and the method includes:

inputting the operation information of different operations into a deep neural network to obtain an operation scheduling strategy and a calculation node queue associated with each operation; the job scheduling policy is used for scheduling and running the different jobs, and at least one computing node in a computing node queue associated with any job is used for running the job;

scheduling and operating each job according to the job scheduling strategy and the computing node queue associated with each job, and determining an evaluation parameter for evaluating the deep neural network according to the scheduling and operating condition of each job;

and updating model parameters in the deep neural network according to the evaluation parameters and the expected parameters which are set for each job so as to adjust the job scheduling strategy.

In a second aspect, an embodiment of the present application provides a job scheduling apparatus based on reinforcement learning, where the apparatus is applied to a management node in a server cluster, where the server cluster further includes at least one computing node for running a job, and the apparatus includes:

the strategy and queue obtaining unit is used for inputting the operation information of different operations into the deep neural network to obtain an operation scheduling strategy and a calculation node queue associated with each operation; the job scheduling strategy is used for scheduling and running the different jobs, and at least one computing node in a computing node queue associated with any job is used for running the job;

the evaluation parameter determining unit is used for scheduling and operating each job according to the job scheduling strategy and the computing node queue associated with each job, and determining evaluation parameters for evaluating the deep neural network according to the scheduling and operating conditions of each job;

and the strategy updating unit is used for updating the model parameters in the deep neural network according to the evaluation parameters and the expected parameters which are set aiming at each job so as to adjust the job scheduling strategy.

According to the technical scheme, by applying the embodiment of the application, the management node inputs the operation information of different operations into the deep neural network to obtain the operation scheduling strategy and the calculation node queue associated with each operation;

and scheduling and operating each job according to the job scheduling strategy and the calculation node queue associated with each job, determining an evaluation parameter for evaluating the deep neural network according to the scheduling and operating condition of each job, and updating the model parameter in the deep neural network according to the evaluation parameter and an expected parameter set for each job so as to adjust the job scheduling strategy. It can be seen that, when different jobs are scheduled, experienced workers are not required to participate in configuring job scheduling policy parameters corresponding to each job, job information of different jobs is input into the deep neural network to automatically output a job scheduling policy and a calculation node queue associated with each job, and when it is determined through evaluation that a scheduling policy determined by the deep neural network is not reasonable, the deep neural network is updated by adjusting model parameters of the deep neural network, so as to dynamically adjust the job scheduling policy of subsequent input jobs. Therefore, the technical scheme provided by the embodiment adaptively adjusts the job scheduling policy according to the expected parameters configured by the user, and the user does not need to consume tedious time to select the scheduling policy, so that the job scheduling policy can be adaptively adjusted, and the job scheduling efficiency can be improved.

Drawings

Fig. 1 is a schematic flowchart of a job scheduling method based on reinforcement learning according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of job scheduling based on a deep neural network according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of an exemplary deep neural network training process provided by an embodiment of the present application;

fig. 4 is a schematic structural diagram of a job scheduling apparatus based on reinforcement learning according to an embodiment of the present disclosure;

fig. 5 is a hardware structure diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

With the rapid development of computer technology, high-performance computing technology comes, and at present, high-performance computing platform technology is widely applied to various fields, the core objective of the technology is to reasonably utilize high-performance computing cluster resources to complete job computing tasks, a churm job scheduling system for providing scheduling strategies for jobs is also deployed at a management node of the high-performance computing platform, and the management node submits the jobs to the computing nodes of the high-performance computing platform according to a job execution sequence determined by the churm job scheduling system.

In practical application, the churm job scheduling system has a default scheduling policy, a user needs to select a computing node queue associated with a job on a front-end page, and sets scheduling policy parameters such as whether backfill is started or not and whether preemption is preempted or not in the churm job scheduling system according to attributes such as job priority, job load and the like of each job in the computing node queue associated with the job, and the background churm scheduling system executes submitted jobs according to the set scheduling rules.

However, both the selection of the computing node queue and the scheduling policy parameters of the churm job scheduling system require manual configuration by a user, however, the manual configuration is highly dependent on user experience, and the configuration result cannot be quantized, so that sometimes the job scheduling cannot play an optimization role, and even has a negative influence on the operation of a high-performance computing platform. Meanwhile, the scheduling results generated by the scheduling strategy parameters are not clear, no scheduling result analysis report exists, and a user does not know whether the job queue can reasonably utilize and optimize the hardware resources in the high-performance computing platform after the existing resource parameters are configured, so that once a new job is submitted or a Slurm job scheduling system is changed, the configuration parameters of the job need to be reconfigured, and a large amount of time is consumed.

In order to solve the above technical problem, an embodiment of the present application provides a job scheduling method based on reinforcement learning, where the method is applied to a management node in a server cluster, where the server cluster further includes at least one computing node for running a job, and the method includes: inputting the operation information of different operations into a deep neural network to obtain an operation scheduling strategy and a calculation node queue associated with each operation; the job scheduling strategy is used for scheduling and running the different jobs, and at least one computing node in a computing node queue associated with any job is used for running the job; scheduling and operating each job according to the job scheduling strategy and the computing node queue associated with each job, and determining an evaluation parameter for evaluating the deep neural network according to the scheduling and operating condition of each job; and updating model parameters in the deep neural network according to the evaluation parameters and the expected parameters which are set for each job so as to adjust the job scheduling strategy.

Based on the above description, it can be seen that, when different jobs are scheduled, an experienced worker is not required to participate in configuring job scheduling policy parameters corresponding to each job, but job information of different jobs is input into the deep neural network to automatically output a job scheduling policy and a computing node queue associated with each job, and when it is determined through evaluation that a scheduling policy determined by the deep neural network is not reasonable, the deep neural network is updated by adjusting model parameters of the deep neural network to dynamically adjust a job scheduling policy of a subsequent input job, so that the job scheduling policy can be adaptively adjusted, and job scheduling efficiency can be improved.

The technical solution of the present application is explained in detail by the following specific examples.

Fig. 1 is a flowchart of an embodiment of a job scheduling method based on reinforcement learning according to an exemplary embodiment of the present application, where the method is applied to a management node in a server cluster, the server cluster constructs a high-performance computing platform, and the server cluster further includes at least one computing node for running a job.

As shown in fig. 1, the job scheduling method based on reinforcement learning includes the following steps:

step 101: and inputting the operation information of different operations into the deep neural network to obtain an operation scheduling strategy and a computing node queue associated with each operation.

In this embodiment, the different job information at least includes attribute data under different job attributes, as shown in table 1 below.

In the above table 1, the number of CPUs required to be used by each task belonging to one job may be understood as: one job is divided into a plurality of tasks, and for each task, the number of CPUs required to run the task. The number of GPUs that the job needs to use when running on each compute node can be understood as: for each job, the computational resources required to run the job. The number of tasks can be understood as: one job is divided into the number of tasks. The number of task groups can be understood as: and submitting the jobs to the computing nodes according to the same task under the condition that the resources required by the jobs are the same and the jobs are configured with different parameters. The memory required to be used by each compute node when running a job can be understood as: and when the operation is carried out, applying for the memory for use from an operation scheduling system in the management node. The number of nodes can be understood as: the number of nodes required for one job. Job priority can be understood as: the priority assigned to each job. The maximum run time is understood as: the maximum run time required to run a job.

In order to reduce the processing load of the deep neural network, the job attributes may be subjected to dimension reduction, and as an embodiment, attributes with little influence are filtered from the job attributes, for example, the job _ name and time _ limit in table 1 are eliminated. Based on this, as another embodiment, the job information includes attribute data under different job attributes, and the implementation manner of implementing step 101 includes: performing dimension reduction processing on attribute data under different job attributes to obtain target job data; inputting the target operation data into a deep neural network. Based on table 1, the job attributes may be subjected to a dimensionality reduction process using a PCA (Principal Component Analysis) method.

In summary, as an embodiment, the attribute data includes any combination of the following attributes: the number of CPUs (central processing units) required to be used by each task in a job, the number of GPUs required to be used when the job runs on each computing node, the size of an application memory when the job is submitted, the number of computing nodes required by each job, the priority given to each job, the number of tasks, the number of task groups and an identifier for representing the job needing to be submitted preferentially.

It should be noted that, in the scheduling method provided in this embodiment of the present application, if it is recognized that the attribute data includes the identifier, the identifier may be recognized to indicate that the job with the identifier is preferentially arranged in the front of the queue belonging to the compute node, and for example, the identifier may be QoS, that is, after the job with the QoS attribute enters the job scheduling system deployed in the management node, the job is unconditionally preempted or preferentially queued in the queue belonging to the compute node, and only the job with the QoS being unidentified is dynamically adjusted to ensure that the emergency task of the user is not affected.

Based on DNN (Deep Neural Network) technology, as shown in fig. 2, after a user configures resource parameters of job requirements, these jobs are jobs 1, … … in fig. 2, job n, and the configured job information is parameters a1, … … and parameter am configured for job 1 in fig. 2; b1, … … with job 2 configured, parameter bm; … …, respectively; job n is configured with parameters k1, … …, parameter km. m is a job parameter number sequence number, a1 represents a parameter with sequence number 1 belonging to job 1, am represents a parameter with sequence number m belonging to job 1, b1 represents a parameter with sequence number 1 belonging to job 2, bm represents a parameter with sequence number m belonging to job 2, k1 represents a parameter with sequence number 1 belonging to job n, km represents a parameter with sequence number m belonging to job n, the deep neural network algorithm continuously optimizes a training target through iterative learning training, finally outputs a calculation node queue and a job scheduling strategy associated with each job, and determines the execution sequence of each job based on the output calculation node queue and job scheduling strategy associated with each job, such as job 2, job n, … … and job 1 in fig. 2. Whereas a DNN structure must have an input layer, an output layer, and between the input and output layers are hidden layers. Among them, a neural network in which the hidden layer is at least greater than 2 is called a deep neural network. The training process for the deep neural network will be described in detail later, and will not be described herein again.

The computing node queue associated with each job in this embodiment may be understood as that the computing nodes are divided into a plurality of computing node queues, and any computing node queue is used to run at least one job, in other words, at least one computing node in the computing node queue associated with any job is used to run the job.

The job scheduling policy is used for scheduling and running different jobs, and each job corresponds to the job scheduling policy belonging to the job. If the job scheduling strategy corresponding to the job a at least comprises a backfill opening strategy and a preemption opening strategy, and if the job b is the job scheduling strategy corresponding to the job b at least comprises a backfill closing strategy and a preemption opening strategy.

And step 102, scheduling and operating each job according to the job scheduling strategy and the computing node queue associated with each job, and determining an evaluation parameter for evaluating the deep neural network according to the scheduling and operating condition of each job.

As an embodiment, one implementation manner of implementing scheduling of running each job according to the job scheduling policy and the compute node queue associated with each job may be: and according to the calculation node queue associated with each job, inputting the job scheduling policy into a job scheduling system configured in a management node, obtaining the execution sequence of each job in the calculation node queue associated with the job, and submitting the job to the calculation node for running according to the obtained execution sequence.

After each job is scheduled and run, determining an evaluation parameter for evaluating the deep neural network according to the scheduling and running condition of each job to determine whether a job scheduling policy output by the deep neural network is appropriate, and if not, executing step 103 to dynamically adjust the job scheduling policy of the subsequent input job. If so, the job scheduling policy of the subsequent input job does not need to be dynamically adjusted, which indicates that the job scheduling policy output by the deep neural network in the current step 101 is appropriate.

As an embodiment, the evaluation parameter may comprise any combination of the following values: bsld (average bound slowdown), total power consumption of the computing node, and idle rate of the CPU of the computing node. This embodiment is not limited to this.

And 103, updating model parameters in the deep neural network according to the evaluation parameters and the expected parameters set for each job so as to adjust the job scheduling strategy.

In this embodiment, the expected parameter is an evaluation parameter when each job runs simultaneously under the condition of sufficient resources, and accordingly, when the evaluation parameter is selected as an average total delay of all jobs, the expected parameter is an average total delay of all jobs when each job runs simultaneously under the condition of sufficient resources, when the evaluation parameter is selected as a total power consumption of the compute node, the expected parameter is a total power consumption of the compute node when each job runs simultaneously under the condition of sufficient resources, when the evaluation parameter is selected as a CPU idle rate of the compute node, the expected parameter is a CPU idle rate of the compute node when each job runs simultaneously under the condition of sufficient resources.

It should be noted that, in this step, the model parameters in the deep neural network are updated only when it is determined that the model parameters in the deep neural network need to be updated according to the evaluation parameters and the expected parameters set for each job, that is, the job information of different subsequently input jobs is input to the deep neural network after the model parameters are adjusted, so as to achieve the purpose of adjusting the job scheduling policy of the subsequently input jobs. Correspondingly, when the model parameters in the deep neural network are determined not to need to be updated, the subsequent input operation is input into the original deep neural network for processing.

In addition, the high-performance computing platform provided by this embodiment can be widely applied to various fields such as medical treatment, education, machinery, geology, and the like, and for each field, the trained deep neural network of step 101 can be trained according to the operation data of this field, and after being applied to a specific user environment, adaptive training, that is, steps 102 to 103, can be put into according to local environment data, and as time goes on, this deep neural network is more adapted to the business requirements and the individual habits of different users.

So far, the description shown in fig. 1 is completed.

According to the technical scheme, by applying the embodiment of the application, the management node inputs the operation information of different operations into the deep neural network to obtain the operation scheduling strategy and the calculation node queue associated with each operation; and scheduling and operating each job according to the job scheduling strategy and the calculation node queue associated with each job, determining an evaluation parameter for evaluating the deep neural network according to the scheduling and operating condition of each job, and updating the model parameter in the deep neural network according to the evaluation parameter and an expected parameter set for each job so as to adjust the job scheduling strategy. It can be seen that, when different jobs are scheduled, experienced workers are not required to participate in configuring job scheduling policy parameters corresponding to each job, job information of different jobs is input into the deep neural network to automatically output a job scheduling policy and a calculation node queue associated with each job, and when it is determined through evaluation that a scheduling policy determined by the deep neural network is not reasonable, the deep neural network is updated by adjusting model parameters of the deep neural network, so as to dynamically adjust the job scheduling policy of subsequent input jobs. Therefore, the technical scheme provided by the embodiment adaptively adjusts the job scheduling policy according to the expected parameters configured by the user, and the user does not need to consume tedious time to select the scheduling policy, so that the job scheduling policy can be adaptively adjusted, and the job scheduling efficiency can be improved.

After the completion of the flowchart shown in fig. 1, as one embodiment, the evaluation parameter is the average delay time of all jobs; the expected parameter is a target delay time;

before updating the model parameters in the deep neural network in step 103, the method may further include:

step A, determining a deviation value between the average delay time and the target delay time according to the average delay time and the target delay time; and if the deviation value is greater than or equal to the threshold value, executing the step B.

If the deviation value between the evaluation parameter and the set expected parameter is greater than or equal to the threshold value, the operation scheduling strategy output by the current deep neural network model is not appropriate, and based on the fact, the step B needs to be executed to dynamically adjust the operation scheduling strategy of the subsequent input operation; if the deviation value between the evaluation parameter and the set expected parameter is smaller than the threshold value, the job information of a new different job is acquired, and the step 101 is executed.

And B, updating the model parameters in the deep neural network.

As yet another embodiment, the deep neural network includes node weights; the implementation of the implementation step 101 includes: the method comprises the steps that operation information of different operations is input into a deep neural network, and the deep neural network determines an operation scheduling strategy and a calculation node queue associated with each operation according to node weight and an activation function softmax; updating model parameters in the deep neural network includes: node weights in the deep neural network are updated.

As an embodiment, the deep neural network is trained by the following training method:

and step C, acquiring the operation information of different sample operations as sample characteristic attributes.

And configuring the attributes of all submitted jobs in the high-performance computing platform, such as the resources used by the jobs, whether a Slurm backfill strategy is used or not, and the like.

And D, configuring an initial scheduling strategy for the pre-trained deep neural network model to obtain the initialized deep neural network model.

And setting initial values of the deep neural network, such as parameters of learning rate, iteration times, node number, initial weight and the like.

And E, inputting the sample characteristic attributes into the deep neural network model to obtain a computing node queue associated with each sample operation and a sample scheduling strategy corresponding to each sample operation.

And F, generating an execution sequence of the running sample jobs in each computing node queue in the current state by using a sample scheduling strategy according to the computing node queue associated with each sample job.

And G, submitting the sample operation to the computing node according to the execution sequence of the sample operation to obtain the evaluation parameters of the evaluation deep neural network. If the deviation value between the evaluation parameter and the expected parameter is smaller than the threshold value, outputting the trained deep neural network under the condition that the loss value is determined to be converged through the loss function; if the deviation value between the evaluation parameter and the expected parameter is larger than or equal to the threshold value, and under the condition that the loss value is determined not to be converged through the loss function, the model parameter of the deep neural network is adjusted, and the deep neural network model is trained continuously by using the deviation value and the sample operation characteristics.

For easier understanding, assume that different sample jobs are formed by job 1, job 2, … …, and job n, and job information corresponding to each of job 1, job 2, … …, and job n is acquired as a sample feature attribute, as shown in fig. 3. The deep neural network in the DNN reinforcement learning model in fig. 3 has been configured as an initialized deep neural network, which assigns sample feature attributes a1, … …, am corresponding to each of the jobs as described in fig. 3 for different jobs; b1, … …, bm; … …, respectively; k1, … … and km are input into a deep neural network in the DNN reinforcement learning model, after the deep neural network is processed, computing node queues relevant to the jobs 1-n and sample scheduling strategies corresponding to the jobs 1-n are output, for example, in the figure 3, job 2 starts to start backfill, etc., job n sets to close backfill, etc., … …, job 1 sets to start backfill, etc., and according to the output computing node queues relevant to the jobs 1-n, an execution sequence of running sample jobs in each computing node queue in the current state is generated by using the sample scheduling strategies corresponding to the jobs 1-n. Submitting the operation 1-n to a computing node in a simulation high-performance computing platform set up in a simulation environment according to the execution sequence of each operation to obtain an evaluation parameter of the evaluation deep neural network, and outputting the trained deep neural network if the deviation value between the evaluation parameter and an expected parameter is less than a threshold value and under the condition that the loss value is determined to be converged by a loss function; and if the deviation value between the evaluation parameter and the expected parameter is greater than or equal to the threshold value and under the condition that the loss value is determined not to be converged through the loss function, adjusting the model parameters of the deep neural network, and continuing training the deep neural network model by using the deviation value and the sample operation characteristics.

The deviation value and the sample operation characteristic attribute are input into a deep neural network, an Adam optimizer is used for optimizing a loss function, and the network weight is updated.

As an embodiment, determining a loss value in the DNN neural network based on a specified loss function, and the average delay time and the target delay time; and if the loss value is not converged, executing the step E.

The loss function is used for measuring the degree of inconsistency between the average delay time of all the jobs and the target delay time, and is a non-negative real-valued function, and the smaller the loss value, the better the robustness of the deep neural network is represented. Accordingly, if the loss value is larger, the robustness of the deep neural network is poor, and step E needs to be performed to train the deep neural network. As an example, the loss function may take: loss = (y)_x-out _x) Loss represents loss value, y_xRepresents the average delay time, out, corresponding to the job with the number of iterations x_xAnd when the difference value between the loss value calculated by the latest iteration number and the loss value calculated by the last iteration number is greater than or equal to the convergence threshold value, the loss value calculated by the latest iteration number is not converged to the loss value calculated by the last iteration number, and the convergence threshold value is 0.1 or 0.01.

In this embodiment, in this step, the sample job is submitted to the computing node according to the execution sequence of the sample job, which may be submitting the sample job to the real computing node by using a real high-performance computing platform, that is, training the deep neural network by using the real computing node. If the conditions are limited, sample jobs can be submitted to the computing nodes of the simulation environment building simulation high-performance computing platform, so that the submitted jobs can be run in the simulation computing environment. As to whether to submit the sample job to the computing node in the simulation environment or to submit the sample job to the computing node in the real environment, the present embodiment is not limited herein, but in order to obtain a deep neural network with better effect, it is recommended to use the computing node in the real environment.

And E, repeating the step E to the step G until the iteration times reach the set initial value or the average delay time is converged.

As an embodiment, after step 103, a chart, a report or/and other documents describing the scheduling process for the user to view may be further included.

Fig. 4 is a block diagram of an embodiment of an apparatus 400 for scheduling a job based on reinforcement learning according to an exemplary embodiment of the present application, as shown in fig. 4, the apparatus is applied to a management node in a server cluster, the server cluster further includes at least one computing node for running a job, and the apparatus includes:

a policy and queue obtaining unit 401, configured to input job information of different jobs to a deep neural network, so as to obtain a job scheduling policy and a compute node queue associated with each job; the job scheduling strategy is used for scheduling and running the different jobs, and at least one computing node in a computing node queue associated with any job is used for running the job;

an evaluation parameter determining unit 402, configured to schedule and run each job according to the job scheduling policy and the computing node queue associated with each job, and determine an evaluation parameter for evaluating the deep neural network according to a scheduling running condition of each job;

a policy updating unit 403, configured to update the model parameters in the deep neural network according to the evaluation parameters and the expected parameters that have been set for each job, so as to adjust the job scheduling policy.

In an optional implementation manner, the job information at least includes attribute data under different job attributes;

the strategy and queue obtaining unit comprises an information input subunit for inputting operation information of different operations into the deep neural network, and the information input subunit is specifically used for:

performing dimension reduction processing on attribute data under different job attributes to obtain target job data;

inputting the target operation data into a deep neural network.

In an alternative implementation, the evaluation parameter is an average delay time of all jobs; the expected parameter is a target delay time;

the device also includes: an update model parameter unit for:

determining a deviation value between the average delay time and the target delay time according to the average delay time and the target delay time;

and if the deviation value is less than or equal to the threshold value, updating the model parameters in the deep neural network.

In an optional implementation, the deep neural network includes node weights;

the policy and queue obtaining unit is specifically configured to: the method comprises the steps that operation information of different operations is input into a deep neural network, and the deep neural network determines a job scheduling strategy and a calculation node queue associated with each operation according to node weight and through an activation function softmax;

the policy update unit comprises an update model parameter subunit for updating model parameters in the deep neural network;

the update model parameter subunit is specifically configured to: node weights in the deep neural network are updated.

In an alternative implementation, the attribute data includes any combination of the following attributes: the number of CPUs (central processing units) required to be used by each task in a job, the number of GPUs required to be used when the job runs on each computing node, the size of an application memory when the job is submitted, the number of computing nodes required by each job, the priority given to each job, the number of tasks, the number of task groups and an identifier for representing the job needing to be submitted preferentially.

According to the technical scheme, the management node inputs the operation information of different operations into the deep neural network to obtain an operation scheduling strategy and a calculation node queue associated with each operation; and scheduling and operating each job according to the job scheduling strategy and the calculation node queue associated with each job, determining an evaluation parameter for evaluating the deep neural network according to the scheduling and operating condition of each job, and updating a model parameter in the deep neural network according to the evaluation parameter and an expected parameter set for each job so as to adjust the job scheduling strategy. It can be seen that, when different jobs are scheduled, experienced workers are not required to participate in configuring job scheduling policy parameters corresponding to each job, job information of different jobs is input into the deep neural network to automatically output a job scheduling policy and a calculation node queue associated with each job, and when it is determined through evaluation that a scheduling policy determined by the deep neural network is not reasonable, the deep neural network is updated by adjusting model parameters of the deep neural network, so as to dynamically adjust the job scheduling policy of subsequent input jobs. Therefore, the technical scheme provided by the embodiment adaptively adjusts the job scheduling policy according to the expected parameters configured by the user, and the user does not need to consume tedious time to select the scheduling policy, so that the job scheduling policy can be adaptively adjusted, and the job scheduling efficiency can be improved.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

In the electronic device provided in the embodiment of the present application, from a hardware level, a schematic diagram of a hardware architecture can be seen in fig. 5. The method comprises the following steps: a machine-readable storage medium and a processor, wherein: the machine-readable storage medium stores machine-executable instructions executable by the processor; the processor is configured to execute machine-executable instructions to perform reinforcement learning-based job scheduling operations as disclosed in the above examples.

Machine-readable storage media are provided by embodiments of the present application that store machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement a reinforcement learning-based job scheduling operation disclosed in the above examples.

Here, a machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and so forth. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

So far, the description of the apparatus shown in fig. 5 is completed.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A job scheduling method based on reinforcement learning is applied to a management node in a server cluster, the server cluster further comprises at least one computing node for running a job, and the method comprises the following steps:

inputting the operation information of different operations into a deep neural network to obtain an operation scheduling strategy and a calculation node queue associated with each operation; the job scheduling strategy is used for scheduling and running the different jobs, and at least one computing node in a computing node queue associated with any job is used for running the job;

and updating the model parameters in the deep neural network according to the evaluation parameters and the expected parameters set for each job so as to adjust the job scheduling strategy.

2. The method according to claim 1, wherein the job information includes attribute data under different job attributes;

the inputting of the job information of the different jobs to the deep neural network comprises:

performing linear dimensionality reduction on attribute data under different operation attributes through a PCA algorithm to obtain target operation data;

inputting the target operation data into a deep neural network.

3. The method of claim 1, wherein the evaluation parameter is an average delay time of all jobs; the expected parameter is a target delay time;

before determining to update the model parameters in the deep neural network according to the evaluation parameters and the expected parameters set for the jobs, the method further comprises:

and if the deviation value is larger than or equal to a threshold value, updating the model parameters in the deep neural network.

4. The method of claim 1, wherein the deep neural network comprises node weights;

the method for inputting the operation information of different operations into the deep neural network to obtain the operation scheduling strategy and the calculation node queues associated with the operations specifically comprises the following steps:

the method comprises the steps that operation information of different operations is input into a deep neural network, and the deep neural network determines a job scheduling strategy and a calculation node queue associated with each operation according to node weight and through an activation function softmax;

the updating model parameters in the deep neural network comprises: node weights in the deep neural network are updated.

5. The method of claim 2, wherein the attribute data comprises any combination of the following attributes:

the number of CPUs (central processing units) required to be used by each task in a job, the number of GPUs required to be used when the job runs on each computing node, the size of an application memory when the job is submitted, the number of computing nodes required by each job, the priority given to each job, the number of tasks, the number of task groups and an identifier for representing the job needing to be submitted preferentially.

6. An apparatus for scheduling jobs based on reinforcement learning, the apparatus being applied to a management node in a server cluster, the server cluster further comprising at least one computing node for running jobs, the apparatus comprising:

7. The apparatus according to claim 6, wherein the job information includes at least attribute data under different job attributes;

inputting the target operation data into a deep neural network.

8. The apparatus of claim 6, wherein the evaluation parameter is an average delay time of all jobs; the expected parameter is a target delay time; the device also includes: an update model parameter unit for:

9. The apparatus of claim 6, wherein the deep neural network comprises node weights;

the policy updating unit comprises an update model parameter subunit for updating model parameters in the deep neural network;

10. The apparatus of claim 6, wherein the attribute data comprises any combination of the following attributes:

the number of CPUs (central processing units) required to be used by each task in a job, the number of GPUs required to be used when the job runs on each computing node, the size of an application memory when the job is submitted, the number of computing nodes required by each job, the priority given to each job, the number of tasks, the number of task groups and an identifier representing that the job needs to be submitted preferentially.

11. An electronic device comprising a readable storage medium and a processor;

wherein the readable storage medium is configured to store machine executable instructions;

the processor configured to read the machine executable instructions on the readable storage medium and execute the instructions to implement the steps of the method of any one of claims 1-5.