CN114860435B

CN114860435B - Big data job scheduling method based on task selection process reinforcement learning

Info

Publication number: CN114860435B
Application number: CN202210449623.1A
Authority: CN
Inventors: 夏莹杰; 武建伟; 陈天祥; 刘瑞峰; 张雷
Original assignee: Taizhou Jiema Technology Co ltd; Research Institute of Zhejiang University Taizhou
Current assignee: Taizhou Jiema Technology Co ltd; Research Institute of Zhejiang University Taizhou
Priority date: 2022-04-24
Filing date: 2022-04-24
Publication date: 2024-04-05
Anticipated expiration: 2042-04-24
Also published as: CN114860435A

Abstract

The invention discloses a big data job scheduling method based on reinforcement learning in a task selection process, which comprises the following steps: s1, carrying out vectorization processing on the jobs in the cluster; s2, for the selected task node, the agent determines how many task instances of the selected task node are scheduled; s3, selecting a task executor type with the best matched resource from the selected task nodes by the resource matching module, and then selecting an idle task executor from the task executor types to prepare to execute a currently scheduled task instance; and S4, deploying and executing the scheduled task instance on the big data task execution platform. The method has strong adaptability and universality for big data operations of different types.

Description

Big data job scheduling method based on task selection process reinforcement learning

Technical Field

The invention relates to a big data job scheduling method based on reinforcement learning in a task selection process, and belongs to the field of artificial intelligence.

Background

The big data application is the utilization of the data value, namely effective information is mined from mass data through data analysis, and decision support is provided for users. How to process these data efficiently becomes critical, and scheduling between tasks in data processing has a great significance to overall performance and resource utilization.

In the workflow, the dependency relationship between task nodes becomes more complex as the number of nodes increases. However, the traditional heuristic algorithm cannot well process complex dependency relationships among task nodes, and only can process the complex dependency relationships through artificial feature engineering construction feature values, so that the artificial influence is great. The graph convolution neural network can autonomously learn task node characteristics and deeply excavate relations in a topological structure, so that human interference is avoided. Therefore, the graph convolution neural network is used for task dependency processing in the workflow scheduling problem, so that the algorithm scheduling effect is improved, and the hot spot direction of the study is achieved.

At present, the existing method based on the graph convolution neural network can only schedule and execute large data jobs of a specific type (running on a Hadoop computing cluster), cannot execute dynamic parameter adjustment on scheduled tasks to optimize a scheduling process, and has great limitation.

Disclosure of Invention

The invention aims to solve the defects in the background art, and is used for completing the dispatching of big data operation in a dynamic workflow environment and improving the overall completion efficiency of the operation.

In order to achieve the above object, the present invention provides a big data job scheduling method based on reinforcement learning in a task selection process, comprising the steps of: s1, carrying out vectorization processing on the jobs in the cluster: if the operation set is not completed, inputting the embedded vector into the reinforcement learning module for decision making; after the full-connection layer processing is completed, the decision network for scheduling carries out a differential value reprocessing process on each task node; finally, the agent of the reinforcement learning module selects the next scheduled task node by setting a softmax layer; s2, for the selected task node, the agent determines how many task instances of the selected task node are scheduled; the corresponding decision network reduces the probability that the job which cannot be scheduled is selected after the full connection layer is processed and at the softmax layer, and takes the number of task executors with the highest probability of the selected task nodes as the number of task instances of specific scheduling; s3, selecting a task executor type with the best matched resource from the selected task nodes by the resource matching module, and then selecting an idle task executor from the task executor types to prepare to execute a currently scheduled task instance; s4, deploying and executing the scheduled task instance on the big data task execution platform: after the execution of the job subset of the current batch is finished, quantifying the rewarding value of the action according to the whole dispatching time of the round of dispatching; if the decision of the agent improves the scheduling effect, the agent will obtain a positive prize value and increase the probability of selecting the decision afterwards; otherwise, the agent will obtain a reverse prize value and reduce the probability of later selecting the decision.

Further, the step S1 includes: s11, converting the graph by using a graph convolution neural network, wherein the processed embedded vector is divided into three classes of node level, operation level and global level; and S12, after the embedded vector is processed by the last full connection layer of the decision network, judging whether each task node can be scheduled. If the scheduling is impossible, reducing vector values corresponding to the task nodes; otherwise, the state is maintained unchanged; and S13, finally, the decision network sets a softmax layer for further processing, and selects the next scheduled task node.

Further, the step S2 includes: s21, judging whether the job of the selected task node is selectable, if the job of the task node cannot be selected, reducing the value corresponding to the job after the decision network full-connection layer processing; otherwise, the state is maintained unchanged; s22, the decision network reduces the probability that the job which cannot be scheduled is selected through further processing of a softmax layer; s23, according to the job where the selected task node is located, calculating a corresponding row vector in the matrix, and selecting the number of task executors with highest probability in the row vector as the number of task instances of specific scheduling.

Further, the step S3 includes: s31, when the agent determines the next scheduled task node, the resource matching module calculates the matching value of the task node and the task executor category, and selects the task executor category corresponding to the maximum matching value; the S32 resource matching module selects an idle task executor from the selected task executor categories to prepare to execute the currently scheduled task instance.

Further, the step S4 includes: s41, after the agent determines the next scheduled task node and the corresponding task node instance, a task executor selected by the resource matching module executes the given task instance in the big data job cluster; s42, after the current scheduling period is finished, using the whole scheduling time of the job as an independent variable, and obtaining a corresponding rewarding value by using a rewarding function; the agent utilizes the feedback of the forward or reverse prize value to learn constantly to explore a series of optimal actions to maximize the cumulative prize expectation.

Compared with the prior art, the invention has the beneficial effects that:

the adaptability is stronger, and the method has universality for big data jobs of different types, such as MapReduce, spark and the like.

Drawings

FIG. 1 is a decision network block diagram of task node scheduling in accordance with one embodiment of the present invention;

FIG. 2 is a block diagram of a resource matching module decision making according to one embodiment of the invention.

Detailed Description

The technical scheme of the invention is further described below with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1 and fig. 2, in one embodiment of the present invention, step 1, vectorizing a job in a cluster. If the operation set is not completed, the embedded vector is input into the reinforcement learning module to make a decision. And (3) carrying out a scheduling decision network, and carrying out a differential value reprocessing process on each task node after the full connection layer processing is completed. Finally, the agent of the reinforcement learning module selects the next scheduled task node by setting a softmax layer.

The graph is transformed by using a graph convolution neural network, and the processed embedded vectors are divided into three classes of node level, operation level and global level.

The task node level includes: task node v _i Execution time dur of each task instance in (3) _i The method comprises the steps of carrying out a first treatment on the surface of the Task node v _i Is the total task instance number inst of (2) _i The method comprises the steps of carrying out a first treatment on the surface of the Task nodev _i The remaining task instance number rest of (2) _i The method comprises the steps of carrying out a first treatment on the surface of the Task node v _i Priority rank of a network _i The method comprises the steps of carrying out a first treatment on the surface of the Task node v _i The required core for CPU core number per task instance _i The method comprises the steps of carrying out a first treatment on the surface of the Task node v _i The memory size requirements mem for each task instance of (a) _i . The job level includes: the total number of task executors used_num_exec that the current job n already occupies _n The current global remaining idle task executor number gl_rest_num_exec. The global level is calculated by calculating embedded vectors of a plurality of job levels using the same calculation formula on the basis of the job level.

The decision network for selecting the next scheduled task node consists of 4 full connection layers and 3 leak ReLU activation functions. The 4 fully connected layers contain 32, 16, 4, 1 neurons, respectively.

The values maintained for the task nodes, including the remaining instance number rest_num_inst, the remaining execution time rest_exec_time, will vary as the agent continues to perform scheduling operations. Both are represented using two-dimensional vectors as follows:

(rest_num_inst,rest_exec_time)

the rest_num_inst in the formula is obtained by subtracting the number of task instances which are already executed and completed by each task node maintained by the scheduling algorithm from the total number of the task nodes, and the rest_exec_time is obtained by multiplying the rest_num_inst by the execution time dur required by the task instance.

After the embedded vector is processed by the last full connection layer of the decision network, whether each task node can be scheduled or not is judged. If the scheduling is impossible, the value corresponding to the node is subtracted by 1e in the matrix after the full connection layer processing ¹⁰ . If the task node can be scheduled, the state is maintained.

The decision network calculates a matrix dimension [ total_num_task_nodes,1] in which each task node in each job has a corresponding value. This value represents the probability that the task node is selected for scheduling.

Finally, the decision network sets a softmax layer for further processing, and selects the next scheduled task node.

In one specific example, assuming 10 task nodes, the final calculated result matrix of the decision network is as follows:

at the 8 th task node (with probability of 0.5723) with the highest corresponding probability is selected.

Fig. 1 is a block diagram of a decision network for task node scheduling in accordance with one embodiment of the present invention.

Step 2, further, for the selected task node, the agent determines how many task instances to schedule. And the corresponding decision network reduces the probability that the job which cannot be scheduled is selected after the full connection layer is processed and at the softmax layer, and takes the number of task executors with the highest probability of the selected task nodes as the number of task instances of specific scheduling.

And judging whether the job where the selected task node is located is selectable. If the job where the task node is located cannot be selected, reducing the value corresponding to the job after the decision network full-connection layer processing; otherwise, remain unchanged.

The information that the agent maintains for the job includes the total number of task executors used by the job, used_num_exec, and the total number of idle task executors gl_rest_num_exec, and these information are stored using two-dimensional vectors. The specific contents are as follows:

(used_num_exec,gl_rest_num_exec)

the state information of the task executors needs to contain the current residual quantity of each task executor, and the vector is used for representing the state information as follows:

(free_num_exec_type ₁ ,free_num_exec_type ₂ ,...,free_num_exec_type _k ) The calculation formula of the global remaining idle task executor number gl_rest_num_exec of the job is as follows:

as shown in the above formula, the global remaining task executor number is the sum of the remaining idle task executors number of all dimensions.

The decision network for performing the job selection judgment is also composed of 4 full connection layers and 3 leak ReLU activation functions. The fully connected layers contain 32, 16, 4, 1 neurons, respectively. If the job of the task node cannot be selected, the value corresponding to the job is subtracted by 1e in the matrix after the full link layer processing ¹⁰ If the job can be selected, the job remains as it is. The reason that a job cannot be selected is either that the job has been executed to completion or that the global remaining task executor number gl_rest_num_exec is 0.

The final decision network reduces the probability that an unscheduled job is selected through further processing by the softmax layer.

The dimension of the calculation result of the decision network after shape reconstruction becomes [ total_num_jobs ], total_num_exec ]. In the matrix, each job has a corresponding row vector, the dimension in the row vector represents how many task executors can be allocated to the job, and the corresponding value represents the probability of allocating the number of task executors. Because the agent finally needs to determine the number of task instances specifically scheduled in the currently selected task node, the scheduling algorithm needs to calculate the corresponding row vector in the matrix according to the job in which the selected task node is located, and select the number of task executors with the highest probability in the row vector as the number of task instances specifically scheduled.

In one embodiment, assuming that the task node selected in step 1 is x, the task executor row vector of the corresponding job is

(4,3,2,5,1)

The corresponding probability line vector is

(0.35,0.12,0.27,0.15,0.11)

The task executor number with the highest corresponding probability is 4 (the probability is 0.35), namely, 4 task executors are selected to execute the job corresponding to the task node x.

And 3, selecting the task executor type with the best matching resource from the selected task nodes by the resource matching module. An idle task executor is then selected from the task executor classes to prepare to execute the currently scheduled task instance.

When the agent determines that the next scheduled task node is good, the resource matching module calculates matching values of the task node and task executor types, and selects the task executor type corresponding to the maximum matching value.

The demand of the task instance for the resources is divided into two parts, namely a CPU and a memory. The scheduling algorithm needs to make sure whether the whole job is focused on preferentially satisfying the CPU or preferentially satisfying the memory. If the CPU resources are more intense, the CPU should be preferentially satisfied, otherwise, the memory should be preferentially satisfied.

The emphasis is quantified by using a weight coefficient alpha, wherein alpha is the weight which preferentially satisfies the CPU, and 1-alpha is the weight which preferentially satisfies the memory. The calculation formula is as follows:

the method C (&) in the formula is used for calculating the requirement of task nodes on the CPU core number or the CPU core number configuration of the task executor, V represents all task node sets in all jobs, E represents the types of all task executors, and N (E) represents the number of E-type task executors. The M (& gt) method in the formula is used for calculating the requirement of the task node on the memory size or the memory configuration of the task executor.

When the agent determines that the next scheduled task node v is good, the resource matching module will calculate the task nodeMatching value score to task executor class e _v,e ：

The resource matching module selects the task executor category corresponding to the maximum matching value for the task node v, namely:

further, the resource matching module selects an idle task preparation executor from the selected task executor categories to execute the currently scheduled task instance.

FIG. 2 is a block diagram illustrating a decision of a resource matching module according to an embodiment of the present invention.

And step 4, deploying and executing the scheduled task instance on the big data task execution platform. And after the execution of the job subset of the current batch is finished, quantifying the rewarding value of the action according to the whole dispatching time of the round of dispatching. If the decision of the agent improves the scheduling effect, the agent will obtain a positive prize value and increase the probability of selecting the decision afterwards; otherwise, the agent will obtain a reverse prize value and reduce the probability of later selecting the decision.

When the agent determines that the next scheduled task node and the corresponding task node instance are good, the task executor selected by the resource matching module will execute the given task instance in the big data job cluster.

And after the current scheduling period is finished, the whole scheduling time of the job is taken as an independent variable, and a corresponding rewarding value is obtained by using a rewarding function. The agent utilizes the feedback of the forward or reverse prize value to learn constantly to explore a series of optimal actions to maximize the cumulative prize expectation.

Once all the set actions have been performed, the environment updates the global time point to gtime. The previous global time point is recorded as prev_gtime, and the initial global time point is recorded as 0. The bonus function is designed as follows:

r＝-(gtime-prev_gtime)*r_s

r_s in the formula is a hyper-parameter.

Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. The big data job scheduling method based on the task selection process reinforcement learning is characterized by comprising the following steps:

s1, carrying out vectorization processing on the jobs in the cluster: if the operation set is not completed, inputting the embedded vector into the reinforcement learning module for decision making; after the full-connection layer processing is completed, the decision network for scheduling carries out a differential value reprocessing process on each task node; finally, the agent of the reinforcement learning module selects the next scheduled task node by setting a softmax layer;

s2, for the selected task node, the agent determines how many task instances of the selected task node are scheduled; the corresponding decision network reduces the probability that the job which cannot be scheduled is selected after the full connection layer is processed and at the softmax layer, and takes the number of task executors with the highest probability of the selected task nodes as the number of task instances of specific scheduling;

s3, selecting a task executor type with the best matched resource from the selected task nodes by the resource matching module, and then selecting an idle task executor from the task executor types to prepare to execute a currently scheduled task instance;

s4, deploying and executing the scheduled task instance on the big data task execution platform: after the execution of the job subset of the current batch is finished, quantifying the rewarding value of the action according to the whole dispatching time of the round of dispatching; if the decision of the agent improves the scheduling effect, the agent will obtain a positive prize value and increase the probability of selecting the decision afterwards; otherwise, the agent will obtain a reverse prize value and reduce the probability of later selecting the decision;

the step S1 includes:

s11, converting the graph by using a graph convolution neural network, wherein the processed embedded vector is divided into three classes of node level, operation level and global level;

and S12, after the embedded vector is processed by the last full connection layer of the decision network, judging whether each task node can be scheduled. If the scheduling is impossible, reducing vector values corresponding to the task nodes; otherwise, the state is maintained unchanged;

s13, finally, the decision network sets a softmax layer for further processing, and selects a next scheduled task node;

the step S4 includes:

s41, after the agent determines the next scheduled task node and the corresponding task node instance, a task executor selected by the resource matching module executes the given task instance in the big data job cluster;

s42, after the current scheduling period is finished, using the whole scheduling time of the job as an independent variable, and obtaining a corresponding rewarding value by using a rewarding function; the agent utilizes the feedback of the forward or reverse prize value to learn constantly to explore a series of optimal actions to maximize the cumulative prize expectation.

2. The big data job scheduling method based on the task selection process reinforcement learning according to claim 1, wherein the step S2 includes:

s21, judging whether the job of the selected task node is selectable, if the job of the task node cannot be selected, reducing the value corresponding to the job after the decision network full-connection layer processing; otherwise, the state is maintained unchanged;

s22, the decision network reduces the probability that the job which cannot be scheduled is selected through further processing of a softmax layer;

s23, according to the job where the selected task node is located, calculating a corresponding row vector in the matrix, and selecting the number of task executors with highest probability in the row vector as the number of task instances of specific scheduling.

3. The big data job scheduling method based on the task selection process reinforcement learning according to claim 2, wherein the step S3 includes:

s31, when the agent determines the next scheduled task node, the resource matching module calculates the matching value of the task node and the task executor category, and selects the task executor category corresponding to the maximum matching value;

s32, selecting idle task execution from the selected task executor types by the resource matching module

The walker prepares to execute the currently scheduled task instance.