CN111198754B

CN111198754B - Task scheduling method and device

Info

Publication number: CN111198754B
Application number: CN201811376902.XA
Authority: CN
Inventors: 孙正君; 喻涵; 李磊; 陈斌斌; 罗洋
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Priority date: 2018-11-19
Filing date: 2018-11-19
Publication date: 2023-07-14
Anticipated expiration: 2038-11-19
Also published as: CN111198754A

Abstract

The invention discloses a task scheduling method and device. The method comprises the following steps: the resource use data of each node in the current period is obtained, the resource use data of each node in the next period is predicted by taking the resource use data in the current period as the input of the bidirectional cyclic neural network model, and then the tasks in each node can be scheduled according to the resource use data of each node in the next period. The embodiment of the invention adopts the bidirectional cyclic neural network model, so that the resource use data of each node in the next period can be predicted more accurately, and the accuracy of scheduling the tasks in each node is further improved.

Description

Task scheduling method and device

Technical Field

The present invention relates to the field of software technologies, and in particular, to a task scheduling method and apparatus.

Background

The Kubernetes system is an open source container cluster management project designed and developed by Google. The design aims to provide an automatized, telescopic and extensible operation platform for the container cluster. The Kubernetes system can be used for conveniently managing containerized applications, and can solve the problem of communication between containers.

The design of the task scheduling method in the Kubernetes system needs to be from the standpoint of maximizing the resource utilization rate and the like, so that the dynamic scheduling can be triggered in advance before the resource bottleneck occurs. In order for the Kubernetes system to respond before a resource bottleneck occurs, it is necessary to predict the amount of resource demand applied over a certain period of time in the future and then dynamically schedule the resource based on the predicted value. The prediction method adopted at present mainly predicts the future resource demand through a simple regression model. However, these methods are susceptible to interference from predictor subjective factors, have high quality requirements on historical data, and have low prediction accuracy.

Based on this, a task scheduling method is needed at present, which is used for solving the problem of low scheduling accuracy caused by low accuracy of predicting the future resource demand in the task scheduling process in the prior art.

Disclosure of Invention

The embodiment of the invention provides a task scheduling method and device, which are used for solving the technical problem of low scheduling accuracy caused by low accuracy of predicting future resource demand in the task scheduling process in the prior art.

The embodiment of the invention provides a task scheduling method, which comprises the following steps:

acquiring resource use data of each node in the current period;

taking the resource use data of each node in the current period as the input of a bidirectional cyclic neural network model, and predicting to obtain the resource use data of each node in the next period; the bidirectional cyclic neural network model is obtained by training according to resource use data of each node in a history period, wherein the history period is a period before the current period;

and scheduling the tasks in each node according to the resource use data of each node in the next period.

In this way, the embodiment of the invention predicts the resource usage data of each node in the next period by taking the resource usage data in the current period as the input of the bidirectional cyclic neural network model, and further can schedule the tasks in each node according to the predicted resource usage data. The bidirectional cyclic neural network model is adopted, so that the resource use data of each node in the next period can be predicted more accurately, and the accuracy of scheduling the tasks in each node is improved; furthermore, the embodiment of the invention can automatically monitor, collect and process data, perform learning analysis and make predictions through a bidirectional circulating neural network model, and realize the whole-course intellectualization without manual intervention; furthermore, in the embodiment of the invention, the threshold value according to which the scheduling is based can be flexibly adjusted according to the resource use data of each node in the current period and the resource use data in the next period, so that the task scheduling is more flexible and more reasonable, and the task can be timely, accurately and dynamically scheduled.

In one possible implementation, the resource usage data includes at least one resource usage amount and a resource usage state, the resource usage state being determined according to the at least one resource usage amount;

the bidirectional cyclic neural network model is obtained by training according to at least one resource usage amount and resource usage state of each node in the history period.

In one possible implementation, the at least one resource usage includes memory usage, CPU usage, disk usage, and IO throughput.

In one possible implementation manner, the bidirectional recurrent neural network model is trained according to at least one resource usage amount and resource usage state of each node in the history period, and includes:

acquiring at least one resource usage amount and resource usage state of each node in any history period;

taking at least one resource usage amount and resource usage state of each node in a first history period as input parameters of a training sample, and taking at least one resource usage amount and resource usage state of each node in a second history period as output parameters of the training sample; the first history period is the last period of the second history period;

and training the bidirectional circulating neural network model by using the training sample to obtain the bidirectional circulating neural network model.

In one possible implementation, the bidirectional recurrent neural network model includes a forward neural network layer and a reverse neural network layer;

the method further comprises the steps of:

receiving a model updating instruction of a user;

and modifying the layer number of the forward neural network layer and/or the reverse neural network layer in the bidirectional circulating neural network model according to the model updating instruction.

The embodiment of the invention provides a task scheduling device, which comprises:

the receiving unit is used for acquiring the resource use data of each node in the current period;

the processing unit is used for taking the resource use data of each node in the current period as the input of a bidirectional cyclic neural network model, and predicting to obtain the resource use data of each node in the next period; the bidirectional cyclic neural network model is obtained by training according to resource use data of each node in a history period, wherein the history period is a period before the current period;

and the scheduling unit is used for scheduling the tasks in each node according to the resource use data of each node in the next period.

In a possible implementation manner, the processing unit is specifically configured to:

the receiving unit is also used for receiving a model updating instruction of a user;

the processing unit is further configured to modify the number of layers of the forward neural network layer and/or the reverse neural network layer in the bidirectional recurrent neural network model according to the model update instruction.

The embodiment of the application also provides a device which has the function of realizing the task scheduling method. The functions may be implemented by hardware executing corresponding software, and in one possible design, the apparatus comprises: a processor, transceiver, memory; the memory is used for storing computer-executable instructions, the transceiver is used for realizing the communication between the device and other communication entities, the processor is connected with the memory through the bus, and when the device runs, the processor executes the computer-executable instructions stored in the memory so as to enable the device to execute the task scheduling method described above.

Embodiments of the present invention also provide a computer storage medium having stored therein a software program which, when read and executed by one or more processors, implements the task scheduling method described in the various possible implementations described above.

Embodiments of the present invention also provide a computer program product containing instructions that, when run on a computer, cause the computer to perform the task scheduling method described in the various possible implementations described above.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that are required to be used in the description of the embodiments will be briefly described below.

FIG. 1 is a schematic diagram of a system architecture to which embodiments of the present invention are applicable;

fig. 2 is a schematic flow chart corresponding to a task scheduling method according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a two-way recurrent neural network model according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a task scheduling device according to an embodiment of the present invention.

Detailed Description

The following description of the present application is provided with reference to the accompanying drawings, and the specific operation method in the method embodiment may also be applied to the device embodiment.

Fig. 1 schematically illustrates a system architecture to which an embodiment of the present invention is applied, as shown in fig. 1, a system 100 to which an embodiment of the present invention is applied includes a scheduling device 101 and at least one node, for example, a node 1021, a node 1022, and a node 1023 illustrated in fig. 1. The scheduling apparatus 101 may be connected to each node through a Server interface (API Server), for example, as shown in fig. 1, the scheduling apparatus 101 may be connected to the node 1021 through the API Server, may be connected to the node 1022 through the API Server, or may be connected to the node 1023 through the API Server.

Further, taking the system 100 as an example of a Kubernetes system, the scheduling device 101 may include a management module 1011, a resource monitoring module 1012, a depth analysis module 1013, and a resource scheduling module 1014. Each node may include a Kubelet component and at least one Pod, for example, node 1021 may include Kubelet components 1211, pod1212, and Pod1213, node 1022 may include Kubelet components 1221, pod1222, and Pod1223, and node 1023 may include Kubelet components 1231, pod1232, and Pod1233.

In particular, the management module 1011 may be responsible for creating Pod in the node, e.g., the management module 1011 may create Pod1212 and Pod1213 in the node 1021, pod1222 and Pod1223 in the node 1022, and Pod1232 and Pod1233 in the node 1023.

The resource monitoring module 1012 may be configured to collect resource usage data at each node, and may aggregate the resource usage data in units of Pod, and provide the aggregated resource usage data to the deep analysis module 1013.

The deep analysis module 1013 may train through a deep learning model according to the resource usage data provided by the resource monitoring module 1012, and predict the resource usage data in a later period of time.

Further, as shown in fig. 1, the depth analysis module 1013 may include a data processing module 1131, a depth prediction module 1132, and a model update module 1133. Wherein the data processing module 1131 may be configured to pre-process the resource usage data collected by the resource monitoring module 1012; the depth prediction module 1132 can send the processing result of the data processing module 1131 into a deep learning model for learning, so as to predict future resource use data; the model update module 1133 may receive the model update instruction of the administrator through the network and may update the model structure according to the need.

The resource scheduling module 1014 may generate a scheduling policy based on the prediction results of the depth analysis module 1013 and issue the scheduling policy to each node.

Further, as shown in FIG. 1, the resource scheduling module 1014 may include a policy generation module 1141 and a policy issuing module 1142. The policy generation module 1141 may generate a corresponding scheduling policy according to the processing result of the depth analysis module 1013; the policy issuing module 1142 may issue the policy generated by the policy generating module 1141 to each node, so as to implement reasonable scheduling of system resources.

Based on the system architecture shown in fig. 1, fig. 2 schematically shows a flow diagram corresponding to a task scheduling method according to an embodiment of the present invention, including the following steps:

step 201, obtaining resource usage data of each node in the current period.

And 202, taking the resource use data of each node in the current period as the input of a bidirectional cyclic neural network model, and predicting to obtain the resource use data of each node in the next period.

And step 203, scheduling the tasks in each node according to the resource use data of each node in the next period.

In particular, the resource usage data may include at least one of a resource usage amount and a resource usage status.

The resource usage may be memory usage, CPU usage, disk usage, and IO throughput, which are not specifically limited. Further, as can be seen from the content shown in fig. 1, when a plurality of Pod may exist in each node and the resource usage of each node is obtained, the resource usage of each Pod in each node may be obtained first, and then the resource usage of the node may be determined according to the resource usage of each Pod.

As shown in table 1, an example of the resource usage in the embodiment of the present invention is shown. The node A comprises Pod A-1 and Pod A-2, wherein the memory usage amount of the Pod A-1 is 20%, the CPU usage amount is 50%, the disk usage amount is 10%, and the IO throughput is 30%; the memory usage of Pod A-2 was 25%, the CPU usage was 30%, the disk usage was 80%, and the IO throughput was 30%, whereby it was found that the memory usage of node A was 45%, the CPU usage was 80%, the disk usage was 90%, and the IO throughput was 60%. The node B comprises Pod B-1 and Pod B-2, wherein the memory usage of the Pod B-1 is 35%, the CPU usage is 55%, the disk usage is 20%, and the IO throughput is 30%; the memory usage of Pod B-2 was 25%, the CPU usage was 45%, the disk usage was 55%, and the IO throughput was 60%, whereby the memory usage of node B was 60%, the CPU usage was 100%, the disk usage was 75%, and the IO throughput was 90%.

Table 1: an example of the resource usage in the embodiment of the present invention

It should be noted that, the method for calculating the resource usage of the node shown above is only an example, and in other possible implementations, the resource usage of the node may also be calculated according to the resource usage of each Pod and the weight of each Pod, which will not be described in detail.

In the embodiment of the invention, the resource use state can be determined according to at least one resource use amount. Specifically, taking the resource usage amount of the node a shown in table 1 as an example, the resource usage state of the node a may be determined according to the memory usage amount, the CPU usage amount, the disk usage amount, and the IO throughput in the node a. For example, when calculating the resource usage state of the node a, it may be determined whether the number of resource usage amounts in the node a is greater than the usage amount threshold is greater than the number threshold, if so, the resource usage state of the node a may be determined to be in a high load state, otherwise, the resource usage state of the node a may be determined to be in a low load state. For another example, when calculating the resource usage state of the node a, the load amount of the node a may be determined according to the memory usage amount, the CPU usage amount, the disk usage amount and the IO throughput in the node a, and then by determining whether the load amount in the node a is greater than the load amount threshold, if so, the resource usage state of the node a may be determined to be a high load state, otherwise, the resource usage state of the node a may be determined to be a low load state.

Prior to performing step 202, embodiments of the present invention may pre-process the resource usage data of each node. Wherein the preprocessing may include two steps of format conversion and dimension reconstruction. The format conversion means converting the acquired resource usage data into a format that can be recognized by the neural network model. The format conversion mode is different according to the different types of the resource usage data, for example, the data field of the Boolean type can be converted into a binary value format and then used as an input data format; knowing that the text type data field can be used as an input data format after format conversion by a Bag of Word (BOW) method; alternatively, a data field of a numeric type may be reserved for the original type as an input data format.

Further, dimension reconstruction can be performed on the data after format conversion, so that a dimension meeting the input requirement of the deep learning model is constructed.

In step 202, as shown in fig. 3, a schematic structural diagram of a bidirectional recurrent neural network model according to an embodiment of the present invention is shown. The bidirectional recurrent neural network model may include a 1-Layer Input Layer (Input Layer), a 3-Layer Forward neural network Layer (Forward Layer), a 3-Layer reverse neural network Layer (backward Layer), a 2-Layer full connection Layer (Fully Connected Layer), and a 1-Layer Output Layer (Output Layer). When the bidirectional cyclic neural network model is calculated, the processed data is input to an input layer, the input layer carries out calculation processing, the calculation result is input to a forward neural network layer and a reverse neural network layer at the same time, the calculation result is combined and input to a full-connection layer after the calculation of the forward neural network layer and the reverse neural network layer is carried out, and the full-connection layer carries out prediction output through an output layer after the data is processed.

Specifically, the bidirectional cyclic neural network model may be trained according to resource usage data of each node in a history period, where the history period may be a period before the current period. Further, the bidirectional cyclic neural network model is obtained by training according to at least one resource usage amount and resource usage state of each node in the history period.

Further, when training the bidirectional cyclic neural network model, the resource usage data can be preprocessed to form a time sequence as a training sample, specifically, at least one resource usage amount and resource usage state of each node in any history period can be obtained first; then, taking at least one resource usage amount and resource usage state of each node in a first history period as input parameters of a training sample, and taking at least one resource usage amount and resource usage state of each node in a second history period as output parameters of the training sample, wherein the first history period is the last period of the second history period; finally, the training sample can be used for training the bidirectional circulating neural network model, so that the bidirectional circulating neural network model is obtained.

In step 203, according to the resource usage data of each node in the next period, a corresponding scheduling policy may be generated, and then the Pod in each node may be scheduled according to the scheduling policy, so as to implement reasonable resource scheduling.

In consideration of the possibility that the accuracy of the bidirectional circulating neural network model is reduced along with the continuous expansion of the data volume, in the embodiment of the invention, the accuracy of the bidirectional circulating neural network model can be improved by a model updating method. Specifically, a model update instruction of a user may be received first, and then the number of layers of the forward neural network layer and/or the reverse neural network layer in the bidirectional recurrent neural network model may be modified according to the model update instruction. Therefore, when the accuracy of the bidirectional circulating neural network model is reduced, the selection of the optimizer can be changed by increasing the number of model layers, and the learning rate is reduced.

Based on the same concept, as shown in fig. 4, the task scheduling device provided by the embodiment of the present invention includes a receiving unit 401, a processing unit 402, and a scheduling unit 403; wherein,,

a receiving unit 401, configured to obtain resource usage data of each node in a current period;

a processing unit 402, configured to use the resource usage data of each node in the current period as an input of a bidirectional cyclic neural network model, and predict to obtain resource usage data of each node in a next period; the bidirectional cyclic neural network model is obtained by training according to resource use data of each node in a history period, wherein the history period is a period before the current period;

and the scheduling unit 403 is configured to schedule the tasks in each node according to the resource usage data of each node in the next period.

In one possible implementation, the processing unit 402 is specifically configured to:

the receiving unit 401 is further configured to receive a model update instruction of a user;

the processing unit 402 is further configured to modify the number of layers of the forward neural network layer and/or the reverse neural network layer in the bidirectional recurrent neural network model according to the model update instruction.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of task scheduling, the method comprising:

acquiring resource utilization data of each node in a current period, wherein the resource utilization data comprises at least one resource utilization amount and a resource utilization state, the resource utilization state is determined according to the at least one resource utilization amount, and the resource utilization state is determined according to the relation between the number of the resource utilization amounts, of which the resource utilization amount is larger than a utilization amount threshold value, in the node and a number threshold value; the at least one resource usage includes memory usage, CPU usage, disk usage, and IO throughput;

2. The method of claim 1, wherein the bi-directional recurrent neural network model is trained from resource usage data for each node over the historical period, comprising:

3. The method according to claim 1 or 2, wherein the two-way recurrent neural network model comprises a forward neural network layer and a reverse neural network layer;

the method further comprises the steps of:

receiving a model updating instruction of a user;

4. A task scheduling device, the device comprising:

a receiving unit, configured to obtain resource usage data of each node in a current period, where the resource usage data includes at least one resource usage amount and a resource usage state, where the resource usage state is determined according to the at least one resource usage amount, and the resource usage state is determined according to a relationship between a number of resource usage amounts in the node where the resource usage amount is greater than a usage amount threshold and a number threshold; the at least one resource usage includes memory usage, CPU usage, disk usage, and IO throughput;

5. The apparatus of claim 4, wherein the processing unit is specifically configured to:

6. The apparatus of claim 4 or 5, wherein the two-way recurrent neural network model comprises a forward neural network layer and a reverse neural network layer;

7. A computer readable storage medium storing instructions which, when run on a computer, cause the computer to carry out the method of any one of claims 1 to 3.

8. A computer device, comprising:

a memory for storing program instructions;

a processor for invoking program instructions stored in said memory and for performing the method according to any of claims 1 to 3 in accordance with the obtained program.