CN116974768A

CN116974768A - Calculation power scheduling method based on deep learning

Info

Publication number: CN116974768A
Application number: CN202311016992.2A
Authority: CN
Inventors: 刘坤
Original assignee: Zhejiang Yindunyun Technology Co ltd
Current assignee: Zhejiang Yindunyun Technology Co ltd
Priority date: 2023-08-11
Filing date: 2023-08-11
Publication date: 2023-10-31

Abstract

The invention discloses a computational power scheduling method based on deep learning, which adopts a computational power scheduling system to work, wherein the computational power scheduling system comprises a data acquisition module, a deep learning module and a strategy construction module, the data acquisition module is used for acquiring various data of nodes, loads and tasks, the deep learning module is used for training resource allocation strategies by utilizing various deep learning models, the strategy construction module is used for optimizing the node performance and the resource allocation strategies, the data acquisition module comprises a task information module, a computing resource information module, a resource operation information module, a load information collection module and a performance index collection module, the task information module is used for acquiring basic information of tasks, the computing resource information module is used for acquiring basic information of computing resources, and the task operation information module is used for acquiring operation state information of the tasks on the computing resources.

Description

Calculation power scheduling method based on deep learning

Technical Field

The invention relates to the technical field of distributed computing, in particular to a computational power scheduling method based on deep learning.

Background

In the field of distributed computing, there have been many studies and implementations aimed at improving the performance and resource utilization of the system. Among the most common methods include static resource allocation and dynamic resource scheduling. Static resource allocation refers to fixing the resource allocation scheme prior to system start-up and not making adjustments during operation. Dynamic resource scheduling is to dynamically adjust the resource allocation scheme according to factors such as task load and system state during the running of the system.

In the prior art, the disadvantage of static resource allocation is that the resource utilization rate is not high, because it cannot be dynamically adjusted according to the actual load situation. While dynamic resource scheduling can be dynamically adjusted according to the load condition, a heuristic algorithm is often needed, and the problems of inaccurate scheduling effect, overlong scheduling time and the like exist. Therefore, it is necessary to design a power-calculation scheduling method based on deep learning for dynamic adjustment.

Disclosure of Invention

The invention aims to provide a computational power scheduling method based on deep learning, which aims to solve the problems in the background technology.

In order to solve the technical problems, the invention provides the following technical scheme: the power calculation scheduling method based on deep learning adopts a power calculation scheduling system for working, wherein the power calculation scheduling system comprises a data acquisition module, a deep learning module and a strategy construction module, the data acquisition module is used for acquiring various data of nodes, loads and tasks, the deep learning module is used for training a resource allocation strategy by utilizing various deep learning models, and the strategy construction module is used for optimizing the performance of the nodes and the resource allocation strategy.

According to the technical scheme, the data acquisition module comprises a task information module, a computing resource information module, a task operation information module, a resource operation information module, a load information collection module and a performance index collection module, wherein the task information module is used for acquiring basic information of a task, the computing resource information module is used for acquiring basic information of a computing resource, the task operation information module is used for acquiring operation state information of the task on the computing resource, the load information collection module is used for acquiring operation state information of the computing resource, and the performance index collection module is used for acquiring performance indexes of the computing resource;

the deep learning module comprises a data preparation module, a data dividing module, a model selecting module, a model constructing module, a model training module, a model evaluating module and a model tuning module, wherein the data preparation module is electrically connected with the data acquisition module, the data preparation module is used for converting data into a format suitable for processing of a deep learning algorithm, the data dividing module is used for dividing the data into a training set and a testing set, the model selecting module is used for selecting a suitable deep learning algorithm model, the model constructing module is used for constructing a corresponding deep learning model, the model training module is used for training the model for repeated iteration, the model evaluating module is used for evaluating the trained model, and the model tuning module is used for tuning the model;

the strategy construction module comprises a model application module, a task scheduling module, a resource allocation module, a task monitoring module and a resource adjustment module, wherein the model construction module is electrically connected with the model application module, the model application module is used for applying a model on data, the task scheduling module is used for scheduling tasks according to priorities and types, the resource allocation module is used for allocating computing resources according to prediction results, the task monitoring module is used for monitoring execution conditions of the tasks in real time, and the resource adjustment module is used for dynamically adjusting allocation conditions of the resources.

According to the technical scheme, the method comprises the following specific steps:

s1, acquiring node performance, load conditions and task types by arranging a data acquisition module in a system to form a data set for subsequent model training;

s2, establishing a model of node performance and resource allocation strategy by using a deep learning algorithm, taking the acquired data set as input, and training by using the deep learning algorithm such as a neural network and the like so as to obtain a prediction model;

s3, in the actual resource scheduling process, predicting and optimizing the node performance and the resource allocation strategy by using the trained model, predicting the optimal resource allocation strategy by using the deep learning model according to the performance and the load condition of the current node, and applying the optimal resource allocation strategy to the actual resource scheduling process.

According to the above technical solution, the step S1 includes the following specific steps:

s1-1, task information is acquired: the method comprises the steps of task names, task types, task sizes and task priorities;

s1-2, acquiring computing resource information: the method comprises the steps of numbering of the computing nodes, the types of the computing nodes and the states of the computing nodes;

s1-3, acquiring task running state information: the method comprises the steps of starting a task, ending the task, occupying rate of a task CPU and occupying rate of a task memory;

s1-4, acquiring operation state information of computing resources: the method comprises the steps of calculating the CPU utilization rate of the node, the memory utilization rate of the node and the network bandwidth utilization rate of the node;

s1-5, load information is acquired: the load information of the computing resources is collected, wherein the load information comprises the number of tasks currently executed by the computing resources and the total number of tasks currently executed by the computing resources, and the load information can help a dispatching system to better balance the load of the computing resources;

s1-6, obtaining performance indexes: including the processor speed of the compute node, the memory size of the compute node, the storage capacity of the compute node, which may help the scheduling system to better evaluate the performance of the computing resources.

According to the above technical solution, the step S2 includes the following specific steps:

s2-1, data preparation: preprocessing and cleaning the collected task data and computing resource data, and removing invalid data and abnormal data, wherein the steps include converting the data into numerical values and carrying out normalization processing;

s2-2, data division: dividing data into a training set and a testing set according to a certain proportion by adopting a random division method, wherein 70% of data are usually used for training a model, and 30% of data are used for testing the model;

s2-3, model selection: selecting a proper deep learning algorithm model according to the required model type and model performance requirements, wherein the proper deep learning algorithm model comprises a convolutional neural network CNN, a cyclic neural network RNN and a long-short-term memory network LSTM;

s2-4, model construction: constructing a corresponding deep learning model according to the selected model type, wherein the deep learning model comprises an input layer, a hidden layer and an output layer, and setting corresponding parameters comprising learning rate and a loss function;

s2-5, model training: inputting the training set into the constructed model for training, and repeatedly iterating to continuously optimize parameters of the model until the loss function on the training set reaches the minimum value;

s2-6, model evaluation: evaluating the trained model by using a test set, and calculating the accuracy P and recall R, F1 values of the model to evaluate the performance and generalization capability of the model;

s2-7, model tuning: and (3) optimizing the model according to the evaluation result and the actual application requirement, wherein the model parameter is adjusted, the number of layers of the neural network is increased or reduced, and the number of neurons is increased or reduced.

According to the above technical solution, the step S3 includes the following specific steps:

s3-1, application of a prediction model: inputting the acquired data into a trained prediction model, and predicting task and resource conditions in a future period according to a prediction result;

s3-2, task scheduling: scheduling the tasks according to the priorities and types according to the prediction results, and determining the execution sequence and time of the tasks so that the tasks can be completed in the shortest time;

s3-3, resource allocation: according to the prediction result, computing resources are allocated, and idle resources are allocated to resources waiting for executing tasks, so that the utilization rate of the computing resources is improved;

s3-4, task monitoring: the execution condition of the task is monitored in real time, and the state of the task is recorded and fed back so as to carry out subsequent task adjustment and optimization;

s3-5, resource adjustment: and dynamically adjusting the allocation condition of the resources according to the execution condition of the tasks and the actual resource condition so as to achieve optimal resource utilization and task execution efficiency.

According to the technical scheme, in the step S2-6, the method for calculating the F1 value is as follows: when evaluating the trained model by using the test set, the model predicts the data input later and provides a resource allocation scheme according to the data input earlier, then waits for the input of the data later and utilizes the existing algorithm to obtain a dynamic resource allocation scheme, if the resource utilization rate of other dynamic resource allocation schemes exceeds the scheme provided by the model, the optimal utilization rate is recorded, the number of times TP that the resource allocation scheme provided by the model reaches the optimal resource utilization rate in each group of data is collected, the number of times FP that the resource allocation scheme provided by the model does not reach the optimal resource utilization rate is collected, the accuracy rate P is obtained, and if the similarity of the resource allocation of the optimal dynamic resource allocation scheme and the scheme provided by the model at each calculation node is lower than a threshold value, the recall number of times FN is calculated, and the recall number of times FN is calculated

Accuracy rate P:

recall ratio R:

f1 value:

according to the technical scheme, in the step S2-5, the specific method for model training is as follows: the learning rate is improved by continuously replacing the learned data and optimizing the parameters of the model, more data are brought into the training range along with the improvement of the learning rate, on one hand, the consumed resource amount of task operation is increased in proportion, on the other hand, the resource amount occupied by the calculation resource allocation work per se is increased, the residual resource amount for task operation is reduced, the influence weight caused by the improvement of the learning rate is reduced, until the learning rate reaches the theoretical maximum value, and the residual resource amount of the calculation node is zero; the formula is

Wherein A is a loss function, namely the amount of resources remaining in the calculation node is calculated by training by adopting the current model, A ₀ For the total resource amount possessed by a certain computing node, A ₁ For calculating the amount of resources occupied by the resource allocation work itself, y ₁ To learn rate, y ₀ Is the theoretical maximum of the learning rate.

Compared with the prior art, the invention has the following beneficial effects: compared with the traditional heuristic algorithm, the method has the advantages that the calculation speed of the deep learning algorithm is higher, the resource scheduling can be completed in a shorter time, the prediction of the node performance and the resource scheduling can be automatically carried out, the requirement of manual intervention is reduced, and therefore the method is more intelligent and automatic.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

fig. 1 is a schematic view of the overall module structure of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, the present invention provides the following technical solutions: the power calculation scheduling method based on deep learning adopts a power calculation scheduling system for working, wherein the power calculation scheduling system comprises a data acquisition module, a deep learning module and a strategy construction module, the data acquisition module is used for acquiring various data of nodes, loads and tasks, the deep learning module is used for training a resource allocation strategy by utilizing various deep learning models, and the strategy construction module is used for optimizing the performance of the nodes and the resource allocation strategy;

the data acquisition module comprises a task information module, a computing resource information module, a task operation information module, a resource operation information module, a load information collection module and a performance index collection module, wherein the task information module is used for acquiring basic information of a task, the computing resource information module is used for acquiring basic information of a computing resource, the task operation information module is used for acquiring operation state information of the task on the computing resource, the load information collection module is used for acquiring operation state information of the computing resource, and the performance index collection module is used for acquiring performance indexes of the computing resource;

the deep learning module comprises a data preparation module, a data dividing module, a model selecting module, a model building module, a model training module, a model evaluating module and a model tuning module, wherein the data preparation module is electrically connected with the data acquisition module, the data preparation module is used for converting data into a format suitable for processing of a deep learning algorithm, the data dividing module is used for dividing the data into a training set and a testing set, the model selecting module is used for selecting a suitable deep learning algorithm model, the model building module is used for building a corresponding deep learning model, the model training module is used for training the model for repeated iteration, the model evaluating module is used for evaluating the trained model, and the model tuning module is used for tuning the model;

the strategy construction module comprises a model application module, a task scheduling module, a resource allocation module, a task monitoring module and a resource adjustment module, wherein the model construction module is electrically connected with the model application module, the model application module is used for applying a model on data, the task scheduling module is used for scheduling tasks according to priorities and types, the resource allocation module is used for allocating computing resources according to prediction results, the task monitoring module is used for monitoring the execution condition of the tasks in real time, and the resource adjustment module is used for dynamically adjusting the allocation condition of the resources;

the method comprises the following specific steps:

s3, in the actual resource scheduling process, predicting and optimizing the node performance and the resource allocation strategy by using the trained model, and according to the performance and the load condition of the current node, predicting the optimal resource allocation strategy by using the deep learning model and applying the optimal resource allocation strategy to the actual resource scheduling process;

the step S1 includes the following specific steps:

s1-6, obtaining performance indexes: the method comprises the steps of calculating the processor speed of the node, the memory size of the node and the storage capacity of the node, and the information can help a dispatching system to better evaluate the performance of the computing resource;

the step S2 includes the following specific steps:

s2-7, model tuning: according to the evaluation result and the actual application requirement, the model is optimized, including adjusting model parameters, increasing or decreasing the number of layers of the neural network, and increasing or decreasing the number of neurons;

the step S3 includes the following specific steps:

s3-5, resource adjustment: according to the execution condition of the task and the actual resource condition, dynamically adjusting the allocation condition of the resource to achieve the optimal resource utilization and task execution efficiency;

in the step S2-6, the calculation method of the F1 value is as follows: when evaluating the trained model by using the test set, the model predicts the data input later and provides a resource allocation scheme according to the data input earlier, then waits for the input of the data later and utilizes the existing algorithm to obtain a dynamic resource allocation scheme, if the resource utilization rate of other dynamic resource allocation schemes exceeds the scheme provided by the model, the optimal utilization rate is recorded, the number of times TP that the resource allocation scheme provided by the model reaches the optimal resource utilization rate in each group of data is collected, the number of times FP that the resource allocation scheme provided by the model does not reach the optimal resource utilization rate is collected, the accuracy rate P is obtained, and if the similarity of the resource allocation of the optimal dynamic resource allocation scheme and the scheme provided by the model at each calculation node is lower than a threshold value, the recall number of times FN is calculated, and the recall number of times FN is calculated

Accuracy rate P:

recall ratio R:

f1 value:

in the step S2-5, the specific method for model training is as follows: the learning rate is improved by continuously replacing the learned data and optimizing the parameters of the model, more data are brought into the training range along with the improvement of the learning rate, on one hand, the consumed resource amount of task operation is increased in proportion, on the other hand, the resource amount occupied by the calculation resource allocation work per se is increased, the residual resource amount for task operation is reduced, the influence weight caused by the improvement of the learning rate is reduced, until the learning rate reaches the theoretical maximum value, and the residual resource amount of the calculation node is zero; the formula is

In order to improve the accuracy and predictive power of the model, the node performance and load conditions will change continuously as the system operates, so that the deep learning model needs to be updated continuously. In particular, the model is retrained and updated with the new data set, thereby improving the accuracy and adaptability of the model, and the model update is retrained and updates the existing predictive model with the new data.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A calculation power scheduling method based on deep learning is characterized in that: the method adopts a computational power scheduling system to work, the computational power scheduling system comprises a data acquisition module, a deep learning module and a strategy construction module, the data acquisition module is used for acquiring various data of nodes, loads and tasks, the deep learning module is used for training a resource allocation strategy by utilizing various deep learning models, and the strategy construction module is used for optimizing the performance of the nodes and the resource allocation strategy.

2. The method for computing power scheduling based on deep learning according to claim 1, wherein the method comprises the following steps: the data acquisition module comprises a task information module, a computing resource information module, a task operation information module, a resource operation information module, a load information collection module and a performance index collection module, wherein the task information module is used for acquiring basic information of a task, the computing resource information module is used for acquiring basic information of a computing resource, the task operation information module is used for acquiring operation state information of the task on the computing resource, the load information collection module is used for acquiring operation state information of the computing resource, and the performance index collection module is used for acquiring performance indexes of the computing resource;

3. The method for computing power scheduling based on deep learning according to claim 2, wherein the method comprises the following steps: the method comprises the following specific steps:

4. A method of power-based dispatch based on deep learning as claimed in claim 3, wherein: the step S1 includes the following specific steps:

s1-5, load information is acquired: collecting load information of the computing resource, wherein the load information comprises the number of tasks currently executed by the computing resource and the total number of tasks of the computing resource;

s1-6, obtaining performance indexes: including the processor speed of the compute node, the memory size of the compute node, the storage capacity of the compute node.

5. The deep learning-based computational power scheduling method as defined in claim 4, wherein: the step S2 includes the following specific steps:

s2-2, data division: dividing data into a training set and a testing set according to a certain proportion by adopting a random dividing method;

6. The method for computing power scheduling based on deep learning according to claim 5, wherein: the step S3 includes the following specific steps:

7. The deep learning-based computational power scheduling method as defined in claim 6, wherein: in the step S2-6, the calculation method of the F1 value is as follows: the number of times TP that the resource allocation scheme reaches the optimal resource utilization rate is given by the model, the number of times FP that the resource allocation scheme does not reach the optimal resource utilization rate is given by the model, the accuracy rate P is obtained, if the similarity of the optimal dynamic resource allocation scheme and the scheme provided by the model in the resource allocation of each computing node is lower than a threshold value, the optimal dynamic resource allocation scheme is calculated as the recall number FN, and if the similarity of the optimal dynamic resource allocation scheme and the scheme provided by the model is lower than the threshold value, the optimal dynamic resource allocation scheme is calculated as the recall number FN

Accuracy rate P:

recall ratio R:

f1 value:

8. the deep learning-based computational power scheduling method as claimed in claim 7, wherein: in the step S2-5, the specific method for model training is as follows: with the improvement of the learning rate, more data are brought into the training range until the learning rate reaches the theoretical maximum value, and the residual resource quantity of the computing node is zero; the formula is