CN116974768A - Calculation power scheduling method based on deep learning - Google Patents
Calculation power scheduling method based on deep learning Download PDFInfo
- Publication number
- CN116974768A CN116974768A CN202311016992.2A CN202311016992A CN116974768A CN 116974768 A CN116974768 A CN 116974768A CN 202311016992 A CN202311016992 A CN 202311016992A CN 116974768 A CN116974768 A CN 116974768A
- Authority
- CN
- China
- Prior art keywords
- module
- model
- task
- resource
- deep learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000013135 deep learning Methods 0.000 title claims abstract description 46
- 238000004364 calculation method Methods 0.000 title claims description 22
- 238000013468 resource allocation Methods 0.000 claims abstract description 60
- 238000012549 training Methods 0.000 claims abstract description 48
- 238000010276 construction Methods 0.000 claims abstract description 17
- 238000013136 deep learning model Methods 0.000 claims abstract description 17
- 238000012360 testing method Methods 0.000 claims description 13
- 238000012544 monitoring process Methods 0.000 claims description 12
- 238000002360 preparation method Methods 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 8
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 6
- 230000006872 improvement Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 230000002159 abnormal effect Effects 0.000 claims description 3
- 238000004140 cleaning Methods 0.000 claims description 3
- 125000004122 cyclic group Chemical group 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000003860 storage Methods 0.000 claims description 3
- 230000009471 action Effects 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5044—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5021—Priority
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/508—Monitor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a computational power scheduling method based on deep learning, which adopts a computational power scheduling system to work, wherein the computational power scheduling system comprises a data acquisition module, a deep learning module and a strategy construction module, the data acquisition module is used for acquiring various data of nodes, loads and tasks, the deep learning module is used for training resource allocation strategies by utilizing various deep learning models, the strategy construction module is used for optimizing the node performance and the resource allocation strategies, the data acquisition module comprises a task information module, a computing resource information module, a resource operation information module, a load information collection module and a performance index collection module, the task information module is used for acquiring basic information of tasks, the computing resource information module is used for acquiring basic information of computing resources, and the task operation information module is used for acquiring operation state information of the tasks on the computing resources.
Description
Technical Field
The invention relates to the technical field of distributed computing, in particular to a computational power scheduling method based on deep learning.
Background
In the field of distributed computing, there have been many studies and implementations aimed at improving the performance and resource utilization of the system. Among the most common methods include static resource allocation and dynamic resource scheduling. Static resource allocation refers to fixing the resource allocation scheme prior to system start-up and not making adjustments during operation. Dynamic resource scheduling is to dynamically adjust the resource allocation scheme according to factors such as task load and system state during the running of the system.
In the prior art, the disadvantage of static resource allocation is that the resource utilization rate is not high, because it cannot be dynamically adjusted according to the actual load situation. While dynamic resource scheduling can be dynamically adjusted according to the load condition, a heuristic algorithm is often needed, and the problems of inaccurate scheduling effect, overlong scheduling time and the like exist. Therefore, it is necessary to design a power-calculation scheduling method based on deep learning for dynamic adjustment.
Disclosure of Invention
The invention aims to provide a computational power scheduling method based on deep learning, which aims to solve the problems in the background technology.
In order to solve the technical problems, the invention provides the following technical scheme: the power calculation scheduling method based on deep learning adopts a power calculation scheduling system for working, wherein the power calculation scheduling system comprises a data acquisition module, a deep learning module and a strategy construction module, the data acquisition module is used for acquiring various data of nodes, loads and tasks, the deep learning module is used for training a resource allocation strategy by utilizing various deep learning models, and the strategy construction module is used for optimizing the performance of the nodes and the resource allocation strategy.
According to the technical scheme, the data acquisition module comprises a task information module, a computing resource information module, a task operation information module, a resource operation information module, a load information collection module and a performance index collection module, wherein the task information module is used for acquiring basic information of a task, the computing resource information module is used for acquiring basic information of a computing resource, the task operation information module is used for acquiring operation state information of the task on the computing resource, the load information collection module is used for acquiring operation state information of the computing resource, and the performance index collection module is used for acquiring performance indexes of the computing resource;
the deep learning module comprises a data preparation module, a data dividing module, a model selecting module, a model constructing module, a model training module, a model evaluating module and a model tuning module, wherein the data preparation module is electrically connected with the data acquisition module, the data preparation module is used for converting data into a format suitable for processing of a deep learning algorithm, the data dividing module is used for dividing the data into a training set and a testing set, the model selecting module is used for selecting a suitable deep learning algorithm model, the model constructing module is used for constructing a corresponding deep learning model, the model training module is used for training the model for repeated iteration, the model evaluating module is used for evaluating the trained model, and the model tuning module is used for tuning the model;
the strategy construction module comprises a model application module, a task scheduling module, a resource allocation module, a task monitoring module and a resource adjustment module, wherein the model construction module is electrically connected with the model application module, the model application module is used for applying a model on data, the task scheduling module is used for scheduling tasks according to priorities and types, the resource allocation module is used for allocating computing resources according to prediction results, the task monitoring module is used for monitoring execution conditions of the tasks in real time, and the resource adjustment module is used for dynamically adjusting allocation conditions of the resources.
According to the technical scheme, the method comprises the following specific steps:
s1, acquiring node performance, load conditions and task types by arranging a data acquisition module in a system to form a data set for subsequent model training;
s2, establishing a model of node performance and resource allocation strategy by using a deep learning algorithm, taking the acquired data set as input, and training by using the deep learning algorithm such as a neural network and the like so as to obtain a prediction model;
s3, in the actual resource scheduling process, predicting and optimizing the node performance and the resource allocation strategy by using the trained model, predicting the optimal resource allocation strategy by using the deep learning model according to the performance and the load condition of the current node, and applying the optimal resource allocation strategy to the actual resource scheduling process.
According to the above technical solution, the step S1 includes the following specific steps:
s1-1, task information is acquired: the method comprises the steps of task names, task types, task sizes and task priorities;
s1-2, acquiring computing resource information: the method comprises the steps of numbering of the computing nodes, the types of the computing nodes and the states of the computing nodes;
s1-3, acquiring task running state information: the method comprises the steps of starting a task, ending the task, occupying rate of a task CPU and occupying rate of a task memory;
s1-4, acquiring operation state information of computing resources: the method comprises the steps of calculating the CPU utilization rate of the node, the memory utilization rate of the node and the network bandwidth utilization rate of the node;
s1-5, load information is acquired: the load information of the computing resources is collected, wherein the load information comprises the number of tasks currently executed by the computing resources and the total number of tasks currently executed by the computing resources, and the load information can help a dispatching system to better balance the load of the computing resources;
s1-6, obtaining performance indexes: including the processor speed of the compute node, the memory size of the compute node, the storage capacity of the compute node, which may help the scheduling system to better evaluate the performance of the computing resources.
According to the above technical solution, the step S2 includes the following specific steps:
s2-1, data preparation: preprocessing and cleaning the collected task data and computing resource data, and removing invalid data and abnormal data, wherein the steps include converting the data into numerical values and carrying out normalization processing;
s2-2, data division: dividing data into a training set and a testing set according to a certain proportion by adopting a random division method, wherein 70% of data are usually used for training a model, and 30% of data are used for testing the model;
s2-3, model selection: selecting a proper deep learning algorithm model according to the required model type and model performance requirements, wherein the proper deep learning algorithm model comprises a convolutional neural network CNN, a cyclic neural network RNN and a long-short-term memory network LSTM;
s2-4, model construction: constructing a corresponding deep learning model according to the selected model type, wherein the deep learning model comprises an input layer, a hidden layer and an output layer, and setting corresponding parameters comprising learning rate and a loss function;
s2-5, model training: inputting the training set into the constructed model for training, and repeatedly iterating to continuously optimize parameters of the model until the loss function on the training set reaches the minimum value;
s2-6, model evaluation: evaluating the trained model by using a test set, and calculating the accuracy P and recall R, F1 values of the model to evaluate the performance and generalization capability of the model;
s2-7, model tuning: and (3) optimizing the model according to the evaluation result and the actual application requirement, wherein the model parameter is adjusted, the number of layers of the neural network is increased or reduced, and the number of neurons is increased or reduced.
According to the above technical solution, the step S3 includes the following specific steps:
s3-1, application of a prediction model: inputting the acquired data into a trained prediction model, and predicting task and resource conditions in a future period according to a prediction result;
s3-2, task scheduling: scheduling the tasks according to the priorities and types according to the prediction results, and determining the execution sequence and time of the tasks so that the tasks can be completed in the shortest time;
s3-3, resource allocation: according to the prediction result, computing resources are allocated, and idle resources are allocated to resources waiting for executing tasks, so that the utilization rate of the computing resources is improved;
s3-4, task monitoring: the execution condition of the task is monitored in real time, and the state of the task is recorded and fed back so as to carry out subsequent task adjustment and optimization;
s3-5, resource adjustment: and dynamically adjusting the allocation condition of the resources according to the execution condition of the tasks and the actual resource condition so as to achieve optimal resource utilization and task execution efficiency.
According to the technical scheme, in the step S2-6, the method for calculating the F1 value is as follows: when evaluating the trained model by using the test set, the model predicts the data input later and provides a resource allocation scheme according to the data input earlier, then waits for the input of the data later and utilizes the existing algorithm to obtain a dynamic resource allocation scheme, if the resource utilization rate of other dynamic resource allocation schemes exceeds the scheme provided by the model, the optimal utilization rate is recorded, the number of times TP that the resource allocation scheme provided by the model reaches the optimal resource utilization rate in each group of data is collected, the number of times FP that the resource allocation scheme provided by the model does not reach the optimal resource utilization rate is collected, the accuracy rate P is obtained, and if the similarity of the resource allocation of the optimal dynamic resource allocation scheme and the scheme provided by the model at each calculation node is lower than a threshold value, the recall number of times FN is calculated, and the recall number of times FN is calculated
Accuracy rate P:
recall ratio R:
f1 value:
according to the technical scheme, in the step S2-5, the specific method for model training is as follows: the learning rate is improved by continuously replacing the learned data and optimizing the parameters of the model, more data are brought into the training range along with the improvement of the learning rate, on one hand, the consumed resource amount of task operation is increased in proportion, on the other hand, the resource amount occupied by the calculation resource allocation work per se is increased, the residual resource amount for task operation is reduced, the influence weight caused by the improvement of the learning rate is reduced, until the learning rate reaches the theoretical maximum value, and the residual resource amount of the calculation node is zero; the formula is
Wherein A is a loss function, namely the amount of resources remaining in the calculation node is calculated by training by adopting the current model, A 0 For the total resource amount possessed by a certain computing node, A 1 For calculating the amount of resources occupied by the resource allocation work itself, y 1 To learn rate, y 0 Is the theoretical maximum of the learning rate.
Compared with the prior art, the invention has the following beneficial effects: compared with the traditional heuristic algorithm, the method has the advantages that the calculation speed of the deep learning algorithm is higher, the resource scheduling can be completed in a shorter time, the prediction of the node performance and the resource scheduling can be automatically carried out, the requirement of manual intervention is reduced, and therefore the method is more intelligent and automatic.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
fig. 1 is a schematic view of the overall module structure of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the present invention provides the following technical solutions: the power calculation scheduling method based on deep learning adopts a power calculation scheduling system for working, wherein the power calculation scheduling system comprises a data acquisition module, a deep learning module and a strategy construction module, the data acquisition module is used for acquiring various data of nodes, loads and tasks, the deep learning module is used for training a resource allocation strategy by utilizing various deep learning models, and the strategy construction module is used for optimizing the performance of the nodes and the resource allocation strategy;
the data acquisition module comprises a task information module, a computing resource information module, a task operation information module, a resource operation information module, a load information collection module and a performance index collection module, wherein the task information module is used for acquiring basic information of a task, the computing resource information module is used for acquiring basic information of a computing resource, the task operation information module is used for acquiring operation state information of the task on the computing resource, the load information collection module is used for acquiring operation state information of the computing resource, and the performance index collection module is used for acquiring performance indexes of the computing resource;
the deep learning module comprises a data preparation module, a data dividing module, a model selecting module, a model building module, a model training module, a model evaluating module and a model tuning module, wherein the data preparation module is electrically connected with the data acquisition module, the data preparation module is used for converting data into a format suitable for processing of a deep learning algorithm, the data dividing module is used for dividing the data into a training set and a testing set, the model selecting module is used for selecting a suitable deep learning algorithm model, the model building module is used for building a corresponding deep learning model, the model training module is used for training the model for repeated iteration, the model evaluating module is used for evaluating the trained model, and the model tuning module is used for tuning the model;
the strategy construction module comprises a model application module, a task scheduling module, a resource allocation module, a task monitoring module and a resource adjustment module, wherein the model construction module is electrically connected with the model application module, the model application module is used for applying a model on data, the task scheduling module is used for scheduling tasks according to priorities and types, the resource allocation module is used for allocating computing resources according to prediction results, the task monitoring module is used for monitoring the execution condition of the tasks in real time, and the resource adjustment module is used for dynamically adjusting the allocation condition of the resources;
the method comprises the following specific steps:
s1, acquiring node performance, load conditions and task types by arranging a data acquisition module in a system to form a data set for subsequent model training;
s2, establishing a model of node performance and resource allocation strategy by using a deep learning algorithm, taking the acquired data set as input, and training by using the deep learning algorithm such as a neural network and the like so as to obtain a prediction model;
s3, in the actual resource scheduling process, predicting and optimizing the node performance and the resource allocation strategy by using the trained model, and according to the performance and the load condition of the current node, predicting the optimal resource allocation strategy by using the deep learning model and applying the optimal resource allocation strategy to the actual resource scheduling process;
the step S1 includes the following specific steps:
s1-1, task information is acquired: the method comprises the steps of task names, task types, task sizes and task priorities;
s1-2, acquiring computing resource information: the method comprises the steps of numbering of the computing nodes, the types of the computing nodes and the states of the computing nodes;
s1-3, acquiring task running state information: the method comprises the steps of starting a task, ending the task, occupying rate of a task CPU and occupying rate of a task memory;
s1-4, acquiring operation state information of computing resources: the method comprises the steps of calculating the CPU utilization rate of the node, the memory utilization rate of the node and the network bandwidth utilization rate of the node;
s1-5, load information is acquired: the load information of the computing resources is collected, wherein the load information comprises the number of tasks currently executed by the computing resources and the total number of tasks currently executed by the computing resources, and the load information can help a dispatching system to better balance the load of the computing resources;
s1-6, obtaining performance indexes: the method comprises the steps of calculating the processor speed of the node, the memory size of the node and the storage capacity of the node, and the information can help a dispatching system to better evaluate the performance of the computing resource;
the step S2 includes the following specific steps:
s2-1, data preparation: preprocessing and cleaning the collected task data and computing resource data, and removing invalid data and abnormal data, wherein the steps include converting the data into numerical values and carrying out normalization processing;
s2-2, data division: dividing data into a training set and a testing set according to a certain proportion by adopting a random division method, wherein 70% of data are usually used for training a model, and 30% of data are used for testing the model;
s2-3, model selection: selecting a proper deep learning algorithm model according to the required model type and model performance requirements, wherein the proper deep learning algorithm model comprises a convolutional neural network CNN, a cyclic neural network RNN and a long-short-term memory network LSTM;
s2-4, model construction: constructing a corresponding deep learning model according to the selected model type, wherein the deep learning model comprises an input layer, a hidden layer and an output layer, and setting corresponding parameters comprising learning rate and a loss function;
s2-5, model training: inputting the training set into the constructed model for training, and repeatedly iterating to continuously optimize parameters of the model until the loss function on the training set reaches the minimum value;
s2-6, model evaluation: evaluating the trained model by using a test set, and calculating the accuracy P and recall R, F1 values of the model to evaluate the performance and generalization capability of the model;
s2-7, model tuning: according to the evaluation result and the actual application requirement, the model is optimized, including adjusting model parameters, increasing or decreasing the number of layers of the neural network, and increasing or decreasing the number of neurons;
the step S3 includes the following specific steps:
s3-1, application of a prediction model: inputting the acquired data into a trained prediction model, and predicting task and resource conditions in a future period according to a prediction result;
s3-2, task scheduling: scheduling the tasks according to the priorities and types according to the prediction results, and determining the execution sequence and time of the tasks so that the tasks can be completed in the shortest time;
s3-3, resource allocation: according to the prediction result, computing resources are allocated, and idle resources are allocated to resources waiting for executing tasks, so that the utilization rate of the computing resources is improved;
s3-4, task monitoring: the execution condition of the task is monitored in real time, and the state of the task is recorded and fed back so as to carry out subsequent task adjustment and optimization;
s3-5, resource adjustment: according to the execution condition of the task and the actual resource condition, dynamically adjusting the allocation condition of the resource to achieve the optimal resource utilization and task execution efficiency;
in the step S2-6, the calculation method of the F1 value is as follows: when evaluating the trained model by using the test set, the model predicts the data input later and provides a resource allocation scheme according to the data input earlier, then waits for the input of the data later and utilizes the existing algorithm to obtain a dynamic resource allocation scheme, if the resource utilization rate of other dynamic resource allocation schemes exceeds the scheme provided by the model, the optimal utilization rate is recorded, the number of times TP that the resource allocation scheme provided by the model reaches the optimal resource utilization rate in each group of data is collected, the number of times FP that the resource allocation scheme provided by the model does not reach the optimal resource utilization rate is collected, the accuracy rate P is obtained, and if the similarity of the resource allocation of the optimal dynamic resource allocation scheme and the scheme provided by the model at each calculation node is lower than a threshold value, the recall number of times FN is calculated, and the recall number of times FN is calculated
Accuracy rate P:
recall ratio R:
f1 value:
in the step S2-5, the specific method for model training is as follows: the learning rate is improved by continuously replacing the learned data and optimizing the parameters of the model, more data are brought into the training range along with the improvement of the learning rate, on one hand, the consumed resource amount of task operation is increased in proportion, on the other hand, the resource amount occupied by the calculation resource allocation work per se is increased, the residual resource amount for task operation is reduced, the influence weight caused by the improvement of the learning rate is reduced, until the learning rate reaches the theoretical maximum value, and the residual resource amount of the calculation node is zero; the formula is
Wherein A is a loss function, namely the amount of resources remaining in the calculation node is calculated by training by adopting the current model, A 0 For the total resource amount possessed by a certain computing node, A 1 For calculating the amount of resources occupied by the resource allocation work itself, y 1 To learn rate, y 0 Is the theoretical maximum of the learning rate.
In order to improve the accuracy and predictive power of the model, the node performance and load conditions will change continuously as the system operates, so that the deep learning model needs to be updated continuously. In particular, the model is retrained and updated with the new data set, thereby improving the accuracy and adaptability of the model, and the model update is retrained and updates the existing predictive model with the new data.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. A calculation power scheduling method based on deep learning is characterized in that: the method adopts a computational power scheduling system to work, the computational power scheduling system comprises a data acquisition module, a deep learning module and a strategy construction module, the data acquisition module is used for acquiring various data of nodes, loads and tasks, the deep learning module is used for training a resource allocation strategy by utilizing various deep learning models, and the strategy construction module is used for optimizing the performance of the nodes and the resource allocation strategy.
2. The method for computing power scheduling based on deep learning according to claim 1, wherein the method comprises the following steps: the data acquisition module comprises a task information module, a computing resource information module, a task operation information module, a resource operation information module, a load information collection module and a performance index collection module, wherein the task information module is used for acquiring basic information of a task, the computing resource information module is used for acquiring basic information of a computing resource, the task operation information module is used for acquiring operation state information of the task on the computing resource, the load information collection module is used for acquiring operation state information of the computing resource, and the performance index collection module is used for acquiring performance indexes of the computing resource;
the deep learning module comprises a data preparation module, a data dividing module, a model selecting module, a model constructing module, a model training module, a model evaluating module and a model tuning module, wherein the data preparation module is electrically connected with the data acquisition module, the data preparation module is used for converting data into a format suitable for processing of a deep learning algorithm, the data dividing module is used for dividing the data into a training set and a testing set, the model selecting module is used for selecting a suitable deep learning algorithm model, the model constructing module is used for constructing a corresponding deep learning model, the model training module is used for training the model for repeated iteration, the model evaluating module is used for evaluating the trained model, and the model tuning module is used for tuning the model;
the strategy construction module comprises a model application module, a task scheduling module, a resource allocation module, a task monitoring module and a resource adjustment module, wherein the model construction module is electrically connected with the model application module, the model application module is used for applying a model on data, the task scheduling module is used for scheduling tasks according to priorities and types, the resource allocation module is used for allocating computing resources according to prediction results, the task monitoring module is used for monitoring execution conditions of the tasks in real time, and the resource adjustment module is used for dynamically adjusting allocation conditions of the resources.
3. The method for computing power scheduling based on deep learning according to claim 2, wherein the method comprises the following steps: the method comprises the following specific steps:
s1, acquiring node performance, load conditions and task types by arranging a data acquisition module in a system to form a data set for subsequent model training;
s2, establishing a model of node performance and resource allocation strategy by using a deep learning algorithm, taking the acquired data set as input, and training by using the deep learning algorithm such as a neural network and the like so as to obtain a prediction model;
s3, in the actual resource scheduling process, predicting and optimizing the node performance and the resource allocation strategy by using the trained model, predicting the optimal resource allocation strategy by using the deep learning model according to the performance and the load condition of the current node, and applying the optimal resource allocation strategy to the actual resource scheduling process.
4. A method of power-based dispatch based on deep learning as claimed in claim 3, wherein: the step S1 includes the following specific steps:
s1-1, task information is acquired: the method comprises the steps of task names, task types, task sizes and task priorities;
s1-2, acquiring computing resource information: the method comprises the steps of numbering of the computing nodes, the types of the computing nodes and the states of the computing nodes;
s1-3, acquiring task running state information: the method comprises the steps of starting a task, ending the task, occupying rate of a task CPU and occupying rate of a task memory;
s1-4, acquiring operation state information of computing resources: the method comprises the steps of calculating the CPU utilization rate of the node, the memory utilization rate of the node and the network bandwidth utilization rate of the node;
s1-5, load information is acquired: collecting load information of the computing resource, wherein the load information comprises the number of tasks currently executed by the computing resource and the total number of tasks of the computing resource;
s1-6, obtaining performance indexes: including the processor speed of the compute node, the memory size of the compute node, the storage capacity of the compute node.
5. The deep learning-based computational power scheduling method as defined in claim 4, wherein: the step S2 includes the following specific steps:
s2-1, data preparation: preprocessing and cleaning the collected task data and computing resource data, and removing invalid data and abnormal data, wherein the steps include converting the data into numerical values and carrying out normalization processing;
s2-2, data division: dividing data into a training set and a testing set according to a certain proportion by adopting a random dividing method;
s2-3, model selection: selecting a proper deep learning algorithm model according to the required model type and model performance requirements, wherein the proper deep learning algorithm model comprises a convolutional neural network CNN, a cyclic neural network RNN and a long-short-term memory network LSTM;
s2-4, model construction: constructing a corresponding deep learning model according to the selected model type, wherein the deep learning model comprises an input layer, a hidden layer and an output layer, and setting corresponding parameters comprising learning rate and a loss function;
s2-5, model training: inputting the training set into the constructed model for training, and repeatedly iterating to continuously optimize parameters of the model until the loss function on the training set reaches the minimum value;
s2-6, model evaluation: evaluating the trained model by using a test set, and calculating the accuracy P and recall R, F1 values of the model to evaluate the performance and generalization capability of the model;
s2-7, model tuning: and (3) optimizing the model according to the evaluation result and the actual application requirement, wherein the model parameter is adjusted, the number of layers of the neural network is increased or reduced, and the number of neurons is increased or reduced.
6. The method for computing power scheduling based on deep learning according to claim 5, wherein: the step S3 includes the following specific steps:
s3-1, application of a prediction model: inputting the acquired data into a trained prediction model, and predicting task and resource conditions in a future period according to a prediction result;
s3-2, task scheduling: scheduling the tasks according to the priorities and types according to the prediction results, and determining the execution sequence and time of the tasks so that the tasks can be completed in the shortest time;
s3-3, resource allocation: according to the prediction result, computing resources are allocated, and idle resources are allocated to resources waiting for executing tasks, so that the utilization rate of the computing resources is improved;
s3-4, task monitoring: the execution condition of the task is monitored in real time, and the state of the task is recorded and fed back so as to carry out subsequent task adjustment and optimization;
s3-5, resource adjustment: and dynamically adjusting the allocation condition of the resources according to the execution condition of the tasks and the actual resource condition so as to achieve optimal resource utilization and task execution efficiency.
7. The deep learning-based computational power scheduling method as defined in claim 6, wherein: in the step S2-6, the calculation method of the F1 value is as follows: the number of times TP that the resource allocation scheme reaches the optimal resource utilization rate is given by the model, the number of times FP that the resource allocation scheme does not reach the optimal resource utilization rate is given by the model, the accuracy rate P is obtained, if the similarity of the optimal dynamic resource allocation scheme and the scheme provided by the model in the resource allocation of each computing node is lower than a threshold value, the optimal dynamic resource allocation scheme is calculated as the recall number FN, and if the similarity of the optimal dynamic resource allocation scheme and the scheme provided by the model is lower than the threshold value, the optimal dynamic resource allocation scheme is calculated as the recall number FN
Accuracy rate P:
recall ratio R:
f1 value:
8. the deep learning-based computational power scheduling method as claimed in claim 7, wherein: in the step S2-5, the specific method for model training is as follows: with the improvement of the learning rate, more data are brought into the training range until the learning rate reaches the theoretical maximum value, and the residual resource quantity of the computing node is zero; the formula is
Wherein A is a loss function, namely the amount of resources remaining in the calculation node is calculated by training by adopting the current model, A 0 For the total resource amount possessed by a certain computing node, A 1 For calculating the amount of resources occupied by the resource allocation work itself, y 1 To learn rate, y 0 Is the theoretical maximum of the learning rate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311016992.2A CN116974768A (en) | 2023-08-11 | 2023-08-11 | Calculation power scheduling method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311016992.2A CN116974768A (en) | 2023-08-11 | 2023-08-11 | Calculation power scheduling method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116974768A true CN116974768A (en) | 2023-10-31 |
Family
ID=88471406
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311016992.2A Pending CN116974768A (en) | 2023-08-11 | 2023-08-11 | Calculation power scheduling method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116974768A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117478529A (en) * | 2023-12-27 | 2024-01-30 | 环球数科集团有限公司 | Distributed computing power sensing and scheduling system based on AIGC |
CN117472549A (en) * | 2023-12-27 | 2024-01-30 | 环球数科集团有限公司 | Distributed computing power dispatching system based on AIGC |
CN117909418A (en) * | 2024-03-20 | 2024-04-19 | 广东琴智科技研究院有限公司 | Deep learning model storage consistency method, computing subsystem and computing platform |
CN117952669A (en) * | 2024-03-27 | 2024-04-30 | 深圳威尔视觉科技有限公司 | Calculation force demand prediction method and device based on deep learning and computer equipment |
CN117909418B (en) * | 2024-03-20 | 2024-05-31 | 广东琴智科技研究院有限公司 | Deep learning model storage consistency method, computing subsystem and computing platform |
-
2023
- 2023-08-11 CN CN202311016992.2A patent/CN116974768A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117478529A (en) * | 2023-12-27 | 2024-01-30 | 环球数科集团有限公司 | Distributed computing power sensing and scheduling system based on AIGC |
CN117472549A (en) * | 2023-12-27 | 2024-01-30 | 环球数科集团有限公司 | Distributed computing power dispatching system based on AIGC |
CN117472549B (en) * | 2023-12-27 | 2024-03-05 | 环球数科集团有限公司 | Distributed computing power dispatching system based on AIGC |
CN117478529B (en) * | 2023-12-27 | 2024-03-12 | 环球数科集团有限公司 | Distributed computing power sensing and scheduling system based on AIGC |
CN117909418A (en) * | 2024-03-20 | 2024-04-19 | 广东琴智科技研究院有限公司 | Deep learning model storage consistency method, computing subsystem and computing platform |
CN117909418B (en) * | 2024-03-20 | 2024-05-31 | 广东琴智科技研究院有限公司 | Deep learning model storage consistency method, computing subsystem and computing platform |
CN117952669A (en) * | 2024-03-27 | 2024-04-30 | 深圳威尔视觉科技有限公司 | Calculation force demand prediction method and device based on deep learning and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116974768A (en) | Calculation power scheduling method based on deep learning | |
CN116646933A (en) | Big data-based power load scheduling method and system | |
CN110135635B (en) | Regional power saturated load prediction method and system | |
CN111274036B (en) | Scheduling method of deep learning task based on speed prediction | |
CN106315319A (en) | Intelligent pre-dispatching method and system for elevator | |
CN110503256A (en) | Short-term load forecasting method and system based on big data technology | |
CN117036104B (en) | Intelligent electricity utilization method and system based on electric power Internet of things | |
CN102622273A (en) | Self-learning load prediction based cluster on-demand starting method | |
CN112685153A (en) | Micro-service scheduling method and device and electronic equipment | |
CN111985845B (en) | Node priority optimization method of heterogeneous Spark cluster | |
CN111325310A (en) | Data prediction method, device and storage medium | |
CN108764588A (en) | A kind of temperature influence power prediction method based on deep learning | |
CN112990500A (en) | Transformer area line loss analysis method and system based on improved weighted gray correlation analysis | |
CN116581750A (en) | Intelligent line load charging method based on power grid load level | |
CN111463782B (en) | Voltage sensitive load model and parameter identification method | |
CN117422274A (en) | Comprehensive energy system operation optimization system and method | |
CN117076882A (en) | Dynamic prediction management method for cloud service resources | |
CN116610416A (en) | Load prediction type elastic expansion system and method based on Kubernetes | |
CN113505879B (en) | Prediction method and device based on multi-attention feature memory model | |
CN102662325A (en) | Improved adaptive learning tree power supply management method | |
CN116128117A (en) | Distribution line loss prediction method and device based on digital twinning | |
CN115794405A (en) | Dynamic resource allocation method of big data processing framework based on SSA-XGboost algorithm | |
CN112187894A (en) | Container dynamic scheduling method based on load correlation prediction | |
CN117634931B (en) | Electric automobile adjustment capability prediction method and system considering charging behavior | |
CN113537575B (en) | Trend load prediction method containing distributed photovoltaic and electric automobile grid connection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |