CN116974768A - Calculation power scheduling method based on deep learning - Google Patents

Calculation power scheduling method based on deep learning Download PDF

Info

Publication number
CN116974768A
CN116974768A CN202311016992.2A CN202311016992A CN116974768A CN 116974768 A CN116974768 A CN 116974768A CN 202311016992 A CN202311016992 A CN 202311016992A CN 116974768 A CN116974768 A CN 116974768A
Authority
CN
China
Prior art keywords
module
model
task
resource
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311016992.2A
Other languages
Chinese (zh)
Inventor
刘坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Yindunyun Technology Co ltd
Original Assignee
Zhejiang Yindunyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Yindunyun Technology Co ltd filed Critical Zhejiang Yindunyun Technology Co ltd
Priority to CN202311016992.2A priority Critical patent/CN116974768A/en
Publication of CN116974768A publication Critical patent/CN116974768A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a computational power scheduling method based on deep learning, which adopts a computational power scheduling system to work, wherein the computational power scheduling system comprises a data acquisition module, a deep learning module and a strategy construction module, the data acquisition module is used for acquiring various data of nodes, loads and tasks, the deep learning module is used for training resource allocation strategies by utilizing various deep learning models, the strategy construction module is used for optimizing the node performance and the resource allocation strategies, the data acquisition module comprises a task information module, a computing resource information module, a resource operation information module, a load information collection module and a performance index collection module, the task information module is used for acquiring basic information of tasks, the computing resource information module is used for acquiring basic information of computing resources, and the task operation information module is used for acquiring operation state information of the tasks on the computing resources.

Description

Calculation power scheduling method based on deep learning
Technical Field
The invention relates to the technical field of distributed computing, in particular to a computational power scheduling method based on deep learning.
Background
In the field of distributed computing, there have been many studies and implementations aimed at improving the performance and resource utilization of the system. Among the most common methods include static resource allocation and dynamic resource scheduling. Static resource allocation refers to fixing the resource allocation scheme prior to system start-up and not making adjustments during operation. Dynamic resource scheduling is to dynamically adjust the resource allocation scheme according to factors such as task load and system state during the running of the system.
In the prior art, the disadvantage of static resource allocation is that the resource utilization rate is not high, because it cannot be dynamically adjusted according to the actual load situation. While dynamic resource scheduling can be dynamically adjusted according to the load condition, a heuristic algorithm is often needed, and the problems of inaccurate scheduling effect, overlong scheduling time and the like exist. Therefore, it is necessary to design a power-calculation scheduling method based on deep learning for dynamic adjustment.
Disclosure of Invention
The invention aims to provide a computational power scheduling method based on deep learning, which aims to solve the problems in the background technology.
In order to solve the technical problems, the invention provides the following technical scheme: the power calculation scheduling method based on deep learning adopts a power calculation scheduling system for working, wherein the power calculation scheduling system comprises a data acquisition module, a deep learning module and a strategy construction module, the data acquisition module is used for acquiring various data of nodes, loads and tasks, the deep learning module is used for training a resource allocation strategy by utilizing various deep learning models, and the strategy construction module is used for optimizing the performance of the nodes and the resource allocation strategy.
According to the technical scheme, the data acquisition module comprises a task information module, a computing resource information module, a task operation information module, a resource operation information module, a load information collection module and a performance index collection module, wherein the task information module is used for acquiring basic information of a task, the computing resource information module is used for acquiring basic information of a computing resource, the task operation information module is used for acquiring operation state information of the task on the computing resource, the load information collection module is used for acquiring operation state information of the computing resource, and the performance index collection module is used for acquiring performance indexes of the computing resource;
the deep learning module comprises a data preparation module, a data dividing module, a model selecting module, a model constructing module, a model training module, a model evaluating module and a model tuning module, wherein the data preparation module is electrically connected with the data acquisition module, the data preparation module is used for converting data into a format suitable for processing of a deep learning algorithm, the data dividing module is used for dividing the data into a training set and a testing set, the model selecting module is used for selecting a suitable deep learning algorithm model, the model constructing module is used for constructing a corresponding deep learning model, the model training module is used for training the model for repeated iteration, the model evaluating module is used for evaluating the trained model, and the model tuning module is used for tuning the model;
the strategy construction module comprises a model application module, a task scheduling module, a resource allocation module, a task monitoring module and a resource adjustment module, wherein the model construction module is electrically connected with the model application module, the model application module is used for applying a model on data, the task scheduling module is used for scheduling tasks according to priorities and types, the resource allocation module is used for allocating computing resources according to prediction results, the task monitoring module is used for monitoring execution conditions of the tasks in real time, and the resource adjustment module is used for dynamically adjusting allocation conditions of the resources.
According to the technical scheme, the method comprises the following specific steps:
s1, acquiring node performance, load conditions and task types by arranging a data acquisition module in a system to form a data set for subsequent model training;
s2, establishing a model of node performance and resource allocation strategy by using a deep learning algorithm, taking the acquired data set as input, and training by using the deep learning algorithm such as a neural network and the like so as to obtain a prediction model;
s3, in the actual resource scheduling process, predicting and optimizing the node performance and the resource allocation strategy by using the trained model, predicting the optimal resource allocation strategy by using the deep learning model according to the performance and the load condition of the current node, and applying the optimal resource allocation strategy to the actual resource scheduling process.
According to the above technical solution, the step S1 includes the following specific steps:
s1-1, task information is acquired: the method comprises the steps of task names, task types, task sizes and task priorities;
s1-2, acquiring computing resource information: the method comprises the steps of numbering of the computing nodes, the types of the computing nodes and the states of the computing nodes;
s1-3, acquiring task running state information: the method comprises the steps of starting a task, ending the task, occupying rate of a task CPU and occupying rate of a task memory;
s1-4, acquiring operation state information of computing resources: the method comprises the steps of calculating the CPU utilization rate of the node, the memory utilization rate of the node and the network bandwidth utilization rate of the node;
s1-5, load information is acquired: the load information of the computing resources is collected, wherein the load information comprises the number of tasks currently executed by the computing resources and the total number of tasks currently executed by the computing resources, and the load information can help a dispatching system to better balance the load of the computing resources;
s1-6, obtaining performance indexes: including the processor speed of the compute node, the memory size of the compute node, the storage capacity of the compute node, which may help the scheduling system to better evaluate the performance of the computing resources.
According to the above technical solution, the step S2 includes the following specific steps:
s2-1, data preparation: preprocessing and cleaning the collected task data and computing resource data, and removing invalid data and abnormal data, wherein the steps include converting the data into numerical values and carrying out normalization processing;
s2-2, data division: dividing data into a training set and a testing set according to a certain proportion by adopting a random division method, wherein 70% of data are usually used for training a model, and 30% of data are used for testing the model;
s2-3, model selection: selecting a proper deep learning algorithm model according to the required model type and model performance requirements, wherein the proper deep learning algorithm model comprises a convolutional neural network CNN, a cyclic neural network RNN and a long-short-term memory network LSTM;
s2-4, model construction: constructing a corresponding deep learning model according to the selected model type, wherein the deep learning model comprises an input layer, a hidden layer and an output layer, and setting corresponding parameters comprising learning rate and a loss function;
s2-5, model training: inputting the training set into the constructed model for training, and repeatedly iterating to continuously optimize parameters of the model until the loss function on the training set reaches the minimum value;
s2-6, model evaluation: evaluating the trained model by using a test set, and calculating the accuracy P and recall R, F1 values of the model to evaluate the performance and generalization capability of the model;
s2-7, model tuning: and (3) optimizing the model according to the evaluation result and the actual application requirement, wherein the model parameter is adjusted, the number of layers of the neural network is increased or reduced, and the number of neurons is increased or reduced.
According to the above technical solution, the step S3 includes the following specific steps:
s3-1, application of a prediction model: inputting the acquired data into a trained prediction model, and predicting task and resource conditions in a future period according to a prediction result;
s3-2, task scheduling: scheduling the tasks according to the priorities and types according to the prediction results, and determining the execution sequence and time of the tasks so that the tasks can be completed in the shortest time;
s3-3, resource allocation: according to the prediction result, computing resources are allocated, and idle resources are allocated to resources waiting for executing tasks, so that the utilization rate of the computing resources is improved;
s3-4, task monitoring: the execution condition of the task is monitored in real time, and the state of the task is recorded and fed back so as to carry out subsequent task adjustment and optimization;
s3-5, resource adjustment: and dynamically adjusting the allocation condition of the resources according to the execution condition of the tasks and the actual resource condition so as to achieve optimal resource utilization and task execution efficiency.
According to the technical scheme, in the step S2-6, the method for calculating the F1 value is as follows: when evaluating the trained model by using the test set, the model predicts the data input later and provides a resource allocation scheme according to the data input earlier, then waits for the input of the data later and utilizes the existing algorithm to obtain a dynamic resource allocation scheme, if the resource utilization rate of other dynamic resource allocation schemes exceeds the scheme provided by the model, the optimal utilization rate is recorded, the number of times TP that the resource allocation scheme provided by the model reaches the optimal resource utilization rate in each group of data is collected, the number of times FP that the resource allocation scheme provided by the model does not reach the optimal resource utilization rate is collected, the accuracy rate P is obtained, and if the similarity of the resource allocation of the optimal dynamic resource allocation scheme and the scheme provided by the model at each calculation node is lower than a threshold value, the recall number of times FN is calculated, and the recall number of times FN is calculated
Accuracy rate P:
recall ratio R:
f1 value:
according to the technical scheme, in the step S2-5, the specific method for model training is as follows: the learning rate is improved by continuously replacing the learned data and optimizing the parameters of the model, more data are brought into the training range along with the improvement of the learning rate, on one hand, the consumed resource amount of task operation is increased in proportion, on the other hand, the resource amount occupied by the calculation resource allocation work per se is increased, the residual resource amount for task operation is reduced, the influence weight caused by the improvement of the learning rate is reduced, until the learning rate reaches the theoretical maximum value, and the residual resource amount of the calculation node is zero; the formula is
Wherein A is a loss function, namely the amount of resources remaining in the calculation node is calculated by training by adopting the current model, A 0 For the total resource amount possessed by a certain computing node, A 1 For calculating the amount of resources occupied by the resource allocation work itself, y 1 To learn rate, y 0 Is the theoretical maximum of the learning rate.
Compared with the prior art, the invention has the following beneficial effects: compared with the traditional heuristic algorithm, the method has the advantages that the calculation speed of the deep learning algorithm is higher, the resource scheduling can be completed in a shorter time, the prediction of the node performance and the resource scheduling can be automatically carried out, the requirement of manual intervention is reduced, and therefore the method is more intelligent and automatic.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
fig. 1 is a schematic view of the overall module structure of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the present invention provides the following technical solutions: the power calculation scheduling method based on deep learning adopts a power calculation scheduling system for working, wherein the power calculation scheduling system comprises a data acquisition module, a deep learning module and a strategy construction module, the data acquisition module is used for acquiring various data of nodes, loads and tasks, the deep learning module is used for training a resource allocation strategy by utilizing various deep learning models, and the strategy construction module is used for optimizing the performance of the nodes and the resource allocation strategy;
the data acquisition module comprises a task information module, a computing resource information module, a task operation information module, a resource operation information module, a load information collection module and a performance index collection module, wherein the task information module is used for acquiring basic information of a task, the computing resource information module is used for acquiring basic information of a computing resource, the task operation information module is used for acquiring operation state information of the task on the computing resource, the load information collection module is used for acquiring operation state information of the computing resource, and the performance index collection module is used for acquiring performance indexes of the computing resource;
the deep learning module comprises a data preparation module, a data dividing module, a model selecting module, a model building module, a model training module, a model evaluating module and a model tuning module, wherein the data preparation module is electrically connected with the data acquisition module, the data preparation module is used for converting data into a format suitable for processing of a deep learning algorithm, the data dividing module is used for dividing the data into a training set and a testing set, the model selecting module is used for selecting a suitable deep learning algorithm model, the model building module is used for building a corresponding deep learning model, the model training module is used for training the model for repeated iteration, the model evaluating module is used for evaluating the trained model, and the model tuning module is used for tuning the model;
the strategy construction module comprises a model application module, a task scheduling module, a resource allocation module, a task monitoring module and a resource adjustment module, wherein the model construction module is electrically connected with the model application module, the model application module is used for applying a model on data, the task scheduling module is used for scheduling tasks according to priorities and types, the resource allocation module is used for allocating computing resources according to prediction results, the task monitoring module is used for monitoring the execution condition of the tasks in real time, and the resource adjustment module is used for dynamically adjusting the allocation condition of the resources;
the method comprises the following specific steps:
s1, acquiring node performance, load conditions and task types by arranging a data acquisition module in a system to form a data set for subsequent model training;
s2, establishing a model of node performance and resource allocation strategy by using a deep learning algorithm, taking the acquired data set as input, and training by using the deep learning algorithm such as a neural network and the like so as to obtain a prediction model;
s3, in the actual resource scheduling process, predicting and optimizing the node performance and the resource allocation strategy by using the trained model, and according to the performance and the load condition of the current node, predicting the optimal resource allocation strategy by using the deep learning model and applying the optimal resource allocation strategy to the actual resource scheduling process;
the step S1 includes the following specific steps:
s1-1, task information is acquired: the method comprises the steps of task names, task types, task sizes and task priorities;
s1-2, acquiring computing resource information: the method comprises the steps of numbering of the computing nodes, the types of the computing nodes and the states of the computing nodes;
s1-3, acquiring task running state information: the method comprises the steps of starting a task, ending the task, occupying rate of a task CPU and occupying rate of a task memory;
s1-4, acquiring operation state information of computing resources: the method comprises the steps of calculating the CPU utilization rate of the node, the memory utilization rate of the node and the network bandwidth utilization rate of the node;
s1-5, load information is acquired: the load information of the computing resources is collected, wherein the load information comprises the number of tasks currently executed by the computing resources and the total number of tasks currently executed by the computing resources, and the load information can help a dispatching system to better balance the load of the computing resources;
s1-6, obtaining performance indexes: the method comprises the steps of calculating the processor speed of the node, the memory size of the node and the storage capacity of the node, and the information can help a dispatching system to better evaluate the performance of the computing resource;
the step S2 includes the following specific steps:
s2-1, data preparation: preprocessing and cleaning the collected task data and computing resource data, and removing invalid data and abnormal data, wherein the steps include converting the data into numerical values and carrying out normalization processing;
s2-2, data division: dividing data into a training set and a testing set according to a certain proportion by adopting a random division method, wherein 70% of data are usually used for training a model, and 30% of data are used for testing the model;
s2-3, model selection: selecting a proper deep learning algorithm model according to the required model type and model performance requirements, wherein the proper deep learning algorithm model comprises a convolutional neural network CNN, a cyclic neural network RNN and a long-short-term memory network LSTM;
s2-4, model construction: constructing a corresponding deep learning model according to the selected model type, wherein the deep learning model comprises an input layer, a hidden layer and an output layer, and setting corresponding parameters comprising learning rate and a loss function;
s2-5, model training: inputting the training set into the constructed model for training, and repeatedly iterating to continuously optimize parameters of the model until the loss function on the training set reaches the minimum value;
s2-6, model evaluation: evaluating the trained model by using a test set, and calculating the accuracy P and recall R, F1 values of the model to evaluate the performance and generalization capability of the model;
s2-7, model tuning: according to the evaluation result and the actual application requirement, the model is optimized, including adjusting model parameters, increasing or decreasing the number of layers of the neural network, and increasing or decreasing the number of neurons;
the step S3 includes the following specific steps:
s3-1, application of a prediction model: inputting the acquired data into a trained prediction model, and predicting task and resource conditions in a future period according to a prediction result;
s3-2, task scheduling: scheduling the tasks according to the priorities and types according to the prediction results, and determining the execution sequence and time of the tasks so that the tasks can be completed in the shortest time;
s3-3, resource allocation: according to the prediction result, computing resources are allocated, and idle resources are allocated to resources waiting for executing tasks, so that the utilization rate of the computing resources is improved;
s3-4, task monitoring: the execution condition of the task is monitored in real time, and the state of the task is recorded and fed back so as to carry out subsequent task adjustment and optimization;
s3-5, resource adjustment: according to the execution condition of the task and the actual resource condition, dynamically adjusting the allocation condition of the resource to achieve the optimal resource utilization and task execution efficiency;
in the step S2-6, the calculation method of the F1 value is as follows: when evaluating the trained model by using the test set, the model predicts the data input later and provides a resource allocation scheme according to the data input earlier, then waits for the input of the data later and utilizes the existing algorithm to obtain a dynamic resource allocation scheme, if the resource utilization rate of other dynamic resource allocation schemes exceeds the scheme provided by the model, the optimal utilization rate is recorded, the number of times TP that the resource allocation scheme provided by the model reaches the optimal resource utilization rate in each group of data is collected, the number of times FP that the resource allocation scheme provided by the model does not reach the optimal resource utilization rate is collected, the accuracy rate P is obtained, and if the similarity of the resource allocation of the optimal dynamic resource allocation scheme and the scheme provided by the model at each calculation node is lower than a threshold value, the recall number of times FN is calculated, and the recall number of times FN is calculated
Accuracy rate P:
recall ratio R:
f1 value:
in the step S2-5, the specific method for model training is as follows: the learning rate is improved by continuously replacing the learned data and optimizing the parameters of the model, more data are brought into the training range along with the improvement of the learning rate, on one hand, the consumed resource amount of task operation is increased in proportion, on the other hand, the resource amount occupied by the calculation resource allocation work per se is increased, the residual resource amount for task operation is reduced, the influence weight caused by the improvement of the learning rate is reduced, until the learning rate reaches the theoretical maximum value, and the residual resource amount of the calculation node is zero; the formula is
Wherein A is a loss function, namely the amount of resources remaining in the calculation node is calculated by training by adopting the current model, A 0 For the total resource amount possessed by a certain computing node, A 1 For calculating the amount of resources occupied by the resource allocation work itself, y 1 To learn rate, y 0 Is the theoretical maximum of the learning rate.
In order to improve the accuracy and predictive power of the model, the node performance and load conditions will change continuously as the system operates, so that the deep learning model needs to be updated continuously. In particular, the model is retrained and updated with the new data set, thereby improving the accuracy and adaptability of the model, and the model update is retrained and updates the existing predictive model with the new data.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A calculation power scheduling method based on deep learning is characterized in that: the method adopts a computational power scheduling system to work, the computational power scheduling system comprises a data acquisition module, a deep learning module and a strategy construction module, the data acquisition module is used for acquiring various data of nodes, loads and tasks, the deep learning module is used for training a resource allocation strategy by utilizing various deep learning models, and the strategy construction module is used for optimizing the performance of the nodes and the resource allocation strategy.
2. The method for computing power scheduling based on deep learning according to claim 1, wherein the method comprises the following steps: the data acquisition module comprises a task information module, a computing resource information module, a task operation information module, a resource operation information module, a load information collection module and a performance index collection module, wherein the task information module is used for acquiring basic information of a task, the computing resource information module is used for acquiring basic information of a computing resource, the task operation information module is used for acquiring operation state information of the task on the computing resource, the load information collection module is used for acquiring operation state information of the computing resource, and the performance index collection module is used for acquiring performance indexes of the computing resource;
the deep learning module comprises a data preparation module, a data dividing module, a model selecting module, a model constructing module, a model training module, a model evaluating module and a model tuning module, wherein the data preparation module is electrically connected with the data acquisition module, the data preparation module is used for converting data into a format suitable for processing of a deep learning algorithm, the data dividing module is used for dividing the data into a training set and a testing set, the model selecting module is used for selecting a suitable deep learning algorithm model, the model constructing module is used for constructing a corresponding deep learning model, the model training module is used for training the model for repeated iteration, the model evaluating module is used for evaluating the trained model, and the model tuning module is used for tuning the model;
the strategy construction module comprises a model application module, a task scheduling module, a resource allocation module, a task monitoring module and a resource adjustment module, wherein the model construction module is electrically connected with the model application module, the model application module is used for applying a model on data, the task scheduling module is used for scheduling tasks according to priorities and types, the resource allocation module is used for allocating computing resources according to prediction results, the task monitoring module is used for monitoring execution conditions of the tasks in real time, and the resource adjustment module is used for dynamically adjusting allocation conditions of the resources.
3. The method for computing power scheduling based on deep learning according to claim 2, wherein the method comprises the following steps: the method comprises the following specific steps:
s1, acquiring node performance, load conditions and task types by arranging a data acquisition module in a system to form a data set for subsequent model training;
s2, establishing a model of node performance and resource allocation strategy by using a deep learning algorithm, taking the acquired data set as input, and training by using the deep learning algorithm such as a neural network and the like so as to obtain a prediction model;
s3, in the actual resource scheduling process, predicting and optimizing the node performance and the resource allocation strategy by using the trained model, predicting the optimal resource allocation strategy by using the deep learning model according to the performance and the load condition of the current node, and applying the optimal resource allocation strategy to the actual resource scheduling process.
4. A method of power-based dispatch based on deep learning as claimed in claim 3, wherein: the step S1 includes the following specific steps:
s1-1, task information is acquired: the method comprises the steps of task names, task types, task sizes and task priorities;
s1-2, acquiring computing resource information: the method comprises the steps of numbering of the computing nodes, the types of the computing nodes and the states of the computing nodes;
s1-3, acquiring task running state information: the method comprises the steps of starting a task, ending the task, occupying rate of a task CPU and occupying rate of a task memory;
s1-4, acquiring operation state information of computing resources: the method comprises the steps of calculating the CPU utilization rate of the node, the memory utilization rate of the node and the network bandwidth utilization rate of the node;
s1-5, load information is acquired: collecting load information of the computing resource, wherein the load information comprises the number of tasks currently executed by the computing resource and the total number of tasks of the computing resource;
s1-6, obtaining performance indexes: including the processor speed of the compute node, the memory size of the compute node, the storage capacity of the compute node.
5. The deep learning-based computational power scheduling method as defined in claim 4, wherein: the step S2 includes the following specific steps:
s2-1, data preparation: preprocessing and cleaning the collected task data and computing resource data, and removing invalid data and abnormal data, wherein the steps include converting the data into numerical values and carrying out normalization processing;
s2-2, data division: dividing data into a training set and a testing set according to a certain proportion by adopting a random dividing method;
s2-3, model selection: selecting a proper deep learning algorithm model according to the required model type and model performance requirements, wherein the proper deep learning algorithm model comprises a convolutional neural network CNN, a cyclic neural network RNN and a long-short-term memory network LSTM;
s2-4, model construction: constructing a corresponding deep learning model according to the selected model type, wherein the deep learning model comprises an input layer, a hidden layer and an output layer, and setting corresponding parameters comprising learning rate and a loss function;
s2-5, model training: inputting the training set into the constructed model for training, and repeatedly iterating to continuously optimize parameters of the model until the loss function on the training set reaches the minimum value;
s2-6, model evaluation: evaluating the trained model by using a test set, and calculating the accuracy P and recall R, F1 values of the model to evaluate the performance and generalization capability of the model;
s2-7, model tuning: and (3) optimizing the model according to the evaluation result and the actual application requirement, wherein the model parameter is adjusted, the number of layers of the neural network is increased or reduced, and the number of neurons is increased or reduced.
6. The method for computing power scheduling based on deep learning according to claim 5, wherein: the step S3 includes the following specific steps:
s3-1, application of a prediction model: inputting the acquired data into a trained prediction model, and predicting task and resource conditions in a future period according to a prediction result;
s3-2, task scheduling: scheduling the tasks according to the priorities and types according to the prediction results, and determining the execution sequence and time of the tasks so that the tasks can be completed in the shortest time;
s3-3, resource allocation: according to the prediction result, computing resources are allocated, and idle resources are allocated to resources waiting for executing tasks, so that the utilization rate of the computing resources is improved;
s3-4, task monitoring: the execution condition of the task is monitored in real time, and the state of the task is recorded and fed back so as to carry out subsequent task adjustment and optimization;
s3-5, resource adjustment: and dynamically adjusting the allocation condition of the resources according to the execution condition of the tasks and the actual resource condition so as to achieve optimal resource utilization and task execution efficiency.
7. The deep learning-based computational power scheduling method as defined in claim 6, wherein: in the step S2-6, the calculation method of the F1 value is as follows: the number of times TP that the resource allocation scheme reaches the optimal resource utilization rate is given by the model, the number of times FP that the resource allocation scheme does not reach the optimal resource utilization rate is given by the model, the accuracy rate P is obtained, if the similarity of the optimal dynamic resource allocation scheme and the scheme provided by the model in the resource allocation of each computing node is lower than a threshold value, the optimal dynamic resource allocation scheme is calculated as the recall number FN, and if the similarity of the optimal dynamic resource allocation scheme and the scheme provided by the model is lower than the threshold value, the optimal dynamic resource allocation scheme is calculated as the recall number FN
Accuracy rate P:
recall ratio R:
f1 value:
8. the deep learning-based computational power scheduling method as claimed in claim 7, wherein: in the step S2-5, the specific method for model training is as follows: with the improvement of the learning rate, more data are brought into the training range until the learning rate reaches the theoretical maximum value, and the residual resource quantity of the computing node is zero; the formula is
Wherein A is a loss function, namely the amount of resources remaining in the calculation node is calculated by training by adopting the current model, A 0 For the total resource amount possessed by a certain computing node, A 1 For calculating the amount of resources occupied by the resource allocation work itself, y 1 To learn rate, y 0 Is the theoretical maximum of the learning rate.
CN202311016992.2A 2023-08-11 2023-08-11 Calculation power scheduling method based on deep learning Pending CN116974768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311016992.2A CN116974768A (en) 2023-08-11 2023-08-11 Calculation power scheduling method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311016992.2A CN116974768A (en) 2023-08-11 2023-08-11 Calculation power scheduling method based on deep learning

Publications (1)

Publication Number Publication Date
CN116974768A true CN116974768A (en) 2023-10-31

Family

ID=88471406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311016992.2A Pending CN116974768A (en) 2023-08-11 2023-08-11 Calculation power scheduling method based on deep learning

Country Status (1)

Country Link
CN (1) CN116974768A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117478529A (en) * 2023-12-27 2024-01-30 环球数科集团有限公司 Distributed computing power sensing and scheduling system based on AIGC
CN117472549A (en) * 2023-12-27 2024-01-30 环球数科集团有限公司 Distributed computing power dispatching system based on AIGC
CN117909418A (en) * 2024-03-20 2024-04-19 广东琴智科技研究院有限公司 Deep learning model storage consistency method, computing subsystem and computing platform
CN117952669A (en) * 2024-03-27 2024-04-30 深圳威尔视觉科技有限公司 Calculation force demand prediction method and device based on deep learning and computer equipment
CN117909418B (en) * 2024-03-20 2024-05-31 广东琴智科技研究院有限公司 Deep learning model storage consistency method, computing subsystem and computing platform

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117478529A (en) * 2023-12-27 2024-01-30 环球数科集团有限公司 Distributed computing power sensing and scheduling system based on AIGC
CN117472549A (en) * 2023-12-27 2024-01-30 环球数科集团有限公司 Distributed computing power dispatching system based on AIGC
CN117472549B (en) * 2023-12-27 2024-03-05 环球数科集团有限公司 Distributed computing power dispatching system based on AIGC
CN117478529B (en) * 2023-12-27 2024-03-12 环球数科集团有限公司 Distributed computing power sensing and scheduling system based on AIGC
CN117909418A (en) * 2024-03-20 2024-04-19 广东琴智科技研究院有限公司 Deep learning model storage consistency method, computing subsystem and computing platform
CN117909418B (en) * 2024-03-20 2024-05-31 广东琴智科技研究院有限公司 Deep learning model storage consistency method, computing subsystem and computing platform
CN117952669A (en) * 2024-03-27 2024-04-30 深圳威尔视觉科技有限公司 Calculation force demand prediction method and device based on deep learning and computer equipment

Similar Documents

Publication Publication Date Title
CN116974768A (en) Calculation power scheduling method based on deep learning
CN116646933A (en) Big data-based power load scheduling method and system
CN110135635B (en) Regional power saturated load prediction method and system
CN111274036B (en) Scheduling method of deep learning task based on speed prediction
CN106315319A (en) Intelligent pre-dispatching method and system for elevator
CN110503256A (en) Short-term load forecasting method and system based on big data technology
CN117036104B (en) Intelligent electricity utilization method and system based on electric power Internet of things
CN102622273A (en) Self-learning load prediction based cluster on-demand starting method
CN112685153A (en) Micro-service scheduling method and device and electronic equipment
CN111985845B (en) Node priority optimization method of heterogeneous Spark cluster
CN111325310A (en) Data prediction method, device and storage medium
CN108764588A (en) A kind of temperature influence power prediction method based on deep learning
CN112990500A (en) Transformer area line loss analysis method and system based on improved weighted gray correlation analysis
CN116581750A (en) Intelligent line load charging method based on power grid load level
CN111463782B (en) Voltage sensitive load model and parameter identification method
CN117422274A (en) Comprehensive energy system operation optimization system and method
CN117076882A (en) Dynamic prediction management method for cloud service resources
CN116610416A (en) Load prediction type elastic expansion system and method based on Kubernetes
CN113505879B (en) Prediction method and device based on multi-attention feature memory model
CN102662325A (en) Improved adaptive learning tree power supply management method
CN116128117A (en) Distribution line loss prediction method and device based on digital twinning
CN115794405A (en) Dynamic resource allocation method of big data processing framework based on SSA-XGboost algorithm
CN112187894A (en) Container dynamic scheduling method based on load correlation prediction
CN117634931B (en) Electric automobile adjustment capability prediction method and system considering charging behavior
CN113537575B (en) Trend load prediction method containing distributed photovoltaic and electric automobile grid connection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination