CN109960578A - A kind of offline dispatching method of data center resource based on deeply study - Google Patents
A kind of offline dispatching method of data center resource based on deeply study Download PDFInfo
- Publication number
- CN109960578A CN109960578A CN201711399661.6A CN201711399661A CN109960578A CN 109960578 A CN109960578 A CN 109960578A CN 201711399661 A CN201711399661 A CN 201711399661A CN 109960578 A CN109960578 A CN 109960578A
- Authority
- CN
- China
- Prior art keywords
- offline
- layer
- resource
- deeply
- data center
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
Abstract
The present invention relates to field of computer technology, in particular to a kind of offline dispatching method of data center resource based on deeply study.Deeply study can provide a viable option for the artificial heuristic of resource scheduling management.By constantly learning, deeply learning method can be optimized for particular job load (such as periodic load or Random Load), and keep high quality Optimized Operation result under various conditions.Guide depth network towards objective optimization, finally towards optimal objective training by calculating the reward value dispatched each time in offline scheduling using minimum average operation slowdown (system slows down the time) as optimization aim.As the result is shown, in a large amount of embodiments test of the invention, the slowndown of the offline dispatching method learnt using deeply embodies deeply learning method in the advantage in this field far below traditional optimization job scheduling methods such as SJF (shortest job first algorithms).
Description
Technical field
The present invention relates to field of computer technology, in particular to it is a kind of based on deeply study data center resource from
Line dispatching method.
Background technique
Resource management is the basic problem in computer network and operating system.Resource allocation is usually associativity problem,
Different np hard problems can be mapped to.Although every kind of Resource Allocation Formula is all specifically, general method is certain
Under the conditions of design have performance guarantee efficient heuritic approach.It has recently been demonstrated that machine learning can be resource management
One viable option of artificial heuristic offer, especially have become machine learning research an active area depth
Spend intensified learning.
In fact, deeply learning method is particularly suitable for resource management system.Firstly, the decision that these systems are made
Often height is duplicate, to generate a large amount of training data for deeply study.Secondly, deeply study can incite somebody to action
Complication system and decision strategy are modeled as deep neural network.Third, even if lacking accurate model, if there is with target phase
The return signal of pass, so that it may those be trained to be difficult to the target directly optimized.Finally, by constantly learning, deeply study
Method can be optimized for particular job load (for example, small-sized work, low-load is periodical), and under various conditions
Keep efficient.
Summary of the invention
The technical problem to be solved by the present invention is providing a kind of deeply dispatched offline applied to data center resource
Learning algorithm becomes the optimal solution of the current efficiently heuritic approach of substitution.
A kind of offline dispatching method of data center resource based on deeply study, which is characterized in that in the data
Heart resource offline scheduling system includes data source modules, running environment module, evaluation mechanism study module and control strategy study
Module;
The data source modules are used to generate the data of offline schedule job, and the data source includes the required resources-type of operation
Type (for example, CPU, memory, I/O), the required resource size of operation, the total number of off-line operation.
The running environment module for constructing running environment model, the running environment include distribution cluster resource,
Wait operation slot.All parts in running environment module are all showed with the image of cell.Cluster resource is shown often
Kind resource allocation gives the operation for the service of being planned to, since current time, show T time step-length backward.Wait operation slot figure
Resource requirement as indicating waiting operation.
The evaluation mechanism study module by from the data source modules in conjunction with the information obtained in running environment module
Evaluation mechanism obtains required reward functions in operational process, feedback data of the reward functions as the evaluation mechanism
The control strategy study module is delivered to by the evaluation mechanism study module, optimizes network parameter.
Optimisation strategy of the control strategy study module for deeply learning method learns, and passes through obtained award
Function obtains the resource tune for the job scheduling sequence after guidance, and by policy update neural network parameter
Spend the final manipulation of physical strategy of operation.
Prospect of the invention be it is wide, it is tight that the present invention can solve the generally existing highly energy-consuming of data center, the wasting of resources
The problems such as weight.So the present invention has good application, certain economic benefit can be all brought to all trades and professions.This hair
The bright deeply learning algorithm used is its real-time compared to present algorithm advantage, rapidity, can study property again.
Detailed description of the invention
Fig. 1 is the block schematic illustration of the deeply study of an embodiment of the present invention.
Fig. 2 is the status diagram of the off-line system of an embodiment of the present invention.
Fig. 3 is the resource offline scheduling flow figure based on deeply study of an embodiment of the present invention.
Specific embodiment
Below according to drawings and examples, specific embodiments of the present invention will be described in further detail.Implement below
Example is not intended to limit the scope of the invention for illustrating the present invention.
Fig. 1 is the block schematic illustration of the deeply study of the embodiment of the present invention.
As shown in Figure 1, intelligent body and environment are interacting.In each time step t, intelligent body observes some state
S_t selects a movement a_t.After action, ambient condition is transitioned into s_ (t+1), and intelligent body receives reward r_t.State
Conversion and reward are random, and are assumed with markov attribute.
Further, the behavior that intelligent body can only draw oneself up, without priori knowledge, environment will transition to which kind of state or
What reward may be.By with environment interaction, during the training period, intelligent body can observe these quantity.The target of study is
Maximize desired accumulation discount reward:, wherein γ ∈ (0,1] be discount reward the factor.
Further, the present invention uses the intensified learning method based on decision search, by holding on policing parameter
A kind of nitrification enhancement that row gradient declines to learn.Target is to maximize expected accumulation discount reward, this target
Gradient is given by:
Further,It is the movement a selected from state s and the expected accumulation prize for then following tactful π _ θ
It encourages.The key idea of Policy-Gradient method is to follow the execution track that strategy obtains by observation to estimate gradient.Simple
In monte carlo method, the intelligent multiple tracks of sampler body, and the accumulation discount that use experience calculates rewards vtAsUnbiased esti-mator.Then it is declined by gradient updates policing parameter:
Further, α is step-length.This equation produces well-known enhancing algorithm, can intuitively understand as follows.Side
ToIt gives and how to change policing parameter to increase πθ(st,at) (the movement probability under at state
st).Equation has stepped a step to this direction;The size of step-length depends on returning to vtHave much.In our design, we
Using a slight variant, by from each return value vtIn subtract a baseline value reduce gradient estimation variance.
Fig. 2 is the status diagram of the off-line system of an embodiment of the present invention.
The state of off-line system is expressed as different grid charts by we, including currently distribute cluster resource grid chart,
The resource requirement grid chart of the operation slot of waiting.Two grids of the leftmost side Fig. 2 show every kind of resource allocation to being planned to
The operation of service, comprising from current time to T time step later.Different colours in these images represent different works
Make.Operation slot grid chart indicates the resource requirement of waiting operation, and the quantity of operation slot is equal to the quantity with machine operation of generation,
Operation is set to correspond with operation slot.
Fig. 3 is the scheduling of resource flow chart based on deeply study of an embodiment of the present invention.
As shown in figure 3, the scheduling of resource that should be learnt based on deeply the following steps are included:
Step S301, generates off-line operation at random.
Further, it will be assumed that two kinds of resources, i.e. capacity { 1r;1r}.Operation duration and resource requirement selection are such as
Under: 80% run duration selects between 1t and 3t;Remaining equal uniform design from 10t to 15t.Each work has
One independent superior resources selected at random.Demand to superior resources selects generally between 0.25r and 0.5r, other moneys
The demand in source uniform design between 0.05r and 0.1r.
Whole off-line operations are packed into operation slot by step S302.The quantity and operation of operation are known in offline schedule job
Demand.Therefore, be arranged operation slot quantity be equal to generation random off-line operation quantity, enable operation with operation slot one by one
It is corresponding.
Step S303, deep learning network select action value A.
Further, the deep neural network that we use is convolutional neural networks CNN.First layer input layer, the second layer
Convolutional layer Conv1, third layer pond layer Pool1:MaxPooling, the 4th layer of convolutional layer Conv2, layer 5 pond layer Pool2:
MaxPooling, layer 6 full articulamentum Local3, the 9th layer of full articulamentum Local4, the 10th layer of output layer Softmax.Foundation
The probability selection action value A of output layer.
Step S304 judges whether operation slot A is empty.
Further, operation slot mesh width is equal to resource maximum capacity 1r, is highly equal to 20t, i.e. 20 times
Step-length.
Step S305, judges whether the operation in operation slot A can be packed into colony dispatching.
Further, cluster grid size is equal to operation slot grid.
Step S306, operation are packed into cluster, and operation slot A is set as empty.
Further, on grid image in the cluster, the job requirements resource size of loading is shown.
Step S307, cluster run a time step.
Further, system time increases a time step, and cluster grid image translates up a line.Former the first row figure
As capped, last line is set as empty.
Step S308 judges whether the scheduling that fulfils assignment.
Further, the scheduling that fulfils assignment must simultaneously meet the following: operation has been fully enclosed system, without just
Without waiting for operation in the operation of operation, operation slot.
Step S309 updates offline neural network parameter by reward value.
Further, in one embodiment of the invention, we are using minimum average operation slowdown as optimization
Target.For each operation j, slowdown byIt provides, whereinBe operation deadline (i.e. reach and it is complete
At the time between execution),It is operation (ideal) duration, pays attention to.We are each time step as a result,
Long reward is set as, wherein j is the set of current operation (make a reservation for or wait to be serviced) in systems.Observation is set
Determine discount factor γ=1, over time, accumulation remuneration is consistent with the summation of slowdown, therefore maximizes accumulation prize
It encourages, minimizes average slowdown.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by those familiar with the art, all answers
It is included within the scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.
Claims (6)
1. a kind of offline dispatching method of data center resource based on deeply study, which is characterized in that the data center
Resource offline dispatches system
Data source modules, for generating the data of offline schedule job, data include operation required resource type (for example,
CPU, memory, I/O), the required resource size of operation, the total number of off-line operation;
Running environment module, for constructing running environment model, the running environment include distribution cluster resource Cluster,
Wait operation slot JobSlot.All parts in running environment module are all showed with the image of cell;
Evaluation mechanism study module awards letter for the information combining assessment mechanism of acquisition to be obtained required reward functions
Number is delivered to control strategy study module as feedback, optimizes network parameter;
Control strategy study module, the optimisation strategy for deeply learning method learn, by obtained reward functions from
And the off-line operation schedule sequences after instructing are used for, and by policy update neural network parameter, obtain the resource offline
The final manipulation of physical strategy of schedule job.
2. a kind of data center's offline resources dispatching method based on deeply study according to claim 1, special
Sign is, the method for generating offline schedule job are as follows: we assume that two kinds of resources, i.e. capacity { 1r;1r }, the operation duration and
Resource requirement selection is as follows: 80% run duration selects between 1t and 3t;Remaining is uniformly selected from 10t to 15t
It selects.Each work has the independent superior resources selected at random, and the demand to superior resources is generally in 0.25r and 0.5r
Between select, the demand of other resources uniform design between 0.05r and 0.1r.
3. a kind of offline dispatching method of data center resource based on deeply study according to claim 1, special
Sign is that off-line operation environment includes 1 cluster resource Cluster, the N number of waiting operation slot JobSlot of distribution, and wherein N is
The quantity of off-line operation.10 grids of every kind of resource width of cluster resource Cluster, 20 grids of height, wait operation slot
10 grids of every kind of resource width of JobSlot, 20 grids of height.
4. a kind of offline dispatching method of data center resource based on deeply study according to claim 1, special
Sign is that the target of deeply study is to maximize desired progressive award:Wherein γ ∈ (0,1] be
The factor of discount reward, the present invention uses the intensified learning method based on decision search, by executing on policing parameter
Come a kind of nitrification enhancement learnt, target is to maximize expected accumulation discount reward, the ladder of this target for gradient decline
Degree is given by:
Further,It is the movement a selected from state s and the expected accumulation prize for then following tactful π _ θ
It encourages, the key idea of Policy-Gradient method is to follow the execution track that strategy obtains by observation to estimate gradient, simple
In monte carlo method, the intelligent multiple tracks of sampler body, and the accumulation discount that use experience calculates rewards vtAsUnbiased esti-mator, then it pass through gradient decline update policing parameter:
Further, α is step-length.This equation produces well-known enhancing algorithm, can intuitively understand as follows, directionIt gives and how to change policing parameter to increase πθ(st,at) (the movement probability s under at statet),
Equation has stepped a step to this direction;The size of step-length depends on returning to vtHave it is much, in our design, we use
One slight variant, by from each return value vtIn subtract a baseline value reduce gradient estimation variance.
5. a kind of offline dispatching method of data center resource based on deeply study according to claim 1, special
Sign is that in one embodiment of the invention, we are using minimum average operation slowdown as optimization aim.For every
A operation j, slowdown is by Sj=Cj/TjIt provides, wherein CjIt is the deadline of operation (between i.e. arrival and completion execute
Time), TjIt is operation (ideal) duration, pays attention to Sj>=1, we are set as the reward of each time step as a result,Wherein j is the set of current operation (make a reservation for or wait to be serviced) in systems.Overview setup discount factor γ=
1, over time, accumulation remuneration is consistent with the summation of slowdown, therefore maximizes progressive award, minimizes average
slowdown。
6. a kind of offline dispatching method of data center resource based on deeply study according to claim 1, special
Sign is that in one embodiment of the invention, the deep neural network used is convolutional neural networks CNN, the knot in network
Structure is as follows: first layer input layer, second layer convolutional layer Conv1, third layer pond layer Pool1:MaxPooling, the 4th layer of convolution
Layer Conv2, layer 5 pond layer Pool2:MaxPooling, the full articulamentum Local3 of layer 6, the 9th layer of full articulamentum
Local4, the 10th layer of output layer Softmax.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711399661.6A CN109960578A (en) | 2017-12-22 | 2017-12-22 | A kind of offline dispatching method of data center resource based on deeply study |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711399661.6A CN109960578A (en) | 2017-12-22 | 2017-12-22 | A kind of offline dispatching method of data center resource based on deeply study |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109960578A true CN109960578A (en) | 2019-07-02 |
Family
ID=67018801
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711399661.6A Pending CN109960578A (en) | 2017-12-22 | 2017-12-22 | A kind of offline dispatching method of data center resource based on deeply study |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109960578A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347478A (en) * | 2019-07-08 | 2019-10-18 | 白紫星 | A kind of model-free data center resource dispatching algorithm based on intensified learning |
CN110443412A (en) * | 2019-07-18 | 2019-11-12 | 华中科技大学 | The intensified learning method of Logistic Scheduling and path planning in dynamic optimization process |
CN110609474A (en) * | 2019-09-09 | 2019-12-24 | 创新奇智(南京)科技有限公司 | Data center energy efficiency optimization method based on reinforcement learning |
CN111651220A (en) * | 2020-06-04 | 2020-09-11 | 上海电力大学 | Spark parameter automatic optimization method and system based on deep reinforcement learning |
CN112035251A (en) * | 2020-07-14 | 2020-12-04 | 中科院计算所西部高等技术研究院 | Deep learning training system and method based on reinforcement learning operation layout |
CN113157422A (en) * | 2021-04-29 | 2021-07-23 | 清华大学 | Cloud data center cluster resource scheduling method and device based on deep reinforcement learning |
CN114116183A (en) * | 2022-01-28 | 2022-03-01 | 华北电力大学 | Data center service load scheduling method and system based on deep reinforcement learning |
CN115271130A (en) * | 2022-09-30 | 2022-11-01 | 合肥工业大学 | Dynamic scheduling method and system for maintenance order of ship main power equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103577265A (en) * | 2012-07-25 | 2014-02-12 | 田文洪 | Method and device of offline energy-saving dispatching in cloud computing data center |
CN108595267A (en) * | 2018-04-18 | 2018-09-28 | 中国科学院重庆绿色智能技术研究院 | A kind of resource regulating method and system based on deeply study |
-
2017
- 2017-12-22 CN CN201711399661.6A patent/CN109960578A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103577265A (en) * | 2012-07-25 | 2014-02-12 | 田文洪 | Method and device of offline energy-saving dispatching in cloud computing data center |
CN108595267A (en) * | 2018-04-18 | 2018-09-28 | 中国科学院重庆绿色智能技术研究院 | A kind of resource regulating method and system based on deeply study |
Non-Patent Citations (1)
Title |
---|
HONGZI MAO等: "Resource Management with Deep Reinforcement Learning", 《HOTNETS "16: PROCEEDINGS OF THE 15TH ACM WORKSHOP ON HOT TOPICS IN NETWORKS》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347478A (en) * | 2019-07-08 | 2019-10-18 | 白紫星 | A kind of model-free data center resource dispatching algorithm based on intensified learning |
CN110443412A (en) * | 2019-07-18 | 2019-11-12 | 华中科技大学 | The intensified learning method of Logistic Scheduling and path planning in dynamic optimization process |
CN110609474A (en) * | 2019-09-09 | 2019-12-24 | 创新奇智(南京)科技有限公司 | Data center energy efficiency optimization method based on reinforcement learning |
CN111651220A (en) * | 2020-06-04 | 2020-09-11 | 上海电力大学 | Spark parameter automatic optimization method and system based on deep reinforcement learning |
CN111651220B (en) * | 2020-06-04 | 2023-08-18 | 上海电力大学 | Spark parameter automatic optimization method and system based on deep reinforcement learning |
CN112035251A (en) * | 2020-07-14 | 2020-12-04 | 中科院计算所西部高等技术研究院 | Deep learning training system and method based on reinforcement learning operation layout |
CN112035251B (en) * | 2020-07-14 | 2023-09-26 | 中科院计算所西部高等技术研究院 | Deep learning training system and method based on reinforcement learning operation layout |
CN113157422A (en) * | 2021-04-29 | 2021-07-23 | 清华大学 | Cloud data center cluster resource scheduling method and device based on deep reinforcement learning |
CN114116183A (en) * | 2022-01-28 | 2022-03-01 | 华北电力大学 | Data center service load scheduling method and system based on deep reinforcement learning |
CN114116183B (en) * | 2022-01-28 | 2022-04-29 | 华北电力大学 | Data center service load scheduling method and system based on deep reinforcement learning |
CN115271130A (en) * | 2022-09-30 | 2022-11-01 | 合肥工业大学 | Dynamic scheduling method and system for maintenance order of ship main power equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109960578A (en) | A kind of offline dispatching method of data center resource based on deeply study | |
Rahimian et al. | A hybrid integer programming and variable neighbourhood search algorithm to solve nurse rostering problems | |
Shyalika et al. | Reinforcement learning in dynamic task scheduling: A review | |
US20220027817A1 (en) | Deep reinforcement learning for production scheduling | |
CN111274036B (en) | Scheduling method of deep learning task based on speed prediction | |
Risbeck et al. | Unification of closed-loop scheduling and control: State-space formulations, terminal constraints, and nominal theoretical properties | |
Wu et al. | A multi-model estimation of distribution algorithm for energy efficient scheduling under cloud computing system | |
Tang et al. | Online operations of automated electric taxi fleets: An advisor-student reinforcement learning framework | |
CN109960573A (en) | A kind of cross-domain calculating task dispatching method and system based on Intellisense | |
Lin | Context-aware task allocation for distributed agile team | |
Zafeiropoulos et al. | Reinforcement learning-assisted autoscaling mechanisms for serverless computing platforms | |
Stølevik et al. | A hybrid approach for solving real-world nurse rostering problems | |
CN113094159A (en) | Data center job scheduling method, system, storage medium and computing equipment | |
Peng et al. | Critical chain based Proactive-Reactive scheduling for Resource-Constrained project scheduling under uncertainty | |
CN106371924A (en) | Task scheduling method for maximizing MapReduce cluster energy consumption | |
Moazeni et al. | Dynamic resource allocation using an adaptive multi-objective teaching-learning based optimization algorithm in cloud | |
Annear et al. | Dynamic assignment of a multi-skilled workforce in job shops: An approximate dynamic programming approach | |
Cai et al. | Deep reinforcement learning for solving resource constrained project scheduling problems with resource disruptions | |
CN113743761A (en) | Intern shift-by-shift scheduling method and system based on random neighborhood search algorithm | |
Kang et al. | Cooperative distributed gpu power capping for deep learning clusters | |
CN113535365A (en) | Deep learning training operation resource placement system and method based on reinforcement learning | |
Tian et al. | Towards critical region reliability support for Grid workflows | |
Risbeck et al. | Closed-loop economic model predictive control for scheduling and control problems | |
Prasad et al. | Adaptive smoothed functional algorithms for optimal staffing levels in service systems | |
CN110347478A (en) | A kind of model-free data center resource dispatching algorithm based on intensified learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190702 |
|
WD01 | Invention patent application deemed withdrawn after publication |