CN109960578A

CN109960578A - A kind of offline dispatching method of data center resource based on deeply study

Info

Publication number: CN109960578A
Application number: CN201711399661.6A
Authority: CN
Inventors: 不公告发明人
Original assignee: Individual
Current assignee: Individual
Priority date: 2017-12-22
Filing date: 2017-12-22
Publication date: 2019-07-02

Abstract

The present invention relates to field of computer technology, in particular to a kind of offline dispatching method of data center resource based on deeply study.Deeply study can provide a viable option for the artificial heuristic of resource scheduling management.By constantly learning, deeply learning method can be optimized for particular job load (such as periodic load or Random Load), and keep high quality Optimized Operation result under various conditions.Guide depth network towards objective optimization, finally towards optimal objective training by calculating the reward value dispatched each time in offline scheduling using minimum average operation slowdown (system slows down the time) as optimization aim.As the result is shown, in a large amount of embodiments test of the invention, the slowndown of the offline dispatching method learnt using deeply embodies deeply learning method in the advantage in this field far below traditional optimization job scheduling methods such as SJF (shortest job first algorithms).

Description

A kind of offline dispatching method of data center resource based on deeply study

Technical field

The present invention relates to field of computer technology, in particular to it is a kind of based on deeply study data center resource from Line dispatching method.

Background technique

Resource management is the basic problem in computer network and operating system.Resource allocation is usually associativity problem, Different np hard problems can be mapped to.Although every kind of Resource Allocation Formula is all specifically, general method is certain Under the conditions of design have performance guarantee efficient heuritic approach.It has recently been demonstrated that machine learning can be resource management One viable option of artificial heuristic offer, especially have become machine learning research an active area depth Spend intensified learning.

In fact, deeply learning method is particularly suitable for resource management system.Firstly, the decision that these systems are made Often height is duplicate, to generate a large amount of training data for deeply study.Secondly, deeply study can incite somebody to action Complication system and decision strategy are modeled as deep neural network.Third, even if lacking accurate model, if there is with target phase The return signal of pass, so that it may those be trained to be difficult to the target directly optimized.Finally, by constantly learning, deeply study Method can be optimized for particular job load (for example, small-sized work, low-load is periodical), and under various conditions Keep efficient.

Summary of the invention

The technical problem to be solved by the present invention is providing a kind of deeply dispatched offline applied to data center resource Learning algorithm becomes the optimal solution of the current efficiently heuritic approach of substitution.

A kind of offline dispatching method of data center resource based on deeply study, which is characterized in that in the data Heart resource offline scheduling system includes data source modules, running environment module, evaluation mechanism study module and control strategy study Module；

The data source modules are used to generate the data of offline schedule job, and the data source includes the required resources-type of operation Type (for example, CPU, memory, I/O), the required resource size of operation, the total number of off-line operation.

The running environment module for constructing running environment model, the running environment include distribution cluster resource, Wait operation slot.All parts in running environment module are all showed with the image of cell.Cluster resource is shown often Kind resource allocation gives the operation for the service of being planned to, since current time, show T time step-length backward.Wait operation slot figure Resource requirement as indicating waiting operation.

The evaluation mechanism study module by from the data source modules in conjunction with the information obtained in running environment module Evaluation mechanism obtains required reward functions in operational process, feedback data of the reward functions as the evaluation mechanism The control strategy study module is delivered to by the evaluation mechanism study module, optimizes network parameter.

Optimisation strategy of the control strategy study module for deeply learning method learns, and passes through obtained award Function obtains the resource tune for the job scheduling sequence after guidance, and by policy update neural network parameter Spend the final manipulation of physical strategy of operation.

Prospect of the invention be it is wide, it is tight that the present invention can solve the generally existing highly energy-consuming of data center, the wasting of resources The problems such as weight.So the present invention has good application, certain economic benefit can be all brought to all trades and professions.This hair The bright deeply learning algorithm used is its real-time compared to present algorithm advantage, rapidity, can study property again.

Detailed description of the invention

Fig. 1 is the block schematic illustration of the deeply study of an embodiment of the present invention.

Fig. 2 is the status diagram of the off-line system of an embodiment of the present invention.

Fig. 3 is the resource offline scheduling flow figure based on deeply study of an embodiment of the present invention.

Specific embodiment

Below according to drawings and examples, specific embodiments of the present invention will be described in further detail.Implement below Example is not intended to limit the scope of the invention for illustrating the present invention.

Fig. 1 is the block schematic illustration of the deeply study of the embodiment of the present invention.

As shown in Figure 1, intelligent body and environment are interacting.In each time step t, intelligent body observes some state S_t selects a movement a_t.After action, ambient condition is transitioned into s_ (t+1), and intelligent body receives reward r_t.State Conversion and reward are random, and are assumed with markov attribute.

Further, the behavior that intelligent body can only draw oneself up, without priori knowledge, environment will transition to which kind of state or What reward may be.By with environment interaction, during the training period, intelligent body can observe these quantity.The target of study is Maximize desired accumulation discount reward:, wherein γ ∈ (0,1] be discount reward the factor.

Further, the present invention uses the intensified learning method based on decision search, by holding on policing parameter A kind of nitrification enhancement that row gradient declines to learn.Target is to maximize expected accumulation discount reward, this target Gradient is given by:

Further,It is the movement a selected from state s and the expected accumulation prize for then following tactful π _ θ It encourages.The key idea of Policy-Gradient method is to follow the execution track that strategy obtains by observation to estimate gradient.Simple In monte carlo method, the intelligent multiple tracks of sampler body, and the accumulation discount that use experience calculates rewards v_tAsUnbiased esti-mator.Then it is declined by gradient updates policing parameter:

Further, α is step-length.This equation produces well-known enhancing algorithm, can intuitively understand as follows.Side ToIt gives and how to change policing parameter to increase π_θ(s_t,a_t) (the movement probability under at state s_t).Equation has stepped a step to this direction；The size of step-length depends on returning to v_tHave much.In our design, we Using a slight variant, by from each return value v_tIn subtract a baseline value reduce gradient estimation variance.

The state of off-line system is expressed as different grid charts by we, including currently distribute cluster resource grid chart, The resource requirement grid chart of the operation slot of waiting.Two grids of the leftmost side Fig. 2 show every kind of resource allocation to being planned to The operation of service, comprising from current time to T time step later.Different colours in these images represent different works Make.Operation slot grid chart indicates the resource requirement of waiting operation, and the quantity of operation slot is equal to the quantity with machine operation of generation, Operation is set to correspond with operation slot.

Fig. 3 is the scheduling of resource flow chart based on deeply study of an embodiment of the present invention.

As shown in figure 3, the scheduling of resource that should be learnt based on deeply the following steps are included:

Step S301, generates off-line operation at random.

Further, it will be assumed that two kinds of resources, i.e. capacity { 1r;1r}.Operation duration and resource requirement selection are such as Under: 80% run duration selects between 1t and 3t;Remaining equal uniform design from 10t to 15t.Each work has One independent superior resources selected at random.Demand to superior resources selects generally between 0.25r and 0.5r, other moneys The demand in source uniform design between 0.05r and 0.1r.

Whole off-line operations are packed into operation slot by step S302.The quantity and operation of operation are known in offline schedule job Demand.Therefore, be arranged operation slot quantity be equal to generation random off-line operation quantity, enable operation with operation slot one by one It is corresponding.

Step S303, deep learning network select action value A.

Further, the deep neural network that we use is convolutional neural networks CNN.First layer input layer, the second layer Convolutional layer Conv1, third layer pond layer Pool1:MaxPooling, the 4th layer of convolutional layer Conv2, layer 5 pond layer Pool2: MaxPooling, layer 6 full articulamentum Local3, the 9th layer of full articulamentum Local4, the 10th layer of output layer Softmax.Foundation The probability selection action value A of output layer.

Step S304 judges whether operation slot A is empty.

Further, operation slot mesh width is equal to resource maximum capacity 1r, is highly equal to 20t, i.e. 20 times Step-length.

Step S305, judges whether the operation in operation slot A can be packed into colony dispatching.

Further, cluster grid size is equal to operation slot grid.

Step S306, operation are packed into cluster, and operation slot A is set as empty.

Further, on grid image in the cluster, the job requirements resource size of loading is shown.

Step S307, cluster run a time step.

Further, system time increases a time step, and cluster grid image translates up a line.Former the first row figure As capped, last line is set as empty.

Step S308 judges whether the scheduling that fulfils assignment.

Further, the scheduling that fulfils assignment must simultaneously meet the following: operation has been fully enclosed system, without just Without waiting for operation in the operation of operation, operation slot.

Step S309 updates offline neural network parameter by reward value.

Further, in one embodiment of the invention, we are using minimum average operation slowdown as optimization Target.For each operation j, slowdown byIt provides, whereinBe operation deadline (i.e. reach and it is complete At the time between execution),It is operation (ideal) duration, pays attention to.We are each time step as a result, Long reward is set as, wherein j is the set of current operation (make a reservation for or wait to be serviced) in systems.Observation is set Determine discount factor γ=1, over time, accumulation remuneration is consistent with the summation of slowdown, therefore maximizes accumulation prize It encourages, minimizes average slowdown.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by those familiar with the art, all answers It is included within the scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.

Claims

1. a kind of offline dispatching method of data center resource based on deeply study, which is characterized in that the data center Resource offline dispatches system

Data source modules, for generating the data of offline schedule job, data include operation required resource type (for example, CPU, memory, I/O), the required resource size of operation, the total number of off-line operation；

Running environment module, for constructing running environment model, the running environment include distribution cluster resource Cluster, Wait operation slot JobSlot.All parts in running environment module are all showed with the image of cell；

Evaluation mechanism study module awards letter for the information combining assessment mechanism of acquisition to be obtained required reward functions Number is delivered to control strategy study module as feedback, optimizes network parameter；

Control strategy study module, the optimisation strategy for deeply learning method learn, by obtained reward functions from And the off-line operation schedule sequences after instructing are used for, and by policy update neural network parameter, obtain the resource offline The final manipulation of physical strategy of schedule job.

2. a kind of data center's offline resources dispatching method based on deeply study according to claim 1, special Sign is, the method for generating offline schedule job are as follows: we assume that two kinds of resources, i.e. capacity { 1r；1r }, the operation duration and Resource requirement selection is as follows: 80% run duration selects between 1t and 3t；Remaining is uniformly selected from 10t to 15t It selects.Each work has the independent superior resources selected at random, and the demand to superior resources is generally in 0.25r and 0.5r Between select, the demand of other resources uniform design between 0.05r and 0.1r.

3. a kind of offline dispatching method of data center resource based on deeply study according to claim 1, special Sign is that off-line operation environment includes 1 cluster resource Cluster, the N number of waiting operation slot JobSlot of distribution, and wherein N is The quantity of off-line operation.10 grids of every kind of resource width of cluster resource Cluster, 20 grids of height, wait operation slot 10 grids of every kind of resource width of JobSlot, 20 grids of height.

4. a kind of offline dispatching method of data center resource based on deeply study according to claim 1, special Sign is that the target of deeply study is to maximize desired progressive award:Wherein γ ∈ (0,1] be The factor of discount reward, the present invention uses the intensified learning method based on decision search, by executing on policing parameter Come a kind of nitrification enhancement learnt, target is to maximize expected accumulation discount reward, the ladder of this target for gradient decline Degree is given by:

Further,It is the movement a selected from state s and the expected accumulation prize for then following tactful π _ θ It encourages, the key idea of Policy-Gradient method is to follow the execution track that strategy obtains by observation to estimate gradient, simple In monte carlo method, the intelligent multiple tracks of sampler body, and the accumulation discount that use experience calculates rewards v_tAsUnbiased esti-mator, then it pass through gradient decline update policing parameter:

Further, α is step-length.This equation produces well-known enhancing algorithm, can intuitively understand as follows, directionIt gives and how to change policing parameter to increase π_θ(s_t,a_t) (the movement probability s under at state_t), Equation has stepped a step to this direction；The size of step-length depends on returning to v_tHave it is much, in our design, we use One slight variant, by from each return value v_tIn subtract a baseline value reduce gradient estimation variance.

5. a kind of offline dispatching method of data center resource based on deeply study according to claim 1, special Sign is that in one embodiment of the invention, we are using minimum average operation slowdown as optimization aim.For every A operation j, slowdown is by S_j=C_j/T_jIt provides, wherein C_jIt is the deadline of operation (between i.e. arrival and completion execute Time), T_jIt is operation (ideal) duration, pays attention to S_j>=1, we are set as the reward of each time step as a result,Wherein j is the set of current operation (make a reservation for or wait to be serviced) in systems.Overview setup discount factor γ= 1, over time, accumulation remuneration is consistent with the summation of slowdown, therefore maximizes progressive award, minimizes average slowdown。

6. a kind of offline dispatching method of data center resource based on deeply study according to claim 1, special Sign is that in one embodiment of the invention, the deep neural network used is convolutional neural networks CNN, the knot in network Structure is as follows: first layer input layer, second layer convolutional layer Conv1, third layer pond layer Pool1:MaxPooling, the 4th layer of convolution Layer Conv2, layer 5 pond layer Pool2:MaxPooling, the full articulamentum Local3 of layer 6, the 9th layer of full articulamentum Local4, the 10th layer of output layer Softmax.