CN107273184A

CN107273184A - A kind of optimized algorithm migrated based on high in the clouds big data with processing cost

Info

Publication number: CN107273184A
Application number: CN201710445796.5A
Authority: CN
Inventors: 夏辉; 王晓薇; 范书国
Original assignee: Shenyang Normal University
Current assignee: Shenyang Normal University
Priority date: 2017-06-14
Filing date: 2017-06-14
Publication date: 2017-10-20

Abstract

The present invention relates to the algorithmic issue of high in the clouds big data migration optimization, belong to cloud computing applied technical field, the data and Resources Management that the present invention is handled multi-source big data high in the clouds are studied, and to optimize the cost of big data high in the clouds processing, improve its service quality；Based on this, the Data Migration and resource provision problem for first handling big data high in the clouds are converted into joint stochastic optimization problems, and then model is solved and corresponding on-line decision algorithm is designed using Liapunov optimisation technique；The algorithm does not need the to-be of forecasting system, and the current state for being based only on system is made decision.

Description

A kind of optimized algorithm migrated based on high in the clouds big data with processing cost

Technical field:

The present invention relates to the algorithmic issue of high in the clouds big data migration optimization, belong to cloud computing applied technical field.

Background technology:

With the fast development of network technology and mobile communication technology, the growth of data volume shows the rule of exponential increase Rule.Figure spirit prize winner Jim Gray propose a new empirical law within 1998：

Present every 18 months newly-increased amount of storage are equal to amount of storage sum since the dawn of human civilization, up to the present, the increasing of data volume Length substantially meets this rule, root it was predicted that the year two thousand twenty data universe is up to 35.2ZB (PB of 1Z,B=1 million), than 2009 0.8ZB increases by 44 times.In the case where so powerful actual demand is promoted, people constantly pursue mass memory capacity, high-performance, Gao An The characteristics such as Quan Xing, high availability, scalability, manageability, the demand to storage is improved constantly.Information content is presented explosion type and increased Long trend so that storage performance has become the bottleneck for being badly in need of improving.The data of enterprise will be stored using high in the clouds to be greatly improved Free space, but on condition that Data Migration will be carried out, it is most important that to greatest extent safely, effectively, least cost by this A little magnanimity big datas move to high in the clouds, and this is a system engineering, and not the work being easily accomplished, and is migrated for big data Demand increases into geometric now, is counted according to IDC, Data Migration cost accounts for more than the 40% of whole big data application charges.

The mode operation that cloud computing is according to payable at sight so that user dynamic can adjust the money rented according to needed for itself Source, and with high-performance and high failure tolerance, provide a kind of efficient and economy solution for big data processing and exist Under cloud computing mode, effectively management how is carried out to data and cloud resource reduces data processing cost to closing weight for data manager Will.Wherein, mostly important the problem of, will count：1) how dynamically by the large-scale data of the real-time generation of diverse location distribute to The data center of geographical distribution2) need to provide how many computing resource in these data centers to ensure service quality while again Run minimized expenseThe dynamic of the dynamic, polyphyly and the resource price that are produced due to data causes above mentioned problem to become Obtain great challenge.

Currently, the research to big data is concentrated mainly on the high-speed parallel processing of different types of data (such as batch The MapReduce frameworks of data processing, for the Spark systems of interactive data, for the Dreme systems of stream data processing System, and for the Prege systems of diagram data), big data analysis apply (such as personalized recommendation, software classification, gene selects) And in terms of big data processing basic technology, but large-scale data transmission is managed to high in the clouds and to its data and resource The seldom of research at present, in order to solve data migration problems, usually using some simple poorly efficient method for example, data are copied Shellfish carries out physical transportation into the hard disk of Large Copacity, entire machine directly even is transported into these methods of the such as data center again Intolerable data processing delay can be not only produced, and can be damaged in view of hard disk in transportation, with great peace Full hidden danger also has actual items to realize automatic duplication and the transmission data as needed between data center, but main focusing The business demand of data, the resource required for data processing is not considered.

The content of the invention

The data and Resources Management that the present invention is handled multi-source big data high in the clouds are studied, to optimize big data cloud The cost of processing is held, its service quality is improved；Based on this, the Data Migration and resource provision for first handling big data high in the clouds are asked Topic is converted into joint stochastic optimization problems, and then model is solved using Liapunov optimisation technique and designs corresponding On-line decision algorithm；The algorithm does not need the to-be of forecasting system, and the current state for being based only on system is made decision.Hair Bright content is as follows：

The symbol of table one represents implication

(1) a kind of across data center complex optimization Data Migration and resource provision unified model are proposed, it is contemplated that many Dynamic and the dynamic of high in the clouds different virtual machine type and its price that data source data is produced

(2) joint stochastic optimization problems is solved based on the analytic solutions derived by using Liapunov optimisation technique Corresponding efficiently on-line decision algorithm is devised, the algorithm can make Data Migration and resource provision decision-making simultaneously and can divide Cloth is realized.

Specifically include following steps：

(1) generally, because different VPN typically belong to different ISPs, its bandwidth price is each not It is identical, orderTo transmit 1GB data to data center d ∈ D price from data source r ∈ R, then the bandwidth total cost of t can It is defined as：

(2) because big data analyzes the huge of the data scale of application, the carrying cost of data is also influence data center One key factor of selection makes s_dTo store the cost required for 1GB data in single time slot on data center d ∈ D, then during t The totle drilling cost of storage that etching system is produced is：

(3) because each cloud service provider generally uses dynamic pricing mechanism, thus the virtual machine rented from data center Quantity the totle drilling cost and service quality of system are had a major impact orderThe k classes rented for t from data center d Type virtual machine quantity makesFor the price of the virtual machine of k types in t data center d, then the meter required for data processing Be counted as originally be：

(4) consider that data source is distributed in diverse geographic location with data center, needed herein using delay as data processing The important performance indexes to be considered, will reduce the influence that delay is caused as far as possible during Data Migration,For data source r ∈ R Transmit data to data center d delay.α is that the weight coefficient for postponing to be converted to financial cost is then postponed into the calculating changed Cost is：

Wherein α is that delay is converted into cost formulas of the weight coefficient of financial cost based on more than, can be with guiding system The totle drilling cost of middle generation is：

C (t)=C_p(t)+C_s(t)+C_b(t)+C_l(t) (5)

If a_r(t) it is movable to for the data volume that t data source r is generated due to the data generated from arbitrary source Arbitrary data center is handled, Wo MensheFor data center d data volume is moved to from data source r in t, Then have for the data source r maximum amount of data produced：

The definition of each expense cost based on more than, the target studied according to this paper minimizes period [0, T] interior data The time average unit cost of migration and processing can be turned in form：

Wherein, constraint (10) is when being equal in order to ensure distributing to the summation of each data center's data in single time slot at this Carve the total amount of data constraints (11) produced and ensure that required virtual machine quantity is no more than the scope that data center can provide From the point of view of problem P1 expression, because data generation is unknown and dynamic, resource variableIt is integer type, therefore problem above It is that it is an object of the present invention under longtime running state, distributed to by optimization each by a constraint random integers optimization problem The quantity for the virtual machine that the data of data center and data center rent is so that long term data processing average unit cost minimum is This problem is handled, employs the optimisation technique developed a recently-Liapunov Optimization Framework to solve problem, The details of solution will be described in detail.

Make H_d(t) in time series in data center d untreated data volume first, we define H_d(0)=0, then team Arrange H_d(t) evolution can be described as follows：

The renewal rule of above-mentioned queue means that handled data volume isNewly arrived data volume isIn order to ensure queue H_d(t),Delay in the worst cases is in maximum functional load delay l, I Slightly devise a respective fictional queue Z_d(t) that (can be considered delay-tolerant queue) wherein, virtual queue Z_d(t) at the beginning of load Beginning turns to Z_d=0, (0) and following rule is updated：

Wherein indicator functionH is worked as in expression_d(t) it is equal to 1 during ＞ 0, otherwise equal to 0. similarly,H is worked as in expression_d (t) it is 1 when=0, is otherwise 0. ε_dFor preset constant, for controlling the scope of queue delay it is possible thereby to prove, if proposing calculation Method ensure that queue H_dAnd Z (t)_d(t) stablizing for a long time, then all data can be obtained at most l time-slot delay To processing, also, l may be configured asWhereinWithIt is queue H respectively_dAnd Z (t)_d(t) upper Limit；

Make Z (t)=(Z_d(t)), H (t)=(H_d(t)),Virtual queue and the matrix of actual arrays are represented respectively, Then the confederate matrix of actual arrays and virtual queue can be represented with θ (t)=[H (t), Z (t)].According to Liapunov framework, It is as follows that we define liapunov function:

Wherein L (θ (t)) is that the Liapunov drift function that overstocked measurement then list time slots are loaded in system can then be determined Justice is：

Δ (θ (t))=E L (θ (t+1)) L (θ (t)) | θ (t) } (16)

For the cost produced by also minimizing system while ensureing that system queue is stable, then Liapunov drift- Penalty term can be in above formula (16)

Increase system synthesis this function acquisition in drift function, i.e.,：

Δ(θ(t))+V·E{C(t)|θ(t)} (17)

Wherein V is non-negative parameter, and it can be traded off between the stability of a system and cost, and .V is bigger, what system was produced Cost is just smaller, otherwise cost bigger is therefore, originally the problem of P1 P2 the problem of reformed into following：

P2.min(17) (18)

s.t.:(9)(10)(11)(12) (19)

In order to solve P2, drift-penalty is not minimized directly herein, and the upper bound for being directed to minimize it is right And, therefore this mode of theoretical proof does not destroy the optimality and performance of algorithm, and the key for solving P2 is to find its upper bound to lead to Cross that theory deduction is provable, the boundary of formula (17) is：

Wherein

By carefully studying on the right of inequality (20), it is found that the optimization problem can equivalently resolve into two subproblems： I.e. data distribution problem and resource provisioning problem solve the details as described below of two above subproblem

A, Data Migration：To minimize on the right of formula (20), by the relation between observation variable, wherein with data distribution phase The part of pass can be extracted as：

Additionally, because the data of each data source are independently generated, the multi-data source global optimization side described in formula (21) Formula can perform in each Data Source Independent respectively and consider data distribution on t data source r, then to be converted into solution as follows for problem Problem：

In fact, above mentioned problem is the minimal weight problem of a broad sense, and data center d power is moved to from data source r Weight isIt overstocks H with data_d(t), bandwidth costStorage cost s_d, delay costRelevant is by using linear Planning theory, we can be in the hope of solution below：

WhereinObviously, t algorithm tends to produce data source r Data Migration to the moment there is most short task queue and the data center of minimum operating cost to be handled.

B, resource distribution：Such as remove the constant term B on the right of formula (20), then variableRelated part can be recognized To be resource provisioning problem.Therefore, we can obtain the optimal provisioning policy of virtual machine by solving following problem：

Similarly, because the resource provision in each data center is independent, similar to data distribution problem, formula (23) can To be solved with building distribution in each data center.Thus, for individual data center d, resource provisioning problem can further change It is written as：

The solution of above-mentioned linear problem of being easy to get is：

Above-mentioned solution shows, when the price of t k class virtual machinesIt is smaller, and its virtual machine capacity v_kIt is bigger When, the virtual machine of k types has bigger possibility to be rented so far, and former problem is turned by using Liapunov framework Change, for a long time in the cost minimization problem of Data Migration and resource provision obtain effectively solving that solution has briefly above Help dispose the details of carried algorithm its on-line Algorithm as shown in algorithm 1 online in real system

Formally, this problem consider geographical distribution data center set D, its sum be D=| D |, value for d (1≤ D≤D) each data center configurations of have different types of virtual machine K (have size K=| K |), have different CPU per class virtual machine And memory configurations, and the ability of k class virtual machines is set as v_k, represent the speed of such virtual machine processing data itself and MapReduce Concrete application is relevant, and there are different speed data managers to manage R=for different data processings | R | individual data source (is represented For set R), and each data source (value is 1≤r≤R) dynamically produces that to need data to be processed be this, the number of any data source Carried out according to data center that it is rented can be moved to by Virtual Private Network (Virtual Private Network, VPN) Analysis is simulation of real scenes, we assume that from the bandwidth of the VPN connections (r, d) on data source r to data center dIt is to have Limit, and be one of system bottleneck in addition, the data volume of each geographical position generation be it is independent, each data center For example, resource price (virtual machine, storage) is different, and changes over time

The system is run according to time series, is divided into t=0,1 ..., T. is in each time series, data manager Need to determine to move how many data to data center d and each data center rental how many resource to support number from data source r The totle drilling cost that the target optimized according to processing is analyzed big data for minimum high in the clouds,

Embodiment：

On-line Algorithm step：

1st, H is inputted_d(t),Z_d(t),a_r(t),v_k,

2nd, export

3rd, resource distribution

For each data center d ∈ D, type of virtual machine k ∈ Kdo

Pass through application

Solve the problems, such as (24)

Obtain virtual machine supply strategy

4th, Data Migration

For each data source r ∈ R, data center d ∈ D do

By applying (23)

Solve the problems, such as (21)

Obtain data distribution strategy (X_d(t))

Queue H is updated according to queue Dynamic Equation (13) (14) respectively_d(t), Z_d(t).

Claims

1. a kind of optimized algorithm migrated based on high in the clouds big data with processing cost, it is characterised in that comprise the following steps：

(1) defineIt is the price that 1GB data to data center d ∈ D are transmitted from data source r ∈ R, then the bandwidth of t is total Expense may be defined as：

<mrow> <msub> <mi>C</mi> <mi>b</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>D</mi> </mrow> </munder> <munder> <mo>&Sigma;</mo> <mrow> <mi>r</mi> <mo>&Element;</mo> <mi>R</mi> </mrow> </munder> <msubsup> <mi>&lambda;</mi> <mi>r</mi> <mi>d</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msubsup> <mi>b</mi> <mi>r</mi> <mi>d</mi> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

(2) s is assumed_dTo store the cost required for 1GB data in single time slot on data center d ∈ D, then t system is produced The totle drilling cost of data storage is：

<mrow> <msub> <mi>C</mi> <mi>s</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>D</mi> </mrow> </munder> <munder> <mo>&Sigma;</mo> <mrow> <mi>r</mi> <mo>&Element;</mo> <mi>R</mi> </mrow> </munder> <msubsup> <mi>&lambda;</mi> <mi>r</mi> <mi>d</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msub> <mi>s</mi> <mi>d</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

(3) defineThe k type Virtual machine quantity rented for t from data center d, orderFor in t data The price of the virtual machine of k types in heart d, then the calculating cost required for data processing be：

<mrow> <msub> <mi>C</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>D</mi> </mrow> </munder> <munder> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>&Element;</mo> <mi>K</mi> </mrow> </munder> <msubsup> <mi>n</mi> <mi>d</mi> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msubsup> <mi>p</mi> <mi>d</mi> <mi>k</mi> </msubsup> <mo>(</mo> <mi>t</mi> <mo>)</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

(4) assumeData center d ∈ D delay is transmitted data to for data source r ∈ R, α is that delay is converted into financial cost Weight coefficient, then postpone conversion calculating cost be：

<mrow> <msub> <mi>C</mi> <mi>l</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>D</mi> </mrow> </munder> <munder> <mo>&Sigma;</mo> <mrow> <mi>r</mi> <mo>&Element;</mo> <mi>R</mi> </mrow> </munder> <mi>&alpha;</mi> <mo>&CenterDot;</mo> <msubsup> <mi>&lambda;</mi> <mi>r</mi> <mi>d</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msubsup> <mi>L</mi> <mi>r</mi> <mi>d</mi> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

Based on cost formula above, the totle drilling cost that can obtain producing in system is：

C (t)=C_p(t)+C_s(t)+C_b(t)+C_l(t) (5)

Assuming that a_r(t) data volume generated for t data source r, because the data generated from arbitrary source are movable to arbitrarily Data center is handled, ifFor data center d data volume is moved to from data source r in t,For data source The maximum amount of data that r is produced；Then have：

<mrow> <msub> <mi>a</mi> <mi>r</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&le;</mo> <msubsup> <mi>A</mi> <mi>r</mi> <mi>max</mi> </msubsup> <mo>,</mo> <mo>&ForAll;</mo> <mi>r</mi> <mo>,</mo> <mi>t</mi> <mo>&Element;</mo> <mo>&lsqb;</mo> <mn>1</mn> <mo>,</mo> <mi>T</mi> <mo>&rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msub> <mi>a</mi> <mi>r</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&le;</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>D</mi> </mrow> </munder> <msubsup> <mi>&lambda;</mi> <mi>r</mi> <mi>d</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>,</mo> <mo>&ForAll;</mo> <mi>r</mi> <mo>,</mo> <mi>t</mi> <mo>&Element;</mo> <mo>&lsqb;</mo> <mn>1</mn> <mo>,</mo> <mi>T</mi> <mo>&rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>

According to definition above and it is assumed that the time average unit cost for minimizing period [0, T] interior Data Migration and processing can be with Form is turned to：

<mrow> <mi>P</mi> <mn>1.</mn> <mi>m</mi> <mi>i</mi> <mi>n</mi> <mo>:</mo> <munder> <mi>lim</mi> <mrow> <mi>T</mi> <mo>&RightArrow;</mo> <mi>&infin;</mi> </mrow> </munder> <mfrac> <mn>1</mn> <mi>T</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>T</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>E</mi> <mo>{</mo> <mi>C</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>}</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <mtable> <mtr> <mtd> <mrow> <mi>s</mi> <mo>.</mo> <mi>t</mi> </mrow> </mtd> <mtd> <mrow> <msub> <mi>a</mi> <mi>r</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&le;</mo> <msubsup> <mi>A</mi> <mi>r</mi> <mi>max</mi> </msubsup> <mo>,</mo> <mo>&ForAll;</mo> <mi>r</mi> <mo>,</mo> <mi>t</mi> <mo>&Element;</mo> <mo>&lsqb;</mo> <mn>1</mn> <mo>,</mo> <mi>T</mi> <mo>&rsqb;</mo> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msub> <mi>a</mi> <mi>r</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>D</mi> </mrow> </munder> <msubsup> <mi>&lambda;</mi> <mi>r</mi> <mi>d</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>,</mo> <mo>&ForAll;</mo> <mi>r</mi> <mo>,</mo> <mi>t</mi> <mo>&Element;</mo> <mo>&lsqb;</mo> <mn>1</mn> <mo>,</mo> <mi>T</mi> <mo>&rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>10</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <mn>0</mn> <mo>&le;</mo> <msup> <msub> <mi>n</mi> <mi>d</mi> </msub> <mi>k</mi> </msup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&le;</mo> <msubsup> <mi>N</mi> <mi>d</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> </msubsup> <mo>,</mo> <mo>&ForAll;</mo> <mi>d</mi> <mo>,</mo> <mo>&ForAll;</mo> <mi>k</mi> <mo>,</mo> <mi>t</mi> <mo>&Element;</mo> <mo>&lsqb;</mo> <mn>1</mn> <mo>,</mo> <mi>T</mi> <mo>&rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msup> <msub> <mi>n</mi> <mi>d</mi> </msub> <mi>k</mi> </msup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msup> <mi>Z</mi> <mo>+</mo> </msup> <mo>&cup;</mo> <mn>0</mn> <mo>,</mo> <mo>&ForAll;</mo> <mi>d</mi> <mo>,</mo> <mo>&ForAll;</mo> <mi>k</mi> <mo>,</mo> <mi>t</mi> <mo>&Element;</mo> <mo>&lsqb;</mo> <mn>1</mn> <mo>,</mo> <mi>T</mi> <mo>&rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow>

From the point of view of problem P1 expression, because data generation is unknown and dynamic, resource variableIt is integer type, therefore more than Problem is a constraint random integers optimization problem；Wherein constraint (10) is in order to ensure being distributed in single time slot in each data The summation of calculation evidence is equal to the total amount of data constraints (11) produced at the moment and ensure that required virtual machine quantity no more than number Is enclosed according to what center can be provided

(5) Liapunov Optimization Framework Theoretical Design On-line Control algorithm is utilized

Make H_d(t) it is untreated data volume in data center d in time series, first, defines H_d(0)=0, then queue H_d(t) Evolution can be described as follows：

<mrow> <msub> <mi>H</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mo>&lsqb;</mo> <msub> <mi>H</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>-</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <mi>K</mi> </mrow> </munder> <msubsup> <mi>n</mi> <mi>d</mi> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msub> <mi>v</mi> <mi>k</mi> </msub> <mo>,</mo> <mn>0</mn> <mo>&rsqb;</mo> <mo>+</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>r</mi> <mo>&Element;</mo> <mi>R</mi> </mrow> </munder> <msubsup> <mi>&lambda;</mi> <mi>r</mi> <mi>d</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>13</mn> <mo>)</mo> </mrow> </mrow>

The renewal rule of above-mentioned queue means that handled data volume isNewly arrived data volume isIn order to ensure queue H_d(t),Delay in the worst cases is in maximum functional load delay l, if A respective fictional queue Z is counted_d(t), wherein, virtual queue Z_d(t) load is initialized as Z_d=0, (0) and update it is as follows Rule：

<mrow> <msub> <mi>Z</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mo>&lsqb;</mo> <msub> <mi>Z</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mn>1</mn> <mrow> <msub> <mi>H</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>></mo> <mn>0</mn> </mrow> </msub> <mo>(</mo> <mrow> <mi>&epsiv;</mi> <mo>-</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <mi>K</mi> </mrow> </munder> <msubsup> <mi>n</mi> <mi>d</mi> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msub> <mi>v</mi> <mi>k</mi> </msub> </mrow> <mo>)</mo> <mo>-</mo> <msub> <mn>1</mn> <mrow> <msub> <mi>H</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mn>0</mn> </mrow> </msub> <munder> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <mi>K</mi> </mrow> </munder> <msubsup> <mi>N</mi> <mi>d</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> </mrow> </msubsup> <mo>&CenterDot;</mo> <msub> <mi>v</mi> <mi>k</mi> </msub> <mo>,</mo> <mn>0</mn> <mo>&rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>14</mn> <mo>)</mo> </mrow> </mrow>

Wherein indicator functionH is worked as in expression_d(t) it is equal to 1 during ＞ 0, otherwise equal to 0；Similarly,H is worked as in expression_d(t) It is 1 when=0, is otherwise 0；ε_dFor preset constant, for controlling the scope of queue delay.It is possible thereby to prove, if carried algorithm It ensure that queue H_dAnd Z (t)_d(t) stablizing for a long time, then all data can be obtained at most l time-slot delay Handle and, l may be configured asWhereinWithIt is queue H respectively_dAnd Z (t)_d(t) upper Limit；

Make Z (t)=(Z_d(t)), H (t)=(H_d(t)),Virtual queue and the matrix of actual arrays are represented respectively, then may be used To represent the confederate matrix of actual arrays and virtual queue with θ (t)=[H (t), Z (t)]；According to Liapunov framework, definition Liapunov function is as follows:

<mrow> <mi>L</mi> <mrow> <mo>(</mo> <mi>&theta;</mi> <mo>(</mo> <mi>t</mi> <mo>)</mo> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <munder> <mo>&Sigma;</mo> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>D</mi> </mrow> </munder> <mo>{</mo> <msub> <mi>Z</mi> <mi>d</mi> </msub> <msup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msub> <mi>H</mi> <mi>d</mi> </msub> <msup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>}</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>15</mn> <mo>)</mo> </mrow> </mrow>

Wherein L (θ (t)) be system in load overstocked measurement.Then the Liapunov drift function of single time slot then may be defined as：

Δ (θ (t))=E L (θ (t+1)) L (θ (t)) | θ (t) } (16)

For the cost produced by also minimizing system while ensureing that system queue is stable, then Liapunov drift-punishment Item can increase system synthesis this function acquisition in above formula (16) drift function, i.e.,：

Δ(θ(t))+V·E{C(t)|θ(t)} (17)

Wherein V is non-negative parameter, and it can be traded off between the stability of a system and cost, and .V is bigger, the cost that system is produced It is just smaller, otherwise cost bigger is therefore, originally the problem of P1 P2 the problem of reformed into following：

P2.min(17) (18)

s.t.:(9)(10)(11)(12) (19)

The key for solving P2 is to find its upper bound, and provable by deriving, the boundary of formula (17) is：

<mrow> <mtable> <mtr> <mtd> <mrow> <mi>&Delta;</mi> <mrow> <mo>(</mo> <mrow> <mi>&theta;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> <mo>)</mo> </mrow> <mo>+</mo> <mi>V</mi> <mo>&CenterDot;</mo> <mi>E</mi> <mrow> <mo>{</mo> <mrow> <munder> <mi>lim</mi> <mrow> <mi>T</mi> <mo>&RightArrow;</mo> <mi>&infin;</mi> </mrow> </munder> <mfrac> <mn>1</mn> <mi>T</mi> </mfrac> <munderover> <mi>&Sigma;</mi> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>T</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>C</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>|</mo> <mi>&theta;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> <mo>}</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>&le;</mo> <mi>B</mi> <mo>+</mo> <mi>E</mi> <mrow> <mo>{</mo> <mrow> <munder> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>D</mi> </mrow> </munder> <munder> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <mi>K</mi> </mrow> </munder> <msubsup> <mi>n</mi> <mi>d</mi> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mrow> <mo>(</mo> <mrow> <msubsup> <mi>Vp</mi> <mi>d</mi> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>-</mo> <msubsup> <mi>H</mi> <mi>d</mi> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <msub> <mi>v</mi> <mi>k</mi> </msub> <mo>-</mo> <msubsup> <mi>Z</mi> <mi>d</mi> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <msub> <mi>v</mi> <mi>k</mi> </msub> </mrow> <mo>)</mo> </mrow> <mo>|</mo> <mi>&theta;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> <mo>}</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>+</mo> <mi>E</mi> <mrow> <mo>{</mo> <mrow> <munder> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>D</mi> </mrow> </munder> <munder> <mi>&Sigma;</mi> <mrow> <mi>r</mi> <mo>&Element;</mo> <mi>R</mi> </mrow> </munder> <msubsup> <mi>&lambda;</mi> <mi>d</mi> <mi>r</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mrow> <mo>(</mo> <mrow> <msub> <mi>Vs</mi> <mi>d</mi> </msub> <mo>+</mo> <msubsup> <mi>Vb</mi> <mi>r</mi> <mi>d</mi> </msubsup> <mo>+</mo> <msubsup> <mi>VL</mi> <mi>r</mi> <mi>d</mi> </msubsup> <mo>+</mo> <msub> <mi>H</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> <mo>)</mo> </mrow> <mo>|</mo> <mi>&theta;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> <mo>}</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>20</mn> <mo>)</mo> </mrow> </mrow>

Wherein

By carefully studying on the right of inequality (20), optimization problem is equivalently resolved into two subproblems：I.e. data distribution is asked Topic and resource provisioning problem；

(6) details for solving two above subproblem is as described below：

A, Data Migration：To minimize on the right of formula (20), wherein the part related to Data Migration can be extracted as：

<mrow> <mi>min</mi> <mi> </mi> <mi>E</mi> <mo>{</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>D</mi> </mrow> </munder> <munder> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <mi>K</mi> </mrow> </munder> <msubsup> <mi>&lambda;</mi> <mi>d</mi> <mi>r</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mrow> <mo>(</mo> <msub> <mi>Vs</mi> <mi>d</mi> </msub> <mo>+</mo> <msubsup> <mi>Vb</mi> <mi>r</mi> <mi>d</mi> </msubsup> <mo>+</mo> <msubsup> <mi>V&alpha;L</mi> <mi>r</mi> <mi>d</mi> </msubsup> <mo>+</mo> <msub> <mi>H</mi> <mi>d</mi> </msub> <mo>(</mo> <mi>t</mi> <mo>)</mo> <mo>)</mo> </mrow> <mo>|</mo> <mi>&theta;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>}</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>21</mn> <mo>)</mo> </mrow> </mrow>

Additionally, because the data of each data source are independently generated, the multi-data source global optimization mode described in formula (21) can To be performed respectively in each Data Source Independent, it is considered to data distribution on t data source r, then the data migration problems are converted into Solve following problem：

<mrow> <mtable> <mtr> <mtd> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> <munder> <mo>&Sigma;</mo> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>D</mi> </mrow> </munder> <msubsup> <mi>&lambda;</mi> <mi>d</mi> <mi>r</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&lsqb;</mo> <msub> <mi>Vs</mi> <mi>d</mi> </msub> <mo>+</mo> <msubsup> <mi>Vb</mi> <mi>r</mi> <mi>d</mi> </msubsup> <mo>+</mo> <msubsup> <mi>V&alpha;L</mi> <mi>r</mi> <mi>d</mi> </msubsup> <mo>+</mo> <msub> <mi>H</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>s</mi> <mo>.</mo> <mi>t</mi> <mo>.</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mn>10</mn> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>22</mn> <mo>)</mo> </mrow> </mrow>

The problem is the minimal weight problem of a broad sense, is from the data source r weights for moving to data center dIt with Data overstock H_d(t), bandwidth costStorage cost s_d, delay costIt is relevant, by using programming theory, we Can be in the hope of solution below：

<mrow> <msubsup> <mi>&lambda;</mi> <mi>r</mi> <mi>d</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>&alpha;</mi> <mi>r</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mrow> <mi>d</mi> <mo>=</mo> <msup> <mi>d</mi> <mo>*</mo> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mrow> <mi>e</mi> <mi>l</mi> <mi>s</mi> <mi>e</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>23</mn> <mo>)</mo> </mrow> </mrow>

WhereinObviously, t algorithm tends to the number for producing data source r According to migrating to the moment there is most short task queue and the data center of minimum operating cost to be handled.

B, resource distribution：Such as remove the constant term B on the right of formula (20), then variableRelated part is considered Resource provisioning problem, the optimal provisioning policy of virtual machine is obtained by solving following problem：

<mrow> <mtable> <mtr> <mtd> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> <mi> </mi> <mi>E</mi> <mo>{</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>d</mi> <mo>&Element;</mo> <mi>D</mi> </mrow> </munder> <munder> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <mi>K</mi> </mrow> </munder> <msubsup> <mi>n</mi> <mi>d</mi> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mrow> <mo>(</mo> <msubsup> <mi>Vp</mi> <mi>d</mi> <mi>k</mi> </msubsup> <mo>(</mo> <mi>t</mi> <mo>)</mo> <mo>-</mo> <msub> <mi>H</mi> <mi>d</mi> </msub> <mo>(</mo> <mi>t</mi> <mo>)</mo> <msub> <mi>v</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>Z</mi> <mi>d</mi> </msub> <mo>(</mo> <mi>t</mi> <mo>)</mo> <msub> <mi>v</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <mi>&theta;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>}</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>s</mi> <mo>.</mo> <mi>t</mi> <mo>.</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>24</mn> <mo>)</mo> </mrow> </mrow>

Similarly, because the resource provision in each data center is independent, similar to data distribution problem, formula (23) can be Each data center solves with building distribution, thus, for individual data center d, resource provisioning problem can be further rewritten as：

<mrow> <mtable> <mtr> <mtd> <mrow> <mi>min</mi> <mi> </mi> <mi>E</mi> <mrow> <mo>{</mo> <mrow> <munder> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <mi>K</mi> </mrow> </munder> <msubsup> <mi>n</mi> <mi>d</mi> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <mrow> <mo>(</mo> <mrow> <msubsup> <mi>Vp</mi> <mi>d</mi> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>H</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <msub> <mi>v</mi> <mi>k</mi> </msub> <mo>-</mo> <msub> <mi>Z</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <msub> <mi>v</mi> <mi>k</mi> </msub> </mrow> <mo>)</mo> </mrow> <mo>|</mo> <mi>&theta;</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> <mo>}</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mi>s</mi> <mo>.</mo> <mi>t</mi> <mo>.</mo> <mrow> <mo>(</mo> <mn>11</mn> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>25</mn> <mo>)</mo> </mrow> </mrow>

The solution of above-mentioned linear problem of being easy to get is：

<mrow> <msubsup> <mi>n</mi> <mi>d</mi> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>N</mi> <mi>d</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>max</mi> </mrow> </msubsup> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mi>f</mi> </mrow> </mtd> <mtd> <mrow> <msub> <mi>H</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>Z</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>></mo> <mfrac> <mrow> <msubsup> <mi>Vp</mi> <mi>d</mi> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> <msub> <mi>v</mi> <mi>k</mi> </msub> </mfrac> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mn>0</mn> <mo>,</mo> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mi>f</mi> </mrow> </mtd> <mtd> <mrow> <msub> <mi>H</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>Z</mi> <mi>d</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&le;</mo> <mfrac> <mrow> <msubsup> <mi>Vp</mi> <mi>d</mi> <mi>k</mi> </msubsup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> <msub> <mi>v</mi> <mi>k</mi> </msub> </mfrac> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>26</mn> <mo>)</mo> </mrow> </mrow> 4